-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deployment of v4.3.3 on corefacility #159
Comments
Running Blazegraph on host, via
|
Looks like #159 (comment) works for storing the database in |
Implemented moby/moby#1143 (comment) in our version: '2'
services:
<container_name>:
image: <image_name>
networks:
- dockernet
networks:
dockernet:
driver: bridge
ipam:
config:
- subnet: 192.168.0.0/24
gateway: 192.168.0.1 except, we added to all our services: networks:
- dockernet which still lets us use namespaces defined by docker (such as sudo firewall-cmd --zone=public --add-port=9999/tcp --permanent
sudo firewall-cmd --reload which also dangerously exposes port 9999 to the outside world, but for now we can also now run:
to connect to the blazegraph instance running on our host. |
After #159 (comment) , can now start reactapp outside of corefacility and uploading files/get results. Note that files are still being stored inside the VM instead of on |
Need to address superphy/grouch#14 now so we can run reactapp out of corefacility. |
|
We're up! 9766690 uses a prefix for reactapp uri's + a |
Subtyping and Database works fine, but there is an issue in the Fishers task not returning any results |
This is weird as the Database task can retrieve serotype info. w/o problems which implies the ECTyper jobs finished correctly, whereas you can't compare VF results in the Fishers task. |
There's only 15 genomes in the db atm., perhaps this is just a case of no shared VFs between 2 H types (since there may be only 1 genome per H type)? Will upload a larger set of ref. genomes to test. |
Looks like we're hitting the timeouts again superphy/grouch#43 Perhaps this is because the VM has lower specs? May need to bump timeouts even more. |
Retested #159 (comment) locally and we don't have this problem. It's almost like the corefacility deployment of blazegraph is losing transactions. |
Figured out the error. I didn't merge branch In new command: |
Still need to address #159 (comment) |
Looks to be associated with the |
|
An addition to the fix added: #168 |
Still didn't work, same error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Bad Gateway</title>
</head><body>
<h1>Bad Gateway</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
</p>
</body></html> |
So far tested: |
Tested on Cybera and still getting problems. This: pallets/flask#2086 (comment) looks super relevant and may be a possible fix. |
Something different... webserver_1 | 2017/07/17 16:03:41 [error] 10#10: *1 client intended to send too large body: 197175003 bytes, client: 192.168.0.1, server: , request: "POST /api/v0/upload HTTP/1.1", host: "localhost:8090", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
webserver_1 | 192.168.0.1 - - [17/Jul/2017:16:03:41 +0000] "POST /api/v0/upload HTTP/1.1" 413 601 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36" "192.197.71.189" This occurs only after |
#159 (comment) is fixed in 33970dc |
I'm wondering if this is possibly above the docker level, because on successful POSTs we get the GET request in docker logs, but when it fails there is absolutely nothing. |
🍰 Partial success! We just need to up the space available/ offload the temporary storage of the files somewhere on the VM and we'll be set. EDIT: another webserver we're unaware of would also explain why we get generic 502 errors instead of 500 errors. |
The space when unzipping issue (#159 (comment)) is confirmed in https://sentry.io/share/issue/3133353338392e333031353732303031/ |
Heads up! May have to adjust speed of Blazegraph queries as |
New HDD was attached to VM. Using https://forums.docker.com/t/how-do-i-change-the-docker-image-installation-directory/1169 to host docker stuff there via symlink. |
What a successful upload should look like in the logs: webserver_1 | upload(): received req. at 2017-07-19-16-24
webserver_1 | [<FileStorage: u'GCA_001912665.1_ASM191266v1_genomic.fna' ('application/octet-stream')>]
webserver_1 | upload(): about to enqueue files
webserver_1 | upload(): all files enqueued, returning...
webserver_1 | handle_groupresults(): started
webserver_1 | handle_groupresults(): finished
webserver_1 | [pid: 21|app: 0|req: 90/162] 192.168.0.1 () {56 vars in 1113 bytes} [Wed Jul 19 16:24:42 2017] POST /api/v0/upload => generated 162 bytes in 2946 msecs (HTTP/1.1 200) 4 headers in 144 bytes (20 switches on core 0)
webserver_1 | 192.168.0.1 - - [19/Jul/2017:16:24:45 +0000] "POST /api/v0/upload HTTP/1.1" 200 162 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" "10.0.10.83"
webserver_1 | [pid: 20|app: 0|req: 73/163] 192.168.0.1 () {52 vars in 962 bytes} [Wed Jul 19 16:24:45 2017] GET /api/v0/results/blob2907573319237084527 => generated 10 bytes in 6 msecs (HTTP/1.1 200) 3 headers in 103 bytes (2 switches on core 0)
webserver_1 | 192.168.0.1 - - [19/Jul/2017:16:24:45 +0000] "GET /api/v0/results/blob2907573319237084527 HTTP/1.1" 200 10 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" "10.0.10.83" What we got with large files (note: looks like files are still being enqueued just no blob id being returned) webserver_1 | upload(): received req. at 2017-07-19-16-59
webserver_1 | [<FileStorage: u'656_files-3.6_GB-ecoli-genomes.zip' ('application/zip')>]
webserver_1 | upload(): about to enqueue files
webserver_1 | 192.168.0.1 - - [19/Jul/2017:17:00:01 +0000] "POST /api/v0/upload HTTP/1.1" 499 0 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36" "192.197.71.189"
webserver_1 | upload(): all files enqueued, returning...
webserver_1 | handle_groupresults(): started
webserver_1 | handle_groupresults(): finished
webserver_1 | Wed Jul 19 17:00:32 2017 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/v0/upload (ip 192.168.0.1) !!!
webserver_1 | Wed Jul 19 17:00:32 2017 - uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 296] during POST /api/v0/upload (192.168.0.1)
webserver_1 | IOError: write error
webserver_1 | [pid: 20|app: 0|req: 76/168] 192.168.0.1 () {52 vars in 1040 bytes} [Wed Jul 19 16:50:02 2017] POST /api/v0/upload => generated 0 bytes in 631156 msecs (HTTP/1.1 200) 4 headers in 0 bytes (3782 switches on core 0) |
From #159 (comment) looks like uwsgi is still active and the disconnect is happening on either nginx or reactapp. |
Only immediately after a 502 error shows up on reactapp do we get this in logs: webserver_1 | upload(): received req. at 2017-07-19-21-05
webserver_1 | [<FileStorage: u'656_files-3.6_GB-ecoli-genomes.zip' ('application/zip')>]
webserver_1 | upload(): about to enqueue files
webserver_1 | 192.168.0.1 - - [19/Jul/2017:21:05:49 +0000] "POST /api/v0/upload HTTP/1.1" 499 0 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36" |
#159 (comment) looks like an error with uwsgi in docker. the |
[claing@superphy docker]$ sudo tail /var/log/nginx/error.log
2017/07/19 11:25:10 [warn] 10922#0: *323 an upstream response is buffered to a temporary file /var/lib/nginx/tmp/proxy/9/00/0000000009 while reading upstream, client: 10.0.0.10, server: lfz.corefacility.ca, request: "GET //grouch/static/js/main.bfae872c.js.map HTTP/1.1", upstream: "http://[::1]:8091/static/js/main.bfae872c.js.map", host: "lfz.corefacility.ca"
2017/07/19 11:25:19 [warn] 10922#0: *327 a client request body is buffered to a temporary file /var/lib/nginx/tmp/client_body/0000000010, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
2017/07/19 13:28:58 [warn] 13540#0: *15 a client request body is buffered to a temporary file /var/lib/nginx/tmp/client_body/0000000001, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
2017/07/19 14:04:37 [error] 13540#0: *15 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", upstream: "http://127.0.0.1:8090/api/v0/upload", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
2017/07/19 14:04:37 [warn] 13540#0: *15 upstream server temporarily disabled while reading response header from upstream, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", upstream: "http://127.0.0.1:8090/api/v0/upload", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping" |
[claing@superphy 2.1.4-inferencing]$ ls -lah
total 7.6G
drwxrwxr-x. 2 claing collaborators 152 Jul 12 16:14 .
drwxrwxr-x. 11 claing nobody 329 Jul 12 16:02 ..
-rwxrwxr-x. 1 claing collaborators 6.2G Jul 20 12:16 bigdata.jnl
-rwxrwxr-x. 1 claing nobody 54M May 23 22:08 blazegraph.jar
-rwxrwxr-x. 1 claing nobody 622 Jul 12 16:02 Dockerfile
-rwxrwxr-x. 1 claing nobody 701M Jul 20 12:16 rules.log
-rwxrwxr-x. 1 claing nobody 2.7K Jul 12 16:02 RWStore.properties
[claing@superphy 2.1.4-inferencing]$ pwd
/Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4-inferencing |
Looks like database status queries are working fine, but POSTs are failing erratically. This is with |
I'm closing this issue; there are a number of major changes that will probably be required to bring this into viable production usage, including a number of breaking changes. Everything will be tracked under https://github.com/superphy/backend/milestone/1 |
No description provided.
The text was updated successfully, but these errors were encountered: