deployment of v4.3.3 on corefacility #159

kevinkle · 2017-07-11T16:14:51Z

No description provided.

kevinkle · 2017-07-11T16:18:42Z

Running Blazegraph on host, via screen and inside /Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4 - the bigdata.jnl will be stored there.

java -server -Xmx4g -Dbigdata.propertyFile=/Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4/RWStore.properties -jar blazegraph.jar

kevinkle · 2017-07-11T17:01:59Z

Looks like #159 (comment) works for storing the database in /Warehouse. But now need to network the docker composition to the blazegraph instance running on the host moby/moby#1143 (comment) - doesn't look like there is an easy/Docker-approved method.

kevinkle · 2017-07-11T18:06:47Z

Implemented moby/moby#1143 (comment) in our docker-compose.yml file:
From the comment:

version: '2'
services:
  <container_name>:
    image: <image_name>
    networks:
      - dockernet

networks:
  dockernet:
    driver: bridge
    ipam:
      config:
        - subnet: 192.168.0.0/24
          gateway: 192.168.0.1

except, we added to all our services:

networks:
      - dockernet

which still lets us use namespaces defined by docker (such as redis) to link containers. Then we add a firewall rule on the centos host:

sudo firewall-cmd --zone=public --add-port=9999/tcp --permanent
sudo firewall-cmd --reload

which also dangerously exposes port 9999 to the outside world, but for now we can also now run:

curl http://192.168.0.1:9999/blazegraph/

to connect to the blazegraph instance running on our host.

kevinkle · 2017-07-11T18:27:36Z

After #159 (comment) , can now start reactapp outside of corefacility and uploading files/get results. Note that files are still being stored inside the VM instead of on /Warehouse, but this will be addressed in #148 and doesn't require any modification on corefacility.

kevinkle · 2017-07-11T18:28:43Z

Need to address superphy/grouch#14 now so we can run reactapp out of corefacility.

kevinkle · 2017-07-11T21:12:35Z

screen -r 15365.pts-1.superphy

kevinkle · 2017-07-11T22:04:58Z

We're up! 9766690 uses a prefix for reactapp uri's + a homepage spec. For a weird reason, the commit is listed under Chad (prob since I pushed it from the VM, but w/e)

kevinkle · 2017-07-12T16:50:14Z

Subtyping and Database works fine, but there is an issue in the Fishers task not returning any results

kevinkle · 2017-07-12T17:03:10Z

This is weird as the Database task can retrieve serotype info. w/o problems which implies the ECTyper jobs finished correctly, whereas you can't compare VF results in the Fishers task.

kevinkle · 2017-07-12T17:09:57Z

There's only 15 genomes in the db atm., perhaps this is just a case of no shared VFs between 2 H types (since there may be only 1 genome per H type)? Will upload a larger set of ref. genomes to test.

kevinkle · 2017-07-12T18:25:57Z

Looks like we're hitting the timeouts again superphy/grouch#43

Perhaps this is because the VM has lower specs? May need to bump timeouts even more.

kevinkle · 2017-07-12T20:52:21Z

Retested #159 (comment) locally and we don't have this problem. It's almost like the corefacility deployment of blazegraph is losing transactions.

kevinkle · 2017-07-12T21:08:35Z

Figured out the error. I didn't merge branch inferencing of our docker-blazegraph repo into master and was running blazegraph on corefacility without inferencing.

In /Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4-inferencing

new command: java -server -Xmx4g -Dbigdata.propertyFile=/Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4-inferencing/RWStore.properties -jar blazegraph.jar

kevinkle · 2017-07-12T21:28:47Z

Still need to address #159 (comment)

kevinkle · 2017-07-12T22:14:46Z

Looks to be associated with the harakiri option: https://stackoverflow.com/questions/24127601/uwsgi-request-timeout-in-python
Added in #165 , testing now...

kevinkle · 2017-07-12T22:37:50Z

*** WARNING: you have enabled harakiri without post buffering. Slow upload could be rejected on post-unbuffered webservers ***

kevinkle · 2017-07-12T22:44:37Z

http://uwsgi-docs.readthedocs.io/en/latest/Fastrouter.html#post-buffering-mode-uwsgi-2-0-9

kevinkle · 2017-07-12T22:59:30Z

An addition to the fix added: #168

kevinkle · 2017-07-13T17:09:04Z

Still didn't work, same error:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Bad Gateway</title>
</head><body>
<h1>Bad Gateway</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
</p>
</body></html>

kevinkle · 2017-07-14T20:52:16Z

So far tested:
~40ish genomes, unzipped, works
~50+ genomes (Ex. 56 genomes, 290.9 MB), unzipped, 502 error
48 genomes (249.6 MB) zipped to 75.2 MB, works
124 genomes (652.5MB) zipped to 197.2 MB, works
656 genomes (3.6 GB) zipped to 1.1 GB, 502 error

kevinkle · 2017-07-14T22:42:02Z

Tested on Cybera and still getting problems. This: pallets/flask#2086 (comment) looks super relevant and may be a possible fix.

kevinkle · 2017-07-17T16:05:08Z

Something different...

webserver_1              | 2017/07/17 16:03:41 [error] 10#10: *1 client intended to send too large body: 197175003 bytes, client: 192.168.0.1, server: , request: "POST /api/v0/upload HTTP/1.1", host: "localhost:8090", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
webserver_1              | 192.168.0.1 - - [17/Jul/2017:16:03:41 +0000] "POST /api/v0/upload HTTP/1.1" 413 601 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36" "192.197.71.189"

This occurs only after Upload... hits 100%.

kevinkle · 2017-07-17T16:46:41Z

#159 (comment) is fixed in 33970dc

kevinkle · 2017-07-17T20:58:12Z

I'm wondering if this is possibly above the docker level, because on successful POSTs we get the GET request in docker logs, but when it fails there is absolutely nothing.

kevinkle · 2017-07-17T21:42:42Z

🍰 Partial success!
Looks like the error was with the nginx.conf running above Docker on the VM. There was a max file size specified there capping it to 200m, changing it to 60g (60GB) now sends the files into Docker. Slightly weird though, because the access.log on that Nginx isn't saying anything about the POST request until axios on the front-end reports 100% upload - perhaps there is another webserver running at corefacility which is also handling buffering???

We just need to up the space available/ offload the temporary storage of the files somewhere on the VM and we'll be set.

EDIT: another webserver we're unaware of would also explain why we get generic 502 errors instead of 500 errors.

kevinkle · 2017-07-17T22:14:17Z

The space when unzipping issue (#159 (comment)) is confirmed in https://sentry.io/share/issue/3133353338392e333031353732303031/

kevinkle · 2017-07-17T22:16:05Z

Heads up! May have to adjust speed of Blazegraph queries as /Warehouse seems slow https://sentry.io/share/issue/3133353338392e333132353036333439/

kevinkle · 2017-07-18T20:17:11Z

New HDD was attached to VM. Using https://forums.docker.com/t/how-do-i-change-the-docker-image-installation-directory/1169 to host docker stuff there via symlink.

kevinkle · 2017-07-19T17:04:37Z

What a successful upload should look like in the logs:

webserver_1              | upload(): received req. at 2017-07-19-16-24
webserver_1              | [<FileStorage: u'GCA_001912665.1_ASM191266v1_genomic.fna' ('application/octet-stream')>]
webserver_1              | upload(): about to enqueue files
webserver_1              | upload(): all files enqueued, returning...
webserver_1              | handle_groupresults(): started
webserver_1              | handle_groupresults(): finished
webserver_1              | [pid: 21|app: 0|req: 90/162] 192.168.0.1 () {56 vars in 1113 bytes} [Wed Jul 19 16:24:42 2017] POST /api/v0/upload => generated 162 bytes in 2946 msecs (HTTP/1.1 200) 4 headers in 144 bytes (20 switches on core 0)
webserver_1              | 192.168.0.1 - - [19/Jul/2017:16:24:45 +0000] "POST /api/v0/upload HTTP/1.1" 200 162 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" "10.0.10.83"
webserver_1              | [pid: 20|app: 0|req: 73/163] 192.168.0.1 () {52 vars in 962 bytes} [Wed Jul 19 16:24:45 2017] GET /api/v0/results/blob2907573319237084527 => generated 10 bytes in 6 msecs (HTTP/1.1 200) 3 headers in 103 bytes (2 switches on core 0)
webserver_1              | 192.168.0.1 - - [19/Jul/2017:16:24:45 +0000] "GET /api/v0/results/blob2907573319237084527 HTTP/1.1" 200 10 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" "10.0.10.83"

What we got with large files (note: looks like files are still being enqueued just no blob id being returned)

webserver_1              | upload(): received req. at 2017-07-19-16-59
webserver_1              | [<FileStorage: u'656_files-3.6_GB-ecoli-genomes.zip' ('application/zip')>]
webserver_1              | upload(): about to enqueue files
webserver_1              | 192.168.0.1 - - [19/Jul/2017:17:00:01 +0000] "POST /api/v0/upload HTTP/1.1" 499 0 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36" "192.197.71.189"
webserver_1              | upload(): all files enqueued, returning...
webserver_1              | handle_groupresults(): started
webserver_1              | handle_groupresults(): finished
webserver_1              | Wed Jul 19 17:00:32 2017 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/v0/upload (ip 192.168.0.1) !!!
webserver_1              | Wed Jul 19 17:00:32 2017 - uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 296] during POST /api/v0/upload (192.168.0.1)
webserver_1              | IOError: write error
webserver_1              | [pid: 20|app: 0|req: 76/168] 192.168.0.1 () {52 vars in 1040 bytes} [Wed Jul 19 16:50:02 2017] POST /api/v0/upload => generated 0 bytes in 631156 msecs (HTTP/1.1 200) 4 headers in 0 bytes (3782 switches on core 0)

kevinkle · 2017-07-19T17:56:48Z

From #159 (comment) looks like uwsgi is still active and the disconnect is happening on either nginx or reactapp.

kevinkle · 2017-07-19T21:06:52Z

Only immediately after a 502 error shows up on reactapp do we get this in logs:

webserver_1              | upload(): received req. at 2017-07-19-21-05
webserver_1              | [<FileStorage: u'656_files-3.6_GB-ecoli-genomes.zip' ('application/zip')>]
webserver_1              | upload(): about to enqueue files
webserver_1              | 192.168.0.1 - - [19/Jul/2017:21:05:49 +0000] "POST /api/v0/upload HTTP/1.1" 499 0 "https://lfz.corefacility.ca/superphy/grouch/subtyping" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"

kevinkle · 2017-07-19T21:10:35Z

#159 (comment) looks like an error with uwsgi in docker. the 192.168.0.1 refers to nginx running above docker on the vm.

kevinkle · 2017-07-19T22:46:26Z

[claing@superphy docker]$ sudo tail /var/log/nginx/error.log
2017/07/19 11:25:10 [warn] 10922#0: *323 an upstream response is buffered to a temporary file /var/lib/nginx/tmp/proxy/9/00/0000000009 while reading upstream, client: 10.0.0.10, server: lfz.corefacility.ca, request: "GET //grouch/static/js/main.bfae872c.js.map HTTP/1.1", upstream: "http://[::1]:8091/static/js/main.bfae872c.js.map", host: "lfz.corefacility.ca"
2017/07/19 11:25:19 [warn] 10922#0: *327 a client request body is buffered to a temporary file /var/lib/nginx/tmp/client_body/0000000010, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
2017/07/19 13:28:58 [warn] 13540#0: *15 a client request body is buffered to a temporary file /var/lib/nginx/tmp/client_body/0000000001, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
2017/07/19 14:04:37 [error] 13540#0: *15 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", upstream: "http://127.0.0.1:8090/api/v0/upload", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"
2017/07/19 14:04:37 [warn] 13540#0: *15 upstream server temporarily disabled while reading response header from upstream, client: 10.0.0.10, server: lfz.corefacility.ca, request: "POST //spfy/api/v0/upload HTTP/1.1", upstream: "http://127.0.0.1:8090/api/v0/upload", host: "lfz.corefacility.ca", referrer: "https://lfz.corefacility.ca/superphy/grouch/subtyping"

kevinkle · 2017-07-20T17:26:44Z

[claing@superphy 2.1.4-inferencing]$ ls -lah
total 7.6G
drwxrwxr-x.  2 claing collaborators  152 Jul 12 16:14 .
drwxrwxr-x. 11 claing nobody         329 Jul 12 16:02 ..
-rwxrwxr-x.  1 claing collaborators 6.2G Jul 20 12:16 bigdata.jnl
-rwxrwxr-x.  1 claing nobody         54M May 23 22:08 blazegraph.jar
-rwxrwxr-x.  1 claing nobody         622 Jul 12 16:02 Dockerfile
-rwxrwxr-x.  1 claing nobody        701M Jul 20 12:16 rules.log
-rwxrwxr-x.  1 claing nobody        2.7K Jul 12 16:02 RWStore.properties
[claing@superphy 2.1.4-inferencing]$ pwd
/Warehouse/Users/claing/superphy/spfy/docker-blazegraph/2.1.4-inferencing

kevinkle · 2017-07-20T17:28:21Z

Looks like database status queries are working fine, but POSTs are failing erratically. This is with # of Genome Files: 637

kevinkle · 2017-07-20T17:52:47Z

I'm closing this issue; there are a number of major changes that will probably be required to bring this into viable production usage, including a number of breaking changes. Everything will be tracked under https://github.com/superphy/backend/milestone/1

kevinkle changed the title ~~deployment on corefacility~~ deployment of v4.3.3 on corefacility Jul 11, 2017

kevinkle closed this as completed Jul 11, 2017

kevinkle reopened this Jul 12, 2017

kevinkle mentioned this issue Jul 19, 2017

create a service to run blazegraph on corefacility #180

Closed

kevinkle mentioned this issue Jul 20, 2017

new approach to uploading files #188

Open

kevinkle closed this as completed Jul 20, 2017

deployment of v4.3.3 on corefacility #159

deployment of v4.3.3 on corefacility #159

Comments

kevinkle commented Jul 11, 2017

kevinkle commented Jul 11, 2017

kevinkle commented Jul 11, 2017

kevinkle commented Jul 11, 2017 • edited Loading

kevinkle commented Jul 11, 2017

kevinkle commented Jul 11, 2017

kevinkle commented Jul 11, 2017

kevinkle commented Jul 11, 2017 • edited Loading

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017 • edited Loading

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 12, 2017

kevinkle commented Jul 13, 2017

kevinkle commented Jul 14, 2017 • edited Loading

kevinkle commented Jul 14, 2017

kevinkle commented Jul 17, 2017 • edited Loading

kevinkle commented Jul 17, 2017

kevinkle commented Jul 17, 2017

kevinkle commented Jul 17, 2017 • edited Loading

kevinkle commented Jul 17, 2017

kevinkle commented Jul 17, 2017

kevinkle commented Jul 18, 2017 • edited Loading

kevinkle commented Jul 19, 2017 • edited Loading

kevinkle commented Jul 19, 2017

kevinkle commented Jul 19, 2017

kevinkle commented Jul 19, 2017

kevinkle commented Jul 19, 2017

kevinkle commented Jul 20, 2017

kevinkle commented Jul 20, 2017

kevinkle commented Jul 20, 2017

kevinkle commented Jul 11, 2017 •

edited

Loading

kevinkle commented Jul 11, 2017 •

edited

Loading

kevinkle commented Jul 12, 2017 •

edited

Loading

kevinkle commented Jul 14, 2017 •

edited

Loading

kevinkle commented Jul 17, 2017 •

edited

Loading

kevinkle commented Jul 17, 2017 •

edited

Loading

kevinkle commented Jul 18, 2017 •

edited

Loading

kevinkle commented Jul 19, 2017 •

edited

Loading