Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to upload large files into Galaxy #999

Closed
flekschas opened this issue Apr 11, 2016 · 26 comments
Closed

Unable to upload large files into Galaxy #999

flekschas opened this issue Apr 11, 2016 · 26 comments

Comments

@flekschas
Copy link
Member

flekschas commented Apr 11, 2016

Commit: 8562f47
Dataset: http://stemcellcommons.org/sites/default/files/isa/isa_13293_727513.zip
Input file: http://stemcellcommons.org/sites/default/files/xf_bioassay_files/2S_HL60_ACAGTG_lane8_read2.fastq.gz

Steps to reproduce

  1. Select a decently large file (impossible currently because file sizes are not displayed) but the one from the screenshot is 4.3GB
  2. Select Analyze and run FastQC

Observed behavior

Download took about 10 mins. Afterwards the system got stuck in exporting the file to Galaxy. I cleared the queue after more than one hour. VHeadline was running at 100% CPU while on the VM no single process was using more than 3% CPU. On my host, Galaxy was doing nothing as well.

Expected behavior

The export to Galaxy shouldn't take that long.

@ngehlenborg
Copy link
Contributor

@ngehlenborg
Copy link
Contributor

@hackdna: Can you try to run this on an AWS instance?

@hackdna
Copy link
Member

hackdna commented Apr 12, 2016

@flekschas: could you paste Celery and Galaxy logs?

@flekschas
Copy link
Member Author

Celeryd-w1.log

2016-04-11 16:47:36 INFO     analysis_manager.tasks:82 run_analysis() - Starting analysis 'FastQC 2016-4-11@16:58:50 - fritz - None provided.'
2016-04-11 16:47:36 INFO     analysis_manager.tasks:85 run_analysis() - Starting input file import tasks for analysis 'FastQC 2016-4-11@16:58:50 - fritz - None provided.'
2016-04-11 16:47:41 DEBUG    analysis_manager.tasks:102 run_analysis() - Input file import pending for analysis 'FastQC 2016-4-11@16:58:50 - fritz - None provided.'
... (same output 200 times)
2016-04-11 16:59:07 DEBUG    analysis_manager.tasks:102 run_analysis() - Input file import pending for analysis 'FastQC 2016-4-11@16:58:50 - fritz - None provided.'
2016-04-11 16:59:13 DEBUG    analysis_manager.tasks:112 run_analysis() - Starting analysis execution in Galaxy
2016-04-11 16:59:13 DEBUG    galaxy_connector.galaxy_workflow:537 configure_workflow() - Configuring Galaxy workflow
2016-04-11 16:59:13 DEBUG    galaxy_connector.galaxy_workflow:561 configure_workflow() - Workflow processing: EXPANSION
2016-04-11 16:59:13 DEBUG    galaxy_connector.galaxy_workflow:95 createStepsAnnot() - Creating workflow steps annotation
2016-04-11 16:59:13 DEBUG    galaxy_connector.galaxy_workflow:500 countWorkflowSteps() - Counting workflow steps
2016-04-11 16:59:14 DEBUG    analysis_manager.tasks:221 import_analysis_in_galaxy() - Uploading analysis input files to Galaxy
2016-04-11 16:59:19 DEBUG    analysis_manager.tasks:136 run_analysis() - Analysis 'FastQC 2016-4-11@16:58:50 - fritz - None provided.' pending in Galaxy
2016-04-11 16:59:25 DEBUG    analysis_manager.tasks:136 run_analysis() - Analysis 'FastQC 2016-4-11@16:58:50 - fritz - None provided.' pending in Galaxy
2016-04-11 16:59:31 DEBUG    analysis_manager.tasks:136 run_analysis() - Analysis 'FastQC 2016-4-11@16:58:50 - fritz - None provided.' pending in Galaxy

worker: Warm shutdown (MainProcess)
[2016-04-11 17:59:25,771: ERROR/MainProcess] ...

I killed Celery and cleared the log that's why we see Warm shutdown.

@hackdna Where can I find Galaxy logs.

@ngehlenborg ngehlenborg modified the milestones: Next, Salem Apr 12, 2016
@hackdna
Copy link
Member

hackdna commented Apr 12, 2016

Need to see the celery-w2.log since this is where file operations are logged. Galaxy logs are at the top level of your Galaxy installation or directly in the terminal if you are not running it as a daemon.

@flekschas
Copy link
Member Author

Celeryd-w2.log

2016-04-11 16:47:36 DEBUG    file_store.tasks:68 import_file() - Importing FileStoreItem with UUID '6f28805e-364a-462f-942b-504589479ea3'
2016-04-11 16:47:36 DEBUG    file_store.tasks:129 import_file() - Downloading from 'http://stemcellcommons.org/sites/default/files/xf_bioassay_files/2S_HL60_ACAGTG_lane8_read1.fastq.gz'
2016-04-11 16:59:11 DEBUG    file_store.tasks:159 import_file() - Finished downloading from 'http://stemcellcommons.org/sites/default/files/xf_bioassay_files/2S_HL60_ACAGTG_lane8_read1.fastq.gz'

worker: Warm shutdown (MainProcess)

 -------------- celery@refinery v3.1.20 (Cipater)
---- **** ----- 
--- * ***  * -- Linux-3.13.0-52-generic-x86_64-with-Ubuntu-14.04-trusty
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         default:0x7fbe7cedf6d0 (djcelery.loaders.DjangoLoader)
- ** ---------- .> transport:   amqp://guest:**@localhost:5672//
- ** ---------- .> results:     database
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- 
--- ***** ----- [queues]
 -------------- .> file_import      exchange=file_import(direct) key=file_import


[2016-04-11 17:59:49,834: WARNING/MainProcess] celery@refinery ready.

I am not running Galaxy as a daemon, hence no logs. But as I said, Galaxy didn't do anything the whole time.

@hackdna
Copy link
Member

hackdna commented Apr 12, 2016

It looks like the failure occurred in import_analysis_in_galaxy(), most likely during the call to connection.libraries.upload_file_from_local_path(). Since there are no Galaxy logs, one thing to try is to repeat the analysis manually: upload the file into a Galaxy library (preferably using the Bioblend function mentioned above) then import into a history and run the workflow.

@hackdna
Copy link
Member

hackdna commented Apr 12, 2016

@ngehlenborg
Copy link
Contributor

FastQC workflow also failed on refinery-dev instance during import into Galaxy (import into Galaxy never went beyond "Pending" as far as I can tell). I was able to import the file in question into a Galaxy history on the same server and to successfully run FastQC on it.

Notes:

[5:58 PM] fritz: 2S_HL60_ACAGTG_lane8_read1.fastq.gz
[5:59 PM] nils: thanks
[5:59 PM] fritz: The refinery import step was fast but the galaxy import is still pending
[6:00 PM] nils: same here
[6:00 PM] nils:might take a while
[6:09 PM] nils: my analysis failed
[6:10 PM] nils: now trying to import the file directly into a galaxy history on the galaxy-dev instance
[6:19 PM] nils: i was able to import the file into a history
[6:19 PM] nils: it is 9.3 GB once imported (edited)
[6:26 PM] nils: running fastqc now
[6:26 PM] nils: finished successfully
[6:27 PM] nils: Job Runtime (Wall Clock)    7 minutes

@hackdna
Copy link
Member

hackdna commented Apr 18, 2016

Refinery app log:

Traceback (most recent call last):
  File "/srv/scc/virtualenvs/refinery-platform/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/srv/scc/apps/refinery-platform/refinery/analysis_manager/views.py", line 77, in analysis_status
    'galaxyImport': status.galaxy_import_state(),
  File "/srv/scc/apps/refinery-platform/refinery/analysis_manager/models.py", line 49, in galaxy_import_state
    return get_task_group_state(self.galaxy_import_task_group_id)
  File "/srv/scc/apps/refinery-platform/refinery/analysis_manager/models.py", line 80, in get_task_group_state
    percent_done = task.info.get('percent_done') or 0
AttributeError: 'exceptions.IOError' object has no attribute 'get'

Celery log:

2016-04-15 18:04:45 ERROR    analysis_manager.tasks:241 import_analysis_in_galaxy() - Failed adding file '70e94053-e1e9-47d4-99fc-453b73b369a6' to Galaxy library 'd250a755261486fd': HTTP Error 500: Internal Server Error
[2016-04-15 18:04:45,130: ERROR/MainProcess] Task analysis_manager.tasks.start_galaxy_analysis[4ac06834-aa39-46ba-b1ce-45cca654c67d] raised unexpected: IOError()
Traceback (most recent call last):
  File "/srv/scc/virtualenvs/refinery-platform/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/srv/scc/virtualenvs/refinery-platform/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/srv/scc/apps/refinery-platform/refinery/analysis_manager/tasks.py", line 274, in start_galaxy_analysis
    ret_list, analysis.library_id, connection)
  File "/srv/scc/apps/refinery-platform/refinery/analysis_manager/tasks.py", line 237, in import_analysis_in_galaxy
    library_id, file_path)[0]['id']
  File "/srv/scc/virtualenvs/refinery-platform/lib/python2.7/site-packages/bioblend/galaxy/libraries/__init__.py", line 251, in upload_file_from_local_path
    return self._do_upload(**vars)
  File "/srv/scc/virtualenvs/refinery-platform/lib/python2.7/site-packages/bioblend/galaxy/libraries/__init__.py", line 217, in _do_upload
    files_attached=files_attached)
  File "/srv/scc/virtualenvs/refinery-platform/lib/python2.7/site-packages/bioblend/galaxy/client.py", line 182, in _post
    r = self.gi.make_post_request(url, payload=payload, files_attached=files_attached)
  File "/srv/scc/virtualenvs/refinery-platform/lib/python2.7/site-packages/bioblend/galaxyclient.py", line 85, in make_post_request
    fp = urllib2.urlopen(request)
  File "/n/sw/centos6/python-2.7.3/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/n/sw/centos6/python-2.7.3/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/n/sw/centos6/python-2.7.3/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/n/sw/centos6/python-2.7.3/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/n/sw/centos6/python-2.7.3/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/n/sw/centos6/python-2.7.3/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError

Galaxy log:

10.242.110.105 - - [15/Apr/2016:18:00:20 -0400] "POST /api/libraries/6b8e564d8dba2e78/contents HTTP/1.1" 500 - "-" "Python-urllib/2.7"
Debug at: https://galaxy-dev.stemcellcommons.org/_debug/view/1460593874

URL: https://galaxy-dev.stemcellcommons.org/api/libraries/6b8e564d8dba2e78/contents
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/eggs/WebError-0.8a-py2.7.egg/weberror/evalexception/middleware.py', line 364 in respond
  app_iter = self.application(environ, detect_start_response)
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/eggs/Paste-1.7.5.1-py2.7.egg/paste/recursive.py', line 84 in __call__
  return self.application(environ, start_response)
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/lib/galaxy/web/framework/middleware/remoteuser.py', line 57 in __call__
  return self.app( environ, start_response )
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpexceptions.py', line 633 in __call__
  return self.application(environ, start_response)
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/lib/galaxy/web/framework/base.py', line 132 in __call__
  return self.handle_request( environ, start_response )
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/lib/galaxy/web/framework/base.py', line 159 in handle_request
  trans = self.transaction_factory( environ )
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/lib/galaxy/web/framework/__init__.py', line 402 in <lambda>
  self.set_transaction_factory( lambda e: self.transaction_chooser( e, galaxy_app, session_cookie ) )
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/lib/galaxy/web/framework/__init__.py', line 433 in transaction_chooser
  return GalaxyWebTransaction( environ, galaxy_app, self, session_cookie )
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/lib/galaxy/web/framework/__init__.py', line 524 in __init__
  self.error_message = self._authenticate_api( session_cookie )
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/lib/galaxy/web/framework/__init__.py', line 676 in _authenticate_api
  api_key = self.request.params.get('key', None)
File 'build/bdist.linux-x86_64/egg/webob/__init__.py', line 900 in params
File 'build/bdist.linux-x86_64/egg/webob/__init__.py', line 892 in str_params
File 'build/bdist.linux-x86_64/egg/webob/__init__.py', line 818 in str_POST
File '/n/sw/centos6/python-2.7.3/lib/python2.7/cgi.py', line 508 in __init__
  self.read_multi(environ, keep_blank_values, strict_parsing)
File '/n/sw/centos6/python-2.7.3/lib/python2.7/cgi.py', line 635 in read_multi
  headers = rfc822.Message(self.fp)
File '/n/sw/centos6/python-2.7.3/lib/python2.7/rfc822.py', line 108 in __init__
  self.readheaders()
File '/n/sw/centos6/python-2.7.3/lib/python2.7/rfc822.py', line 155 in readheaders
  line = self.fp.readline()
File '/n/galaxy/www/galaxy_scc_dev/galaxy-central-hbc-20121101220520/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py', line 482 in readline
  data = self.file.readline(max_read)
File '/n/sw/centos6/python-2.7.3/lib/python2.7/socket.py', line 412 in readline
  bline = buf.readline(size)
OverflowError: signed integer is greater than maximum

buf <cStringIO.StringO object at 0x7f9bf8171a78>
self    <socket._fileobject object at 0x7f9c000afb50>
size    3513897762

@ngehlenborg
Copy link
Contributor

Thanks @hackdna. Appears to be an issue with bioblend. Do you agree? make_post_request seems to choke on this file (or the size of the file).

@hackdna
Copy link
Member

hackdna commented Apr 18, 2016

The error (HTTP 500) comes from Galaxy and is simply being reported back by urllib2.

@ngehlenborg
Copy link
Contributor

Error occurs very deep down in socket library. Given that this is Python 2.7.3 (released in April 2012), we should try to run this on a newer Python version before exploring further. @hackdna will try to run this on the latest AWS setup. Assigning to @hackdna for now.

@hackdna
Copy link
Member

hackdna commented Apr 18, 2016

One potential workaround:

In addition, since it's possible to upload to Amazon's Simple Storage Service (S3) in parallel, using Galaxy CloudMan may be a faster alternative. We are investigating incorporating easy access to S3 buckets for Galaxy instances on the Amazon Elastic Compute Cloud (EC2). But you don't need to wait for the pretty interface, you can already access contents of S3 buckets by pasting links to their contents in the "URL/Text:" field of the "Upload File" tool."
https://wiki.galaxyproject.org/FTPUpload

@ngehlenborg
Copy link
Contributor

@hackdna: I was able to upload the file into a history in the same Galaxy instance directly (by providing the corresponding HTTP URL in the Galaxy upload tool), so at least in principle the file is not too big for this Galaxy instance.

@hackdna
Copy link
Member

hackdna commented Apr 18, 2016

Yes, that would be a workaround. However, this is making a GET request from Galaxy which is fundamentally different from making a POST request to Galaxy (the source of error in question).

@hackdna
Copy link
Member

hackdna commented Apr 21, 2016

Error is reproducible on Galaxy 16.01 (CloudMan) but with different message. Galaxy log:

134.174.183.88 - - [21/Apr/2016:20:59:25 +0000] "POST /api/tools HTTP/1.0" 500 - "-" "python-requests/2.9.1"
Error - <class 'webob.request.DisconnectionError'>: The client disconnected while sending the POST/PUT body (2340199912 more bytes were expected)
URL: http://galaxy-dev.aws.stemcellcommons.org/api/tools
File '/mnt/galaxy/galaxy-app/lib/galaxy/web/framework/middleware/error.py', line 151 in __call__
  app_iter = self.application(environ, sr_checker)
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/paste/recursive.py', line 85 in __call__
  return self.application(environ, start_response)
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/paste/httpexceptions.py', line 640 in __call__
  return self.application(environ, start_response)
File '/mnt/galaxy/galaxy-app/lib/galaxy/web/framework/base.py', line 126 in __call__
  return self.handle_request( environ, start_response )
File '/mnt/galaxy/galaxy-app/lib/galaxy/web/framework/base.py', line 153 in handle_request
  trans = self.transaction_factory( environ )
File '/mnt/galaxy/galaxy-app/lib/galaxy/web/framework/webapp.py', line 66 in <lambda>
  self.set_transaction_factory( lambda e: self.transaction_chooser( e, galaxy_app, session_cookie ) )
File '/mnt/galaxy/galaxy-app/lib/galaxy/web/framework/webapp.py', line 97 in transaction_chooser
  return GalaxyWebTransaction( environ, galaxy_app, self, session_cookie )
File '/mnt/galaxy/galaxy-app/lib/galaxy/web/framework/webapp.py', line 193 in __init__
  self.error_message = self._authenticate_api( session_cookie )
File '/mnt/galaxy/galaxy-app/lib/galaxy/web/framework/webapp.py', line 308 in _authenticate_api
  api_key = self.request.params.get('key', None)
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/webob/request.py', line 853 in params
  params = NestedMultiDict(self.GET, self.POST)
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/webob/request.py', line 789 in POST
  self.make_body_seekable()
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/webob/request.py', line 943 in make_body_seekable
  self.copy_body()
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/webob/request.py', line 963 in copy_body
  did_copy = self._copy_body_tempfile()
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/webob/request.py', line 980 in _copy_body_tempfile
  data = input.read(min(todo, 65536))
File '/mnt/galaxy/galaxy-app/.venv/local/lib/python2.7/site-packages/webob/request.py', line 1549 in readinto
  + "(%d more bytes were expected)" % self.remaining
DisconnectionError: The client disconnected while sending the POST/PUT body (2340199912 more bytes were expected)


CGI Variables
-------------
  CONTENT_LENGTH: '4628837012'
  CONTENT_TYPE: 'multipart/form-data; boundary=c9d21b13c58b4c0db22e6f01499ba01d'
  HTTP_ACCEPT: '*/*'
  HTTP_ACCEPT_ENCODING: 'gzip, deflate'
  HTTP_CONNECTION: 'close'
  HTTP_HOST: 'galaxy-dev.aws.stemcellcommons.org'
  HTTP_USER_AGENT: 'python-requests/2.9.1'
  HTTP_X_FORWARDED_FOR: '134.174.183.88'
  HTTP_X_FORWARDED_HOST: 'galaxy-dev.aws.stemcellcommons.org'
  ORGINAL_HTTP_HOST: 'galaxy_app'
  ORGINAL_REMOTE_ADDR: '127.0.0.1'
  PATH_INFO: '/api/tools'
  REMOTE_ADDR: '134.174.183.88'
  REQUEST_METHOD: 'POST'
  SERVER_NAME: '127.0.0.1'
  SERVER_PORT: '8080'
  SERVER_PROTOCOL: 'HTTP/1.0'


WSGI Variables
--------------
  application: <paste.recursive.RecursiveMiddleware object at 0x7fcce108fcd0>
  is_api_request: True
  paste.expected_exceptions: [<class 'paste.httpexceptions.HTTPException'>]
  paste.httpexceptions: <paste.httpexceptions.HTTPExceptionHandler object at 0x7fcce108fc50>
  paste.httpserver.proxy.host: 'dummy'
  paste.httpserver.proxy.scheme: 'http'
  paste.httpserver.thread_pool: <paste.httpserver.ThreadPool object at 0x7fcce0aabb90>
  paste.recursive.forward: <paste.recursive.Forwarder from />
  paste.recursive.include: <paste.recursive.Includer from />
  paste.recursive.include_app_iter: <paste.recursive.IncluderAppIter from />
  paste.recursive.script_name: ''
  paste.throw_errors: True
  request_id: 'ede84834080311e683140a4d98d6c597'
  webob._body_file: (<_io.BufferedReader>, <socket._fileobject object at 0x7fcca06234d0 length=4628837012>)
  webob._parsed_query_vars: (GET([]), '')
  wsgi process: 'Multithreaded'

@hackdna
Copy link
Member

hackdna commented Apr 22, 2016

Opened a CloudMan issue: galaxyproject/cloudman#47

@ngehlenborg ngehlenborg modified the milestones: Salem, Taunton Apr 26, 2016
@hackdna
Copy link
Member

hackdna commented May 3, 2016

One workaround is to use upload_file_from_url().

@ngehlenborg ngehlenborg assigned scottx611x and unassigned hackdna May 3, 2016
@scottx611x
Copy link
Member

@scottx611x
Copy link
Member

Relevant issue for refactoring use of get_full_url method #1063

@hackdna hackdna changed the title Unable to run FastQC Unable to upload large files into Galaxy May 3, 2016
@scottx611x
Copy link
Member

scottx611x commented May 4, 2016

This issue seems to be resolved on a local Galaxy 16.01 instance.

Large file in local Galaxy history
screen shot 2016-05-04 at 4 16 13 pm

TODO:

  • Test on Cloudman Galaxy 16.01 instance

@scottx611x
Copy link
Member

Working with Cloudman/Galaxy16.01
screen shot 2016-05-05 at 9 38 26 pm

@scottx611x
Copy link
Member

This can be closed upon merging: #1092

@scottx611x
Copy link
Member

#444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants