Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TypeError when uploading large files from ftp to s3 #4315

Merged
merged 1 commit into from Aug 7, 2017

Conversation

Projects
None yet
6 participants
@jlhg
Copy link
Contributor

commented Jul 14, 2017

This PR fixed TypeError during multipart upload to s3.

Here are steps to reproduce the issue:

  1. Setup config/object_store_conf.xml to use s3 bucket as file storage.
  2. Upload large file by sftp.
  3. Click Get Data->Upload File->Choose FTP file, then click start to upload file.
  4. Failed to upload file. Got this error:
Traceback (most recent call last):
  File "/galaxy-central/lib/galaxy/jobs/runners/__init__.py", line 630, in finish_job
    job_state.job_wrapper.finish( stdout, stderr, exit_code )
  File "/galaxy-central/lib/galaxy/jobs/__init__.py", line 1232, in finish
    self.app.object_store.update_from_file(dataset.dataset, create=True)
  File "/galaxy-central/lib/galaxy/objectstore/__init__.py", line 502, in update_from_file
    return self._call_method('update_from_file', obj, ObjectNotFound, True, **kwargs)
  File "/galaxy-central/lib/galaxy/objectstore/__init__.py", line 513, in _call_method
    return store.__getattribute__(method)(obj, **kwargs)
  File "/galaxy-central/lib/galaxy/objectstore/s3.py", line 604, in update_from_file
    self._push_to_os(rel_path, source_file)
  File "/galaxy-central/lib/galaxy/objectstore/s3.py", line 384, in _push_to_os
    multipart_upload(self.s3server, self.bucket, key.name, source_file, mb_size)
  File "/galaxy-central/lib/galaxy/objectstore/s3_multipart_upload.py", line 95, in multipart_upload
    for _ in pmap(transfer_part, ((s3server, mp.id, mp.key_name, mp.bucket_name, i, part) for (i, part) in enumerate(split_file(tarball, mb_size, cores)))):
  File "/galaxy-central/lib/galaxy/objectstore/s3_multipart_upload.py", line 112, in wrap
    return func(self, timeout=timeout if timeout is not None else 1e100)
  File "/galaxy-central/lib/galaxy/objectstore/s3_multipart_upload.py", line 112, in wrap
    return func(self, timeout=timeout if timeout is not None else 1e100)
  File "/galaxy-central/lib/galaxy/objectstore/s3_multipart_upload.py", line 112, in wrap
    return func(self, timeout=timeout if timeout is not None else 1e100)
  File "/galaxy-central/lib/galaxy/objectstore/s3_multipart_upload.py", line 112, in wrap
    return func(self, timeout=timeout if timeout is not None else 1e100)
  File "/galaxy-central/lib/galaxy/objectstore/s3_multipart_upload.py", line 112, in wrap
    return func(self, timeout=timeout if timeout is not None else 1e100)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
    raise value
TypeError: transfer_part() takes exactly 6 arguments (1 given)

@galaxybot galaxybot added the triage label Jul 14, 2017

@galaxybot galaxybot added this to the 17.09 milestone Jul 14, 2017

@bgruening bgruening added area/framework and removed triage labels Jul 14, 2017

@dannon

This comment has been minimized.

Copy link
Member

commented Jul 17, 2017

Not sure I understand at first glance the fix here. We throw away kwargs, but that's fine, there are none to consider. The wrapped function should take more than just the single args variable, though, right?

Long term, we probably want to do something more like this, for multipart uploads: https://github.com/spotify/luigi/pull/515/files#diff-68ac377fbe87ada8c1af3a743e141c9dR162

@jlhg

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2017

Yes. pmap passes only one argument (a tuple) to transfer_part, which requires 6 arguments.

    with multimap(cores) as pmap:
        for _ in pmap(transfer_part, ((s3server, mp.id, mp.key_name, mp.bucket_name, i, part)
                                      for (i, part) in
                                      enumerate(split_file(tarball, mb_size, cores)))):

Wrapper should take this one argument and expand it for transfer_part.

@jmchilton

This comment has been minimized.

Copy link
Member

commented Aug 7, 2017

@jlhg makes sense to me thanks for the clarification. I've looked into this and this seems like a solid fix to me. Seems this was broken by me in 66944ba - so I apologize to everyone involved.

@jmchilton jmchilton merged commit 3f270e9 into galaxyproject:dev Aug 7, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jmchilton

This comment has been minimized.

Copy link
Member

commented Aug 7, 2017

Thanks a ton for the contribution @jlhg - we really appreciate it and look forward to more 😃.

@martenson martenson added the kind/bug label Oct 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.