Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the Docker Python SDK #2158

Merged
merged 50 commits into from Oct 15, 2017

Conversation

Projects
None yet
7 participants
@apierleoni
Copy link
Contributor

apierleoni commented Jun 15, 2017

Description

Hi there, we did some work to run a data processing pipeline using Docker containers with Luigi using the Docker Python SDK and we would like to contribute it back.

Motivation and Context

These PR allows to use the Docker Python SDK to run Docker containers instead of launching them from a subprocess.
The main advantage over calling an external process is that all the communications done with the Docker daemon are through the API and not a subprocess, being able to handle exceptions properly and potentially connect to remote APIs (we did not test this scenario)
It also solve non trivial problems with the docker SDK like mounting a single file, and makes sure that there is enough space in the container to run the job by mounting a temporary directory from the host , avoiding the container size limit of 10Gb.

Have you tested this? If so, how?

Unit tests are included. But they require a Docker daemon to run. Not sure it is available in Travis.
We are currently running this in our staging environment and we are ready to run production pipelines with this contrib.

@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Sep 9, 2017

Hi, sorry I have been off for a while.
@Tarrasch
I have extended the module docs to include the DockerTask use case. Not sure if you prefer this docs to be somewhere else. If yes, can you point me to it?
@dlstadther
Happy to do an other PR for the link to our blog if this one requires reviews.

@dlstadther

This comment has been minimized.

Copy link
Collaborator

dlstadther commented Sep 11, 2017

@apierleoni It would be great if you could separate "companies who use luigi" into another PR - that one can be merged quickly.

Also, looks like you've had some Travis errors; could you take a look and resolve? Thanks!

@elipapa

This comment has been minimized.

Copy link

elipapa commented Sep 11, 2017

@dlstadther the travis errors are not ours and apply to every PR. @Tarrasch has already acknowledged that they have yet to be fixed (see comment above: june 23)

on separating the PR: we have no problem waiting for this one to be merged for our "companies with luigi" link to show up. We'd rather keep them in the same PR for simplicity and just wait for the review for it to be merged.

@dlstadther

This comment has been minimized.

Copy link
Collaborator

dlstadther commented Sep 11, 2017

@elipapa Those errors were resolved elsewhere. Your current errors are flake8.

Andrea Pierleoni
@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Sep 11, 2017

flake8 errors fixed now, all tests passed on travis

tox.ini Outdated
@@ -10,6 +10,7 @@ deps=
moto<1.0
HTTPretty==0.8.10
nose<2.0
docker

This comment has been minimized.

@dlstadther

dlstadther Sep 11, 2017

Collaborator

Do you want to specify a specific version?

This comment has been minimized.

@apierleoni

apierleoni Sep 11, 2017

Author Contributor

yes thanks, well spotted!

@dlstadther
Copy link
Collaborator

dlstadther left a comment

If there are no objections, I'm ok to merge this.

Andrea Pierleoni
@drobakowski
Copy link

drobakowski left a comment

Hi @apierleoni thanks a lot for all this work! I'm using this DockerTask for quite some time and it is working fine overall so far. Even though I had to adapt some parts to make it work properly for our use case. I will start with this first small review because I think the test are not working correctly when they are fixed. See my comments (volumes to binds) in the tests. After that I will add some additional comments and suggestions, for some later improvements or extensions.

image = "busybox"
name = "MountLocalFileAsVolume"
# volumes= {'/tmp/local_file_test': {'bind': local_file.name, 'mode': 'rw'}}
volumes = [local_file.name + ':/tmp/local_file_test']

This comment has been minimized.

@drobakowski

drobakowski Sep 11, 2017

Shouldn't this be renamed to binds?

dummyopt = luigi.Parameter()
image = "busybox"
name = "MountLocalFileAsVolumeWithParam"
volumes = [local_file.name + ':/tmp/local_file_test']

This comment has been minimized.

@drobakowski

drobakowski Sep 11, 2017

Shouldn't this be renamed to binds?

name = "MountLocalFileAsVolumeWithParamRedef"

@property
def volumes(self):

This comment has been minimized.

@drobakowski

drobakowski Sep 11, 2017

Shouldn't this be renamed to binds?

# derive volumes (ie. list of container destination paths) from
# specified binds
self._volumes = [b.split(':')[1] for b in self._binds]

This comment has been minimized.

@drobakowski

drobakowski Sep 11, 2017

One addition I had to make to get this properly working in our scenario was to include a custom instance attribute self._complete = False and set this to True when the container exited.

**self.container_options)
self._client.start(container['Id'])

exit_status = self._client.wait(container['Id'])

This comment has been minimized.

@drobakowski

drobakowski Sep 11, 2017

+self._complete = True

I'm not sure anymore if I included this simply to True on purpose or if this should be set in subject to the exit_status.

In our use case we're chaining multiple DockerTask consecutive so that each task depend on the previous one and its output. As I'm quite new to working with luigi I don't have that much deep knowledge and experience so far, but I think I had some problems with the luigi scheduler while the scheduler was trying to resolve and start the DockerTask process chain properly. After including the previous changes and overwriting the complete methods, the chain worked fine for me. As far as I remember correctly, I've even got the tests to run fine after renaming volumes to binds. Perhaps this is use case specific, but maybe someone with better knowledge of luigi can overlook this better.

class ImportProcess(DockerTask):
    ....
    
    def complete(self):
        return self.output().exists() or self._complete
    ...

Of course this could also be implemented in the DockerTask class directly with maybe some modification like this:

class DockerTask(luigi.Task):
    ....
    def complete(self):
        return super(DockerTask, self).complete() or self._complete
@drobakowski

This comment has been minimized.

Copy link

drobakowski commented Sep 11, 2017

I do have two more smaller additions, but in order to keep this PR simple, I will create a PR after this one will be merged. Thanks again for DockerTask, this helps us a lot! 👍

@dlstadther

This comment has been minimized.

Copy link
Collaborator

dlstadther commented Sep 19, 2017

@apierleoni Where does this PR stand given @drobakowski 's review and comments?

@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Sep 25, 2017

@drobakowski
I am glad DockerTask was useful for you, and than you for the review.
Yes I have renamed volume to binds, and changed default behaviour for tmp directory mounting. Tests are now passing in my system.
I have added docker to the services in travis, so now the test should run in CI

Regarding the second issue, I am not 100% sure I understood what you want to do with the self._complete attribute. The way we use DockerTask is to make sure that the Target is labeled as completed, so that Luigi will know if he needs to rerun the task or not.
Would you prefer to trigger an error in case the container quit status 0? how do you know it is not completed at that stage?

@dlstadther
tests should be improved now, maybe you can help here on what is the preferred way to store a task to be completed.

@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Sep 25, 2017

some tests are failing for an error that looks related to boto non importing the google cloud platform module. this might not be limited to this PR
https://travis-ci.org/spotify/luigi/jobs/279538792#L565

@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Sep 26, 2017

merged with current master, still failing for boto

apierleoni added some commits Oct 14, 2017

@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Oct 14, 2017

@dlstadther
I have fixed the errors with boto in the tests and resolved conflicts with master.
Should be clear to merge if you want.

@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Oct 14, 2017

I guess the codecov issue is related with this https://github.com/codecov/support/issues/180

@dlstadther
Copy link
Collaborator

dlstadther left a comment

LGTM! I'm not concerned about that codecov thing. The tests pass, so let's merge and allow others the goodness of Luigi v1 DockerTask support!

@dlstadther dlstadther merged commit 4801fec into spotify:master Oct 15, 2017

3 of 4 checks passed

codecov/changes 1 file has unexpected coverage changes not visible in diff.
Details
codecov/patch 85.71% of diff hit (target 70.92%)
Details
codecov/project 71.04% (+0.12%) compared to 791abf3
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dlstadther

This comment has been minimized.

Copy link
Collaborator

dlstadther commented Feb 16, 2018

Hey @apierleoni , do you have any insight into why two of the Docker tests started failing recently?

It'd be much appreciated if you could check it out! Thanks

@apierleoni

This comment has been minimized.

Copy link
Contributor Author

apierleoni commented Apr 11, 2018

@dlstadther sorry for the late reply, is it still an issue? I cannot find the failing tests.

@dlstadther

This comment has been minimized.

Copy link
Collaborator

dlstadther commented Apr 11, 2018

@apierleoni They've since been fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.