Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use list for file_dep (fixes #254). #430

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tillahoffmann
Copy link

This PR fixes #254 by changing file_dep to a list rather than using a set, ensuring the order of dependencies is preserved.

@schettino72
Copy link
Member

Thanks.

If my memory is correct I made it a set() instead of list for performance reasons.
If a task has hundreds of file_dep it would be inefficient to check in using a list.
Not sure this still applies... have you checked it?

Please also add an entry on CHANGES, and doc changes (not sure if relevant or where).

@tillahoffmann
Copy link
Author

I've had a look running the tests on master and the feature branch. There doesn't seem to be much in it, but I appreciate that the number of deps is likely low in the tests. See below for results.

Test duration results

Executing all tests

$ (master) repeat 5 time pytest > /dev/null
pytest -q > /dev/null  2.88s user 0.84s system 52% cpu 7.054 total
pytest -q > /dev/null  2.80s user 0.82s system 52% cpu 6.949 total
pytest -q > /dev/null  2.79s user 0.82s system 52% cpu 6.905 total
pytest -q > /dev/null  2.79s user 0.80s system 52% cpu 6.872 total
pytest -q > /dev/null  2.85s user 0.83s system 51% cpu 7.089 total
$ (file_dep-list) repeat 5 time pytest > /dev/null
pytest -q > /dev/null  2.80s user 0.82s system 52% cpu 6.925 total
pytest -q > /dev/null  2.77s user 0.80s system 52% cpu 6.855 total
pytest -q > /dev/null  2.76s user 0.80s system 51% cpu 6.853 total
pytest -q > /dev/null  2.79s user 0.81s system 52% cpu 6.898 total
pytest -q > /dev/null  2.77s user 0.80s system 52% cpu 6.857 total

Executing only tests in test_dependency.py

$ (master) repeat 5 time pytest tests/test_dependency.py > /dev/null 
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.553 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.541 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.546 total
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.556 total
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.557 total
$ (file_dep-list) repeat 5 time pytest tests/test_dependency.py > /dev/null 
pytest tests/test_dependency.py > /dev/null  0.35s user 0.17s system 14% cpu 3.569 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.16s system 14% cpu 3.536 total
pytest tests/test_dependency.py > /dev/null  0.32s user 0.15s system 13% cpu 3.531 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.17s system 14% cpu 3.540 total
pytest tests/test_dependency.py > /dev/null  0.34s user 0.16s system 14% cpu 3.533 total

@tillahoffmann
Copy link
Author

I've had another look at performance, and I think this change would not negatively affect performance. I use setup.py to generate 5,000 files and then consider the time it takes to execute doit for a task that depends on all 5,000 files.

# setup.py
import os


os.makedirs("inputs", exist_ok=True)
for i in range(5000):
    with open(f"inputs/{i}.txt", "w") as fp:
        fp.write(str(i) * i)
# dodo.py
import os


os.makedirs("inputs", exist_ok=True)
for i in range(5000):
    with open(f"inputs/{i}.txt", "w") as fp:
        fp.write(str(i) * i)

Here's the evaluation.

# git:file_dep-list
$ time (repeat 100 doit)
...
8.15s user 3.18s system 94% cpu 11.962 total

# git:master
$ time (repeat 100 doit)
...
8.19s user 3.19s system 94% cpu 12.044 total

@tillahoffmann
Copy link
Author

Hi @schettino72, just wanted to follow up whether this is something you're interested in merging.

@schettino72
Copy link
Member

@tillahoffmann sorry for delay. I am planning to dedicate some time for open-source in the last week of this month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dependencies is a set not a list
2 participants