Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to not use md5 for file dependencies. #22

Closed
saimn opened this issue Feb 5, 2015 · 2 comments
Closed

Option to not use md5 for file dependencies. #22

saimn opened this issue Feb 5, 2015 · 2 comments
Milestone

Comments

@saimn
Copy link
Contributor

saimn commented Feb 5, 2015

Hi,
First, thanks for doit, it is a great tool !
I'm using it currently to process a lot of huge files, and the bottleneck is the computation of the md5sums for the file dependencies. From what I have seen in the code it is not possible currently to have this. Even if I use task_dep and uptodate with check_timestamp_unchanged instead of file_dep, I still need to specify targets, hence md5sum is computed on the target files.
So, unless I missed another way to achieve this, what I would like is to use file_dep and target, as it works well and is very practical with subtasks, but with only the timestamp check.
Maybe adding an option in DOIT_CONFIG to deactivate the use of md5sums could be a solution ?

@schettino72
Copy link
Member

It would be nice to add a way plug a different implementation for check if a file changed (not only being able to disable MD5).

I see this could be done in 3 different levels.

  • globally on DOIT_CONFIG
  • on task level with a new attribute
  • for a each file_dep

Did you actually measure it? I would like to see some benchmarks (or your code is open source?)

md5 is never computed for targets, where you get this from?

Another problem is that I think doit is calculating the md5 of same file more than once per run.

@saimn
Copy link
Contributor Author

saimn commented Feb 5, 2015

It would be nice to add a way plug a different implementation for check if a file changed (not only being able to disable MD5).

Yes, indeed !

Did you actually measure it? I would like to see some benchmarks (or your code is open source?)

No the code is not open source. A little test with a 'echo' action and a 8.5Gb file takes roughly ~40sec.

md5 is never computed for targets, where you get this from?

My bad, I looked at the code a few days ago and I didn't remember correctly. So yes, it is computed only for file_dep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants