-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track input data #232
Track input data #232
Conversation
95b87bf
to
8ab6cd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this looks already pretty nice and simply! 👍
However, for image projects or any other projects with larger datasets, you have folders of images and the labels usually in a json file.
Since this is a typical subcase, we should probably generate md5 sums for all files in these folders, to really be on the safe side.
Only problem with this is that this could be really time-intensive if someone happens to run this on a large dataset, with half a million samples. Maybe it's not too bad though. Probably something to try out.
I've also implemented a recursive calculation of md5sums in my PR by the way, in case you want to take a shortcut :)
...cookiecutter.common_mlflow }}/{{ cookiecutter.project_slug_no_hyphen }}/mlf_core/mlf_core.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Agree with everything that @KevinMenden said
- We usually use single quotes in the template and it would be nice to stick with them here (super nitpick, don't worry :) )
- I slept about it and I prefer
input_name_input_hash
:) - Besides that I think this is what I/we had in mind 👍 Just need to update the documentation somewhere to mention that this functionality exists. I am not perfectly sure yet where. Maybe you have an idea? If not feel free to ping me and I will try to find a suitable spot.
- I increased the version of mlf-core in development -> please merge development into this branch and update the changelog
88900a4
to
58be7c5
Compare
Co-authored-by: Kevin Menden <kevin.menden@t-online.de>
Co-authored-by: Kevin Menden <kevin.menden@t-online.de>
e45fb0a
to
c1fad45
Compare
I think this is the only part left. I think the documentation of Do you want me to update the changelog in this PR also? |
""" | ||
Recursively go through directories and subdirectories | ||
and generate tuples of (<file_path>, <md5sum>) | ||
returns: list of tuples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a type hint as well :)
+1 for updating the CL
This needs to be fixed :) |
Yeah that's why I am not sure whether you can really nicely incorporate the functions into the templates already. These dataloaders download and save the files in /data. However, be careful because when training with Docker the Tensorboard logs are also saved in the /data directory. @Imipenem time to finally move the tensorboard logs out from /data and save them in a new directory? /logs? |
What if we comment them out with |
So couple of options:
I am fine with any of those. So feel free to do what you think is best or we wait for a comment from @Imipenem |
I went with 1. while we wait to hear from @Imipenem. Should be ready for review if everything passes. |
@emiller88 @Zethson : Yes, we should definitely move the tensorboard logging stuff then into a new directory I just remember that I had some issues with the volume path to be shared between docker and local system, which was the reason I stayed with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emiller88 @Imipenem can one of you please open an issue to change the tensorboard logging path?
Done #236 |
Closes #141
@Zethson is this the direction you had in mind so far?