Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional flag for RegisterDataDep for post_fetch_cleanup #14

Closed
Evizero opened this issue Dec 28, 2017 · 4 comments
Closed

Additional flag for RegisterDataDep for post_fetch_cleanup #14

Evizero opened this issue Dec 28, 2017 · 4 comments

Comments

@Evizero
Copy link
Collaborator

Evizero commented Dec 28, 2017

I think it could be nice to have an additional optional named flag for RegisterDataDep that could signal to DataDeps that after executing the post_fetch_method the downloaded files should be deleted. Maybe something like post_fetch_cleanup = true which defaults to false. This way the data for something like CIFAR-10 isn't kept around twice (once as the archive and once as the unpacked files)

This should work well with #10 (comment)

@Evizero
Copy link
Collaborator Author

Evizero commented Dec 28, 2017

There is an argument to be made that this can already be achieved by adding an rm instruction to the post_fetch_method. So in a sense this proposal is more about a convenience addition.

@Evizero
Copy link
Collaborator Author

Evizero commented Dec 28, 2017

Actually, now that I think about it the following isn't that bad and it works. Also, looking at the code for rm it seems cross platform.

post_fetch_method = file -> (run(BinDeps.unpack_cmd(file,dirname(file), ".gz", ".tar")); rm(file))  

I'll leave this issue open in case you have some opinion on this. otherwise feel free to close

@oxinabox
Copy link
Owner

oxinabox commented Jan 4, 2018

I was thinking about something similar to this initially.
I think you are right and that this is something that should have.

The way I was initially thinking of doing it was by including a helper a function: rm_after(fn) = file -> (fn(file); rm(file)).
Which is a specific case of your solution.

Related would be to allow post_fetch_method to take a iterator of commands and then run all of them.
Which is what Flux does for it's callbacks.
Problem there is that we already allow it to take a iterator, because of handling recursive structures.
and that would be ambiguous.

Another would be to include a helper a function like doall(xs...)= file-> foreach(xs(file), xs)

Yet another option would be to include helper functions, like:
extract(file) = run(BinDeps.unpack_cmd(file,dirname(file), ".gz", ".tar"))
which would have a partner
extract_and_rm(file) = (extract(file); rm(file))
which I think would be useful.

@oxinabox
Copy link
Owner

oxinabox commented Jan 9, 2018

Done>
via the inclusion of an unpack method, that has an option to no delete the orginals.

@oxinabox oxinabox closed this as completed Jan 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants