-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve temporary data management using temp directories #133
base: master
Are you sure you want to change the base?
Conversation
…y transformer weight files from temporary directory
Hopefully I did not break anything. |
Hi Luca ! Shouldn't the tmp directory be defined in Using only one tmp path in We could also probably use the tmp path as the "download" path for simplification? so replace |
Good point. I've modified this by:
I'm not sure about this, I kept them separated for the moment. I think the download path has a specific policy for cleaning (clean all at any time), while I'm not sure what should be for the temp directory. The current implementation removes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the review for the part of the PR related to adding an output path to the application scripts for the models, instead of using the default one only (except grobidTagger.py where it was already there).
if output_directory: | ||
model.save(output_directory) | ||
else: | ||
model.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simpler in one line:
model.save(dir_path=output_directory)
(sp applies to all application scripts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated all the application scripts, however I was reluctant to remove the output_directory=None
and change the signature of various application methods.
I added a check on the dir_path
in save()
to default it to the default directory if the dir_path is None. Maybe you have a better solution that this..
The PR is doing two different things:
The second one is much more complex and really needs tests, so I am doing two separate review to help me :) |
This PR should solve #124 and #126:
tmp_directory
(usuallytmp/model-architecture
) is deleted at the beginningsave()
will copy the data correctly in the output, which could be a) specified by the user with parameter--output
but it's at discretion of the application or, b) the default location under data/XYZ/modelsThe
eval_nfold
of sequence labelling it's evaluating the models and it's usingself.model
to set the best one. Such model is then the one copied to the output directory when we callmodel.save(...)