New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make jiant pip installable #887
Conversation
This works for the tutorial experiment, but is still a work-in-progress. * Add pip dependencies - matched environment.yml as closely as possible * Move config into jiant package
Hello @davidbenton! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 You can repair most issues by installing black and running: Comment last updated at 2019-08-24 00:19:26 UTC |
see #877 |
I see I need to get demo.py running to pass CI, which should be straightforward. I have just been running The PEP 8 issues aren't picked up by black locally, and I'm not sure why, but I'll clean that up too. |
Okay, minor changes made. crossing my fingers on CI Another item to note: if these changes are going in the right direction, documentation will need some updates, including change to config paths and pip installation. |
Thanks for taking this on! On your questions, most definitely run the superglue baselines and make sure the changes don't break anything from those runs (scripts/superglue-baselines.sh). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together!
Moving config/ and main() seems reasonable. I'm fine with keeping a legacy main.py and deleting it at a future release.
@@ -0,0 +1,609 @@ | |||
"""Train a multi-task model using AllenNLP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub is treating this as a substantial diff rather than a move. Are there any substantial changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only changes are making a long string multi-line, and using pkg_resources
to locate defaults.conf
if no config is passed.
jiant/__main__.py
Outdated
|
||
To debug this, run with -m ipdb: | ||
|
||
python -m ipdb main.py --config_file ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a double import warning for ipdb when I run it that way. ipdb docs say that formulation is python 2.7 only. Are we supporting python 2.7? 3.X? I've been using 3.6 and 3.7 for the two environments I've been working in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm—I haven't been using ipdb lately. Does anyone else here know what the recommended practice is here?
We're only aiming to support 3.6/3.7—we should also clarify that somewhere...
Also: We haven't been enforcing the PEP8 issues that don't also pop up in Black, though some of them can still be worth looking at. We should probably turn off the PEP8 bot... Thanks! |
And to confirm, this generally all looks productive/reasonable. LMK when it's ready for a final look. |
Thanks for review and comments! I'm glad it's going in the right direction. @pruksmhc I ran the boolq task locally using @sleepinyourhat Functionally, I think we have pip-installability complete, but depending what they're doing, users might run into some issues using jiant via pip. Specifically, they won't have easy access to the config files or shell scripts. You can see examples of this in my pip tutorial recipe above, where data and Do we need to make it easier for pip users to get config files or use some shell scripts? If yes, is that in scope for #877, or open new issue for it? |
That sounds reasonable. This PR seems to be net helpful as long as the standard workflow (clone, create/edit config files, modify code) isn't any harder. We don't have a great case study workflow to try that would involve pip yet, though. Once we start experimenting with flambé and similar tools, the pain points will become clearer. |
After a thorough review, I see that Is a symlink reasonable to make |
I suspect people will often want to name other config files explicitly, both at the command line, and as inheritances within their new config files, so we'll need to make sure that's relatively painless/documented. I'm fine with symlinks if they're generally supported/safe for multi-platform git projects. |
Symlink will work for most, but I am not confident it works for all Windows users, so I will move all config to the package (as it is in the PR now), update paths, and document. I think I have what I need, and I'll update the PR tomorrow. |
Got hung up debugging the CircleCI changes, but I have a pip environment passing tests there now. Not finished with other work, but going to call it a day on this. Will pick this up by the end of the week. |
Okay, I think this is ready for review @sleepinyourhat. Note the significant changes to CircleCI, and let me know if you have questions about anything there. I saw in my merge just now that #897 specified a version of conda, which this wipes out. Not sure if that's a red flag or not. I updated config paths everywhere, but did not attempt to run every script, etc. I did, as noted above, run one task from |
Oh, and I'll create the documentation PR if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! I'll merge tomorrow unless anyone else raises concerns.
@@ -3,7 +3,7 @@ | |||
jobs: | |||
test: | |||
docker: | |||
- image: continuumio/miniconda3:4.6.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me, as long as we have a fixed version and tests are passing.
Great! You'll need to create a PyPI account to upload package distributions, if you don't have one, and then the notes in setup.py should cover it. |
Hey @davidbenton , I need to verify a new branch in jiant is pip installable, what do I need to do? |
Installing git repos with pip should be pretty standard. Roughly, IIRC: Make a clone/copy of the repo, exit any environment/virtualenv you're using that already has jiant set up, and run something like |
@HaokunLiu Everything above seems right, but I'd recommend going a step further and creating a new virtual environment to make sure you're not pulling from existing global/user installed dependencies. The CircleCI config.yml steps create a virtual environment and pip install the package from local filesystem, so you can use that as a recipe, if it's helpful. |
Thanks, I finally did it. |
Yes, that's right. As pip use cases emerge, it might be necessary to add some more checks, but CircleCI currently does basic package installation via pip, including dependencies. |
@davidbenton I've got another problem. Using RoBERTa tokenizer needs to run "python -m spacy download en". I can manually run this when running experiments. But when I'm using Robert tokenizer in a test script, there is no place for me to ask spacy to download en. Is there any way we can do this in setup, or somewhere else? |
* Create initial package for PyPI (pip) distribution * Working PyPI package - can run tutorial experiment This works for the tutorial experiment, but is still a work-in-progress. * Add pip dependencies - matched environment.yml as closely as possible * Move config into jiant package * PEP 8 long strings changed to multi-line * Update CI demo.conf path * Use locally installed pip package for CircleCI * Update call to ipdb * Update config paths in non-python files * Document release and distribution * Name fix. * Add note to 'Contributing'
This should be considered a work-in-progress, not a complete PR. It passes tests, but might break some untested things, and there are a few things I'll flag for review.
Because I'm still getting to know jiant, I chose running the tutorial experiment using a pip-installed package as my goal. This changeset allows that (without requiring conda or manual dependency installation). It doesn't directly address setting up an environment or downloading data using the pip package. To try it yourself, you can:
Some items worth reviewing:
default.conf
, for example.main.main()
intojiant.__main__
for convenience of runningpython -m jiant
, and to keep necessary code inside thejiant
package. I kept main.py for backward compatibility, and there's some duplicated code, but not too much.