Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLOSED] Make jiant pip installable #887

Closed
jeswan opened this issue Sep 17, 2020 · 22 comments
Closed

[CLOSED] Make jiant pip installable #887

jeswan opened this issue Sep 17, 2020 · 22 comments

Comments

@jeswan
Copy link
Contributor

jeswan commented Sep 17, 2020

Issue by davidbenton
Tuesday Aug 13, 2019 at 17:40 GMT
Originally opened as nyu-mll/jiant#887


This should be considered a work-in-progress, not a complete PR. It passes tests, but might break some untested things, and there are a few things I'll flag for review.

Because I'm still getting to know jiant, I chose running the tutorial experiment using a pip-installed package as my goal. This changeset allows that (without requiring conda or manual dependency installation). It doesn't directly address setting up an environment or downloading data using the pip package. To try it yourself, you can:

# create a new python virtual environment (conda not required)
# set up environment variables per tutorial, including pointing to an existing data directory, e.g.
export JIANT_PROJECT_PREFIX=~/jiant/tutorial
export JIANT_DATA_DIR=~/jiant/data

# copy default.conf from repo

# copy tutorial config file content from https://github.com/nyu-mll/jiant/blob/master/tutorials/setup_tutorial.md to tutorial.conf

# install package
pip install --upgrade --index-url https://test.pypi.org/simple/ jiant==1.1.0a0

# run experiment
python -m jiant --config_file tutorial.conf

Some items worth reviewing:

  • A maintainer should check the metadata in setup.py - I took a shot at it...
  • Config was moved to inside the package, per setuptools recommendation, but this could break things. I only tried the tutorial and the test suite.
  • Assuming a distributed package should include only what is required to run typical tasks, I only added package dependencies necessary to run the tutorial. Are there other tasks/tests that should be considered?
  • This doesn't expose any shell or python scripts as CLI commands, but we might want some for downloading data or generating a local copy of default.conf, for example.
  • I moved main.main() into jiant.__main__ for convenience of running python -m jiant, and to keep necessary code inside the jiant package. I kept main.py for backward compatibility, and there's some duplicated code, but not too much.
  • tests and other root-level packages were not included in the package. Does this satisfy the need for a pip package without the other packages? If no, it might make sense to reorganize some directories.

davidbenton included the following code: https://github.com/nyu-mll/jiant/pull/887/commits

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by pep8speaks
Tuesday Aug 13, 2019 at 17:40 GMT


Hello @davidbenton! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 You can repair most issues by installing black and running: black -l 100 ./*. If you contribute often, have a look at the 'Contributing' section of the README for instructions on doing this automatically.

Comment last updated at 2019-08-24 00:19:26 UTC

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Tuesday Aug 13, 2019 at 17:41 GMT


see #877

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Tuesday Aug 13, 2019 at 18:35 GMT


I see I need to get demo.py running to pass CI, which should be straightforward. I have just been running nose2.

The PEP 8 issues aren't picked up by black locally, and I'm not sure why, but I'll clean that up too.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Tuesday Aug 13, 2019 at 18:49 GMT


Okay, minor changes made. crossing my fingers on CI

Another item to note: if these changes are going in the right direction, documentation will need some updates, including change to config paths and pip installation.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by pruksmhc
Tuesday Aug 13, 2019 at 20:17 GMT


Thanks for taking this on! On your questions, most definitely run the superglue baselines and make sure the changes don't break anything from those runs (scripts/superglue-baselines.sh).
I'll defer to @iftenney on the last pip question.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Friday Aug 16, 2019 at 14:03 GMT


Also: We haven't been enforcing the PEP8 issues that don't also pop up in Black, though some of them can still be worth looking at. We should probably turn off the PEP8 bot...

Thanks!

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Friday Aug 16, 2019 at 14:03 GMT


And to confirm, this generally all looks productive/reasonable. LMK when it's ready for a final look.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Monday Aug 19, 2019 at 17:06 GMT


Thanks for review and comments! I'm glad it's going in the right direction.

@pruksmhc I ran the boolq task locally using superglue-baseline.sh. No surprise, but it's a bit slow to run with no GPU, ha! After about four days, it completed without error. That's a successful smoke test, but let me know if all (or other specific) tasks need to be run.

@sleepinyourhat Functionally, I think we have pip-installability complete, but depending what they're doing, users might run into some issues using jiant via pip. Specifically, they won't have easy access to the config files or shell scripts. You can see examples of this in my pip tutorial recipe above, where data and defaults.conf need to be manually fetched/located by the user. There are some options available to us for addressing this, but I don't have good sense of real-world use cases.

Do we need to make it easier for pip users to get config files or use some shell scripts? If yes, is that in scope for #877, or open new issue for it?

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Monday Aug 19, 2019 at 17:27 GMT


That sounds reasonable. This PR seems to be net helpful as long as the standard workflow (clone, create/edit config files, modify code) isn't any harder. We don't have a great case study workflow to try that would involve pip yet, though. Once we start experimenting with flambé and similar tools, the pain points will become clearer.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Tuesday Aug 20, 2019 at 16:19 GMT


After a thorough review, I see that defaults.conf is the only config file ever needed at runtime. Symlinking that into the jiant package would allow us to leave other config where it is, which would greatly simplify the documentation and script changes to this PR.

Is a symlink reasonable to make defaults.conf accessible from old and new paths? I've tested in OS X, and should also work the same in *nix. I believe Windows has full symlink support as of Windows 10, but I don't have an easy way to test.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Tuesday Aug 20, 2019 at 17:32 GMT


I suspect people will often want to name other config files explicitly, both at the command line, and as inheritances within their new config files, so we'll need to make sure that's relatively painless/documented.

I'm fine with symlinks if they're generally supported/safe for multi-platform git projects.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Tuesday Aug 20, 2019 at 19:49 GMT


Symlink will work for most, but I am not confident it works for all Windows users, so I will move all config to the package (as it is in the PR now), update paths, and document. I think I have what I need, and I'll update the PR tomorrow.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Thursday Aug 22, 2019 at 05:30 GMT


Got hung up debugging the CircleCI changes, but I have a pip environment passing tests there now. Not finished with other work, but going to call it a day on this. Will pick this up by the end of the week.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Friday Aug 23, 2019 at 23:22 GMT


Okay, I think this is ready for review @sleepinyourhat.

Note the significant changes to CircleCI, and let me know if you have questions about anything there. I saw in my merge just now that nyu-mll/jiant#897 specified a version of conda, which this wipes out. Not sure if that's a red flag or not.

I updated config paths everywhere, but did not attempt to run every script, etc. I did, as noted above, run one task from superglue-baselines.sh.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Friday Aug 23, 2019 at 23:22 GMT


Oh, and I'll create the documentation PR if the README.md copy is approved.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Saturday Aug 24, 2019 at 01:30 GMT


Great! You'll need to create a PyPI account to upload package distributions, if you don't have one, and then the notes in setup.py should cover it.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Friday Sep 06, 2019 at 12:06 GMT


Hey @davidbenton , I need to verify a new branch in jiant is pip installable, what do I need to do?

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Friday Sep 06, 2019 at 13:19 GMT


Installing git repos with pip should be pretty standard. Roughly, IIRC: Make a clone/copy of the repo, exit any environment/virtualenv you're using that already has jiant set up, and run something like pip install jiant from the directory. Then try running demo.conf and/or any additional code you need to run to verify that the dependencies that you're adding loaded correctly.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Friday Sep 06, 2019 at 15:22 GMT


@HaokunLiu Everything above seems right, but I'd recommend going a step further and creating a new virtual environment to make sure you're not pulling from existing global/user installed dependencies. The CircleCI config.yml steps create a virtual environment and pip install the package from local filesystem, so you can use that as a recipe, if it's helpful.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Friday Sep 06, 2019 at 17:42 GMT


Thanks, I finally did it.
BTW, does what you said mean, any PR that passes the CircleCI check is automatically pip installable?

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by davidbenton
Friday Sep 06, 2019 at 18:29 GMT


Yes, that's right. As pip use cases emerge, it might be necessary to add some more checks, but CircleCI currently does basic package installation via pip, including dependencies.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Saturday Sep 07, 2019 at 22:19 GMT


@davidbenton I've got another problem. Using RoBERTa tokenizer needs to run "python -m spacy download en". I can manually run this when running experiments. But when I'm using Robert tokenizer in a test script, there is no place for me to ask spacy to download en. Is there any way we can do this in setup, or somewhere else?

@jeswan jeswan closed this as completed Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant