Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable lazy imports by the CLI & clear __init__.py files #56

Merged
merged 5 commits into from
Sep 24, 2018

Conversation

dhimmel
Copy link
Member

@dhimmel dhimmel commented Aug 22, 2018

Enables manubot CLI to be called without running these files and triggering the imports. #53 and #54 also aimed partially to do this but were incomplete.

How I noticed the issue was that when running manubot --version, I got the following output:

/home/dhimmel/anaconda3/envs/manubot-dev/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
v0.1.0

Basically, when we did from manubot.cite.cite_command import add_subparser_cite, this was executing manubot.cite.__init__.py. For manubot.process.__init__.py, this was importing pandas and other heavy dependencies, which:

  1. cause annoying messages like the one above
  2. slow down basic command line calls
  3. increase the change of improperly configured environments leading to unnecessary failure.

This PR is one approach to fix this. However, it has the aesthetically unfortunate consequence of changing the API to now need to specify another submodule, e.g. import manubot.cite.cite. However, it will also allow importing single functions from say manubot.cite.doi without having to run the entire init.py file.

An alternative solution would be to move add_subparser_cite and add_subparser_process to manubot.command.py and change the line parser.set_defaults(function=cli_cite) to use a string to specify the submodule/function, which would then be imported at runtime via importlib.

@agitter what do you think

Enables manubot CLI to be called without running these files and
triggering the imports
@agitter
Copy link
Member

agitter commented Aug 23, 2018

I'm unsure which option is better. The import manubot.cite.cite syntax does look awkward. However, I also appreciate the three problems with the current configuration that you noted above.

Is there anyone in the else in the greenelab we could ping to ask about these organizational questions?

@dhimmel
Copy link
Member Author

dhimmel commented Aug 23, 2018

Is there anyone in the else in the greenelab we could ping to ask about these organizational questions?

Maybe @Miserlou or @kurtwheeler has some wisdom here.

@Miserlou
Copy link

The stuff currently in __init__.py should moved to a util.py. I think you can then put your desired imports in __init__.py to avoid the cite.cite.blah ugliness.

@kurtwheeler
Copy link

kurtwheeler commented Aug 24, 2018

I don't think you could solve this through organizational changes. While import manubot.cite.cite is kinda awkward, I think it may be the best bet. The only thing that I could think of that might kinda do what you want would be to conditionally import things in your __init__.py files like so:

if not IS_SCRIPT:
  from manubot.cite.cite import standardize_citation
  from manubot.cite.cite import is_valid_citation_string
  etc

and then set IS_SCRIPT to true from your management commands. I think this is an uglier solution than import manubot.cite.cite though.

Creates slightly more aesthetic import statements such as
`import manubot.cite.util` rather than `import manubot.cite.cite`.
Submodules are now lazily imported, when using the manubot CLI.
@dhimmel dhimmel changed the title Move code out of __init__.py files Enable lazy imports by the CLI & clear __init__.py files Sep 19, 2018
@dhimmel
Copy link
Member Author

dhimmel commented Sep 19, 2018

Okay, thanks @Miserlou and @kurtwheeler for weighing in. Both comments were helpful:

  • In da90fe6, I renamed the former __init__.py files to util.py. For example, it's now manubot.cite.util rather than manubot.cite.cite. I think it's a bit less confusing when we don't repeat cite.

    If we want functions to be available under manubot.cite rather than manubot.cite.util, we can add these imports to __init__.py. I am thinking we may want this for citation_to_citeproc. However, this will have the undesirable effect of sometimes resulting in additional importing that is not required for a certain user behavior.

  • In c9d82d7, I moved subcommand arparse configuration to manubot.command.py and switched to lazy import of the actual worker function that executes the program. Hence, manubot --version will complete without loading any submodules.

@dhimmel
Copy link
Member Author

dhimmel commented Sep 20, 2018

@agitter do you think we should restore anyimports to manubot.cite or manubot.process? Does this look good to you?

Enables lazy import of citation_retreiver functions.
Define __all__ in manubot.cite.__init__.py to silence flake8
warning as per https://stackoverflow.com/a/49266468/4651668.
@dhimmel
Copy link
Member Author

dhimmel commented Sep 22, 2018

In 4f11fd9, I exposed citation_to_citeproc & standardize_citation under manubot.cite. I reformatted manubot.cite.util to be lazy such that no non-builtin packages are imported unless specific functions are actually called.

Copy link
Member

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Organizationally, these changes look good to me. I didn't test whether or not the lazy loading is working as expected. The design looks nice though.

I also find manubot.cite.util preferable to manubot.cite.cite.

Exposing citation_to_citeproc and standardize_citation should be sufficient for now. If we see other Manubot use cases emerge we can revisit that later.

@dhimmel
Copy link
Member Author

dhimmel commented Sep 24, 2018

I didn't test whether or not the lazy loading is working as expected.

I am pretty sure our test coverage should be sufficient here.

@dhimmel dhimmel merged commit 1148689 into manubot:master Sep 24, 2018
@dhimmel dhimmel deleted the clean__init__ branch September 24, 2018 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants