Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review dependencies #53

Closed
gentzkow opened this issue Apr 7, 2022 · 15 comments · Fixed by #55
Closed

Review dependencies #53

gentzkow opened this issue Apr 7, 2022 · 15 comments · Fixed by #55
Assignees

Comments

@gentzkow
Copy link
Owner

gentzkow commented Apr 7, 2022

Take a quick look at our conda_env.yaml and flag any dependencies you think we could omit (either in the sense that they're not being used at all or in the sense that we could get rid of them and tweak the other code without much loss.

@szahedian
Copy link
Contributor

Hi @gentzkow! I've taken a look at the dependencies. In order for python3 run_all.py to run, all of the python dependencies in conda_env.yaml are required. They are either used by the /data and /analysis scripts in template, or are used by gslab_make.

We can, however, remove the R dependencies without issue (recall that the base template only uses python after we moved the R code to /extensions/r in #48).

@gentzkow
Copy link
Owner Author

gentzkow commented Apr 8, 2022

Thanks!

Two questions:

  1. Can we make jupyter something you only install if you want to use the Jupyter functionality of gslab_make? (E.g., if you try to run make_jupyter you get an error asking you to add that to the install?) Seems parallel to other gslab_make functions like matlab, perl, etc. where we don't require the user to have that software installed to build the template.

  2. Can you remind me how we're using future? Can we update to get rid of this or is it needed to maintain backward compatibility to Python 2 somehow?

m

@szahedian
Copy link
Contributor

Thanks for these questions, @gentzkow.

  1. It seems jupyter is required for the following reason. In run_all.py in template, we load GSLab make here. That call invokes gslab_make/__init__.py, which imports run_jupyter(). So it's not that jupyter functionality is directly used in template, only that in importing gslab_make we also load its jupyter-dependent portion.

    The difference between run_jupyter() and something like run_matlab(), is that the latter doesn't import python modules for execution. You can see here that run_matlab() relies instead of program directives. By contrast, run_jupyter() in this line uses nbconvert.preprocessors.ExecutePreprocessor().

    I've looked into ways to run jupyter notebooks like .py scripts, and it seems they either resemble the approach we have here or use the jupyter shell command. So I'm not optimistic we can eliminate this dependency.

  2. future is used in gslab_make. It's imported in many of the scripts that are used in make.py e.g. update_exectuables and write_source_logs(). Anecdotally, these scripts import a host of functions from future, but only seem to use future.utils.raise_from().

    We could probably eliminate many of the future import statements throughout gslab_make(). To eliminate it wholly we would need some substitute for raise_from(). I could spend a bit more time looking into this if we think it's worthwhile.

@gentzkow
Copy link
Owner Author

Great. Thanks!

For (1), can we revise the setup to take the import out of __init__.py and instead have it manually added to the run_all.py or make.py scripts in the case where the user wants to run Jupyter? Or in some other way create some additional steps that allow this to only be loaded if the user affirmatively wants it?

For (2), if you're up for it, can you open a task for yourself to clean those future calls out of gslab_make? If it ends up looking more involved than a day or so's work you can let me know and we can try to find someone else to help out.

@szahedian
Copy link
Contributor

Hi @gentzkow!

  1. The jupyter module is required so long as it is imported in a .py that __init__.py imports from. Even if __init__.py no longer imports run_jupyter(), it imports other functions in run_program.py. And because run_program.py imports the jupyter module, the problem persists.

    1. One solution is to move run_jupyter() into its own script e.g. run_dependent_program.py, remove the import of jupyter from run_program.py, and manually import run_jupyter() in the make scripts.
    2. Another solution is to leave the run scripts in run_program.py and move the import statement for all run_program.py functions into the make scripts. This won't require

    Note that an issue with both these approaches is that they require a manual import. Any manually imported function would be run without the gs. prefix, like the other gslab_make functions. If you like, I can spend some more time checking if there is a way to get around this problem using importlib.

    A related question is the manner in which we update gslab_make for use in template. We could update master with our changes. Or we could fork a branch gentzkow_template in gslab_make that our lib/gslab_make submodule would point to. This way we leave in place the default behavior of __init__.py importing all functions.

  2. I've created Eliminate future dependency from package gslab-econ/gslab_make#49 for deleting future from gslab_make.

@gentzkow
Copy link
Owner Author

OK. Sounds like more hassle than it's worth. Let's not worry about making any changes related to jupyter.

@szahedian
Copy link
Contributor

Sounds good, @gentzkow. Once gslab-econ/gslab_make#49 closes, we can remove future from conda_env.yaml and wrap this issue.

@gentzkow
Copy link
Owner Author

gentzkow commented May 4, 2022

@szahedian Checking in on status here. Thanks!

@szahedian
Copy link
Contributor

Thanks for checking in, @gentzkow! I've posted an update for you in gslab-econ/gslab_make#49.

@gentzkow
Copy link
Owner Author

gentzkow commented May 6, 2022

Thanks! I'd vote we wrap this issue and continue work in gslab_make#49.

@szahedian
Copy link
Contributor

@gentzkow we will definitely continue with gslab-econ/gslab_make#49. I think we should keep this open until that issue closes, so that we can remove future from conda_env.yaml.

@gentzkow
Copy link
Owner Author

gentzkow commented May 6, 2022 via email

@szahedian
Copy link
Contributor

@gentzkow with the merged PR for gslab-econ/gslab_make#49 and the latest commit above, work on this issue is complete. Please see 50d1435 for the full details. In brief, we removed R dependencies, pip, and future from conda_env.yaml. I'm still able to run run_all.py.

If this looks good to you, I can go ahead and merge these changes!

@gentzkow
Copy link
Owner Author

gentzkow commented May 11, 2022 via email

@szahedian
Copy link
Contributor

Summary: In this issue, we removed pip, future, and R dependencies from conda_env.yaml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants