There are a number of cases where your data scientists have developed some Jupter notebooks and want to run them weekly for data analysis and generating reports. In such cases, it becomes important to create a worflow and run the notebooks in an automated fashion.
You can achive this goal using papermill and airflow.
To learn more about papermill librray, visit their papermill repository.
Setting up your pmill conda environment: conda create -n pmill python=3.6
To activate this environment, run: conda activate pmill
To deactivate an active environment, run: conda deactivate
If you are experiencing "AttributeError: module 'enum' has no attribute 'IntFlag'" error, run: unset PYTHONPATH
You need to install papermill in Python 3 kernel. See notebooks/setup_environment.ipynb.
In this demo, we run a sample notebook to which we just pass two parameters and simply print them. After running the following command, check the notebooks/output.ipynb to view the output of your notebook run.
Note that we need to explicitly pass the kernel (i.e. python3) when running the notebook using papermill library:
papermill notebooks/run_me.ipynb notebooks/output.ipynb -p alpha 0.6 -p l1_ratio 0.1 -k python3