We often have to do the same thing for each analysis.
- create a jupyter notebook and import numpy, pandas, matplotlib, etc.
- put watermark to log your environment
- count the missing values in the data set
- check the data types and correct them if they are wrong
- train math model with an usual grid parameter.
- draw the ROC curve of the trained model
- etc.
This module provides generic classes and functions which can be applied almost everywhere and documentation about
- how to prepare your analysis environment.
- hints of useful commands
Supported Python version: 3.7
If you want to use virtualenv
, you can create a new environment
by the following command
> python -m venv your_env
In the working directory you can find a directory your_env
for
the environment. You can activate the environment by
> source your_env/bin/activate
you can find a minimal set of libraries for ad hoc analysis.
(your_env) > pip install -r requirements.txt
Probably you need more libraries. After installing them, you should keep the list of installed libraries by
(your_env) > pip freeze > requirements.txt
jupytext generate a Python script if you create a jupyter notebook and synchronize the pair. The points are
- You can reconstruct your jupyter notebook from the paired Python script.
- While it is difficult to understand a change of a jupyter notebook by reading the raw text, it is easy to understand a change on a Python script.
Namely jupytext makes it easy to manage notebooks on a git repository.
With this library you can put your environment on your notebook briefly. Write the following two lines in a cell and let it run.
%load_ext watermark
%watermark -v -n -m -p numpy,scipy,sklearn,pandas,matplotlib,seaborn
pip install https://github.com/stdiff/adhoc/archive/v0.4.zip
Note that this library is not registered in PyPI, therefore the line
adhoc==0.4
in requirements.txt
raises an error. To avoid this error you can put
the following line instead of the above line
git+git://github.com/stdiff/adhoc.git@v0.4#egg=adhoc
There are some useful extensions for Jupyter lab.
- If you do not know the path to
jupyter_notebook_config.py
, thenjupyter --paths
command shows the directories where you might find the config file. - If you have not created it yet, execute
jupyter notebook --generate-config
. jupyter labextension list
shows the list of installed jupyter extensions
Spellchecker works on Markdown cells and highlights misspelled words, but this does not correct them.
This extension enables us to use a notebook template very easily. Therefore you do not need to type the same import statements.
Note that the extension looks for templates files under the subdirectories of the specified directories.
With DrawIO you can draw diagrams easily.
NB. This might not wort because of this issue.