Skip to content

stdiff/adhoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

adhoc

  • Build Status: Build Status (master) Build Status (dev)
  • Code Coverage: codecov (master) codecov (dev)

Goal of this repository/library

We often have to do the same thing for each analysis.

  • create a jupyter notebook and import numpy, pandas, matplotlib, etc.
  • put watermark to log your environment
  • count the missing values in the data set
  • check the data types and correct them if they are wrong
  • train math model with an usual grid parameter.
  • draw the ROC curve of the trained model
  • etc.

This module provides generic classes and functions which can be applied almost everywhere and documentation about

  • how to prepare your analysis environment.
  • hints of useful commands

Read Notebooks on nbviewer

Setup: Python

Supported Python version: 3.7

If you want to use virtualenv, you can create a new environment by the following command

> python -m venv your_env

In the working directory you can find a directory your_env for the environment. You can activate the environment by

> source your_env/bin/activate

Libraries

you can find a minimal set of libraries for ad hoc analysis.

(your_env) > pip install -r requirements.txt

Probably you need more libraries. After installing them, you should keep the list of installed libraries by

(your_env) > pip freeze > requirements.txt

jupytext

jupytext generate a Python script if you create a jupyter notebook and synchronize the pair. The points are

  • You can reconstruct your jupyter notebook from the paired Python script.
  • While it is difficult to understand a change of a jupyter notebook by reading the raw text, it is easy to understand a change on a Python script.

Namely jupytext makes it easy to manage notebooks on a git repository.

watermark

With this library you can put your environment on your notebook briefly. Write the following two lines in a cell and let it run.

%load_ext watermark
%watermark -v -n -m -p numpy,scipy,sklearn,pandas,matplotlib,seaborn

Install "adhoc"

pip install https://github.com/stdiff/adhoc/archive/v0.4.zip

Note that this library is not registered in PyPI, therefore the line

adhoc==0.4

in requirements.txt raises an error. To avoid this error you can put the following line instead of the above line

git+git://github.com/stdiff/adhoc.git@v0.4#egg=adhoc

Setup: JupyterLab

There are some useful extensions for Jupyter lab.

  • If you do not know the path to jupyter_notebook_config.py, then jupyter --paths command shows the directories where you might find the config file.
  • If you have not created it yet, execute jupyter notebook --generate-config.
  • jupyter labextension list shows the list of installed jupyter extensions

Spell checker

Spellchecker works on Markdown cells and highlights misspelled words, but this does not correct them.

JupyterLab Template

This extension enables us to use a notebook template very easily. Therefore you do not need to type the same import statements.

Note that the extension looks for templates files under the subdirectories of the specified directories.

DrawIO

With DrawIO you can draw diagrams easily.

NB. This might not wort because of this issue.

useful references

pandas

matplotlib

seaborn

sckit-learn

About

Helper classes and functions for an ad hoc analysis with/without machine learning

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published