Skip to content
/ soorgeon Public
forked from ploomber/soorgeon

Convert monolithic Jupyter notebooks πŸ“™ into maintainable Ploomber pipelines. πŸ“Š

License

Notifications You must be signed in to change notification settings

rrhg/soorgeon

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Soorgeon

Join our community | Newsletter | Contact us | Blog | Website | YouTube

header

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Open JupyerLab

Note: Soorgeon is in alpha, help us make it better.

Install

Compatible with Python 3.7 and higher.

pip install soorgeon

Usage

[Optional] Testing if the notebook runs

Before refactoring, you can optionally test if the original notebook or script runs without exceptions:

# works with ipynb files
soorgeon test path/to/notebook.ipynb

# and notebooks in percent format
soorgeon test path/to/notebook.py

Optionally, set the path to the output notebook:

soorgeon test path/to/notebook.ipynb path/to/output.ipynb

soorgeon test path/to/notebook.py path/to/output.ipynb

Refactoring

To refactor your notebook:

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

# use alternative serializer (cloudpickle or dill) if notebook 
# contains variables that cannot be serialized using pickle 
soorgeon refactor nb.ipynb --serializer cloudpickle
soorgeon refactor nb.ipynb --serializer dill

To learn more, check out our guide.

Cleaning

Soorgeon has a clean command that applies black for .ipynb and .py files:

soorgeon clean path/to/notebook.ipynb

or

soorgeon clean path/to/script.py

Linting

Soorgeon has a lint command that can apply [flake8]:

soorgeon lint path/to/notebook.ipynb

or

soorgeon lint path/to/script.py

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory data analysis notebook:

cd soorgeon/examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd soorgeon/examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Community

About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

Click here to know how you can contribute to Ploomber.

Telemetry

We collect anonymous statistics to understand and improve usage. For details, see here

About

Convert monolithic Jupyter notebooks πŸ“™ into maintainable Ploomber pipelines. πŸ“Š

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%