<a href="https://colab.research.google.com/github/krixik-ai/krixik-docs/blob/main/docs/system/pipeline_creation/saving_and_loading_pipelines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import os
import sys
import json
import importlib
from pathlib import Path

# demo setup - including secrets instantiation, requirements installation, and path setting
if os.getenv("COLAB_RELEASE_TAG"):
    # if running this notebook in Google Colab - make sure to enter your secrets
    MY_API_KEY = "YOUR_API_KEY_HERE"
    MY_API_URL = "YOUR_API_URL_HERE"

    # if running this notebook on Google Colab - install requirements and pull required subdirectories
    # install Krixik python client
    !pip install krixik

    # install github clone - allows for easy cloning of subdirectories from docs repo: https://github.com/krixik-ai/krixik-docs
    !pip install github-clone

    # clone datasets
    if not Path("data").is_dir():
        !ghclone https://github.com/krixik-ai/krixik-docs/tree/main/data
    else:
        print("docs datasets already cloned!")

    # define data dir
    data_dir = "./data/"

    # create output dir
    from pathlib import Path

    Path(data_dir + "/output").mkdir(parents=True, exist_ok=True)

    # pull utilities
    if not Path("utilities").is_dir():
        !ghclone https://github.com/krixik-ai/krixik-docs/tree/main/utilities
    else:
        print("docs utilities already cloned!")
else:
    # if running local pull of docs - set paths relative to local docs structure
    # import utilities
    sys.path.append("../../../")

    # define data_dir
    data_dir = "../../../data/"

    # if running this notebook locally from Krixik docs repo - load secrets from a .env placed at the base of the docs repo
    from dotenv import load_dotenv

    load_dotenv("../../../.env")

    MY_API_KEY = os.getenv("MY_API_KEY")
    MY_API_URL = os.getenv("MY_API_URL")


# load in reset
reset = importlib.import_module("utilities.reset")
reset_pipeline = reset.reset_pipeline


# import Krixik and initialize it with your personal secrets
from krixik import krixik

krixik.init(api_key=MY_API_KEY, api_url=MY_API_URL)

SUCCESS: You are now authenticated.


## Saving and Loading Pipelines
[🇨🇴 Versión en español de este documento](https://krixik-docs.readthedocs.io/es-main/sistema/creacion_de_pipelines/guardar_y_cargar_pipelines/)

This overview of the saving and loading pipelines is divided into the following sections:

- [The `save_pipeline` Method](#the-save_pipeline-method)
- [The `load_pipeline` Method](#the-load_pipeline-method)
- [The `reset_pipeline` Function](#the-reset_pipeline-function)

### The `save_pipeline` Method

Saving your pipeline in Krixik means *saving its [configuration](pipeline_config.md)* to disk.

You can save the [configuration](pipeline_config.md) of a pipeline by using the `save_pipeline` method. This method takes one (required) argument:

- `config_path`: A valid local file path.

`config_path` must end with a `.yml` or `.yaml` extension. This is currently the only file format that Krixik saves pipelines into.

To demonstrate how it works, first you'll need to create a pipeline with the [`create_pipeline`](create_pipeline.md) method:

In [2]:
# first create a pipeline
pipeline = krixik.create_pipeline(
    name="saving_and_loading_pipelines_1_summarize_summarize_keyword-db", module_chain=["summarize", "summarize", "keyword-db"]
)

Now that you have a pipeline you can use the `save_pipeline` method to save that pipeline to disk:

In [3]:
# save a pipeline's configuration to disk - example file path provided
pipeline.save_pipeline(config_path=data_dir + "pipeline_configs/save-pipeline-demo.yaml")

For your convenience, if a file by the given filename does not exist at the given location, Krixik will locally create the file and then save your pipeline
 into it.

### The `load_pipeline` Method

Given that a pipeline's [configuration](pipeline_config.md) is its fundamental descriptor, any valid config file can be loaded into Krixik, thus reinstantiating its associated pipeline.

The `load_pipeline` method takes a single (required) argument:

- `config_path`: A valid local file path.

For the `load_pipeline` method to work, the file indicated by `config_path` must (a) exist, (b) have a `.yaml` or `.yml` extension, and (c) hold a properly formatted Krixik pipeline [configuration](pipeline_config.md). If one of these is not true, the method will fail. If you've earlier [saved](#the-save_pipeline-method) a Krixik pipeline to that destination with that file name, it should work just fine. 

Using the `load_pipeline` method looks like this:

In [5]:
# load a pipeline into memory via its valid configuration file
pipeline = krixik.load_pipeline(config_path=data_dir + "pipeline_configs/save-pipeline-demo.yaml")

Note that you don't need to have previously dealt with the saved pipeline yourself. For instance, a colleague may have shared a pipeline [configuration](pipeline_config.md) file with you, or you may have written the file from scratch. As long as the config is properly formatted, the `load_pipeline` method will work as it should.

### The `reset_pipeline` Function

The `load_pipeline` method discussed above reinstantiates a previously existing pipeline with the same `name` and `module_chain`. Since files processed through a pipeline are attached to the pipeline's `name`, those files would continue to be attached to this newly instantiated pipeline.

If you wish to recreate a pipeline but seek to do so with a blank slate, the easiest way to do it is with the `reset_pipeline` function, which deletes all processed datapoints attached to that pipeline (i.e. anything relating to any files previously processed through it).

The `reset_pipeline` function takes one argument (required):

- `pipeline`: The Python variable that the pipeline object is currently saved to.

Note that this is _not_ the `name` of the pipeline. For instance, if you wished to reset the pipeline in the `load_pipeline` method example code immediately above, the `pipeline` argument for the `reset_pipeline` function would be set to `my_pipeline_2`, as follows:

In [6]:
# delete all processed datapoints belonging to this pipeline
reset_pipeline(pipeline)

In other words, the `pipeline` argument to the `reset_pipeline` function is a Python variable that a pipeline object has been assigned to, and `reset_pipeline` will delete any datapoints associated with that pipeline object's `name` on the Krixik system.

In [7]:
# delete all processed datapoints belonging to this pipeline
reset_pipeline(pipeline)