<a href="https://colab.research.google.com/github/krixik-ai/krixik-docs/blob/main/docs/system/file_system/update_method.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os
import sys
import json
import importlib
from pathlib import Path

# demo setup - including secrets instantiation, requirements installation, and path setting
if os.getenv("COLAB_RELEASE_TAG"):
    # if running this notebook in Google Colab - make sure to enter your secrets
    MY_API_KEY = "YOUR_API_KEY_HERE"
    MY_API_URL = "YOUR_API_URL_HERE"

    # if running this notebook on Google Colab - install requirements and pull required subdirectories
    # install Krixik python client
    !pip install krixik

    # install github clone - allows for easy cloning of subdirectories from docs repo: https://github.com/krixik-ai/krixik-docs
    !pip install github-clone

    # clone datasets
    if not Path("data").is_dir():
        !ghclone https://github.com/krixik-ai/krixik-docs/tree/main/data
    else:
        print("docs datasets already cloned!")

    # define data dir
    data_dir = "./data/"

    # create output dir
    from pathlib import Path

    Path(data_dir + "/output").mkdir(parents=True, exist_ok=True)

    # pull utilities
    if not Path("utilities").is_dir():
        !ghclone https://github.com/krixik-ai/krixik-docs/tree/main/utilities
    else:
        print("docs utilities already cloned!")
else:
    # if running local pull of docs - set paths relative to local docs structure
    # import utilities
    sys.path.append("../../../")

    # define data_dir
    data_dir = "../../../data/"

    # if running this notebook locally from Krixik docs repo - load secrets from a .env placed at the base of the docs repo
    from dotenv import load_dotenv

    load_dotenv("../../../.env")

    MY_API_KEY = os.getenv("MY_API_KEY")
    MY_API_URL = os.getenv("MY_API_URL")


# load in reset
reset = importlib.import_module("utilities.reset")
reset_pipeline = reset.reset_pipeline


# import Krixik and initialize it with your personal secrets
from krixik import krixik

krixik.init(api_key=MY_API_KEY, api_url=MY_API_URL)

SUCCESS: You are now authenticated.


## The `update` Method

You can update any metadata of any processed file by using the `update` method.

This overview of the `update` method is divided into the following sections:

- [`update` Method Arguments](#update-method-arguments)
- [`update` Method Example](#update-method-example)
- [Observations on the `update` Method](#observations-on-the-update-method)

### `update` Method Arguments

The `update` method takes one required argument and at least one of several optional arguments:

- `file_id` (required, str) - The `file_id` of the file whose metadata you wish to update.

- `expire_time` (optional, int) - The amount of time (in seconds) that file data will remain on Krixik servers, counting as of when the `update` method is run.

- `symbolic_directory_path` (optional, str) - A UNIX-formatted directory path under your account in the Krixik system.

- `file_name` (optional, str) - A custom file name that must end with the file extension of the original input file. **You cannot update the file extension.**

- `symbolic_file_path` (optional, str) - A combination of `symbolic_directory_path` and `file_name` in a single argument.

- `file_tags` (optional, list) - A list of custom file tags (each a key-value pair). Note that you must update the whole set, so if a file has three file tags and you update one of them, entirely excluding the other two from the `update` method `file_tags` argument, both of those will be deleted.

- `file_description` (optional, str) - A custom file description.

If none of the optional arguments are present, the `update` method will not work because there will be nothing to update.

### `update` Method Example

For this document's example we will use a pipeline consisting of a single [`parser`](../../modules/support_function_modules/parser_module.md) module.  We use the [`create_pipeline`](../pipeline_creation/create_pipeline.md) method to instantiate the pipeline, and then process a file through it:

In [None]:
# create an example pipeline with a single parser module
pipeline = krixik.create_pipeline(name="update_method_1_parser", module_chain=["parser"])

# process short input file
process_output = pipeline.process(
    local_file_path=data_dir + "input/frankenstein_very_short.txt",  # the initial local filepath where the input JSON file is stored
    local_save_directory=data_dir + "output",  # save output repo data output subdir
    expire_time=60 * 30,  # process data will be deleted from the Krixik system in 30 minutes
    wait_for_process=True,  # wait for process to complete before returning IDE control to user
    verbose=False,  # do not display process update printouts upon running code
    symbolic_directory_path="/novels/gothic",
    file_name="Draculas.txt",
    file_tags=[{"author": "Shelley"}, {"category": "gothic"}, {"century": "19"}],
)

Let's see what the file's record looks like with the [`list`](list_method.md) method:

In [3]:
# see the file's record with
list_output = pipeline.list(symbolic_directory_paths=["/novels/gothic"])

# nicely print the output of this
print(json.dumps(list_output, indent=2))

{
  "status_code": 200,
  "request_id": "6ccb47ec-574d-4d48-9dcb-7d0fe91f23b8",
  "message": "Successfully returned 1 item.  Note: all timestamps in UTC.",
  "items": [
    {
      "last_updated": "2024-06-05 15:20:51",
      "process_id": "d99653b8-d16a-981a-41ab-1a2a86e99e9f",
      "created_at": "2024-06-05 15:20:51",
      "file_metadata": {
        "modules": {
          "module_1": {
            "parser": {
              "model": "sentence"
            }
          }
        },
        "modules_data": {
          "module_1": {
            "parser": {
              "data_files_extensions": [
                ".json"
              ],
              "num_lines": 26
            }
          }
        }
      },
      "file_tags": [
        {
          "author": "shelley"
        },
        {
          "category": "gothic"
        },
        {
          "century": "19"
        }
      ],
      "file_description": "",
      "symbolic_directory_path": "/novels/gothic",
      "pipeline": "up

We can use the `update` method to update the file's metadata.

We'll update its `file_name`, since it's erroneous, change the `{"category": "gothic"}` file tag for something different, and add a `file_description`. We'll leave its `symbolic_directory_path` untouched.

In [4]:
# update metadata the metadata for the processed file
update_output = pipeline.update(
    file_id=process_output["file_id"],
    file_name="Frankenstein.txt",
    file_tags=[{"author": "Shelley"}, {"country": "UK"}, {"century": "19"}],
    file_description="Is the villain the monster or the doctor?",
)

# nicely print the output of this update
print(json.dumps(process_output, indent=2))

INFO: lower casing file_name Frankenstein.txt to frankenstein.txt
INFO: lower casing file tag {'author': 'Shelley'} to {'author': 'shelley'}
INFO: lower casing file tag {'country': 'UK'} to {'country': 'uk'}
{
  "status_code": 200,
  "pipeline": "update_method_1_parser",
  "request_id": "3c82cde3-dc63-431f-bf68-1a33b998c272",
  "file_id": "e53d3c35-6f4c-466f-ab7c-6971a6312a09",
  "message": "SUCCESS - output fetched for file_id e53d3c35-6f4c-466f-ab7c-6971a6312a09.Output saved to location(s) listed in process_output_files.",
  "process_output": [
    {
      "snippet": "\ufeffLetter 1\n\n_To Mrs. Saville, England._\n\n\nSt. Petersburgh, Dec. 11th, 17\u2014.",
      "line_numbers": [
        1,
        2,
        3,
        4,
        5,
        6
      ]
    },
    {
      "snippet": "You will rejoice to hear that no disaster has accompanied the\ncommencement of an enterprise which you have regarded with such evil\nforebodings.",
      "line_numbers": [
        7,
        8,
        9,

Now we invoke the [`list`](list_method.md) method to confirm that all metadata has indeed been updated as requested:

In [5]:
# call  to see the file's newly updated record
list_output = pipeline.list(symbolic_file_paths=["/novels/gothic/Frankenstein.txt"])

# nicely print the output of this
print(json.dumps(list_output, indent=2))

{
  "status_code": 200,
  "request_id": "396b969f-2aeb-4952-bd6a-67e1ccca14f1",
  "message": "Successfully returned 1 item.  Note: all timestamps in UTC.",
  "items": [
    {
      "last_updated": "2024-06-05 15:21:10",
      "process_id": "d99653b8-d16a-981a-41ab-1a2a86e99e9f",
      "created_at": "2024-06-05 15:20:51",
      "file_metadata": {
        "modules": {
          "module_1": {
            "parser": {
              "model": "sentence"
            }
          }
        },
        "modules_data": {
          "module_1": {
            "parser": {
              "data_files_extensions": [
                ".json"
              ],
              "num_lines": 26
            }
          }
        }
      },
      "file_tags": [
        {
          "author": "shelley"
        },
        {
          "country": "uk"
        },
        {
          "century": "19"
        }
      ],
      "file_description": "Is the villain the monster or the doctor?",
      "symbolic_directory_path": "/n

### Observations on the `update` Method

Four closing observation on the `update` method:

- Note that in the example above we updated `file_tags` by including the entire set of file tags: `[{"author": "Shelley"}, {"country": "UK"}, {"century": 19}]`. If we'd only used `[{"country": "UK"}]`, the "author" and "century" ones would have been deleted.

- You cannot update a `symbolic_directory_path`/`file_name` combination (a.k.a. a `symbolic_file_path`) so it's identical to that of another file. Krixik will not allow it.

- You can also not update a file's file extension. For instance, a `.txt` file cannot become a `.pdf` file through the `update` method.

- The `update` method allows you to extend a file's [`expire_time`](../parameters_processing_files_through_pipelines/process_method.md#core-process-method-arguments) indefinitely. Upon initially uploading a file, its [`expire_time`](../parameters_processing_files_through_pipelines/process_method.md#core-process-method-arguments) cannot be greater than 2,592,000 seconds (30 days). However, if you periodically invoke `update` on the file and reset its [`expire_time`](../parameters_processing_files_through_pipelines/process_method.md#core-process-method-arguments) to another 2,592,000 seconds (or however many seconds you please), the file will remain on-system for that much more time as of that moment, and so forth.

In [6]:
# delete all processed datapoints belonging to this pipeline
reset_pipeline(pipeline)