## Multi-Module Pipeline: Sentiment Analysis on Transcription

This document details a modular pipeline that takes in an audio file in English, [`transcribes`](../../modules/ai_model_modules/transcribe_module.md) it, and then performs [`sentiment analysis`](../../modules/ai_model_modules/sentiment_module.md) on each sentence of the transcript.

The document is divided into the following sections:

- [Pipeline Setup](#pipeline-setup)
- [Processing an Input File](#processing-an-input-file)

In [1]:
# import utilities
import sys 
import json
import importlib
sys.path.append('../../../')
reset = importlib.import_module("utilities.reset")
reset_pipeline = reset.reset_pipeline

# load secrets from a .env file using python-dotenv
from dotenv import load_dotenv
import os
load_dotenv("../../../.env")
MY_API_KEY = os.getenv('MY_API_KEY')
MY_API_URL = os.getenv('MY_API_URL')

# import krixik and initialize it with your personal secrets
from krixik import krixik
krixik.init(api_key = MY_API_KEY, 
            api_url = MY_API_URL)

SUCCESS: You are now authenticated.


### Pipeline Setup

To achieve what we've described above, let's set up a pipeline sequentially consisting of the following modules:

- A [`transcribe`](../../modules/ai_model_modules/transcribe_module.md) module.

- A [`json-to-txt`](../../modules/support_function_modules/json-to-txt_module.md) module.

- A [`parser`](../../modules/ai_model_modules/parser_module.md) module.

- A [`sentiment`](../../modules/ai_model_modules/sentiment_module.md) module.

We use the [`json-to-txt`](../../modules/support_function_modules/json-to-txt_module.md) and [`parser`](../../modules/ai_model_modules/parser_module.md) combination, which combines the transcribed snippets into one document and then splices it again, to make sure that any pauses in speech don't make for partial snippets that can confuse the [`sentiment`](../../modules/ai_model_modules/sentiment_module.md) model.

Pipeline setup is accomplished through the [`.create_pipeline`](../../system/pipeline_creation/create_pipeline.md) method, as follows:

In [2]:
# create a pipeline as detailed above

pipeline_1 = krixik.create_pipeline(name="multi_sentiment_analysis_on_transcription",
                                    module_chain=["transcribe",
                                                  "json-to-txt",
                                                  "parser",
                                                  "sentiment"])

### Processing an Input File

Lets take a quick look at a short test file before processing.

In [3]:
# examine contents of input file

from IPython.display import Video
Video("../../../data/input/Interesting Facts About Colombia.mp3")

We will use the default models for every module in the pipeline, so the [`modules`](../../system/parameters_processing_files_through_pipelines/process_method.md#selecting-models-via-the-modules-argument) argument of the [`.process`](../../system/parameters_processing_files_through_pipelines/process_method.md) method doesn't need to be leveraged.

In [4]:
# process the file through the pipeline, as described above

process_output_1 = pipeline_1.process(local_file_path = "../../../data/input/Interesting Facts About Colombia.mp3", # the initial local filepath where the input file is stored
                                      local_save_directory="../../../data/output", # the local directory that the output file will be saved to
                                      expire_time=60*30, # process data will be deleted from the Krixik system in 30 minutes
                                      wait_for_process=True, # wait for process to complete before returning IDE control to user
                                      verbose=False) # do not display process update printouts upon running code

Exception: [WinError 267] The directory name is invalid: 'C:\\Users\\Lucas\\AppData\\Local\\Temp\\tmpzrem3fpp\\krixik_converted_version_Interesting Facts About Colombia.mp3'

The output of this process is printed below. To learn more about each component of the output, review documentation for the [`.process`](../../system/parameters_processing_files_through_pipelines/process_method.md) method.

Because the output of this particular module-model pair is a JSON file, the process output is provided in this object as well (this is only the case for JSON outputs).  Moreover, the output file itself has been saved to the location noted in the `process_output_files` key.  The `file_id` of the processed input is used as a filename prefix for the output file.

In [9]:
# nicely print the output of this process

print(json.dumps(process_output_1, indent=2))

{
  "status_code": 200,
  "pipeline": "transcribe-sentiment-pipeline",
  "request_id": "bca798e6-85de-4f8a-9974-744108545dae",
  "file_id": "dfaced90-11ed-41c8-9bf0-8751656be563",
  "message": "SUCCESS - output fetched for file_id dfaced90-11ed-41c8-9bf0-8751656be563.Output saved to location(s) listed in process_output_files.",
  "process_output": [
    {
      "snippet": " That's the episode looking at the great country of Columbia.",
      "positive": 0.993,
      "negative": 0.007,
      "neutral": 0.0
    },
    {
      "snippet": "We looked at some really basic facts.",
      "positive": 0.252,
      "negative": 0.748,
      "neutral": 0.0
    },
    {
      "snippet": "It's name, a bit of its history, the type of people that live there, land size, and all that jazz.",
      "positive": 0.998,
      "negative": 0.002,
      "neutral": 0.0
    },
    {
      "snippet": "But in this video, we're going to go into a little bit more of a detailed look.",
      "positive": 0.992,
      

To confirm that everything went as it should have, let's load in the text file output from `process_output_files`:

In [14]:
# load in process output from file

with open(process_output_1["process_output_files"][0]) as f:
  print(json.dumps(json.load(f), indent=2))

[
  {
    "snippet": " That's the episode looking at the great country of Columbia.",
    "positive": 0.993,
    "negative": 0.007,
    "neutral": 0.0
  },
  {
    "snippet": "We looked at some really basic facts.",
    "positive": 0.252,
    "negative": 0.748,
    "neutral": 0.0
  },
  {
    "snippet": "It's name, a bit of its history, the type of people that live there, land size, and all that jazz.",
    "positive": 0.998,
    "negative": 0.002,
    "neutral": 0.0
  },
  {
    "snippet": "But in this video, we're going to go into a little bit more of a detailed look.",
    "positive": 0.992,
    "negative": 0.008,
    "neutral": 0.0
  },
  {
    "snippet": "Yo, what is going on guys?",
    "positive": 0.005,
    "negative": 0.995,
    "neutral": 0.0
  },
  {
    "snippet": "Welcome back to F2D facts.",
    "positive": 0.999,
    "negative": 0.001,
    "neutral": 0.0
  },
  {
    "snippet": "The channel where I look at people cultures and places.",
    "positive": 0.999,
    "negativ

In [None]:
# delete all processed datapoints belonging to this pipeline

reset_pipeline(pipeline_1)