# Streamlined Analysis and Imports

If you want to quickly run the analysis for a CSV file containing Zendesk Tickets handled by the VA, specify the file name below and press the ▶▶ button (Restart kernel and run all cells)


In [None]:
import os

from utils.dialog_flow_analysis.common import (
    DETRACTORS_UPPER_LIMIT,
    PROMOTERS_LOWER_LIMIT,
    RATING_MAX,
    RATING_MIN,
    analyze_flows,
    extract_flows_from_session_id,
)
from utils.dialog_flow_analysis.zendesk import match_zendesk_id_to_session_id
from utils.logger import logger

%load_ext autoreload
%autoreload 2

# Change the values below
CSV_FNAME = "acc_issues_GR&CY_virtual_assistant.csv"

sessions_csv_fname = f"{os.path.splitext(CSV_FNAME)[0]}_session_id.csv"
sessions_content_csv_fname = f"{os.path.splitext(CSV_FNAME)[0]}_content.csv"

# 1. Configuration and Setup


## Set your credentials and configuration in the .env file

If this is your first time using this notebook, create a `.env` file in the main directory. Then, copy the contents of `.env.example` into `.env` and provide your credentials there.

Input your Moveo and Zendesk credentials necessary for retrieving the data.

In the `Data visualization config` you can change the scale of the **rating** for your usecase. Also you can define `Detractors`, `Neutrals` and `Promoters`. Default is:

- Rating scale of 0-10
- Detractors 0-3
- Neutrals 4-6
- Promoters 7-10


## Confirm that the rating configuration is correct


In [None]:
logger.info(f"Working with a rating scale of {RATING_MIN} to {RATING_MAX}")
logger.info(
    "Detractors are users who provide ratings from "
    f"{RATING_MIN} to {DETRACTORS_UPPER_LIMIT}"
)
logger.info(
    "Neutrals are users who provide ratings from "
    f"{DETRACTORS_UPPER_LIMIT + 1} to {PROMOTERS_LOWER_LIMIT - 1}"
)
logger.info(
    "Promoters are users who provide ratings from "
    f"{PROMOTERS_LOWER_LIMIT} to {RATING_MAX}"
)

# 2. Get Moveo sessions from Zendesk conversations


## Function Overview: **match_zendesk_id_to_session_id**

To get the sessions content from Moveo analytics we will have to retrieve the session_ids from the Zendesk conversations.

We will do this by getting the **tags** starting with `moveo_session_id_` for each conversation through the Zendesk API.

For the analysis we will _need the user's rating_ for each ticket (Question1 column)

Combining the above, you will have to add a CSV file in the `/data` directory that contains at least these 2 columns for every ticket(the order does not matter):

| ChatEvaluationTicketID | Question1 |
| ---------------------- | --------- |
| ticket_id_123          | 10        |
| ...                    | ...       |

### Parameters

This function has two parameters:

1. the **input** CSV file that contains the Zendesk ticket_ids (Default value: `csv_fname` defined at the top)
2. the **output** CSV file that will include the Moveo session_ids (Default value: `sessions_csv_fname` defined at the top)

### Output

The output will be a CSV file with the following format:
| ChatEvaluationTicketID | Question1 | SessionID |
| ---------------------- | --------- | -------------- |
| ticket_id_123 | 10 | session_id_123 |

**Note 1**: If you wish to execute this function with file names other than those initially defined, simply modify `csv_fname` and `sessions_csv_fname` before executing it. (e.g., `match_zendesk_id_to_session_id("account_issues_br.csv", "account_issues_br_session_id.csv")`)

**Note 2**: If the output file already exists, you will be prompted to confirm whether you want to override it. Please respond with either y (yes) or n (no).


## Call the function and retrieve session_ids


In [None]:
match_zendesk_id_to_session_id(CSV_FNAME, sessions_csv_fname)

# 3. Fetch session data from Moveo Analytics


## Function Overview: **extract_flows_from_session_id**

This function retrieves the content of each session from Moveo Analytics API.

### Parameters

This function requires two parameters:

1. The **input** CSV file containing the SessionIDs (outputed by the previous function). (Default value: `sessions_csv_fname` defined at the top)
2. The **output** CSV file including the SessionContent. (Default value: `sessions_content_csv_fname` defined at the top)

### Output

The output CSV file will have the following format:

| ChatEvaluationTicketID | SessionID      | SessionContent                                 |
| ---------------------- | -------------- | ---------------------------------------------- |
| ticket_id_123          | session_id_123 | This will be the entire content of the session |

**Note 1**: If you wish to execute this function with file names other than those initially defined, simply modify `sessions_csv_fname` and `sessions_content_csv_fname` before executing it. (e.g., `extract_flows_from_session_id("account_issues_br_session_id.csv", "account_issues_br_content.csv.csv")`)

**Note 2**: If the output file already exists, you will be prompted to confirm whether you want to override it. Please respond with either y (yes) or n (no).


## Call the function to get the content


In [None]:
extract_flows_from_session_id(sessions_csv_fname, sessions_content_csv_fname)

# 4. Generate the charts


## Function Overview: **analyze_flows**

This function utilizes the SessionContent to generate insights aimed at enhancing the brain's performance.

### Parameters

This function requires one parameter: an input CSV file containing the `SessionContent` outputted by the preceding function. (Default value: `sessions_content_csv_fname` defined at the top)

### Output

1. **Sankey Diagrams**: Three Sankey diagrams portraying the Journey distribution, one for each customer group. Rectangles represent dialog nodes, and columns represent dialog turns initiated by customer inputs. These diagrams are interactive, allowing you to rearrange nodes by dragging them.
2. **Rating Frequency Histogram**: A histogram that visualizes the distribution of rating frequencies across all rating values.
3. **Containment Percentage Pie Charts**: Three pie charts that illustrate the containment percentage for each customer group.
4. **Coverage Percentage Pie Charts**: Three pie charts that display the coverage percentage for each customer group.

❗ **_The diagrams above will also be generated as html files within the `/data/{fname}_plots` directory. Open them in a browser to view in more detail._** ❗

### Returns

The dataframe. You can use it to get more info like the **mean** or **median** and conduct further analysis.

**Note 1**: All generated diagrams are interactive, enabling you to access counts and additional details simply by **hovering over them**.

**Note 2**: If you wish to execute this function for a file other than the one initially defined, simply modify `sessions_content_csv_fname` before executing it. (e.g., `analyze_flows("account_issues_br_content.csv.csv")`)


## Run the analysis


In [None]:
# Adjust the threshold for the minimum number of transitions (dialog turns) to
# display in the Sankey diagrams.
# Increase with larger datasets and decrease with smaller ones.
MIN_TRANSITIONS_DISPLAY_THRESHOLD = 3

In [None]:
df = analyze_flows(sessions_content_csv_fname, MIN_TRANSITIONS_DISPLAY_THRESHOLD)

## Utilize the dataframe for further analysis according to your preferences.


In [None]:
df["Question1"].skew()

In [None]:
df["Question1"].mean()

In [None]:
df["Question1"].median()