# **Project : A Case Study of ExpressWay Logistics**

**Business Overview:**

ExpressWay Logistics is a dynamic logistics service provider, committed to delivering efficient, reliable and cost-effective courier transportation and warehousing solutions. With a focus on speed, precision and customer satisfaction, we aim to be the go-to partner for our customers seeking seamless courier services. Our core service involves ensuring operational efficiency throughout our delivery and courier services, including inventory management, durable packaging and swift dispatch of couriers, real time tracking of shipments and on-time delivery of couriers as promised. We are committed to enhance our logistics and courier services and improve seamless connectivity for our customers.

**Current Challenge:**

ExpressWay Logistics faces numerous challenges in ensuring seamless deliveries and customer satisfaction. These challenges include managing various customer demands simultaneously, addressing delays in deliveries and ensuring products arrive intact and safe. Additionally, the company struggles with complexity of efficiently storing and handling a large volume of packages and ultimately meeting customer expectations. Moreover, maintaining a skilled workforce capable of handling various aspects of logistics operations presents its own set of challenges. Overcoming these obstacles requires a comprehensive approach that integrates innovative technology, strategic planning, and continuous improvement initiatives to ensure smooth operations and exceptional service delivery.

**Objective:**

Our primary objective is to conduct a sentiment analysis of user-generated reviews across various digital channels and platforms. By paying attention to their feedback, we want to find ways to make our services better - like handling different customer demands simultaneously, dealing with late deliveries, and keeping packages secured and intact. Through the application of prompt engineering methodologies and sentiment analysis, we'll figure out if sentiments expressed by users for our courier services are Positive or Negative. This will help us understand where we need to improve in order to meet customer expectations and keep them happy. With a focus on getting better all the time, we'll overcome the challenges at ExpressWay Logistics and make our services the best.

**Data Description:**

The dataset titled "courier-service_reviews.csv" is structured to facilitate sentiment analysis for courier service reviews. Here's a brief description of the data columns:

1. id: This column contains unique identifiers for each review entry. It helps in distinguishing and referencing individual reviews.
2. review: This column includes the actual text of the courier service reviews. The reviews are likely composed of customer opinions and experiences regarding different aspects of the services provided by ExpressWay Logistics.
3. sentiment: This column provides an additional layer of classification (positive and negative) for the mentioned reviews.

##**Step 1. Setup**

(A) Writing/Creating the config.json file

## 1.1 Installation

In [None]:
!pip install -q datasets boto3

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m21.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m91.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━

## 1.2 Imports

In [None]:
# Import all Python packages required to access the Bedrock Claude API.
# Import additional packages required to access datasets and create examples.

import boto3
import json
import random

import pandas as pd
import numpy as np

from datasets import load_dataset
from collections import Counter
from tqdm import tqdm

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from tabulate import tabulate

## 1.3 Authentication

**(A) Writing/Creating the config.json file**

In [None]:
!mkdir ~/.aws
!cp aws_config.ini ~/.aws/credentials

In [None]:
client = boto3.client('bedrock-runtime')

In [None]:
model_id = 'anthropic.claude-instant-v1'

## Task : Sentiment Analysis

##**Step 2: Assemble Data**

(A) Upload and Read csv File

(B) Count Positive and Negative Sentiment Reviews

(C) Split the Dataset

For the sentiment analysis classification task, we will use a dataset of courier service reviews for ExpressWay Logistics. Our investigation will focus on assigning positive or negative sentiment to courier service reviews that customers have posted on online channels. During prompt engineering, we will use a hold-out set of reviews (i.e., gold examples) to ascertain the quality of the sentiment assignment.

**(A) Upload and read csv file (2 Marks)**

In [None]:
 cs_reviews_df = pd.read_csv("/content/courier-service_reviews.csv")
# Read CSV File Here

In [None]:
cs_reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131 entries, 0 to 130
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         131 non-null    int64 
 1   review     131 non-null    object
 2   sentiment  131 non-null    object
dtypes: int64(1), object(2)
memory usage: 3.2+ KB


In [None]:
cs_reviews_df.sample(5)

Unnamed: 0,id,review,sentiment
122,123,ExpressWay Logistics caught my attention with ...,Negative
129,130,What I appreciate most about ExpressWay Logist...,Positive
21,22,ExpressWay Logistics is known for its prompt d...,Positive
32,33,ExpressWay Logistics' customer service represe...,Negative
30,31,I am extremely disappointed with the service p...,Negative


**(B) Count Positive and Negative Sentiment Reviews**

In [None]:
#cs_reviews_df['label'] = np.where(cs_reviews_df.sentiment == "Positive", 1,0)

In [None]:
cs_reviews_df.sample(5)

Unnamed: 0,id,review,sentiment
111,112,I had a disappointing experience with ExpressW...,Negative
99,100,I have complete confidence in the packaging pr...,Positive
108,109,ExpressWay Logistics' lack of attention to det...,Negative
77,78,ExpressWay Logistics is unreliable and untrust...,Negative
105,106,I had a sensitive legal document that needed t...,Positive


In [None]:
cs_reviews_df.shape

(131, 3)

In [None]:
cs_reviews_df.sentiment.unique()

array(['Positive', 'Negative'], dtype=object)

In [None]:
cs_reviews_df.sentiment.value_counts()

sentiment
Positive    68
Negative    63
Name: count, dtype: int64

**(C) Split the Dataset**

Now that the preprocessing is done, let us split the data into two segments (use split_ratio of 0.2) - one segment (80%) that gives us a pool to draw few-shot examples from and another segment (20%) that gives us a pool of gold examples.

In [None]:
cs_examples_df, cs_gold_examples_df = train_test_split(
    cs_reviews_df, #<- the full dataset
    test_size=0.2, #<- 20% random sample selected for gold examples
    random_state=42 #<- ensures that the splits are the same for every session
)

In [None]:
(cs_examples_df.shape, cs_gold_examples_df.shape)

((104, 3), (27, 3))

Select the correct columns for further analysis which should exclude the target column.

In [None]:
columns_to_select = ['review','sentiment']

To select gold examples for this session, we sample randomly from the test data using a `random_state=42`. This ensures that the examples from multiple runs of the sampling are the same (i.e., they are randomly selected but do not change between different runs of the notebook). Note that we are doing this only to keep execution times low for illustration. In practise, large number of gold examples facilitate robust estimates of model accuracy.

In [None]:
gold_examples = (
        cs_gold_examples_df.loc[:, columns_to_select]
                                     .sample(21, random_state=42) #<- ensures that gold examples are the same for every session
                                     .to_json(orient='records')
)

Let us print and look at the gold_examples

In [None]:
gold_examples

'[{"review":"The delivery executive assigned by ExpressWay Logistics was courteous and professional during the delivery process. They tried their best to handle the package with care.Unfortunately, the package arrived with slight damage despite the delivery executive\'s efforts. The packaging seemed more than adequate to protect the contents during transit.","sentiment":"Positive"},{"review":"ExpressWay Logistics failed to meet my expectations. The delivery was delayed, and the customer support team was unresponsive and unhelpful when I tried to inquire about the status of my parcel.","sentiment":"Negative"},{"review":"ExpressWay Logistics\' incompetence resulted in a major inconvenience when my package was delivered to the wrong recipient. Despite providing accurate delivery information, the package ended up in the hands of someone else, and efforts to retrieve it were unsuccessful. When I contacted customer service for assistance, I was met with apathy and a lack of urgency. Their fa

In [None]:
json.loads(gold_examples)[0]     #Json format

{'review': "The delivery executive assigned by ExpressWay Logistics was courteous and professional during the delivery process. They tried their best to handle the package with care.Unfortunately, the package arrived with slight damage despite the delivery executive's efforts. The packaging seemed more than adequate to protect the contents during transit.",
 'sentiment': 'Positive'}

### Step 3: Derive Prompt

##**Step 3: Derive Prompt**

(A) Writing Zero Shot System Message

(B) Creating Zero Shot Prompt

(C) Writing Few Shot System Message

(D) Creating Examples For Few shot prompte

(E) Creating Few Shot Prompt

#### Create prompts

In [None]:
claude_prompt_template = """

Human: {system_message}
{prompt}

Assistant: {response}
"""

In [None]:
user_message_template = """```{courier_service_review}```"""

**(A) Write Zero Shot System Message**

In [None]:
# Write zero shot system message here
zero_shot_system_message = """You are a customer service representative for courier service.You will pay attention to customer review.Classify customer sentiment presented in the review input into one of the following categories.
Categories - ['Positive ', 'Negative']
Courier service review will be delimited by triple backticks in the input.
Answer only 'Positive' or 'Negative'. Nothing Else. Do not explain your answer."""

**(B) Create Zero Shot Prompt**

In [None]:
# Get the input review
input_review = cs_reviews_df.iloc[0, 1]

# Create user input message
user_input = {
    'role': 'user',
    'content': user_message_template.format(courier_service_review=input_review)
}

# Convert user input to string format
user_input_str = f"Role: {user_input['role']}, Content: {user_input['content']}"

In [None]:
# Create zero shot prompt to be input ready for completion function
zero_shot_prompt = claude_prompt_template.format(
    system_message=zero_shot_system_message + "\n" + user_input_str,
    prompt=user_message_template.format(courier_service_review=input_review),
    response=''
)

print(zero_shot_prompt)



Human: You are a customer service representative for courier service.You will pay attention to customer review.Classify customer sentiment presented in the review input into one of the following categories.
Categories - ['Positive ', 'Negative']
Courier service review will be delimited by triple backticks in the input.
Answer only 'Positive' or 'Negative'. Nothing Else. Do not explain your answer.
Role: user, Content: ```ExpressWay Logistics' commitment to transparency gives us confidence in their services. They provide clear and upfront pricing, so we know exactly what to expect. With ExpressWay Logistics, there are no hidden fees or surprises, just reliable service at a fair price.```
```ExpressWay Logistics' commitment to transparency gives us confidence in their services. They provide clear and upfront pricing, so we know exactly what to expect. With ExpressWay Logistics, there are no hidden fees or surprises, just reliable service at a fair price.```

Assistant: 



In [None]:
payload = json.dumps({
    "prompt": zero_shot_prompt,
    "max_tokens_to_sample": 4,
    "temperature": 0
})

response = client.invoke_model(
    body=payload,
    modelId=model_id
)

response_body = json.loads(response['body'].read())

print(response_body['completion'])

 Positive


**(C) Write Few Shot System Message**

**Prompt 2: Few-shot**

For the few-shot prompt, there is no change in the system message compared with the zero-shot prompt. However, we augment this system message with few shot examples.  

In [None]:
few_shot_system_message = """You are a customer service representative for courier service.You will pay attention to customer review. Pay attention to complaints related to issues such as late deliveries, and keeping packages secured and intact.  Classify customer sentiment presented in the review input into one of the following categories.
Categories - ['Positive ', 'Negative']
Courier service review will be delimited by triple backticks in the input.
Answer only 'Positive' or 'Negative'. Nothing Else. Do not explain your answer.
"""

Merely selecting random samples from the polarity subsets is not enough because the examples included in a prompt are prone to a set of known biases such as:
 - Majority label bias (frequent answers in predictions)
 - Recency bias (examples near the end of the prompt)


To avoid these biases, it is important to have a balanced set of examples that are arranged in random order. Let us create a Python function that generates bias-free examples:

In [None]:
def create_examples(dataset, n=4):

    """
    Return a JSON list of randomized examples of size 2n with two classes.
    Create subsets of each class, choose random samples from the subsets,
    merge and randomize the order of samples in the merged list.
    Each run of this function creates a different random sample of examples
    chosen from the training data.

    Args:
        dataset (DataFrame): A DataFrame with examples (review + label)
        n (int): number of examples of each class to be selected

    Output:
        randomized_examples (JSON): A JSON with examples in random order
    """

    positive_reviews = (dataset.sentiment == 'Positive')
    negative_reviews = (dataset.sentiment == 'Negative')
    columns_to_select = ['review', 'sentiment']

    positive_examples = dataset.loc[positive_reviews, columns_to_select].sample(n)
    negative_examples = dataset.loc[negative_reviews, columns_to_select].sample(n)

    examples = pd.concat([positive_examples, negative_examples])

    # sampling without replacement is equivalent to random shuffling

    randomized_examples = examples.sample(2*n, replace=False)

    return randomized_examples.to_json(orient='records')

**(D) Create Examples For Few shot prompte**

In [None]:
# Create Examples
examples = create_examples(cs_examples_df, 2)

In [None]:
json.loads(examples)

[{'review': 'ExpressWay Logistics is known for its prompt delivery times.It ensured that my parcels arrive well ahead of schedule.I am happy with the service.',
  'sentiment': 'Positive'},
 {'review': 'Expressway Logistics is extremely unreliable when it comes to packaging. My recent shipment arrived with items badly damaged due to poor packaging. Despite raising the issue with customer service, there was no satisfactory resolution provided. Their lack of attention to packaging standards is concerning and reflects poorly on their overall service quality.',
  'sentiment': 'Negative'},
 {'review': 'ExpressWay Logistics is unreliable and untrustworthy. They failed to deliver my parcel on time, and the customer support team was unapologetic and unwilling to assist me in resolving the issue.',
  'sentiment': 'Negative'},
 {'review': 'Thanks to ExpressWay Logistics, I can confidently say that my shipping woes are a thing of the past. They have earned my trust and loyalty, and I look forward 

With the examples in place, we can now assemble a few-shot prompt. Since we will be using the few-shot prompt several times during evaluation, let us write a function to create a few-shot prompt (the logic of this function is depicted below).

In [None]:
def create_prompt(system_message, examples, user_message_template):

    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): system message with instructions for sentiment analysis
        examples (str): JSON string with list of examples
        user_message_template (str): string with a placeholder for courier service reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """

    few_shot_prompt = ''

    for example in json.loads(examples):
        example_review = example['review']
        example_sentiment = example['sentiment']
        few_shot_prompt += claude_prompt_template.format(
                system_message=system_message,
                prompt=user_message_template.format(courier_service_review=example_review),
                response=example_sentiment
        )

    return (
        few_shot_prompt +
        claude_prompt_template.format(
            system_message=system_message,
            prompt=user_message_template,
            response=''
        )
    )

**(E) Create Few Shot Prompt **

In [None]:
# Create Few shot prompt
few_shot_prompt = create_prompt(
    few_shot_system_message,
    examples,
    user_message_template
)

In [None]:
few_shot_prompt

"\n\nHuman: You are a customer service representative for courier service.You will pay attention to customer review. Pay attention to complaints related to issues such as late deliveries, and keeping packages secured and intact.  Classify customer sentiment presented in the review input into one of the following categories.\nCategories - ['Positive ', 'Negative']\nCourier service review will be delimited by triple backticks in the input.\nAnswer only 'Positive' or 'Negative'. Nothing Else. Do not explain your answer.\n\n```ExpressWay Logistics is known for its prompt delivery times.It ensured that my parcels arrive well ahead of schedule.I am happy with the service.```\n\nAssistant: Positive\n\n\nHuman: You are a customer service representative for courier service.You will pay attention to customer review. Pay attention to complaints related to issues such as late deliveries, and keeping packages secured and intact.  Classify customer sentiment presented in the review input into one of

##**Step 4: Evaluate prompts**

(A) Evaluate Zero Shot Prompt

(B) Evaluate Few Shot Prompt

(C) Calculate Mean and Standard Deviation for Zero Shot Prompt and Few Shot Prompt

Now we have two sets of prompts that we need to evaluate using gold labels. Since the few-shot prompt depends on the sample of examples that was drawn to make up the prompt, we expect some variability in evaluation. Hence, we evaluate each prompt multiple times to get a sense of the average and the variation around the average.

To reiterate, a choice on the prompt should account for variability due to the choice of the random sample. To aid repeated evaluation, we assemble an evaluation function .

In [None]:
def evaluate_prompt(prompt, gold_examples, user_message_template):

    """
    Return the micro-F1 score for predictions on gold examples.
    For each example, we make a prediction using the prompt. Gold labels and
    model predictions are aggregated into lists and compared to compute the
    F1 score.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        user_message_template (str): string with a placeholder for courier service review

    Output:
        micro_f1_score (float): Micro-F1 score computed by comparing model predictions
                                with ground truth
    """

    model_predictions, ground_truths, review_texts = [], [], []

    for example in json.loads(gold_examples):
        gold_input = example['review']

        try:
            payload = json.dumps({
                "prompt": prompt.format(courier_service_review=gold_input),
                "max_tokens_to_sample": 2,
                "temperature": 0
            })

            response = client.invoke_model(
                body=payload,
                modelId=model_id
            )

            response_body = json.loads(response['body'].read())

            prediction = response_body['completion']
            model_predictions.append(prediction.strip()) # <- removes extraneous white spaces
            ground_truths.append(example['sentiment'])
            review_texts.append(gold_input)

        except Exception as e:
            continue

    micro_f1_score = f1_score(ground_truths, model_predictions, average="micro")

    table_data = [[text, pred, truth] for text, pred, truth in zip(review_texts, model_predictions, ground_truths)]
    headers = ["Review", "Model Prediction", "Ground Truth"]
    print(tabulate(table_data, headers=headers, tablefmt="grid"))

    return micro_f1_score


Let us now use this function to do one evaluation of all the two prompts assembled so far, each time computing the Micro-F1 score.

**(A) Evaluate zero shot prompt**

In [None]:
evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

0.47619047619047616

**(B) Evaluate few shot prompt**

In [None]:
evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

0.9523809523809523

However, this is just *one* choice of examples. We will need to run these evaluations with multiple choices of examples to get a sense of variability in F1 score for the few-shot prompt. As an example, let us run evaluations for the few-shot prompt 5 times.

In [None]:
num_eval_runs = 5

In [None]:
zero_shot_performance = []
few_shot_performance = []

In [None]:
for _ in tqdm(range(num_eval_runs)):

    # For each run create a new sample of examples
    examples = create_examples(cs_examples_df)

    # Assemble the zero shot prompt
    zero_shot_prompt = f"""Human: {zero_shot_system_message}\n\nHuman: {user_message_template}\n\nAssistant: """

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate zero shot prompt accuracy on gold examples
    zero_shot_micro_f1 = evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

    # Evaluate few shot prompt accuracy on gold examples
    few_shot_micro_f1 = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    zero_shot_performance.append(zero_shot_micro_f1)
    few_shot_performance.append(few_shot_micro_f1)

  0%|          | 0/5 [00:00<?, ?it/s]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 20%|██        | 1/5 [00:32<02:11, 32.90s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 40%|████      | 2/5 [01:05<01:37, 32.44s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 60%|██████    | 3/5 [01:37<01:04, 32.43s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 80%|████████  | 4/5 [02:08<00:31, 31.74s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

100%|██████████| 5/5 [02:40<00:00, 32.19s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      




**(C) Calculate Mean and Standard Deviation for Zero Shot Prompt and Few Shot Prompt**

Compute the average (mean) and measure the variability (standard deviation) of the evaluation scores for both zero shot and few shot prompts.

In [None]:
np.array(zero_shot_performance).mean(), np.array(zero_shot_performance).std()
# Calculate for Zero Shot

(0.9047619047619048, 0.0)

In [None]:
np.array(few_shot_performance).mean(), np.array(few_shot_performance).std()
# Calculate for Few Shot

(0.9333333333333333, 0.023328473740792145)

##**Step 5: Observation, Insights and Business perspective**

##*Observations:*
We had 131 total customer reviews. Out of which 68 were positive and 63 reviews were negative. This provides us with a balanced dataset to work with. Twenty percent of the dataset was held back for Gold or test sample. When compared with the gold sample the F1 score for Zero Shot prompt was 0.47 and Few Shot prompt was 0.95. Mean and standard deviation of the Zero shot prompt were 0.9 and 0.0. For Few Shot prompt the Mean and standard deviation were 0.93 and 0.02

##*Insights:*
Clearly, Few shot prompt performed better than the Zero shot prompt with higher F1 score on the same set of data. This is indicative of the model's inability to detect sentiment without guidance or example. This could be due to the language of the  prompt as we notice that all predicted sentiment was positive for Zero Shot prompt. However when the sample was randomized both Zero Shot and Few Shot prompt mean and standard deviation were higher at 0.9 or better with a standard deviation between 0 and .02

##*Recommendations:*
We recommend ExpressWay Logistics to use the Few Shot prompt with our model to get a higher accuracy in customer sentiment determination. Following which the review data should be analyzed for the categories of customer issues to pinpoint the root causes in various strategic target areas. This would enable the company to implement the right corrective action for improved and efficient product delivery, reliable and cost-effective courier transportation and warehousing; focusing on speed, precision and customer satisfaction. This will also enable the company to be the seamless courier services company it aspires to be. It will allow the company to continuously improve operational efficiency throughout the delivery process, inventory management, packaging, dispatch of couriers, and on-time delivery of couriers as promised.