# **Project : A Case Study of ExpressWay Logistics**

**Business Overview:**

ExpressWay Logistics is a dynamic logistics service provider, committed to delivering efficient, reliable and cost-effective courier transportation and warehousing solutions. With a focus on speed, precision and customer satisfaction, we aim to be the go-to partner for our customers seeking seamless courier services. Our core service involves ensuring operational efficiency throughout our delivery and courier services, including inventory management, durable packaging and swift dispatch of couriers, real time tracking of shipments and on-time delivery of couriers as promised. We are committed to enhance our logistics and courier services and improve seamless connectivity for our customers.

**Current Challenge:**

ExpressWay Logistics faces numerous challenges in ensuring seamless deliveries and customer satisfaction. These challenges include managing various customer demands simultaneously, addressing delays in deliveries and ensuring products arrive intact and safe. Additionally, the company struggles with complexity of efficiently storing and handling a large volume of packages and ultimately meeting customer expectations. Moreover, maintaining a skilled workforce capable of handling various aspects of logistics operations presents its own set of challenges. Overcoming these obstacles requires a comprehensive approach that integrates innovative technology, strategic planning, and continuous improvement initiatives to ensure smooth operations and exceptional service delivery.

**Objective:**

Our primary objective is to conduct a sentiment analysis of user-generated reviews across various digital channels and platforms. By paying attention to their feedback, we want to find ways to make our services better - like handling different customer demands simultaneously, dealing with late deliveries, and keeping packages secured and intact. Through the application of prompt engineering methodologies and sentiment analysis, we'll figure out if sentiments expressed by users for our courier services are Positive or Negative. This will help us understand where we need to improve in order to meet customer expectations and keep them happy. With a focus on getting better all the time, we'll overcome the challenges at ExpressWay Logistics and make our services the best.

**Data Description:**

The dataset titled "courier-service_reviews.csv" is structured to facilitate sentiment analysis for courier service reviews. Here's a brief description of the data columns:

1. id: This column contains unique identifiers for each review entry. It helps in distinguishing and referencing individual reviews.
2. review: This column includes the actual text of the courier service reviews. The reviews are likely composed of customer opinions and experiences regarding different aspects of the services provided by ExpressWay Logistics.
3. sentiment: This column provides an additional layer of classification (positive and negative) for the mentioned reviews.

##**Step 1. Setup (2 Marks)**

(A) Writing/Creating the config.json file  (2 Marks)

### Installation

In [1]:
%pip install openai==0.28.0 tiktoken datasets session-info scikit-learn tabulate --quiet

Note: you may need to restart the kernel to use updated packages.


### Imports

Import all Python packages required to access the Azure Open AI API and to access datasets and create examples.

In [2]:
# Import all Python packages required to access the Azure Open AI API.
# Import additional packages required to access datasets and create examples.

import openai
import json
import random
import tiktoken
import session_info
import pprint as pp

import pandas as pd
import numpy as np

from collections import Counter
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from tabulate import tabulate

### Authentication

**(A) Writing/Creating the config.json file (2 Marks)**

In [3]:
# Define your configuration information
config_data = {
    "AZURE_OPENAI_KEY": "",
    "AZURE_OPENAI_ENDPOINT": "",
    "AZURE_OPENAI_APITYPE": "",
    "AZURE_OPENAI_APIVERSION": "",
    "CHATGPT_MODEL": ""
}

#Replace "" with your credentials

In [4]:
# Write the configuration information into the config.json file
# with open('config.json', 'w') as config_file:
#     json.dump(config_data, config_file, indent=4)

# print("Config file created successfully!")

Reading the config.json file

In [5]:
with open('config.json', 'r') as az_creds:
    data = az_creds.read()

In [6]:
creds = json.loads(data)

In [7]:
openai.api_key = creds["AZURE_OPENAI_KEY"]
openai.api_base = creds["AZURE_OPENAI_ENDPOINT"]
openai.api_type = creds["AZURE_OPENAI_APITYPE"]
openai.api_version = creds["AZURE_OPENAI_APIVERSION"]

In [8]:
chat_model_id = creds["CHATGPT_MODEL"]

### Utilities

Define a function for token counter to keep track of the completion window available in the prompt.

In [9]:
def num_tokens_from_messages(messages):

    """
    Return the number of tokens used by a list of messages.
    Adapted from the Open AI cookbook token counter
    """

    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

    # Each message is sandwiched with <|start|>role and <|end|>
    # Hence, messages look like: <|start|>system or user or assistant{message}<|end|>

    tokens_per_message = 3 # token1:<|start|>, token2:system(or user or assistant), token3:<|end|>

    num_tokens = 0

    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>

    return num_tokens

# Task: Sentiment Analysis

##**Step 2: Assemble Data (4 Marks)**

(A) Upload and Read csv File (2 Marks)

(B) Creating a new column named as "label" (target column) corresponding to the sentiments in the dataset (2 Marks)

Read the file

**(A) Upload and read csv file (2 Marks)**

In [10]:
cs_reviews_df = pd.read_csv("courier-service_reviews.csv")
# Read CSV File Here

**(B) Creating a new column named as "label" (target column) corresponding to the sentiments in the dataset (2 Marks)**

In [11]:
cs_reviews_df['label'] = cs_reviews_df['sentiment']

Split the data into two segments (use split_ratio of 0.2) - one segment (80%) that gives us a pool to draw few-shot examples from and another segment (20%) that gives us a pool of gold examples.

In [12]:
cs_examples_df, cs_gold_examples_df = train_test_split(
    cs_reviews_df, #<- the full dataset
    test_size=0.2, #<- 20% random sample selected for gold examples
    random_state=42 #<- ensures that the splits are the same for every session
)

Select the correct columns for further analysis which should exclude the target column.

In [13]:
columns_to_select = ['review','sentiment']

Create gold examples and select a random sample (depends on the learner based on the session runtime - example:21) of rows from the gold examples dataframe(cs_gold_examples_df).

In [14]:
gold_examples = (
        cs_gold_examples_df.loc[:, columns_to_select]
                                     .sample(21, random_state=42) #<- ensures that gold examples are the same for every session
                                     .to_json(orient='records')   # for better readability
)

To select gold examples for this session, sample randomly from the test data using a `random_state=42`. This ensures that the examples from multiple runs of the sampling are the same (i.e., they are randomly selected but do not change between different runs of the notebook). Note that we are doing this only to keep execution times low for illustration. In practise, large number of gold examples facilitate robust estimates of model accuracy.

##**Step 3: Derive Prompt (14 Marks)**

(A) Write Zero Shot Prompt (5 Marks)

(B) Write Few Shot Prompt (5 Marks)

(C) Print Create Examples (2 Marks)

(D) Print Few shot Prompt (2 Marks)

#### Create prompts

In [15]:
user_message_template = """```{courier_service_review}```"""

**(A) Write Zero Shot Prompt (5 Marks)**

In [16]:
zero_shot_system_message = """You have been tasked with analyzing the sentiment of customer reviews for ExpressWay Logistics.
You are provided with a dataset of customer reviews for ExpressWay's courier service.
Each review is labeled as either positive or negative.
Your task is to classify the sentiment of the review as either 'Positive' for positive or 'Negative' for negative.
Limit your response to those labels. Do not explain your reasoning."""
# """Limit your response to exactly one of those two labels and do not explain your reasoning.""" will make
#  the zero shot prompt perform like the few shot prompt defeating the point of the exercise.

#  Write Zero Shot Prompt Here

In [17]:
zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]

**(B) Write Few Shot Prompt (5 Marks)**

For the few-shot prompt, there is no change in the system message compared with the zero-shot prompt. However, we augment this system message with few shot examples.  

In [18]:
few_shot_system_message = zero_shot_system_message
#  Write Few Shot Prompt Here

Merely selecting random samples from the polarity subsets is not enough because the examples included in a prompt are prone to a set of known biases such as:
 - Majority label bias (frequent answers in predictions)
 - Recency bias (examples near the end of the prompt)


To avoid these biases, it is important to have a balanced set of examples that are arranged in random order.

Let us now look at how we can assemble examples to go along with this few-shot system message and compose a few-shot prompt.

###Define "create_examples" function

In [19]:
def create_examples(dataset, n=4):

    """
    Return a JSON list of randomized examples of size 2n with two classes.
    Create subsets of each class, choose random samples from the subsets,
    merge and randomize the order of samples in the merged list.
    Each run of this function creates a different random sample of examples
    chosen from the training data.

    Args:
        dataset (DataFrame): A DataFrame with examples (review + label)
        n (int): number of examples of each class to be selected

    Output:
        randomized_examples (JSON): A JSON with examples in random order
    """

    positive_reviews = (dataset.sentiment == 'Positive')
    negative_reviews = (dataset.sentiment == 'Negative')
    columns_to_select = ['review', 'sentiment']

    positive_examples = dataset.loc[positive_reviews, columns_to_select].sample(n)
    negative_examples = dataset.loc[negative_reviews, columns_to_select].sample(n)

    examples = pd.concat([positive_examples, negative_examples])
    # sampling without replacement is equivalent to random shuffling
    randomized_examples = examples.sample(2*n, replace=False)

    return randomized_examples.to_json(orient='records')

Use the above create_examples function to create examples for few shot prompt

In [20]:
examples = create_examples(cs_examples_df, 2)

**(C) Print Created Examples (2 Marks)**

In [21]:
pp.pprint(examples, indent=4, width=120)
# Print the created examples here

('[{"review":"ExpressWay Logistics failed to deliver my package on the promised date, causing significant '
 'inconvenience and disruption to my plans. Despite assurances that the package would arrive on time, it was delayed '
 'without any explanation or notification from the company. The lack of transparency and communication from ExpressWay '
 "Logistics is unacceptable, and I'm deeply disappointed by their failure to uphold their end of the "
 'bargain.","sentiment":"Negative"},{"review":"ExpressWay Logistics\' delivery drivers seem to have no regard for '
 'punctuality. Despite paying for expedited shipping, my packages are consistently delayed, causing major '
 'inconvenience. It\'s time for them to step up their game.","sentiment":"Negative"},{"review":"I appreciate the '
 'attention to detail that ExpressWay Logistics puts into every aspect of their service, from packaging to delivery to '
 'customer support and the promised door to door deliveries.Overall a satisfied '
 'servi

2. Define "create_prompt" function

In [22]:
def create_prompt(system_message, examples, user_message_template):

    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): system message with instructions for sentiment analysis
        examples (str): JSON string with list of examples
        user_message_template (str): string with a placeholder for courier service reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """

    few_shot_prompt = [{'role':'system', 'content': system_message}]

    for example in json.loads(examples):
        example_review = example['review']
        example_sentiment = example['sentiment']

        few_shot_prompt.append(
            {
                'role': 'user',
                'content': user_message_template.format(
                    courier_service_review=example_review
                )
            }
        )

        few_shot_prompt.append(
            {'role': 'assistant', 'content': f"{example_sentiment}"}
        )

    return few_shot_prompt

Use the above create_prompt function to create few_shot_prompt

In [23]:
few_shot_prompt = create_prompt(
    few_shot_system_message,
    examples,
    user_message_template
)

**(D) Print the Few Shot Prompt (2 Marks)**

In [24]:
pp.pprint(few_shot_prompt, indent=2, width=120)
# Print the created few shot prompt here

[ { 'content': 'You have been tasked with analyzing the sentiment of customer reviews for ExpressWay Logistics.\n'
               "You are provided with a dataset of customer reviews for ExpressWay's courier service.\n"
               'Each review is labeled as either positive or negative.\n'
               "Your task is to classify the sentiment of the review as either 'Positive' for positive or 'Negative' "
               'for negative.\n'
               'Limit your response to those labels. Do not explain your reasoning.',
    'role': 'system'},
  { 'content': '```ExpressWay Logistics failed to deliver my package on the promised date, causing significant '
               'inconvenience and disruption to my plans. Despite assurances that the package would arrive on time, it '
               'was delayed without any explanation or notification from the company. The lack of transparency and '
               "communication from ExpressWay Logistics is unacceptable, and I'm deeply disapp

##**Step 4: Evaluate prompts (10 Marks)**

(A) Evaluate Zero Shot Prompt (3 Marks)

(B) Evaluate Few Shot Prompt (3 marks)

(C) Calculate Mean and Standard Deviation for Zero Shot and Few Shot (4 Marks)

### Define Evaluation scorer

In [25]:
def evaluate_prompt(prompt, gold_examples, user_message_template):

    """
    Return the micro-F1 score for predictions on gold examples.
    For each example, we make a prediction using the prompt. Gold labels and
    model predictions are aggregated into lists and compared to compute the
    F1 score.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        user_message_template (str): string with a placeholder for courier service review

    Output:
        micro_f1_score (float): Micro-F1 score computed by comparing model predictions
                                with ground truth
    """

    model_predictions, ground_truths, review_texts = [], [], []

    for example in json.loads(gold_examples):
        gold_input = example['review']
        user_input = [
            {
                'role':'user',
                'content': user_message_template.format(courier_service_review=gold_input)
            }
        ]

        try:
            response = openai.ChatCompletion.create(
                deployment_id=chat_model_id,
                messages=prompt+user_input,
                temperature=0, # <- Note the low temperature(For a deterministic response)
                max_tokens=2 # <- Note how we restrict the output to not more than 2 tokens
            )

            prediction = response['choices'][0]['message']['content']
            model_predictions.append(prediction.strip()) # <- removes extraneous white spaces
            ground_truths.append(example['sentiment'])
            review_texts.append(gold_input)

        except Exception as e:
            continue

    micro_f1_score = f1_score(ground_truths, model_predictions, average="micro")

    table_data = [[text, pred, truth] for text, pred, truth in zip(review_texts, model_predictions, ground_truths)]
    headers = ["Review", "Model Prediction", "Ground Truth"]
    print(tabulate(table_data, headers=headers, maxcolwidths=80, tablefmt="grid"))

    return micro_f1_score


**(A) Evaluate zero shot prompt (3 Marks)**

In [26]:
evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Negative.          | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log

0.38095238095238093

**(B) Evaluate few shot prompt (3 Marks)**

In [27]:
evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Neutral            | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log

0.8095238095238095

 However, this is just *one* choice of examples. We will need to run these evaluations with multiple choices of examples to get a sense of variability in F1 score for the few-shot prompt. As an example, let us run evaluations for the few-shot prompt 5 times.

In [28]:
num_eval_runs = 5

In [29]:
zero_shot_performance = []
few_shot_performance = []

In [30]:
for _ in tqdm(range(num_eval_runs)):

    # For each run create a new sample of examples
    examples = create_examples(cs_examples_df)

    # Assemble the zero shot prompt with these examples
    zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate zero shot prompt accuracy on gold examples
    zero_shot_micro_f1 = evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

    # Evaluate few shot prompt accuracy on gold examples
    few_shot_micro_f1 = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    zero_shot_performance.append(zero_shot_micro_f1)
    few_shot_performance.append(few_shot_micro_f1)

  0%|          | 0/5 [00:00<?, ?it/s]

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Negative.          | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log

 20%|██        | 1/5 [00:05<00:20,  5.06s/it]

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Neutral            | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log

 40%|████      | 2/5 [00:10<00:15,  5.05s/it]

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Neutral            | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log

 60%|██████    | 3/5 [00:15<00:10,  5.04s/it]

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Neutral            | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log

 80%|████████  | 4/5 [00:20<00:05,  5.02s/it]

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Neutral            | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log

100%|██████████| 5/5 [00:25<00:00,  5.12s/it]

+----------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                           | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and        | Neutral            | Positive       |
| professional during the delivery process. They tried their best to handle the    |                    |                |
| package with care.Unfortunately, the package arrived with slight damage despite  |                    |                |
| the delivery executive's efforts. The packaging seemed more than adequate to     |                    |                |
| protect the contents during transit.                                             |                    |                |
+----------------------------------------------------------------------------------+--------------------+----------------+
| ExpressWay Log




**(C) Calculate Mean and Standard Deviation for Zero Shot and Few Shot (4 Marks)**

Compute the average (mean) and measure the variability (standard deviation) of the evaluation scores for both zero shot and few shot prompts.

In [34]:
zero_shot_performance_mean = np.mean(zero_shot_performance)
zero_shot_performance_std = np.std(zero_shot_performance)
display(f"Zero Shot Performance: {zero_shot_performance_mean:.2f} +/- {zero_shot_performance_std:.2f}")
# Calculate for Zero Shot

'Zero Shot Performance: 0.38 +/- 0.00'

In [32]:
few_shot_performance_mean = np.mean(few_shot_performance)
few_shot_performance_std = np.std(few_shot_performance)
display(f"Few Shot Performance: {few_shot_performance_mean:.2f} +/- {few_shot_performance_std:.2f}")
# Calculate for Few Shot

'Few Shot Performance: 0.81 +/- 0.00'

##**Step 5: Observation and Insights and Business perspective (10 Marks)**

Based on the projects, learner needs to share observations, learnings, insights and the business use case where these learnings can be beneficial.
Provide a breakdown of the percentage of positive and negative reviews. Additionally, explain how this classification can assist ExpressWay Logistics in addressing the issues identified.


In [38]:
# Calculate the percentage of each value in cs_reviews_df['sentiment'] column
sentiment_distribution = cs_reviews_df['sentiment'].value_counts(normalize=True) * 100
sentiment_distribution

sentiment
Positive    51.908397
Negative    48.091603
Name: proportion, dtype: float64

### Results and Conclusions

#### Model Observations

1. **LLM prompts produce "fuzzy" results. Approach with caution.** I noticed that even when specifying a binary classification with deterministic values, the nature of the model produced variable outputs. Spurious characters are produced which artificially degrade model results. _The F1 scores above contain a downward bias_ i.e. the models actually perform better than the code suggests. Even if the model correctly classified the review as negative it produced the label "Negative." which is computationally distinct from the ground truth label "Negative".

2. **Careful prompt construction and/or post-processing is required to get at true performance.** A better choice of labels and limiting the response to one token would alleviate the fuzzy results problem. For example, {-1, 1} would be preferred as a result. For a binary classifier this is ideal as the meann can quickly demonstrate overall sentiment. However, here the models clearly were taking a more nuanced take on classification, stubbornly but correctly classifying reviews as "Mixed" or "Neutral." In my view this makes GPT3.5 on its own, a bad choice for binary classifiers of text like this. We can post-postrocess to "cast" then into binary results but before doing so we should note what percentage of this out of band results are and act accordingly.

3. **Obviously zero shot underperforms few shot.** The prompt is more constrained with more context for the LLM to complete the task. Here the F1 score doubles from a deeply underperforming model to one with reasonable performance. I would regard an F1 score 0.81 as indicating a good but not great model. It is good enough to say general things about customer sentiment. "We have too many negative reviews." It is not good enough in my view to make super specific claims about sentiment. "We are net positive 4% on total reviews."

4. **It was possible to make a zero shot model perform equally to a few shot with the right prompt.** As I iterated on my prompts looking at the results, I was able to tweak them to the point where they performed (F1 score) identically to the few shot example prompts. This has implications for efficiencies in operations but also risks. I wouldn't recommend it for anything but a binary classifier.

#### Business Observations

1. **General conclusions are better than nothing and that can be enough to get started.** The old adage still applies: Know your customer. While I wouldn't depend on these models to tell me the exact figures, they provide me with plenty of important insights. For example, there are too damn many negative reviews! ExpressWay Logistics has problems that need to be addressed.

2. **A lot more fine grained analysis needs to be done.** This binary classifier is more of a quick, back of the envelope bit of analysis. What really needs to be done is:
    - **Let the LLM do it's job and classify things correctly as it wants.** If a review is "Mixed," it's mixed! It's neither positive or negative but both. It makes no sense to throw away results you paid dearly for that can drive important insights. Azure is not free! You need to understand your customer! 
    - **We need to have multi-class classification along multiple dimensions.** Even if we could perfectly classify reviews as positive or negative, we would still have no clue what these reviews are about. What kind of problem did the customer have? Late delivery? Damaged product? On the other hand, how did we delight them? What do they love about ExpressWay? We have no idea what problems to address nor what we should emphasize to improve customer experience.