# Review Analysis and Sales Data Integration


## Introduction
Understanding customer sentiment through product reviews is crucial for businesses aiming to enhance their product offerings and customer satisfaction. This use case will guide you through the process of analyzing product reviews, classifying them into various emotional categories using natural language processing (NLP), and linking these reviews with sales data. By the end of this use case, you will be able to uncover insights about how different emotional categories of reviews correlate with product sales, helping you make data-driven decisions to improve your products and marketing strategies.

## Running the Tutorial in Different Environments
This tutorial can be executed both within ThanoSQL Lab and in a local Python/Jupyter environment. Whether you prefer to work directly within ThanoSQL Lab's integrated environment or set up a local development environment on your machine, the instructions provided will guide you through the necessary steps.

## Dataset
We will be working with the following datasets:
- **Review Comments Table (review_comments)**: Contains textual reviews of products.
  - `ProductId`: Unique identifier for each product.
  - `UserId`: Unique identifier for each user.
  - `Text`: Text of the product review.
- **Review Comments Sentiment Table (review_comments_sentiment)**: Contains product reviews along with their sentiment classification.
  - `ProductId`: Unique identifier for each product.
  - `UserId`: Unique identifier for each user.
  - `Text`: Text of the product review.
  - `Sentiment`: Sentiment classification of the review.
- **Sales Data Table (review_sales)**: Contains sales data for each product.
  - `ProductId`: Unique identifier for each product.
  - `Score`: Sales score for the product.

## Goals
1. Classify product reviews into emotional categories.
2. Link classified reviews with corresponding sales data.
3. Generate actionable insights based on the analysis of linked data.

## Procedure

### Download Datasets

First, we will download the datasets. 

In [17]:
!wget -O use_case_3_data.zip https://raw.githubusercontent.com/smartmind-team/assets/main/datasets/use_cases/use_case_3_data.zip

--2024-06-11 01:41:25--  https://raw.githubusercontent.com/smartmind-team/assets/main/datasets/use_cases/use_case_3_data.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1461928 (1.4M) [application/zip]
Saving to: ‘use_case_3_data.zip’


2024-06-11 01:41:25 (138 MB/s) - ‘use_case_3_data.zip’ saved [1461928/1461928]



In [18]:
!unzip use_case_3_data.zip

Archive:  use_case_3_data.zip
  inflating: comments_data_sentiment.csv  
  inflating: comments_data.csv       
  inflating: sales_data.csv          


### Import ThanoSQL Library
Import the ThanoSQL library and create a client instance. This client will be used to interact with the ThanoSQL engine.

**You can find your API Token and Engine URL by following these steps:**

1. Go to your workspace’s settings page.
2. Navigate to the "Developer" tab.
3. Locate and copy your API Token and Engine URL.

In [None]:
from thanosql import ThanoSQL
client = ThanoSQL(api_token="your_api_token", engine_url="engine_url")

### Upload Data to Tables

#### Upload the `review_sales` table which contains sales data for each product.

In [2]:
table = client.table.upload("review_sales", "sales_data.csv", if_exists='replace')
table.get_records(limit=10).to_df()

Unnamed: 0,ProductId,Score
0,B005K4Q1VI,5
1,B005K4Q1VI,4
2,B005K4Q1VI,5
3,B005K4Q1VI,5
4,B005K4Q1VI,5
5,B005K4Q1VI,3
6,B005K4Q1VI,5
7,B005K4Q1VI,5
8,B005K4Q1VI,1
9,B005K4Q1VI,5


This step uploads the `review_sales` data to ThanoSQL and retrieves the first 10 records to confirm the upload.

#### Upload the `review_comments` table which contains textual reviews of products.

In [3]:
table = client.table.upload("review_comments", "comments_data.csv", if_exists='replace')
table.get_records(limit=10).to_df()

Unnamed: 0,ProductId,UserId,Text
0,B005K4Q1VI,A1BJHZE41QWBX6,I received the K-cups on time and in execellan...
1,B005K4Q1VI,A365A90ZY1Q80T,This is the best hot cocoa I've had in a K-Cup...
2,B005K4Q1VI,AQ80LJNDLVZPG,"I'm pregnant and had to,give up my hot cup of ..."
3,B005K4Q1VI,A2NWIF2APZNNV0,This is quite the buy compared to other hot ch...
4,B005K4Q1VI,A9RIPXAHUQBKY,Kids loved it. Great product at a great price....
5,B005K4Q1VI,A1NK86G7OK4E54,"Even on the small cup setting of the elite, th..."
6,B005K4Q1VI,A3D7QMIGNEID0P,The hot chocolate is my husband's favorite. S...
7,B005K4Q1VI,A103U3KR4L2ZXT,I like the rich dark smooth taste of the Grove...
8,B005K4Q1VI,A2CJKQ0DOE5ARF,Looks like and tastes like chocolate colored w...
9,B005K4Q1VI,A14Z8PWKLIRDLA,Excellent coffee for people on a tight schedul...


This step uploads the `review_comments` data to ThanoSQL and retrieves the first 10 records to confirm the upload.

#### Upload the `review_comments_sentiment` table which contains product reviews along with their sentiment classification.

In [4]:
table = client.table.upload("review_comments_sentiment", "comments_data_sentiment.csv", if_exists='replace')
table.get_records(limit=10).to_df()

Unnamed: 0,ProductId,UserId,Text,Sentiment
0,B005K4Q1VI,A1BJHZE41QWBX6,I received the K-cups on time and in execellan...,love
1,B005K4Q1VI,A365A90ZY1Q80T,This is the best hot cocoa I've had in a K-Cup...,joy
2,B005K4Q1VI,AQ80LJNDLVZPG,"I'm pregnant and had to,give up my hot cup of ...",love
3,B005K4Q1VI,A2NWIF2APZNNV0,This is quite the buy compared to other hot ch...,love
4,B005K4Q1VI,A9RIPXAHUQBKY,Kids loved it. Great product at a great price....,love
5,B005K4Q1VI,A1NK86G7OK4E54,"Even on the small cup setting of the elite, th...",sadness
6,B005K4Q1VI,A3D7QMIGNEID0P,The hot chocolate is my husband's favorite. S...,joy
7,B005K4Q1VI,A103U3KR4L2ZXT,I like the rich dark smooth taste of the Grove...,joy
8,B005K4Q1VI,A2CJKQ0DOE5ARF,Looks like and tastes like chocolate colored w...,joy
9,B005K4Q1VI,A14Z8PWKLIRDLA,Excellent coffee for people on a tight schedul...,joy


This step uploads the `review_comments_sentiment` data to ThanoSQL and retrieves the first 10 records to confirm the upload.

### Classify Reviews and Aggregate Sales Data

Predict sentiment for reviews using a pre-trained model and aggregate the data to link reviews with sales.

#### Predict Sentiment for Review

In [5]:
query_result = client.query.execute("""
    SELECT
      "ProductId",
      "Text",
      (jsonb_array_elements_text(
          (thanosql.predict(
              input := "Text",
              model := 'bhadresh-savani/distilbert-base-uncased-emotion',
              model_args := '{"top_k": 1}',
              token := 'your_token'
          )::jsonb)
      )::jsonb ->> 'label') AS "Sentiment"
    FROM
      review_comments
    LIMIT 10;
""")
query_result.records.to_df()

Unnamed: 0,ProductId,Text,Sentiment
0,B005K4Q1VI,I received the K-cups on time and in execellan...,love
1,B005K4Q1VI,This is the best hot cocoa I've had in a K-Cup...,joy
2,B005K4Q1VI,"I'm pregnant and had to,give up my hot cup of ...",love
3,B005K4Q1VI,This is quite the buy compared to other hot ch...,love
4,B005K4Q1VI,Kids loved it. Great product at a great price....,love
5,B005K4Q1VI,"Even on the small cup setting of the elite, th...",sadness
6,B005K4Q1VI,The hot chocolate is my husband's favorite. S...,joy
7,B005K4Q1VI,I like the rich dark smooth taste of the Grove...,joy
8,B005K4Q1VI,Looks like and tastes like chocolate colored w...,joy
9,B005K4Q1VI,Excellent coffee for people on a tight schedul...,joy


This query predicts the sentiment of each review using the `distilbert-base-uncased-emotion` model and retrieves the first 10 predictions to confirm the process.


#### Link Reviews with Sales Data

Group reviews by `ProductId` and `Sentiment` to show the count of each sentiment and the corresponding sales data for each product.


In [6]:
query_result = client.query.execute("""
    SELECT 
      s."ProductId",
      SUM(s."Score") AS total_quantity_sold,
      r."Sentiment",
      COUNT(r."Sentiment") AS sentiment_count
    FROM 
      review_sales s
    JOIN
      (
          SELECT
              "ProductId",
              "Text",
              (jsonb_array_elements_text(
                  (thanosql.predict(
                      input := "Text",
                      model := 'bhadresh-savani/distilbert-base-uncased-emotion',
                      model_args := '{"top_k": 1}',
                      token := 'your_token'
                  )::jsonb)
              )::jsonb ->> 'label') AS "Sentiment"
          FROM
              review_comments
          LIMIT 100
      ) r
    ON 
      s."ProductId" = r."ProductId"
    GROUP BY 
      s."ProductId", r."Sentiment"
    ORDER BY 
      s."ProductId", sentiment_count DESC;
    """)
query_result.records.to_df()

Unnamed: 0,ProductId,total_quantity_sold,Sentiment,sentiment_count
0,B005K4Q1VI,79546.0,joy,20088
1,B005K4Q1VI,24377.0,love,6156
2,B005K4Q1VI,11547.0,anger,2916
3,B005K4Q1VI,8981.0,sadness,2268
4,B005K4Q1VI,3849.0,surprise,972


This query links the reviews with sales data by grouping them by `ProductId` and `Sentiment`. It calculates the total quantity sold and the count of each sentiment for each product.

#### Analyze Positive Reviews and Sales Performance

Show the relationship between sales and positive reviews (`joy`, `love`, `surprise`, `neutral`) for each product.

In [8]:
query_result = client.query.execute("""
    WITH positive_reviews AS (
      SELECT 
          "ProductId",
          COUNT(*) AS positive_count
      FROM 
          (
              SELECT
                  "ProductId",
                  "Text",
                  (jsonb_array_elements_text(
                      (thanosql.predict(
                          input := "Text",
                          model := 'bhadresh-savani/distilbert-base-uncased-emotion',
                          model_args := '{"top_k": 1}',
                          token := 'your_token'
                      )::jsonb)
                  )::jsonb ->> 'label') AS "Sentiment"
              FROM
                  review_comments
              LIMIT 100
          )
      WHERE 
          "Sentiment" IN ('joy', 'love', 'surprise', 'neutral')
      GROUP BY 
          "ProductId"
    ),
    total_sales AS (
      SELECT 
          "ProductId",
          SUM("Score") AS total_quantity_sold
      FROM 
          review_sales
      GROUP BY 
          "ProductId"
    )
    SELECT 
      ts."ProductId",
      ts.total_quantity_sold,
      COALESCE(pr.positive_count, 0) AS positive_count,
      CASE 
          WHEN pr.positive_count IS NULL THEN 'No Positive Reviews'
          WHEN ts.total_quantity_sold > (
              SELECT AVG(total_quantity_sold) 
              FROM total_sales
          ) THEN 'Above Average Sales'
          ELSE 'Below Average Sales'
      END AS sales_performance
    FROM 
      total_sales ts
    LEFT JOIN 
      positive_reviews pr
    ON 
      ts."ProductId" = pr."ProductId"
    ORDER BY 
      ts.total_quantity_sold DESC;
""")
query_result.records.to_df()

Unnamed: 0,ProductId,total_quantity_sold,positive_count,sales_performance
0,B0034EDMCW,1490.0,0,No Positive Reviews
1,B0034EDM2W,1490.0,0,No Positive Reviews
2,B0034EDLS2,1490.0,0,No Positive Reviews
3,B0034EFIYC,1490.0,0,No Positive Reviews
4,B0034EDMLI,1490.0,0,No Positive Reviews
5,B006H34CUS,1476.0,0,No Positive Reviews
6,B000EVOSE4,1439.0,0,No Positive Reviews
7,B0076MLL12,1283.0,0,No Positive Reviews
8,B005K4Q4KG,1283.0,0,No Positive Reviews
9,B005K4Q1T0,1283.0,0,No Positive Reviews


This query analyzes the relationship between positive reviews and sales performance for each product. It categorizes the sales performance based on the average sales.

## Conclusion
This use case has guided you through the process of using ThanoSQL to analyze product reviews and their impact on sales performance. By classifying reviews into emotional categories and linking them with sales data, you can uncover valuable insights that help in understanding customer sentiment and its correlation with product success. This analysis can inform your marketing strategies, product development, and customer service approaches, ultimately contributing to improved customer satisfaction and business growth.
