# Documentation for `SimilarityScore.ipynb`

## Overview
This notebook computes semantic similarity scores between Amazon product reviews and reference complaint or shipping-related sentences using sentence embeddings. It also incorporates sentiment analysis to improve the precision of similarity-based filtering. The resulting features are aggregated at the product level.

## Main Steps

1.  Loads Amazon reviews with precomputed embeddings from `amazon_reviews_with_embeddings.parquet` (generated by `EmbeddingCalculationReviewText.ipynb`).

2.  Computes the similarity scores of the review text with a collection of reference complaint sentences (`sentences_complaint.pkl`). This uses the `all-MiniLM-L6-v2`. We also calculate the similarity score between review text and a set of shipping-related complaint sentences.

3. We use a sentiment classifier (`distilbert-base-uncased-finetuned-sst-2-english`) to compute sentiment scores for each review. To reduce false positives, we filter out reviews with high similarity but positive sentiment to the complaint sentences, in order to reduce false positives.

4. We aggregates mean and maximum similarity and sentiment scores per ASIN, and compute weighted and mean review embeddings per ASIN, giving more weight to negative reviews.

5. We then save the final product-level features and embeddings to `Data/review_features_df.pkl`.

## Input Files

- `amazon_reviews_with_embeddings.parquet` (Amazon reviews with sentence embeddings)
- `sentences_complaint.pkl` (reference complaint sentences)

## Output Files

- `amazon_reviews_with_similarity_scores.parquet` (reviews with similarity scores)
- `amazon_reviews_with_sim_sent_scores.parquet` (reviews with similarity and sentiment scores)
- `../Data/review_features_df.pkl` (aggregated product-level features and embeddings)


In [3]:
import pandas as pd
import numpy as np
import re
from sklearn.metrics.pairwise import cosine_similarity

## Load Amazon Reviews Data With Embeddings

In [3]:
amazon_df = pd.read_parquet("amazon_reviews_with_embeddings.parquet")

In [4]:
amazon_df.shape
amazon_df.columns

Index(['overall', 'vote', 'verified', 'reviewTime', 'reviewerID', 'asin',
       'reviewerName', 'reviewText', 'summary', 'unixReviewTime', 'image',
       'style', 'review_len_words', 'review_len_chars', 'reviewText_clean',
       'embedding'],
      dtype='object')

## Loading Model `all-MiniLM-L6-v2`

In [6]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2', device='cuda')
model.max_seq_length = 256  



In [7]:
def clean_text(text):
    text = str(text).strip()
    text = re.sub(r'\s+', ' ', text)
    text = re.sub(r'<.*?>', '', text)
    return text

In [8]:
def add_similarity_scores(amazon_df, sentences, model, similarity_column):
    # Clean and encode input sentences
    cleaned_sentences = [clean_text(s) for s in sentences if pd.notnull(s)]
    sentence_embeddings = model.encode(cleaned_sentences)
    mean_embedding = np.mean(sentence_embeddings, axis=0)

    # Create a copy to avoid modifying original DataFrame
    df = amazon_df.copy()

    # Initialize similarity column
    df[similarity_column] = None

    # Process non-null embeddings
    valid_mask = df['embedding'].notnull()
    if valid_mask.sum() == 0:
        return df  # No valid embeddings

    embedding_matrix = np.vstack(df.loc[valid_mask, 'embedding'])
    similarities = cosine_similarity(embedding_matrix, [mean_embedding]).flatten()
    df.loc[valid_mask, similarity_column] = similarities

    return df

In [9]:
def print_top_similar_reviews(df, score_column, text_column='reviewText', top_n=20):

    # Sort and get top reviews
    top_reviews = df.sort_values(by=score_column, ascending=False).head(top_n)

    # Print them nicely
    for i, (_, row) in enumerate(top_reviews.iterrows(), 1):
        print(f"--- Review {i} ---")
        print(f"Similarity Score: {row[score_column]:.4f}")
        print(f"{text_column}:\n{row[text_column]}\n")

### Try Similarity Scores With Five Complaint Sentences

In [11]:
sentence31 = "I began experiencing issues shortly after using the product. There was a strange fume that caused discomfort, and a burning sensation followed. Upon closer inspection, I noticed parts that seemed cracked or wet, possibly from a toxic or excessive chemical leak. My doctor advised a checkup after I developed irritation, and the specialist mentioned possible exposure to a carcinogen. I also saw rusted metal and a joint that had become loose, potentially leading to an internal hazard. It even skidded on smooth surfaces, and at one point got wedged under a cabinet. I sent a pic to support, but the response was delayed. This whole process was uncomfortable, and honestly, the product's failure left me feeling unsafe and frustrated."
sentence31 = "I want to issue a serious warning about this product. It posed multiple hazards and could potentially be dangerous under normal use. While following all instructions, I experienced a rash that later turned infected, possibly due to a contaminated coating or poor manufacturing. The item had visible fragment damage and arrived from a questionable batch, emitting a fishy smell that didn’t seem okay. During use, a piece suddenly split, and I nearly tripped, injuring my shoulder. The situation was upsetting enough that I contacted an attorney, and I’m currently investigating whether a teratogen or other harmful substance was involved. My doctor mentioned GI issues could be linked. I also noticed sharp barbs near the attachment point, and the vinyl edges were poorly sealed. The unit would shake uncontrollably on acceleration, making operation unsafe. My eardrum felt strained from a sudden boom sound, and I had to schedule an extraction due to an object blocking my nostril. Honestly, the experience has been a complete disaster, and it ought to be declared unfit for sale. I’ve saved the product, the decal, and even the knife used to open the package in case I need to settle this through legal means. I’d strongly advise others to avoid this item entirely. It's not just poorly made, it’s lethality without warning."
sentence31 = "After purchasing this item, I had a truly unsettling experience. I was preparing dinner and attempted to remove the lid, but it was jammed so tightly I had to tilt and pry it off, which led to a sudden faceplant as the container slipped. I pinched my hand in the process and spilled part of the frozen contents across the floor. The included packet of seasoning smelled bitter, and after cooking, I noticed the meat had a strange misshapen texture and an off taste. Within hours, I felt sick, with abdominal cramping and redness forming around my mouth — likely an allergic reaction. I checked the use-by date and realized it had passed. I’ve since submitted a complaint and requested the product be replaced, but the process has been frustrating. I feel this could have been a deadlydanger for someone with more severe allergies. The entire situation was terrifying — from the injury while opening it to the symptoms that followed. Honestly, the quality and safety of this product are unacceptable."
sentence13 = "I had an awful experience with this product and want to warn others. It was advertised as sturdy and safe, but that’s far from the truth. Shortly after unboxing, I noticed parts were rusted and chipped, and some metal components had started to flake. While using it, the mechanism suddenly jammed, and when I tried to adjust it, it abruptly collapsed, causing me to injure my ankle. The pain was intense enough that I later had to visit a neurologist due to nerve issues from the scrapes and swelling. Even worse, the smell of smoke emerged after brief use, and I had to immediately unplug it. I contacted a representative, but they simply told me to return the item without offering support or concern. It felt inevitable that someone would get seriously hurt using this. At one point, I even dropped it near my child, and one of the sharp parts nearly slit the surface of our posterior cabinet. I’m honestly afraid of using anything similar again — the risk of a crash or breakdown is just too high. Products like this should undergo more thorough safety checks before being sold."
sentence17 = "I had a terrible experience that left me deeply disappointed and honestly a little shaken. After just a few uses, the product began to malfunction, and a sharp edge cut my finger while trying to disassemble it for cleaning. The injury bled more than expected, and I had to visit emergency care. I was told the wound might result in a permanent scar. Even before that, parts had become sticky, and the material gave off a scenty, almost petroleum-like odor that irritated my skin. Shortly after storing it, a piece shattered near the axle, nearly hitting me in the chest. It’s shocking that a product with such volatile materials is allowed on the market. I’ve now removed it completely from my home and am trying to get it recalled. I wouldn’t wish this on anyone — it’s a threat to safety, and the company should take professional responsibility for the painful consequences I’ve endured. I’ll be much more cautious in the future and urge others to stay away."

sentences = [sentence31, sentence31, sentence31, sentence13, sentence17]

In [12]:
amazon_df = add_similarity_scores(amazon_df, sentences, model, "complaint_similarity5")

  attn_output = torch.nn.functional.scaled_dot_product_attention(


In [13]:
print_top_similar_reviews(amazon_df, score_column="complaint_similarity5", text_column='reviewText', top_n=20)

--- Review 1 ---
Similarity Score: 0.5947
reviewText:

--- Review 2 ---
Similarity Score: 0.5880
reviewText:
My son was just hospitalized by this I'm very upset and I will be doing everything I can to sue and have this removed from shelf everywhere this is no joke and is very harmful and deadly please be more careful using this product.

--- Review 3 ---
Similarity Score: 0.5867
reviewText:
The package was broken  and pieces were everywhere in the envelope and on the back it actually said could be poison and could effect health.  It was horrible. My husband threw it away instead of returning the product

--- Review 4 ---
Similarity Score: 0.5858
reviewText:
Product damaged and unusable.

--- Review 5 ---
Similarity Score: 0.5794
reviewText:
I ignored the several reviews that also stated they received theirs damaged, thought it was priced ok for a wood product. My box was thin but not damaged the product itself was damaged in several areas see photos, the aroma from the game is awful it

## Use More Sentences To Find The Cosine Similarities

In [15]:
# Load sentences:
import pickle

with open("sentences_complaint.pkl", "rb") as f:
    sentences_complaint = pickle.load(f)

In [16]:
amazon_df = add_similarity_scores(amazon_df, sentences_complaint, model, "complaint_similarity")

In [17]:
print_top_similar_reviews(amazon_df, score_column="complaint_similarity", text_column='reviewText', top_n=20)

--- Review 1 ---
Similarity Score: 0.6818
reviewText:
Product damaged and unusable.

--- Review 2 ---
Similarity Score: 0.6750
reviewText:

--- Review 3 ---
Similarity Score: 0.6651
reviewText:
I contacted the company directly to report a product that almost caused a fire and burned me, and they didn't care. They wouldn't send me a label to ship it back directly and had no interest in resolving a major product defect and potential fire hazard.

--- Review 4 ---
Similarity Score: 0.6623
reviewText:

--- Review 5 ---
Similarity Score: 0.6567
reviewText:
Product was severely damaged.

--- Review 6 ---
Similarity Score: 0.6499
reviewText:
Product was intact and undamaged.

--- Review 7 ---
Similarity Score: 0.6499
reviewText:
DO NOT PURCHASE!!! DANGEROUS AND FAULTY MANUFACTURING PRACTICES MAKE THIS PRODUCT UNSAFE AND HARMFUL, POTENTIALLY DANGEROUS LEADING TO DEATH! I HAVE CONTACTED THE POISON CONTROL CENTER AFTER MY DAUGHTER BECAME VIOLENTLY ILL AND DISCOVERED THAT THIS TOY BEGAN LEAKING C

## Shipping Related SImilarity Scores:

In [19]:
sentence_shipping = [
    "The package arrived late and the box was completely crushed.",
    "Item was missing from the box when it finally showed up.",
    "Took over two weeks to deliver, and it was the wrong item.",
    "Product arrived broken with pieces rattling in the box.",
    "Shipping took forever and tracking info was never updated.",
    "It never arrived. I had to contact customer service twice to get a refund.",
    "Arrived open and missing the accessories. Very disappointing.",
    "The outer box was soaked and falling apart. Product was unusable.",
    "Delivery was delayed multiple times. Not acceptable for a paid service.",
    "Item was tossed at the door. Packaging was torn and dented.",
    "Poorly packaged.",
    "Product was severely damaged"
]

In [20]:
amazon_df = add_similarity_scores(amazon_df, sentence_shipping, model, "shipping_similarity")

In [21]:
print_top_similar_reviews(amazon_df, score_column="shipping_similarity", text_column='reviewText', top_n=20)

--- Review 1 ---
Similarity Score: 0.8184
reviewText:
package came damaged, the box was squished and broken.

--- Review 2 ---
Similarity Score: 0.8180
reviewText:
It was delivered all out of the box, open and broken.

--- Review 3 ---
Similarity Score: 0.8168
reviewText:
Box was badly damaged during shipping.

--- Review 4 ---
Similarity Score: 0.8126
reviewText:
The package was received damaged.

--- Review 5 ---
Similarity Score: 0.8115
reviewText:
It was just as expected though the box was damaged on arrival, the product was intact.

--- Review 6 ---
Similarity Score: 0.8106
reviewText:
Package was damaged upon arrival. Very disappointing.

--- Review 7 ---
Similarity Score: 0.8087
reviewText:
Package came slightly damaged and some of the box was scratched and dented.

--- Review 8 ---
Similarity Score: 0.8068
reviewText:
The delivery was super late, and the product came damaged.

--- Review 9 ---
Similarity Score: 0.8068
reviewText:
poor packaging, arrived broken.

--- Review 10 -

### Improving Similarity Score Precision with Sentiment Filtering

While computing similarity scores between review embeddings and a complaint-related reference vector helped identify potential safety issues, we observed false positives: several reviews with high similarity scores were actually positive in sentiment. These reviews often mentioned terms like "product", "damage" or "issue" in a non-negative context.

To address this, we incorporated a sentiment classification layer. For reviews classified as positive, we set their similarity score to zero, ensuring that reviews with high semantic similarity but positive tone are not mistakenly flagged. This hybrid approach improves our model’s precision by filtering out irrelevant matches that share complaint-related vocabulary but not complaint-related intent.


In [28]:
# Save the amazon_df with new columns:
amazon_df.drop("complaint_similarity5", axis=1, inplace=True)
amazon_df.to_parquet("amazon_reviews_with_similarity_scores.parquet", index=True)

In [5]:
amazon_df = pd.read_parquet("amazon_reviews_with_similarity_scores.parquet")

In [6]:
amazon_df.shape

(8201231, 19)

In [7]:
amazon_df.columns

Index(['overall', 'vote', 'verified', 'reviewTime', 'reviewerID', 'asin',
       'reviewerName', 'reviewText', 'summary', 'unixReviewTime', 'image',
       'style', 'review_len_words', 'review_len_chars', 'reviewText_clean',
       'embedding', 'complaint_similarity', 'shipping_similarity',
       'sentiment'],
      dtype='object')

## Drop Null Values and Some Columns:

In [8]:
amazon_df = amazon_df.dropna(subset=['summary','reviewText', 'reviewerName'])

In [9]:
amazon_df = amazon_df.drop(columns=['image', 'vote', 'style', 'reviewTime'])

In [10]:
amazon_df.shape

(8191295, 15)

In [23]:
duplicates = amazon_df.duplicated(subset=['reviewerID', 'asin', 'unixReviewTime', 'overall'], keep=False)
print(f"Number of duplicate reviews: {duplicates.sum()}")

Number of duplicate reviews: 391965


## Calculating Similarity Scores Based on Sentiments

In [22]:
import torch
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from tqdm import tqdm
import pandas as pd

# Load model and tokenizer once
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_sent = AutoModelForSequenceClassification.from_pretrained(model_name)

def batched_sentiment_scores(texts, batch_size=64, score_type="positive"):
    """
    Returns sentiment scores (probability of POSITIVE or NEGATIVE) for input texts.
    """
    sentiment_scores = []
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model_sent.to(device)

    for i in tqdm(range(0, len(texts), batch_size), desc="Sentiment Scoring"):
        batch_texts = texts[i:i+batch_size]

        # Tokenize and move to device
        inputs = tokenizer(batch_texts, return_tensors='pt', truncation=True, padding=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        # Run inference
        with torch.no_grad():
            logits = model_sent(**inputs).logits
            probs = softmax(logits, dim=1).cpu().numpy()

        # Score index: 0 = NEGATIVE, 1 = POSITIVE
        idx = 1 if score_type.lower() == "positive" else 0
        sentiment_scores.extend(probs[:, idx])

    return sentiment_scores



In [24]:
amazon_df['positive_sentiment_score'] = batched_sentiment_scores(
    amazon_df['summary'].fillna("").tolist(), 
    batch_size=64, 
    score_type="positive"
)

Sentiment Scoring: 100%|███████████████████████████████████████████████████████| 127989/127989 [28:07<00:00, 75.83it/s]


In [47]:
positive_sentiment_score = amazon_df['positive_sentiment_score']

In [26]:
def print_top_similar_reviews_sentiment(df, score_column, text_column='reviewText', top_n=20):
    # Sort and get top reviews
    top_reviews = df.sort_values(by=score_column, ascending=False).head(top_n)

    # Print them nicely
    for i, (_, row) in enumerate(top_reviews.iterrows(), 1):
        print(f"--- Review {i} ---")
        print(f"Similarity Score: {row[score_column]:.4f}")
        print(f"Sentiment: {row['positive_sentiment_score']}")
        print(f"{text_column}:\n{row[text_column]}\n")

In [28]:
print_top_similar_reviews_sentiment(amazon_df, score_column="complaint_similarity", text_column='reviewText', top_n=20)

--- Review 1 ---
Similarity Score: 0.6818
Sentiment: 0.000347054039593786
reviewText:
Product damaged and unusable.

--- Review 2 ---
Similarity Score: 0.6750
Sentiment: 0.027122223749756813
reviewText:

--- Review 3 ---
Similarity Score: 0.6651
Sentiment: 0.016852473840117455
reviewText:
I contacted the company directly to report a product that almost caused a fire and burned me, and they didn't care. They wouldn't send me a label to ship it back directly and had no interest in resolving a major product defect and potential fire hazard.

--- Review 4 ---
Similarity Score: 0.6623
Sentiment: 0.04538879916071892
reviewText:

--- Review 5 ---
Similarity Score: 0.6567
Sentiment: 0.9622771739959717
reviewText:
Product was severely damaged.

--- Review 6 ---
Similarity Score: 0.6499
Sentiment: 0.9997640252113342
reviewText:
Product was intact and undamaged.

--- Review 7 ---
Similarity Score: 0.6499
Sentiment: 0.0064238193444907665
reviewText:
DO NOT PURCHASE!!! DANGEROUS AND FAULTY MANUFACT

In [33]:
amazon_df.to_parquet("amazon_reviews_with_sim_sent_scores.parquet", index=True)

In [27]:
amazon_df = pd.read_parquet("amazon_reviews_with_sim_sent_scores.parquet")

In [31]:
amazon_df.dropna(subset=['embedding'], inplace=True)

## Aggregate Features

### Mean Sentiment Score:

In [32]:
mean_scores_by_asin = amazon_df.groupby('asin').agg({
    'positive_sentiment_score': 'mean',
    'complaint_similarity': 'mean',
    'shipping_similarity': 'mean'
}).reset_index()

mean_scores_by_asin.rename(columns={
    'positive_sentiment_score': 'mean_sentiment_score',
    'complaint_similarity': 'mean_complaint_similarity',
    'shipping_similarity': 'mean_shipping_similarity'
}, inplace=True)

In [33]:
max_complaint_idx = amazon_df.groupby('asin')['complaint_similarity'].idxmax()
max_rows = amazon_df.loc[max_complaint_idx, ['asin', 'complaint_similarity', 'shipping_similarity', 'positive_sentiment_score']]
max_rows = max_rows.rename(columns={
    'complaint_similarity': 'max_complaint_similarity',
    'shipping_similarity': 'shipping_similarity_at_max_complaint',
    'positive_sentiment_score': 'sentiment_score_at_max_complaint'
})

In [41]:
mean_scores_by_asin.head()

Unnamed: 0,asin,mean_sentiment_score,mean_complaint_similarity,mean_shipping_similarity
0,191639,0.871852,0.043511,0.1769
1,4950763,0.999009,0.022326,0.086034
2,4983289,0.999608,0.010901,0.10404
3,5069491,0.030873,0.074038,0.085383
4,6466222,0.999009,0.03532,0.129227


In [39]:
max_rows.head()

Unnamed: 0,asin,max_complaint_similarity,shipping_similarity_at_max_complaint,sentiment_score_at_max_complaint
5742413,191639,0.043511,0.1769,0.871852
6515097,4950763,0.022326,0.086034,0.999009
5742417,4983289,0.045305,0.229722,0.999874
6697985,5069491,0.074038,0.085383,0.030873
5742419,6466222,0.03532,0.129227,0.999009


## Weighted Embedding and Mean Embedding of ReviewText

In [46]:
def combined_embedding(group, alpha=0.5):
    embeddings = np.vstack(group['embedding'])
    sentiment_scores = group['positive_sentiment_score'].values

    # Mean embedding (standard average)
    mean_emb = embeddings.mean(axis=0)

    # Weighted embedding (more weight to negative reviews)
    weights = 1 - sentiment_scores
    if weights.sum() == 0:
        weights = np.ones_like(weights)
    weights = weights / weights.sum()
    weighted_emb = np.average(embeddings, axis=0, weights=weights)

    # Final combined embedding
    return alpha * weighted_emb + (1 - alpha) * mean_emb

In [57]:
# Compute final embeddings per ASIN
review_embeddings = review_embeddings = amazon_df.groupby('asin')[['embedding', 'positive_sentiment_score']].apply(combined_embedding)

In [61]:
embedding_df = pd.DataFrame(review_embeddings.tolist(), index=review_embeddings.index)
embedding_df.columns = [f'embedding_{i}' for i in range(embedding_df.shape[1])]
embedding_df = embedding_df.reset_index()

In [62]:
embedding_df.head()

Unnamed: 0,asin,embedding_0,embedding_1,embedding_2,embedding_3,embedding_4,embedding_5,embedding_6,embedding_7,embedding_8,...,embedding_374,embedding_375,embedding_376,embedding_377,embedding_378,embedding_379,embedding_380,embedding_381,embedding_382,embedding_383
0,191639,0.027758,0.065257,0.017953,-0.051408,-0.115517,0.043882,-0.052921,-0.040914,-0.05521,...,0.081221,0.00093,0.020542,-0.000513,-0.017386,0.039457,-0.001327,0.092034,0.026612,0.110444
1,4950763,-0.094509,0.001439,0.04751,-0.052813,-0.019436,-0.030759,0.070818,-0.02399,0.047919,...,0.029028,0.017909,-0.015086,-0.018589,0.001332,0.042599,0.161014,-0.016039,-0.000539,0.075859
2,4983289,0.003233,0.037895,-0.020209,-0.065908,-0.076476,0.015781,0.030195,-0.062512,0.004691,...,0.094576,0.031822,-0.011569,0.012821,0.026619,0.047444,0.065055,0.032397,-0.048743,0.048678
3,5069491,-0.019407,-0.085453,0.039475,-0.008537,-0.043666,0.025446,0.028984,-0.028728,0.05547,...,0.029462,0.064632,-0.056036,0.085632,-0.007459,0.025162,-0.024561,0.022096,0.016897,0.061657
4,6466222,-0.043714,0.038217,-0.052901,0.010238,-0.103368,-0.003938,0.03367,0.066004,0.091853,...,0.067709,-0.049012,-0.03867,-0.042237,-0.067665,0.007688,0.146708,0.049473,0.004074,-0.054181


In [67]:
review_features_df = embedding_df.merge(mean_scores_by_asin, on='asin', how='inner')
review_features_df = review_features_df.merge(max_rows, on='asin', how='inner')

In [69]:
review_features_df.columns

Index(['asin', 'embedding_0', 'embedding_1', 'embedding_2', 'embedding_3',
       'embedding_4', 'embedding_5', 'embedding_6', 'embedding_7',
       'embedding_8',
       ...
       'embedding_380', 'embedding_381', 'embedding_382', 'embedding_383',
       'mean_sentiment_score', 'mean_complaint_similarity',
       'mean_shipping_similarity', 'max_complaint_similarity',
       'shipping_similarity_at_max_complaint',
       'sentiment_score_at_max_complaint'],
      dtype='object', length=391)

In [75]:
review_features_df.shape

(624529, 391)

In [71]:
review_features_df.head()

Unnamed: 0,asin,embedding_0,embedding_1,embedding_2,embedding_3,embedding_4,embedding_5,embedding_6,embedding_7,embedding_8,...,embedding_380,embedding_381,embedding_382,embedding_383,mean_sentiment_score,mean_complaint_similarity,mean_shipping_similarity,max_complaint_similarity,shipping_similarity_at_max_complaint,sentiment_score_at_max_complaint
0,191639,0.027758,0.065257,0.017953,-0.051408,-0.115517,0.043882,-0.052921,-0.040914,-0.05521,...,-0.001327,0.092034,0.026612,0.110444,0.871852,0.043511,0.1769,0.043511,0.1769,0.871852
1,4950763,-0.094509,0.001439,0.04751,-0.052813,-0.019436,-0.030759,0.070818,-0.02399,0.047919,...,0.161014,-0.016039,-0.000539,0.075859,0.999009,0.022326,0.086034,0.022326,0.086034,0.999009
2,4983289,0.003233,0.037895,-0.020209,-0.065908,-0.076476,0.015781,0.030195,-0.062512,0.004691,...,0.065055,0.032397,-0.048743,0.048678,0.999608,0.010901,0.10404,0.045305,0.229722,0.999874
3,5069491,-0.019407,-0.085453,0.039475,-0.008537,-0.043666,0.025446,0.028984,-0.028728,0.05547,...,-0.024561,0.022096,0.016897,0.061657,0.030873,0.074038,0.085383,0.074038,0.085383,0.030873
4,6466222,-0.043714,0.038217,-0.052901,0.010238,-0.103368,-0.003938,0.03367,0.066004,0.091853,...,0.146708,0.049473,0.004074,-0.054181,0.999009,0.03532,0.129227,0.03532,0.129227,0.999009


In [None]:
review_features_df.to_pickle("../Data/review_features_df.pkl")