# About the dataset

The "Consumer Reviews of Amazon Products" dataset is a comprehensive collection of data that includes detailed information about products sold on Amazon, along with customer reviews. This dataset is valuable for a variety of research and analysis purposes, particularly in fields such as Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML). Here's a summary of what the dataset contains based on the provided sources:

1. Basic Product Information: This includes details about the products being reviewed, such as descriptions, category information, price, brand, and image features.
2. Reviews: The dataset encompasses reviews which contain ratings, text, and votes on the helpfulness of the review. This allows for analysis of customer sentiment and the perceived quality of products.
3. Metadata: Additional metadata provided includes product metadata like descriptions and category information, which can be used for more detailed analysis and categorization of products.
4. Usage Conditions: Accessing and using these datasets comes with specific conditions, especially for academic research. Commercial use is generally restricted, and users are prohibited from attempting to identify the authors of the reviews.

The dataset serves as a rich source of information for analyzing customer product experiences, understanding variations in product perception across different regions, and studying promotional intent or bias in reviews. It's constructed to represent a wide range of customer evaluations and opinions, making it a valuable resource for researchers and analysts looking to dive deep into consumer behavior.

# Preprocessing

I preprocess the the in several steps.

1. I first drop the reviews that do not contain a review text.
2. I apply basic text cleaning on the text review such as `.lower()` and `strip()`
3. Finally, I apply an NLP filtering for the stop-words using the `spacy` library.

# Evaluation of results.

I choose random samples from the dataset and apply my sentiment analysis. To evaluate my results, I cross-check the predicted 'polarity' scores of the `spacy` library with the provided text and rating score.

# Limitations

The `en_core_web_sm` `spacy` model is a limited model for NLP purposes. The predictions are not accurate all the time. For example:
```
Review: my husband loves this. we coupled it with the playstation vue streaming service for $30/mo and dropped cox cable totally. we found that the amazon fire box offers a channel guide for ps vue that the roku does not, which is def a helpful feature to have with ps vue.
Rating: 5.0
Polarity: 0.05
```
As we can wee, although the review text and rating score are highly positive, the polarity score is almost neutral. This might be caused by the model's over-sensitivity to the corpus such as  'dropped cox cable totally' from the review.

In [1]:
import spacy
import pandas
from spacytextblob.spacytextblob import SpacyTextBlob

In [2]:
# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe('spacytextblob')


<spacytextblob.spacytextblob.SpacyTextBlob at 0x11fc6059e50>

In [3]:
df = pandas.read_csv('data/amazon_product_reviews.csv')
clean_data = df.dropna(subset=['reviews.text'])

  df = pandas.read_csv('data/amazon_product_reviews.csv')


In [12]:
def sentiment_analysis(text: str):
    # Basic text cleaning
    text = text.lower()
    text = text.strip()
    # Process text
    document = nlp(text)
    # Remove stop words
    filtered_tokens = [token.text for token in document if not token.is_stop]
    clean_text = ' '.join(filtered_tokens)
    clean_doc = nlp(clean_text)
    polarity = clean_doc._.blob.polarity
    return polarity

In [18]:
n_samples = 10
# Select random rows
sample_df = clean_data.sample(n_samples).reset_index()
for i in range(n_samples):
    print(f'---Processing sample {i}')
    text = sample_df['reviews.text'][i]
    rating = sample_df['reviews.rating'][i]
    polarity = sentiment_analysis(text)
    print(f'Review: {text}')
    print(f'Rating: {rating}')
    print(f'Polarity: {polarity}')

---Processing sample 0
Review: My older kids helped me put in the child safety features and then it works great for my special needs daughter
Rating: 5.0
Polarity: 0.44126984126984126
---Processing sample 1
Review: I like the size of the device as well as the expandable storage
Rating: 5.0
Polarity: 0.0
---Processing sample 2
Review: Good features alexa integrates very well into daily lives also speaker sound is quite good
Rating: 5.0
Polarity: 0.45
---Processing sample 3
Review: I'm really enjoying my kindle fire. I love the size and the ability to fit it into my purse. Great for reading on the plane.
Rating: 5.0
Polarity: 0.55
---Processing sample 4
Review: Unit has fast response time. Graphics are very clear.
Rating: 4.0
Polarity: 0.15000000000000002
---Processing sample 5
Review: This is a great tablet. I bought it for my daughter and it's very easy for her to use.
Rating: 5.0
Polarity: 0.6166666666666667
---Processing sample 6
Review: Amazon FireTV is a powerful little box. Worth 