# <div style="padding: 30px; color:white; margin:10; font-size:95%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b> Intel Review - Product Pros & Cons</b></div>

<div style="background-color: #3b3745; border-radius: 12px; padding: 20px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
    <h2 style="color: #F1A424; text-align: center;">Table of Contents</h2>
    <ul style="list-style: none; padding: 0;">
        <li><a href="#section-1" style="color: white; text-decoration: none; display: flex; align-items: center; padding: 8px 15px; border-radius: 6px; transition: background-color 0.3s;"><span style="margin-right: 10px; font-weight: bold; color: #F1A424;">1.</span>  Importing Libraries</a></li>
        <li><a href="#section-2" style="color: white; text-decoration: none; display: flex; align-items: center; padding: 8px 15px; border-radius: 6px; transition: background-color 0.3s;"><span style="margin-right: 10px; font-weight: bold; color: #F1A424;">2.1</span>Loading the CSV file in dataframe </a></li>
        <li><a href="#section-3" style="color: white; text-decoration: none; display: flex; align-items: center; padding: 8px 15px; border-radius: 6px; transition: background-color 0.3s;"><span style="margin-right: 10px; font-weight: bold; color: #F1A424;">2.2</span>  Text Pre-Processing</a></li>
        <li><a href="#section-4" style="color: white; text-decoration: none; display: flex; align-items: center; padding: 8px 15px; border-radius: 6px; transition: background-color 0.3s;"><span style="margin-right: 10px; font-weight: bold; color: #F1A424;">3.</span> Pros & Cons of Intel Processors in a tabular format</a></li>    
    </ul>
</div>

<a id="section-1"></a>
# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'>1 |</span></b> <b>  Importing Libraries</b></div>

In [2]:
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from tabulate import tabulate
nltk.download('stopwords')
nltk.download('vader_lexicon')

[nltk_data] Downloading package stopwords to /Users/jyo/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/jyo/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True


 <a id="section-2"></a>
# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'> 2.1 |</span></b> <b>Loading the CSV file in dataframe 
</b></div>


In [3]:
df = pd.read_csv('2-dataset_7(senti)_vader.csv')



 <a id="section-3"></a>
# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'> 2.2 |</span></b> <b> Text Pre-Processing
</b></div>


In [4]:
# Additional stopwords
additional_stopwords = set([
    'also', 'would', 'could', 'like', 'make', 'sure', 'go', 
    'get', 'got', 'recommend', 'highly', 'review', 'want', 'hard'
])

# Combine NLTK and additional stopwords
stop_words = set(stopwords.words('english')).union(additional_stopwords)

def preprocess(text):
    # Remove punctuation, numbers, and convert to lowercase
    text = re.sub(r'[^\w\s]', '', text)
    text = re.sub(r'\d+', '', text).lower()
    # Remove stopwords
    text = ' '.join(word for word in text.split() if word not in stop_words)
    return text

# Apply preprocessing to the review column
df['cleaned_review'] = df['Review'].apply(preprocess)


 <a id="section-4"></a>
# <div style="padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745"><b><span style='color:#F1A424'> 3 |</span></b> <b> Pros & Cons of Intel Processors in a tabular format
</b></div>


In [2]:
sid = SentimentIntensityAnalyzer()

def get_sentiment(bigram):
    return sid.polarity_scores(bigram)['compound']

def analyze_reviews(reviews):
    # Extract bigrams with minimum document frequency of 2
    vectorizer = CountVectorizer(ngram_range=(2, 2), min_df=2)
    X = vectorizer.fit_transform(reviews)
    
    # Sum up the counts of each bigram
    bigram_counts = X.sum(axis=0).A1
    bigram_features = vectorizer.get_feature_names_out()
    
    # Create a DataFrame with the bigrams and their counts
    bigram_df = pd.DataFrame({'bigram': bigram_features, 'count': bigram_counts})
    bigram_df = bigram_df.sort_values(by='count', ascending=False)
    
    # Apply sentiment analysis to the bigrams
    bigram_df['sentiment'] = bigram_df['bigram'].apply(get_sentiment)
    
    # Separate positive and negative bigrams
    positive_bigrams = bigram_df[bigram_df['sentiment'] > 0]
    negative_bigrams = bigram_df[bigram_df['sentiment'] < 0]
    
    # Filter bigrams to keep only meaningful phrases (two words)
    positive_bigrams = positive_bigrams[positive_bigrams['bigram'].apply(lambda x: len(x.split()) == 2)]
    negative_bigrams = negative_bigrams[negative_bigrams['bigram'].apply(lambda x: len(x.split()) == 2)]
    
    # Get the top 5 positive and negative bigrams
    top_5_pros = positive_bigrams.head(5)
    top_5_cons = negative_bigrams.head(5)
    
    return top_5_pros, top_5_cons

# Initialize dictionaries to store the pros and cons for each product
product_pros = {}
product_cons = {}

# Group reviews by product and analyze each group
for product, group in df.groupby('Product'):
    top_5_pros, top_5_cons = analyze_reviews(group['cleaned_review'])
    product_pros[product] = top_5_pros
    product_cons[product] = top_5_cons

# Display the results in a structured format
for product in product_pros:
    print(f"Product: {product}")
    print("Top 5 Positive:")
    print(tabulate(product_pros[product], headers='keys', tablefmt='psql'))
    print("\nTop 5 Negative:")
    if not product_cons[product].empty:
        print(tabulate(product_cons[product], headers='keys', tablefmt='psql'))
    else:
        print("No significant negative bigrams found.")
    print("\n" + "="*50 + "\n")

Product: intel-i3-10100
Top 5 Pros:
+-----+--------------+---------+-------------+
|     | bigram       |   count |   sentiment |
|-----+--------------+---------+-------------|
|  59 | good price   |       6 |      0.4404 |
| 131 | run cool     |       5 |      0.3182 |
|  36 | easy install |       4 |      0.4404 |
|  65 | great cpu    |       4 |      0.6249 |
|  82 | im happy     |       4 |      0.5719 |
+-----+--------------+---------+-------------+

Top 5 Cons:
+-----+----------------+---------+-------------+
|     | bigram         |   count |   sentiment |
|-----+----------------+---------+-------------|
|  97 | low power      |       4 |     -0.2732 |
|  98 | low setting    |       2 |     -0.2732 |
| 119 | price terrible |       2 |     -0.4767 |
+-----+----------------+---------+-------------+


Product: intel-i3-12100f
Top 5 Pros:
+-----+----------------+---------+-------------+
|     | bigram         |   count |   sentiment |
|-----+----------------+---------+-------------|

                                   This concludes the analysis presented in this notebook.
