<a href="https://colab.research.google.com/github/jaiyeoba-1/Projects/blob/main/Extracting_Insights_using_Gemini_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module: Insights Through AI

## Importing Libraries

In [None]:
import os

import google.generativeai as genai

## Model setup

In [None]:
# configure API Key
genai.configure(api_key="AIzaSyAxG3dNhhlRVXTz4BRUtSm_BJnAhTbHz3g")

In [None]:
# Create the model
# See https://ai.google.dev/api/python/google/generativeai/GenerativeModel
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 64,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}


In [None]:
model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
  # safety_settings = Adjust safety settings
  # See https://ai.google.dev/gemini-api/docs/safety-settings
)


## Start a chat session

In [None]:
chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        "You are a market research analyst specializing in consumer electronics.\\n",
      ],
    }
  ]
)


In [None]:
# Test different prompts
response1 = chat_session.send_message("What are the key factors customers consider when buying headphones?")
print(response1.text)

Customers consider a multitude of factors when purchasing headphones, which can be broadly categorized as follows:

**1. Sound Quality:** This is arguably the most crucial factor.  Specific elements within this category include:

* **Frequency Response:**  The range of frequencies the headphones can reproduce accurately (bass, mids, treble).  Consumers often prioritize balanced sound, strong bass, or clear treble depending on their musical preferences.
* **Clarity and Detail:** The ability to discern individual instruments and nuances in the audio.
* **Soundstage:** The perception of width and depth of the sound.  Important for immersive listening experiences.
* **Distortion:**  Unwanted additions or alterations to the original sound.  Lower distortion is preferred.


**2. Comfort and Fit:**  Long listening sessions necessitate comfortable headphones.  This includes:

* **Earcup Size and Design:**  Over-ear, on-ear, or in-ear styles cater to different preferences and ear shapes.  Paddi

In [None]:
response2 = chat_session.send_message("How would you analyze customer feedback about sound quality?")
print(response2.text)

Analyzing customer feedback on sound quality requires a systematic approach to uncover actionable insights.  Here's a breakdown of how I would analyze such feedback:

**1. Data Collection:**

* **Source Identification:**  Gather feedback from various sources including online reviews (Amazon, Best Buy, specialized audio forums), social media mentions, customer surveys, and focus groups.
* **Data Consolidation:**  Centralize all collected feedback into a single repository for efficient analysis.  This might involve using qualitative data analysis software.
* **Data Cleaning:**  Remove irrelevant information, duplicates, and spam.

**2. Qualitative Analysis:**

* **Thematic Analysis:** Identify recurring themes and patterns in the feedback.  This involves coding the feedback to categorize it into meaningful themes related to sound quality (e.g., bass response, treble clarity, soundstage, distortion, overall balance).
* **Sentiment Analysis:**  Determine the overall sentiment (positive, ne

## Section 3: Integrating Gemini AI with Your Dataset

Now that you've set up the API and started prompting Gemini, let's move on to using Gemini AI with a dataset. We'll be working with a pandas DataFrame to send data to the API for analysis.

In [None]:
import pandas as pd
import google.generativeai as genai

# Load your dataset
df = pd.read_csv('/content/smartwatch-reviews.csv')

# Replace 'your_api_key' with the actual API key
api_key = "AIzaSyAxG3dNhhlRVXTz4BRUtSm_BJnAhTbHz3g"

# Function to analyze reviews using Gemini AI
def analyze_reviews(reviews, api_key,prompt_message):
    genai.configure(api_key=api_key)

    model = genai.GenerativeModel(
        model_name="gemini-1.5-flash",
        generation_config={
            "temperature": 1,
            "top_p": 0.95,
            "top_k": 64,
            "max_output_tokens": 8192,
            "response_mime_type": "text/plain",
        },
    )

    chat_session = model.start_chat(history=[])

    prompt = prompt_message + "\n\n".join(reviews)
    response = chat_session.send_message(prompt)
    return response.text



In [None]:
# Select reviews to analyze
reviews = df['Description'].head(100).tolist()

# Create your prompt
prompt_message = "Analyze the following reviews and summarize the key insights:"

# Get insights
insights = analyze_reviews(reviews, api_key, prompt_message)
print(insights)

The reviews on the Apple Watch SE 2023 are largely positive, praising its features and user experience, but highlight significant battery life concerns.

**Positive Aspects:**

* **Excellent User Interface (UI):**  Reviewers consistently lauded the WatchOS as smooth, intuitive, and visually appealing, even for first-time Apple Watch users.
* **High Accuracy:**  The watch's fitness tracking capabilities were praised for their accuracy, closely matching high-end devices.
* **Call Quality:**  The quality of calls made through the watch was exceptionally good.
* **Build Quality and Design:** The watch's build quality is considered top-notch, with many reviewers appreciating the comfortable strap and sleek design.  The 40mm size was particularly noted as fitting smaller wrists well.
* **Value for Money (at certain price points):** Several reviewers emphasized receiving the watch at a discounted price, making it a good value compared to the Apple Watch 8.  Others considered it a good option 

In [None]:
df.head()

Unnamed: 0,Product Name,Name,Stars,Title,Date,Description
0,"Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...",Karthikeyan,5.0,5.0 out of 5 stars\nGood and reliable watch.,,Very good in terms of features and amazing qua...
1,"Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...",Vishvesh,4.0,4.0 out of 5 stars\nThe Perfect Budget Apple W...,,This is the first time I am using an Apple Wat...
2,"Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...",Bindu,5.0,5.0 out of 5 stars\nGood One,,Simple look with basic features.\nRead more
3,"Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...",Sandhyaram,2.0,2.0 out of 5 stars\nBattery problem,,2h charging 100%\nRead more
4,"Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...",Suresh,1.0,1.0 out of 5 stars\nNot as per image,,Screen is very small\nRead more


## Section 4: Gathering Insights

Now that you've set up Gemini AI and integrated it with your dataset, it's time to dive into gathering insights. This section will guide you through developing effective prompts, extracting key information, and interpreting the results to gain meaningful insights.

First we will do a little bit of cleaning from the data.



### Step 3.1: Organizing Product Information

First we will list all the products in our dataset.

In [None]:
# Check unique products in your dataset
unique_products = df['Product Name'].unique()
print("Products in dataset:")
for product in unique_products:
    print(f"- {product}")

Products in dataset:
- Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smartwatch with Starlight Aluminum Case with Starlight Sport Loop One Size. Fitness & Sleep Tracker, Crash Detection, Heart Rate Monitor, Retina Display
- Apple Watch SE (GPS + Cellular, 40mm) - Space Grey Aluminium Case with Midnight Sport Band - Regular
- Apple Watch Series 8 [GPS 45 mm] Smart Watch w/Midnight Aluminium Case with Midnight Sport Band. Fitness Tracker, Blood Oxygen & ECG Apps, Always-On Retina Display, Water Resistant
- Fitbit Inspire 2 Health & Fitness Tracker with a Free 1-Year Premium Trial, 24/7 Heart Rate, Black/Black, One Size (S & L Bands Included)
- Fitbit Versa 4 Fitness Watch (Black/Graphite Aluminium) with 6-Month Premium Membership
- Fitbit Inspire 3 Health & Fitness Tracker (Midnight Zen/Black) with 6-Month Premium Membership
- Garmin Vivoactive 3 GPS Smartwatch, 1.2 inch
- Garmin Forerunner 55, GPS Running Watch with Daily Suggested Workouts, Up to 2 Weeks of Battery Life (Whitestone)
- Garm

Now, let's create a mapping for these products.

In [None]:
# Create a mapping dictionary for product names
# NOTE: CREATE MAPPING FOR YOUR PRODUCTS HERE
product_mapping = {
    "Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smartwatch with Starlight Aluminum Case with Starlight Sport Loop One Size. Fitness & Sleep Tracker, Crash Detection, Heart Rate Monitor, Retina Display": {
        "short_name": "Apple Watch SE 2023 GPS",
        "brand": "Apple",
    },
    "Apple Watch SE (GPS + Cellular, 40mm) - Space Grey Aluminium Case with Midnight Sport Band - Regular": {
        "short_name": "Apple Watch SE Cellular",
        "brand": "Apple"
    },
    "Apple Watch Series 8 [GPS 45 mm] Smart Watch w/Midnight Aluminium Case with Midnight Sport Band. Fitness Tracker, Blood Oxygen & ECG Apps, Always-On Retina Display, Water Resistant": {
        "short_name": "Apple Watch Series 8",
        "brand": "Apple"
    },
    "Fitbit Inspire 2 Health & Fitness Tracker with a Free 1-Year Premium Trial, 24/7 Heart Rate, Black/Black, One Size (S & L Bands Included)": {
        "short_name": "Fitbit Inspire 2",
        "brand": "Fitbit"
    },
    "Fitbit Versa 4 Fitness Watch (Black/Graphite Aluminium) with 6-Month Premium Membership": {
        "short_name": "Fitbit Versa 4",
        "brand": "Fitbit"
    },
    "Fitbit Inspire 3 Health & Fitness Tracker (Midnight Zen/Black) with 6-Month Premium Membership": {
        "short_name": "Fitbit Inspire 3",
        "brand": "Fitbit"
    },
    "Garmin Vivoactive 3 GPS Smartwatch, 1.2 inch": {
        "short_name": "Garmin Vivoactive 3",
        "brand": "Garmin"
    },
    "Garmin Forerunner 55, GPS Running Watch with Daily Suggested Workouts, Up to 2 Weeks of Battery Life (Whitestone)": {
        "short_name": "Garmin Forerunner 55",
        "brand": "Garmin"
    },
    "Garmin Forerunner 45, 42MM Easy-to-Use GPS Running Watch with Garmin Coach Free Training Plan Support, Red (No-Cost EMI Available)": {
        "short_name": "Garmin Forerunner 45",
        "brand": "Garmin"
    },
    "Fossil Gen 6 Digital Black Dial Men's Watch-FTW4059": {
        "short_name": "Fossil Gen 6",
        "brand": "Fossil"
    },
    "Fossil Gen 6 Display Wellness Edition Black Smartwatch FTW4069": {
        "short_name": "Fossil Gen 6 Wellness",
        "brand": "Fossil"
    }
}



In [None]:
# Function to apply the mapping to your dataset
def clean_product_info(df, mapping):
    # Create new columns using the mapping
    df['short_name'] = df['Product Name'].map({k: v['short_name'] for k, v in mapping.items()})
    df['brand'] = df['Product Name'].map({k: v['brand'] for k, v in mapping.items()})

    # Handle any products not in mapping
    df['short_name'] = df['short_name'].fillna(df['Product Name'])
    df['brand'] = df['brand'].fillna('Unknown')

    return df


In [None]:
# Clean your dataset
df = clean_product_info(df, product_mapping)

# View the results
print("\\nSample of cleaned data:")
print(df[['Product Name', 'short_name', 'brand']].head())

\nSample of cleaned data:
                                        Product Name               short_name  \
0  Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...  Apple Watch SE 2023 GPS   
1  Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...  Apple Watch SE 2023 GPS   
2  Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...  Apple Watch SE 2023 GPS   
3  Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...  Apple Watch SE 2023 GPS   
4  Apple Watch SE (2nd Gen, 2023) [GPS 40mm] Smar...  Apple Watch SE 2023 GPS   

   brand  
0  Apple  
1  Apple  
2  Apple  
3  Apple  
4  Apple  


### Step 3.2: Adding Sentiment Information

Before we dive into advanced analysis, let's add sentiment scores to our reviews:

In [None]:
from textblob import TextBlob

def add_sentiment_info(df):
    # Add sentiment scores
    df['sentiment_score'] = df['Description'].apply(lambda x: TextBlob(str(x)).sentiment.polarity)

    # Categorize sentiments
    df['sentiment_category'] = df['sentiment_score'].apply(lambda x:
        'Positive' if x > 0.1
        else 'Negative' if x < -0.1
        else 'Neutral')

    return df


In [None]:
# Add sentiment information
df = add_sentiment_info(df)

# Quick view of sentiment distribution
print("\\nSentiment Distribution:")
print(df['sentiment_category'].value_counts()*100/len(df))

\nSentiment Distribution:
sentiment_category
Positive    96.145455
Neutral      2.836364
Negative     1.018182
Name: count, dtype: float64


### Step 3.3: Basic Product Analysis

Let's start with analyzing individual products:

In [None]:
def analyze_basic_product(reviews, product_name):
    prompt_message = f"""Analyze these reviews for {product_name}:
    1. What are the top 3 most mentioned features?
    2. What do customers love most about this product?
    3. What are the most common complaints?

    Please provide specific examples from the reviews."""


    return analyze_reviews(reviews, api_key, prompt_message)


In [None]:
# Example: Analyze a single product
product_reviews = df[df['short_name'] == 'Apple Watch SE 2023 GPS']['Description'].tolist()[:30]

basic_insights = analyze_basic_product(product_reviews, 'Apple Watch SE 2023 GPS')
print(basic_insights)

Based on the provided Apple Watch SE 2023 GPS reviews:

**1. Consumer Sentiment on Battery Life:**

The reviews present a mixed picture regarding battery life.  While some users report  "reliable in terms of performance and battery capacity" and a full charge in 2 hours,  a significant portion express dissatisfaction, describing the battery as "terrible" and calling for improvement from Apple.  There's no consensus; experience seems to vary.

**2. Reviews for Reference:**

Here are some representative reviews categorized by sentiment:

* **Positive:**  "Very good in terms of features and amazing quality. Reliable in terms of performance and battery capacity."  (This review is repeated multiple times, suggesting a common positive viewpoint, although the "reliable battery capacity" part should be considered in the context of the other negative feedback)  Another positive review extensively details the user's positive experience with the watch's features, UI, call quality, and accuracy, n

### Step 3.4: Sentiment-Based Analysis

Now let's analyze reviews based on sentiment:

In [None]:
def analyze_sentiment_patterns(reviews, sentiment_category, product_name):
    prompt_message = f"""Analyze these {sentiment_category.lower()} reviews for {product_name}:
    1. What specific aspects create {sentiment_category.lower()} reactions?
    2. Identify common phrases or descriptions used
    3. Note any comparisons with competitor products
    4. Find unexpected or unique feedback

    Support your analysis with specific examples."""

    return analyze_reviews(reviews, api_key, prompt_message)


In [None]:
# Example: Analyze positive reviews for a product
positive_reviews = df[
    (df['short_name'] == 'Apple Watch SE 2023 GPS') &
    (df['sentiment_category'] == 'Positive')
]['Description'].tolist()[:30]

positive_insights = analyze_sentiment_patterns(
    positive_reviews,
    'Positive',
    'Apple Watch SE 2023 GPS'
)
print(positive_insights)

Analysis of Apple Watch SE 2023 GPS Negative Reviews:

The provided reviews are overwhelmingly focused on a single aspect: **battery life**.

1. **Specific Aspects Creating Negative Reactions:**

The sole source of negativity in these reviews is the perceived poor battery life of the Apple Watch SE 2023 GPS.  There is no mention of other features or aspects of the watch.

2. **Common Phrases or Descriptions:**

The phrase "Battery of this watch is terrible" is repeated verbatim in every review.  The addition of  "I think if apple improve its Battery" is also consistently present, indicating a plea for improvement from Apple rather than complete dissatisfaction with the product.

3. **Comparisons with Competitor Products:**

No comparisons to competitor products are made in these reviews.  The feedback is solely focused on the Apple Watch SE 2023 GPS itself and its shortcomings.

4. **Unexpected or Unique Feedback:**

The most unexpected aspect is the complete lack of diversity in the f

### Step 3.5: Comparative Analysis

Let's compare products within and across brands:

In [None]:
def analyze_comparative_reviews(reviews_dict):
    # Combine all product names for the prompt
    products_list = ", ".join(reviews_dict.keys())

    prompt_message = f"""Compare customer feedback across these products: {products_list}

    For each product, analyze:
    1. Standout features and unique selling points
    2. Common complaints or limitations
    3. Price-to-value perception
    4. Target user preferences

    Then compare:
    1. Key differences in customer satisfaction
    2. Feature preferences across products
    3. Quality and reliability mentions

    Format as a structured comparison with specific examples."""

    # Combine reviews with product labels
    combined_reviews = []
    for product, reviews in reviews_dict.items():
        labeled_reviews = [f"[{product}] {review}" for review in reviews]
        combined_reviews.extend(labeled_reviews[:20])  # Take first 20 reviews per product

    return analyze_reviews(combined_reviews, api_key, prompt_message)


In [None]:
# Example: Compare two products
products_to_compare = ['Apple Watch SE 2023 GPS', 'Fitbit Versa 4']
reviews_dict = {}
for product in products_to_compare:
    reviews = df[df['short_name'] == product]['Description'].tolist()[:20]
    reviews_dict[product] = reviews

comparison_insights = analyze_comparative_reviews(reviews_dict)
print(comparison_insights)

## Apple Watch SE 2023 GPS vs. Fitbit Versa 4: A Customer Feedback Comparison

This comparison analyzes customer feedback for the Apple Watch SE 2023 GPS and the Fitbit Versa 4, focusing on key aspects of user experience.

**Apple Watch SE 2023 GPS**

**1. Standout Features & Unique Selling Points:**

* **Seamless Apple Ecosystem Integration:**  The strong integration with iPhones and other Apple devices is a major draw.  Users praise the smooth user interface (WatchOS), ease of use, and excellent call quality.
* **Accuracy of Health Tracking:** Many users highlight the accuracy of fitness tracking metrics (sleep, steps, heart rate) compared to other devices.  The crisp AMOLED display, although lacking Always-On Display, is also frequently mentioned positively.

**2. Common Complaints & Limitations:**

* **Lack of Advanced Health Features:** The absence of features like Always-On Display (AOD), blood oxygen monitoring (SpO2), and ECG is a recurring complaint, especially when compared t

### Step 3.6: Feature-Specific Analysis

Finally, let's analyze specific features across products:

In [None]:
def analyze_feature(reviews, feature_name):
    prompt_message = f"""Analyze mentions of {feature_name} in these reviews:
    1. How do customers describe the {feature_name}?
    2. What specific aspects are praised or criticized?
    3. How does {feature_name} compare across products?
    4. What improvements do customers suggest?

    Include specific quotes and examples."""

    return analyze_reviews(reviews, api_key, prompt_message)


In [None]:
# Example: Analyze battery life mentions
battery_reviews = df[
    df['Description'].str.contains('battery|charge|power', case=False, na=False)
]['Description'].tolist()[:200]

battery_insights = analyze_feature(battery_reviews, "battery life")
print(battery_insights)

Based on the provided reviews, here's an analysis of the Apple Watch battery life:

**1. How do customers describe the battery life?**

Customer descriptions of battery life are highly varied, ranging from overwhelmingly positive to extremely negative:

* **Positive:**  "Reliable in terms of performance and battery capacity," "A days battery life," "satisfactory."  Some users find that a single charge lasts a full day, even with workouts and calls.  One user even states that they only need to charge while commuting to work.

* **Negative:** "Battery of this watch is terrible," "Battery gets drained very fast when a workout mode is on," "You have to charge it daily. If you workout more then charge twice a day," "Poor battery back up. Lasts max up to 24 hrs."  Many users express frustration with needing to charge daily or even multiple times a day depending on usage.  The battery's performance during workouts and calls is specifically cited as a problem.  One reviewer even suggests a sma

## Conclusion
Now that you know how to use the Gemini API to gather insights, feel free to experiment as much as possible. Remember: The goal is to understand the voice of the customer through their feedback. By developing effective prompts and interpreting the AI's responses, you can uncover valuable insights that drive better decision-making.