# Name: Prateek Majumder

### **Exercise: Sentiment Analysis and Key Insights Extraction from Ford Car Reviews**

### **Problem Statement:**
You have been provided with a dataset containing Ford car reviews. Your task is to use LangChain and the concepts you’ve learned to perform the following tasks:

1. **Sentiment Analysis**: Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and store the result.
2. **Key Insights Extraction**: Extract key pieces of information from each review, such as the pros and cons mentioned, and the specific features the reviewer liked or disliked (e.g., vehicle performance, comfort, price).

You will build a LangChain-based solution that leverages language models to automatically extract this information and provide a structured summary of the reviews.

---
### **Steps to Solve:**

#### **Step 1: Load the Dataset**
- The dataset file is named `ford_car_reviews.csv` and is sourced from Kaggle: [Edmunds Consumer Car Ratings and Reviews](https://www.kaggle.com/datasets/ankkur13/edmundsconsumer-car-ratings-and-reviews).
- For this exercise, **limit the data to the first 25 records**. This can be achieved by using `df.head(25)` or `df.iloc[:25]` when loading the data into a DataFrame.

#### **Step 2: Define the Sentiment Analysis Task**
- Use LangChain to create a pipeline to classify the sentiment of each review.
- Define prompts that can guide the model to evaluate the sentiment. For example:
  - "Given the following car review, classify the sentiment as positive, neutral, or negative."

#### **Step 3: Key Insights Extraction**
- Use LangChain to create a pipeline to extract pros, cons, and notable features from each review. Define prompts such as:
  - "What are the pros and cons of the vehicle described in the following review?"
  - "What specific features of the vehicle does the reviewer like or dislike?"

#### **Step 4: Update the DataFrame with New Information**
- Run the pipeline for each review and collect the sentiment and insights.
- Once the analysis and extraction are complete, update the original DataFrame with additional columns to include:
  - Sentiment (positive, neutral, negative)
  - Pros
  - Cons
  - Liked_Features
  - Disliked_Features

---

### **Example Output:**

```json
{
  "Review_Date": "03/07/13",
  "Vehicle_Title": "2006 Ford Mustang Coupe",
  "Review_Text": "With the expected arrival of our 6th child...",
  "Rating": 4.125,
  "Sentiment": "Positive",
  "Pros": "Good driving experience, Large seating capacity, Great options",
  "Cons": "None mentioned",
  "Liked_Features": ["Driving experience", "Seating capacity", "Options available"],
  "Disliked_Features": []
}
```

In [3]:
%pip install langchain-community==0.3.0 langgraph==0.2.22 langchain-groq==0.2.0 python-dotenv

Collecting langchain-community==0.3.0
  Downloading langchain_community-0.3.0-py3-none-any.whl.metadata (2.8 kB)
Collecting langgraph==0.2.22
  Downloading langgraph-0.2.22-py3-none-any.whl.metadata (13 kB)
Collecting langchain-groq==0.2.0
  Downloading langchain_groq-0.2.0-py3-none-any.whl.metadata (2.9 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community==0.3.0)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langsmith<0.2.0,>=0.1.112 (from langchain-community==0.3.0)
  Downloading langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community==0.3.0)
  Downloading pydantic_settings-2.7.1-py3-none-any.whl.metadata (3.5 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain-community==0.3.0)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting langgraph-chec

In [7]:
import pandas as pd
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Load the dataset (limit to the first 25 records)
df = pd.read_csv('/content/ford_car_reviews.csv', nrows=25)


In [5]:
import os, json, re, getpass
from dotenv import load_dotenv

load_dotenv(".env", override=True)
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("GROQ API Key: ")

GROQ API Key: ··········


In [6]:
# LLM
from langchain_groq import ChatGroq

model_id = "llama3-8b-8192" #llama3-8b-8192, llama-3.1-8b-instant, llama3-groq-8b-8192-tool-use-preview, llama3-groq-70b-8192-tool-use-preview
llm = ChatGroq(model_name=model_id, temperature=0, )

In [8]:
# Sentiment Analysis Chain
sentiment_template = """Given the following car review, classify the sentiment as positive, neutral, or negative in single word:

{review}
"""
sentiment_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that analyzes text."),
        ("human", sentiment_template),
    ]
)
sentiment_chain = sentiment_prompt | llm | StrOutputParser()


In [9]:
# Insight Extraction Chain
insights_template = """What are the pros and cons of the vehicle described in the following review?
What specific features of the vehicle does the reviewer like or dislike?  Provide your answer as a JSON object with keys "pros", "cons", "liked_features", and "disliked_features".

Review:
{review}
"""

insights_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that analyzes text and provides insights in JSON format."),
        ("human", insights_template),
    ]
)
insights_chain = insights_prompt | llm | JsonOutputParser()


In [10]:
def get_sentiment(review):
    return sentiment_chain.invoke({"review": review})


def get_insights(review):
    try:
        insights = insights_chain.invoke({"review": review})
        return insights
    except Exception as e:
        print(f"Error processing review: {e}")
        return {"pros": "", "cons": "", "liked_features": [], "disliked_features": []}


In [11]:
# Apply functions to DataFrame
df["Sentiment"] = df["Review"].apply(get_sentiment)
insights_df = df["Review"].apply(get_insights).apply(pd.Series)

df = pd.concat([df, insights_df], axis=1)
df.rename(columns={
    "pros": "Pros", "cons": "Cons",
    "liked_features": "Liked_Features", "disliked_features": "Disliked_Features"
}, inplace=True)

In [12]:
df.head()

Unnamed: 0.1,Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,Sentiment,Pros,Cons,Liked_Features,Disliked_Features
0,0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,Positive,[],[],[],[]
1,1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,Neutral,"[Great mileage, Good power, Smokin' hot lookin...","[Orneriest transmission I've ever used, Harsh ...","[Engine, Appearance]","[Transmission, Ride quality, Noise level]"
2,2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,Positive,"[fairly reasonable, great investment]",[],"[price, investment potential]",[]
3,3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,Positive,"[love it, beast will smoke any G6 or v6 camaro...",[],"[v6 engine, air aid cold air injector, throttl...",[]
4,4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,Positive,"[hugs the road, does whatever you ask at a mom...","[alternator had to be replaced, repairs like t...","[handling, performance, reliability]","[alternator, maintenance requirements]"


**Results:**

In [14]:
# Export the DataFrame to an Excel file
df.to_excel('ford_car_reviews_analysis.xlsx', index=False)

In [21]:

val = df.iloc[10]

# Convert the row to a dictionary
val_dict = val.to_dict()

# Print the dictionary
val_dict

{'Unnamed: 0': 10,
 'Review_Date': ' on 10/29/10 00:00 AM (PDT)',
 'Author_Name': 'Stu ',
 'Vehicle_Title': '2006 Ford Mustang Coupe V6 Premium 2dr Coupe (4.0L 6cyl 5M)',
 'Review_Title': 'Fun stang',
 'Review': " Only had car for one week and this car put the fun back into driving.Even though it's a V6 it still get up and goes.I love the way it handles and turns heads.",
 'Rating': 4.5,
 'Sentiment': 'Positive',
 'Pros': ['puts the fun back into driving', 'handles well', 'turns heads'],
 'Cons': [],
 'Liked_Features': ['way it handles', 'acceleration'],
 'Disliked_Features': []}