# ReadMe

This notebook demonstrates the process of extracting structured data from unstructured Tesla customer reviews using a open-source LLM. 

## Table of Contents

1. [Required Libraries](#required-libraries)
2. [Dataset of Reviews](#dataset-of-reviews)
3. [Utilities for Function Calling](#utilities-for-function-calling)
4. [Build Inference Chain](#build-inference-chain)
5. [Inference on One Example](#inference-on-one-example)
6. [Processing All Reviews](#processing-all-reviews)
7. [Saving Structured Reviews](#saving-structured-reviews)

## Required Libraries

%pip install langchain==0.3.0

%pip install langchain-community

%pip install langchain-core

%pip install pydantic

# 1. Required libraries

In [1]:
import json
import logging
import random
import pandas as pd
from tqdm import tqdm

from enum import Enum
from langchain_community.llms import Ollama
from langchain.tools.render import render_text_description
from langchain_core.tools import tool
from langchain.prompts import PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel
from typing import List, Optional, Union, Dict, Any
from langchain_core.output_parsers import JsonOutputParser

# 2. Dataset of Reviews

In [2]:
reviews = pd.read_csv("dataset/tesla_customer_reviews.csv")

print("Dataset shape ", reviews.shape)
print("Dataset columns ", reviews.columns)

# shuffle and visualize a few random reviews
reviews_indices = reviews.index.tolist()
selected_ix = random.choices(reviews_indices, k=10)
reviews["review"].loc[selected_ix].tolist()

Dataset shape  (995, 4)
Dataset columns  Index(['manufacturer', 'url', 'review', 'review_struct'], dtype='object')


["Reviewed: 0       on 08/30/18 15:36 PM (PDT)\n1       on 07/04/18 17:15 PM (PDT)\n2       on 06/21/18 05:22 AM (PDT)\n3       on 04/07/18 15:01 PM (PDT)\n4       on 12/30/17 21:45 PM (PST)\n                  ...             \n135     on 06/30/13 17:27 PM (PDT)\n136     on 06/13/13 20:30 PM (PDT)\n137     on 05/20/13 17:43 PM (PDT)\n138     on 04/14/13 15:47 PM (PDT)\n139     on 03/09/13 15:39 PM (PST)\nName: Review_Date, Length: 140, dtype: object\nrating: 0      5.000\n1      4.000\n2      5.000\n3      4.000\n4      5.000\n       ...  \n135    3.375\n136    5.000\n137    4.625\n138    5.000\n139    5.000\nName: Rating, Length: 140, dtype: float64\nvehicle title: 0      2018 Tesla Model X SUV 75D 4dr SUV AWD (electr...\n1      2018 Tesla Model X SUV 100D 4dr SUV AWD (elect...\n2      2017 Tesla Model X SUV 100D 4dr SUV AWD (elect...\n3      2017 Tesla Model X SUV 100D 4dr SUV AWD (elect...\n4      2017 Tesla Model X SUV P100D 4dr SUV AWD (elec...\n                             ...   

# 3. Utilities for function calling

In [3]:
class ModelName(Enum):
    """
    Enumeration of model names for the language model.
    """
    LLAMA3 = "llama3"
    MISTRAL = "mistral:7b-instruct-v0.2-q8_0"
    NEXUSRAVEN = "nexusraven"
    LLAMA31_8B_TOOLS = "interstellarninja/llama3.1-8b-tools"
    LLAMA3_8B_INSTRUCT_FUNCTION_CALLING = "smangrul/llama-3-8b-instruct-function-calling"
    LLAMA3_GROQ_TOOLUSE = "llama3-groq-tool-use"
    LLAMA318B_LATEST = "llama3.1:latest"

llm = Ollama(model = ModelName.LLAMA3.value, temperature=0, num_predict=600)


class RobustJsonOutputParser(JsonOutputParser):
    """
    A robust JSON output parser that extends the JsonOutputParser class.
    This parser attempts to handle JSON decoding errors gracefully by extracting
    a JSON-like structure from the text if the initial parsing fails.
    """
    def parse(self, text: str) -> Dict[str, Any]:
        try:
            return super().parse(text)
        except json.JSONDecodeError as e:
            logging.error(f"Failed to parse JSON: {e}")
            logging.debug(f"Problematic text: {text}")
            # Attempt to extract a JSON-like structure from the text
            start = text.find('{')
            end = text.rfind('}')
            if start != -1 and end != -1:
                potential_json = text[start:end+1]
                try:
                    return json.loads(potential_json)
                except json.JSONDecodeError:
                    pass
            # If extraction fails, return a default structure
            return {"name": "ReviewStruct", "arguments": {}}


def safe_select_arguments(response: dict) -> dict:
    """
    Select and execute the appropriate function based on the response.
    The error handling for:
        a. Invalid tool names or missing arguments (KeyError).
        b. Invalid argument types (TypeError).
    In both error cases, it falls back to creating a minimal ReviewStruct with an error message.
    Args:
        response (dict): A dictionary containing the function name and arguments.
    Returns:
        dict: The result of executing the specified function with the given arguments.
    """
    try:
        return globals()[response["name"]](response["arguments"])
    except KeyError as e:
        logging.error(f"Invalid tool name or missing arguments: {e}")
        # Fallback to default ReviewStruct with minimal information
        return ReviewStruct(
            date_of_review=None,
            vehicle_model=None,
            customer_rating=None,
            review_summary="Unable to parse review due to invalid LLM output")
    except TypeError as e:
        logging.error(f"Invalid argument types: {e}")
        # Another fallback option
        return ReviewStruct(
            date_of_review=None,
            vehicle_model=None,
            customer_rating=None,
            review_summary="Unable to parse review due to invalid argument types")


@tool
def ReviewStruct(date: Optional[str], 
            vehicleModel: Optional[str], 
            customerRating: Optional[Union[str, int]],  # Allow customer_rating to be either str or int
            reviewSummary: str,
            technicalPros: Optional[List[str]],
            otherPros: Optional[List[str]],
            technicalIssues: Optional[List[str]],
            otherIssues: Optional[List[str]],
            serviceExperience: Optional[str],
            overallExperience: Optional[str]) -> dict:
    """Parsing customer review information.
    
    - date: the date the review was written or posted. The date must be in the format: year-month-day. If not available leave blank. 
    - vehicleModel: name of the model being reviewed. If not available leave blank
    - customerRating: rating of the vehicle_model provided in the review.
                       If no rating is provided in the review, estimate the rating based on the content of the review. The estimated rating is a value between 1 and 5 where 1 is exceptional and 5 is exceptional. 
    - reviewSummary: summary of the review
    - technicalPros: A list of specific positive feedbacks related to the car (optional). 
                    Example: ["Smooth ride", "Great fuel efficiency"]. If no technical pros found, leave blank.
    - otherPros: A list of specific non-technical positive feedbacks related to service.
                If no non technical pros found, leave blank.
    - technicalIssues: A list of technical issues. Each entry should clearly identify a specific issue related to the working of the car.
                     Avoid vague statements and ensure that each entry is concise and focuses on a particular problem. 
                     If none found, leave blank.
    - otherIssues: A list of non-technical issues. 
                    Each entry should clearly identify a specific issue that is not related to the working of the car.
                    If none found, leave blank.
    - serviceExperience: service related experience of the customer. The value can be any of the following: 
                        exceptional|good|average|poor|terrible.
    - overallExperience: overall experience of the customer. The value can be any of the following: 
                        exceptional|good|average|poor|terrible.   
    """
    # Default empty lists if None
    if technicalPros is None:
        technicalPros = []
    if otherPros is None:
        otherPros = []
    if technicalIssues is None:
        technicalIssues = []
    if otherIssues is None:
        otherIssues = []

    return {"date_of_review": date, 
            "vehicle_model": vehicleModel, 
            "customer_rating": customerRating, 
            "review_summary": reviewSummary,
            "technical_pros": technicalPros,
            "other_pros": otherPros,
            "technical_issues": technicalIssues,
            "other_issues": otherIssues,
            "service_experience": serviceExperience,
            "overall_experience": overallExperience}

# 4. Build inference chain

In [4]:
#####################################################
# create a string representation of the tool, which is 
# similar to how functions are described in standard function calling.
#####################################################
reviewstruct_tool_as_str = render_text_description([ReviewStruct])


#####################################################
# Build system prompt and create chat_prompt
#####################################################
system_prompt = f"""You are an assistant that has access to the following tools. Here are the name and description of the tool:

{reviewstruct_tool_as_str}

Given the user input, return the name and input of the tool to use. Return your response as a JSON blob with 'name' and 'arguments' keys. 
Also make sure to return arguments value as dictionary. Do not return any other text data."""

prompt = ChatPromptTemplate.from_messages([("system", system_prompt), ("user", "{input}")])


#####################################################
# Create the LLM chain
#####################################################
robust_parser = RobustJsonOutputParser()

chain = prompt | llm | robust_parser | safe_select_arguments

# 5. Inference on 1 example

In [5]:
# select_ix = random.choice(reviews_indices)  <-- uncomment if you wan tto try random reviews
select_ix = 100

unstructured_review = reviews.loc[select_ix]["review"]
print(f"Raw customer review ({select_ix}) \n\n{unstructured_review}\n\n")

struct_review = chain.invoke(unstructured_review)

print(json.dumps(struct_review, indent=2))

Raw customer review (100) 



{
  "date_of_review": "2023-09-08",
  "vehicle_model": "",
  "customer_rating": "average",
  "review_summary": "Summary after three years of Model S Performance ownership. Car is good, software temperamental but tolerable, gimmicks are fun.",
  "technical_pros": [
    "Smooth ride",
    "Great fuel efficiency"
  ],
  "other_pros": [],
  "technical_issues": [
    "software temperamental"
  ],
  "other_issues": [
    "dire customer service",
    "unreliable and dishonest virtual assistant",
    "feeling fobbed off"
  ],
  "service_experience": "poor",
  "overall_experience": "terrible"
}


  return globals()[response["name"]](response["arguments"])


# 6. Processing all the reviews

In [13]:
def generate_struct_review(r):
    # we will save both valid LLM json output and not valid LLM json output
    r["valid_struct"] = False
    raw_review = r["review"]
    try:
        result = chain.invoke(raw_review)
        r["review_struct"] = result
        r["valid_struct"] = True
    except Exception as e:
        r["review_struct"] = e

    return r


# get pandas to use tqdm progress
tqdm.pandas()

reviews = reviews.progress_apply(lambda r: generate_struct_review(r), axis=1)

100%|█████████████████████████████████████████| 995/995 [23:59<00:00,  1.45s/it]


# 7. Structured Reviews

In [14]:
# let's save the structrued reviews:
reviews.to_csv("dataset/tesla_customer_review_with_structured_data.csv", index=False)

print("Number of failed/success ")
reviews["valid_struct"].value_counts()

Number of failed/success 


True     801
False    194
Name: valid_struct, dtype: int64

In [15]:
reviews.head()

Unnamed: 0,manufacturer,url,review,review_struct,valid_struct
0,tesla,https://www.consumeraffairs.com/automotive/tes...,"Reviewed Oct. 26, 2022\nWe put a deposit down ...","{'date_of_review': 'Oct. 26, 2022', 'vehicle_m...",True
1,tesla,https://www.consumeraffairs.com/automotive/tes...,"Reviewed Dec. 20, 2019\nI ordered a model 3 SR...","{'date_of_review': '2019-12-20', 'vehicle_mode...",True
2,tesla,https://www.trustpilot.com/review/tesla.com?pa...,Best car i have ever ownedBest car i have ever...,"{'date_of_review': '2024-03-06', 'vehicle_mode...",True
3,tesla,https://www.consumeraffairs.com/automotive/tes...,"Reviewed Aug. 15, 2022\nHorrible customer serv...","{'date_of_review': '2022-08-15', 'vehicle_mode...",True
4,tesla,https://www.consumeraffairs.com/automotive/tes...,"Reviewed March 29, 2022\nI bought my 2013 Mode...","{'date_of_review': '2022-03-29', 'vehicle_mode...",True
