### Problem Statement
Create a simple Python script that uses a pre-trained language model (e.g., GPT-3, GPT-4) to perform sentiment analysis on a set of product reviews. The script should read the reviews from a text file, analyze the sentiment of each review (positive, negative, neutral), and output the results to another text file.

In [46]:
import os
import dotenv

In [52]:
from dotenv import load_dotenv
load_dotenv()

hf_auth_token = os.getenv("HF_AUTH_TOKEN")
if not hf_auth_token:
    raise ValueError("HUGGINGFACE_API_KEY is not set in the environment variables")

In [56]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("zero-shot-classification", model="meta-llama/Meta-Llama-3-8B", use_auth_token=hf_auth_token)

def analyze_sentiment(rating, review):
    response = classifer("Rating: {rating}\n\nReview: {review}",
                     candidate_labels=["Positive", "Negative", "Neutral"])
    
    return response

OSError: meta-llama/Meta-Llama-3-8B is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

In [11]:
import pandas as pd

In [25]:
df = pd.read_table("data/amazon_alexa.tsv")
df.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1


In [16]:
df['rating'].value_counts()

rating
5    2286
4     455
1     161
3     152
2      96
Name: count, dtype: int64

In [17]:
df['feedback'].value_counts()

feedback
1    2893
0     257
Name: count, dtype: int64

In [18]:
df.isnull().sum()

rating              0
date                0
variation           0
verified_reviews    1
feedback            0
dtype: int64

In [19]:
# remove null values
df.dropna(inplace=True)

In [20]:
df.isnull().sum()

rating              0
date                0
variation           0
verified_reviews    0
feedback            0
dtype: int64

In [22]:
df.shape

(3149, 5)

I will provide the following parameters with the reviews:
1. rating
2. review

In [23]:
df.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1


In [27]:
df = df[['rating', 'verified_reviews']]
df.head()

Unnamed: 0,rating,verified_reviews
0,5,Love my Echo!
1,5,Loved it!
2,4,"Sometimes while playing a game, you can answer..."
3,5,I have had a lot of fun with this thing. My 4 ...
4,5,Music


In [36]:
def read_reviews(input_file):
    try:
        df = pd.read_table(input_file) # read_tables() for .tsv file, read_csv() for .csv files
        df.dropna(inplace=True)
        df = df[['rating', 'verified_reviews']]
        return df

    except Exception as e:
        print(f"Error reading input file: {e}")
        return []

In [41]:
def analyze_sentiment(rating, review):
    try:
        client = OpenAI()

        response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Analyze the sentiment of the following review and classify it as 'Positive', 'Negative', or 'Neutral':\n\nRating: {rating}\nReview: {review}\n\nSentiment:"}
        ]
        )

        sentiment = response.choices[0].message.content
        
        return sentiment
    
    except Exception as e:
        print(f"Error analyzing sentiment: {e}")
        return "Error"

In [37]:
test_df = read_reviews("data/amazon_alexa.tsv")

In [38]:
test_df.head()

Unnamed: 0,rating,verified_reviews
0,5,Love my Echo!
1,5,Loved it!
2,4,"Sometimes while playing a game, you can answer..."
3,5,I have had a lot of fun with this thing. My 4 ...
4,5,Music


In [42]:
for row in test_df.head().itertuples():
    rating = row.rating
    review = row.verified_reviews
    sentiment = analyze_sentiment(rating, review)
    print(f"Rating: {rating}, Review: {review}, Sentiment: {sentiment}")
    break

Error analyzing sentiment: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}
Rating: 5, Review: Love my Echo!, Sentiment: Error


In [7]:
def write_sentiments(output_file, sentiments):
    try:
        with open(output_file, 'w') as file:
            for sentiment in sentiments:
                file.write(f"{sentiment}\n")
    except Exception as e:
        print(f"Error writing to output file: {e}")

In [8]:
def main(input_file, output_file):
    reviews = read_reviews(input_file)
    sentiments = []
    
    for review in reviews:
        sentiment = analyze_sentiment(review)
        sentiments.append(f"Review: {review}\nSentiment: {sentiment}\n")

    write_sentiments(output_file, sentiments)

In [10]:
input_file = 'input_reviews.txt'  # Replace with your input file path
output_file = 'output_sentiments.txt'  # Replace with your output file path

main(input_file, output_file)

Error reading input file: [Errno 2] No such file or directory: 'input_reviews.txt'
