What is JSON?.......................
JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications (e.g., sending some data from the server to the client, so it can be displayed on a web page, or vice versa).

In [None]:
import json
import string


Read and parse the JavaScript Object Notation (JSON) data into a suitable
Python data structure.

In [None]:


#function to read the JSON file
def read_json_file(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as json_file:
            json_data_dict = {} #this dictionary will be used to store the valid JSON objects read from the file.
            for line_number, line in enumerate(json_file, start=1):
                try:
                    json_object = json.loads(line)
                    #assuming there's a unique identifier in each JSON object, use it as the key
                    identifier = json_object.get("unique_identifier", str(line_number))
                    json_data_dict[identifier] = json_object
                except json.JSONDecodeError: #if the line is not a valid JSON object, a JSONDecodeError shall be raised.
                    continue #skips the line that doesnt correspond with the json template
            return json_data_dict
    except FileNotFoundError:
        print(f"Error: File '{file_path}' not found.")
        return None

# Replace 'your_file_path.json' with the actual path to your JSON file
json_data_dict = read_json_file('/content/Cell_Phones_and_Accessories_5.json')

# Now, 'json_data_dict' contains only the valid JSON objects, excluding lines with extra data or decoding errors


**Exploring data set**


In [None]:

def explore_and_filter_dataset(json_data_dict, sample_size=30):
    if json_data_dict:

        print("Dataset Size:", len(json_data_dict))

        #the first entry has all the columns, print their names
        first_entry = next(iter(json_data_dict.values()), {})
        column_names = list(first_entry.keys())
        print("Columns in the Dataset:", column_names)

        necessary_columns = ['reviewText', 'overall']

        #filter the dataset to keep necessary cols
        filtered_data_dict = {key: {col: entry[col] for col in necessary_columns} for key, entry in json_data_dict.items()}

        #printing sample of entries ; in this case = 30
        print(f"\nSample of {sample_size} Entries:")
        for key, entry in list(filtered_data_dict.items())[:sample_size]:
            print(f"Key: {key}, Data: {entry}")

        return filtered_data_dict, necessary_columns
    else:
        print("Error: No data to explore.")
        return None, None

#explore and filter the dataset
filtered_data_dict, necessary_columns = explore_and_filter_dataset(json_data_dict)


Dataset Size: 194439
Columns in the Dataset: ['reviewerID', 'asin', 'reviewerName', 'helpful', 'reviewText', 'overall', 'summary', 'unixReviewTime', 'reviewTime']

Sample of 30 Entries:
Key: 1, Data: {'reviewText': "They look good and stick good! I just don't like the rounded shape because I was always bumping it and Siri kept popping up and it was irritating. I just won't buy a product like this again", 'overall': 4.0}
Key: 2, Data: {'reviewText': 'These stickers work like the review says they do. They stick on great and they stay on the phone. They are super stylish and I can share them with my sister. :)', 'overall': 5.0}
Key: 3, Data: {'reviewText': 'These are awesome and make my phone look so stylish! I have only used one so far and have had it on for almost a year! CAN YOU BELIEVE THAT! ONE YEAR!! Great quality!', 'overall': 5.0}
Key: 4, Data: {'reviewText': "Item arrived in great time and was in perfect condition. However, I ordered these buttons because they were a great deal a

**text processing**


In [None]:

#manually including the list of stop words
stopwords = """
i me my myself we our ours ourselves you your yours yourself yourselves
he him his himself she her hers herself it its itself they them their theirs
themselves what which who whom this that these those am is are was were be
been being have has had having do does did doing a an the and but if or because
as until while of at by for with about against between into through during
before after above below to from up down in out on off over under again further
then once here there when where why how all any both each few more most other
some such no nor not only own same so than too very s t can will just don should now
""".split()

#text preprocessing to the 'reviewText' column for thematic analysis
def preprocess_text_thematic(text):
    translator = str.maketrans('', '', string.punctuation)
    return text.translate(translator).lower()


for key, entry in filtered_data_dict.items():
    filtered_data_dict[key]['reviewText'] = preprocess_text_thematic(entry['reviewText'])

#samples after text processing
print(f"\nSample of Entries After Text Preprocessing:")
for key, entry in list(filtered_data_dict.items())[:30]:
    print(f"Key: {key}, Data: {entry}")



Sample of Entries After Text Preprocessing:
Key: 1, Data: {'reviewText': 'they look good and stick good i just dont like the rounded shape because i was always bumping it and siri kept popping up and it was irritating i just wont buy a product like this again', 'overall': 4.0}
Key: 2, Data: {'reviewText': 'these stickers work like the review says they do they stick on great and they stay on the phone they are super stylish and i can share them with my sister ', 'overall': 5.0}
Key: 3, Data: {'reviewText': 'these are awesome and make my phone look so stylish i have only used one so far and have had it on for almost a year can you believe that one year great quality', 'overall': 5.0}
Key: 4, Data: {'reviewText': 'item arrived in great time and was in perfect condition however i ordered these buttons because they were a great deal and included a free screen protector i never received one though its not a big deal it wouldve been nice to get it since they claim it comes with one', 'overal

**Thematic Analysis**


In [None]:

def thematic_analysis(data):
    positive_phrases = {}  #to store positive phrases and their frequencies
    negative_phrases = {}  #to store negative phrases and their frequencies

    for key, entry in data.items():
        review_text = entry.get('reviewText', '')

        #'overall' is the column containing the rating
        rating = entry.get('overall', 0)

        #setting a threshold
        positive_threshold = 4.0
        negative_threshold = 2.0

        if rating >= positive_threshold:
            #tokenizing the review text into words
            words = review_text.split()

            #count the frequency of each word in positive reviews
            for word in words:
                if word not in stopwords:
                    positive_phrases[word] = positive_phrases.get(word, 0) + 1

        elif rating <= negative_threshold:
            #tokenizing the review text into words
            words = review_text.split()

            #count the frequency of each word in negative reviews
            for word in words:
                if word not in stopwords:
                    negative_phrases[word] = negative_phrases.get(word, 0) + 1

    #phrases by frequency sorted
    sorted_positive_phrases = sorted(positive_phrases.items(), key=lambda x: x[1], reverse=True)
    sorted_negative_phrases = sorted(negative_phrases.items(), key=lambda x: x[1], reverse=True)

    return sorted_positive_phrases, sorted_negative_phrases

#thematic analysis on the filtered data
positive_phrases, negative_phrases = thematic_analysis(filtered_data_dict)

#print the top positive and negative phrases
print("\nTop Positive Phrases:")
for phrase, frequency in positive_phrases[:10]:
    print(f"{phrase}: {frequency}")

print("\nTop Negative Phrases:")
for phrase, frequency in negative_phrases[:10]:
    print(f"{phrase}: {frequency}")



Top Positive Phrases:
phone: 133060
case: 109127
one: 66801
great: 59181
like: 56142
use: 48113
battery: 46205
good: 46087
screen: 46036
well: 41158

Top Negative Phrases:
phone: 20632
case: 16608
one: 10293
would: 8261
like: 7445
get: 7290
screen: 7198
use: 6143
product: 5709
battery: 5617


**sentiment ANALYSIS**

In [None]:
#dictionaries with word weights for sentiment analysis
positive_word_weights = {
    'good': 0.8,
    'excellent': 0.9,
    'positive': 0.85,
    'amazing': 0.9,
    'fantastic': 0.85,
    'outstanding': 0.9,
    'love': 0.8,
    'great': 0.8,
    'awesome': 0.85,
    'happy': 0.75,

}

negative_word_weights = {
    'bad': -0.1,
    'poor': -0.2,
    'negative': -0.15,
    'horrible': -0.2,
    'awful': -0.2,
    'terrible': -0.2,
    'disappointing': -0.15,
    'hate': -0.2,
    'dislike': -0.15,
    'regret': -0.15,

}

#Rule-based sentiment analysis function
def rule_based_sentiment_analysis(review, positive_weights, negative_weights):
    words = review.split()
    score = 0

    for word in words:
        #check if the word is in the positive or negative dictionaries
        if word.lower() in positive_weights:
            score += positive_weights[word.lower()]
        elif word.lower() in negative_weights:
            score += negative_weights[word.lower()]

    return score

#sentiment analysis to each review in the filtered data
for key, entry in filtered_data_dict.items():
    review_text = entry.get('reviewText', '')
    sentiment_score = rule_based_sentiment_analysis(review_text, positive_word_weights, negative_word_weights)

    #sentiment label based on the threshold assigning
    threshold = 0.5
    sentiment_label = 'Positive' if sentiment_score > threshold else 'Negative' if sentiment_score < -threshold else 'Neutral'

    #Updating  the entry with sentiment information
    entry['sentiment_score'] = sentiment_score
    entry['sentiment_label'] = sentiment_label

#sample of entries with sentiment analysis results printed
print("\nSample of Entries After Sentiment Analysis:")
for key, entry in list(filtered_data_dict.items())[:30]:
    print(f"Key: {key}, Sentiment Score: {entry.get('sentiment_score', 0)}, Sentiment Label: {entry.get('sentiment_label', 'Neutral')}")



Sample of Entries After Sentiment Analysis:
Key: 1, Sentiment Score: 1.6, Sentiment Label: Positive
Key: 2, Sentiment Score: 0.8, Sentiment Label: Positive
Key: 3, Sentiment Score: 1.65, Sentiment Label: Positive
Key: 4, Sentiment Score: 1.6, Sentiment Label: Positive
Key: 5, Sentiment Score: 1.65, Sentiment Label: Positive
Key: 6, Sentiment Score: 0, Sentiment Label: Neutral
Key: 7, Sentiment Score: 0.8, Sentiment Label: Positive
Key: 8, Sentiment Score: 0, Sentiment Label: Neutral
Key: 9, Sentiment Score: 1.6, Sentiment Label: Positive
Key: 10, Sentiment Score: 1.65, Sentiment Label: Positive
Key: 11, Sentiment Score: 0, Sentiment Label: Neutral
Key: 12, Sentiment Score: 0, Sentiment Label: Neutral
Key: 13, Sentiment Score: 2.4000000000000004, Sentiment Label: Positive
Key: 14, Sentiment Score: 0.85, Sentiment Label: Positive
Key: 15, Sentiment Score: 0.85, Sentiment Label: Positive
Key: 16, Sentiment Score: 0.8, Sentiment Label: Positive
Key: 17, Sentiment Score: 0.8, Sentiment Lab

**storage**

In [None]:
#save the end result in a text file
output_file_path = '/content/sentiment_results.txt'

with open(output_file_path, 'w', encoding='utf-8') as output_file:
    for key, entry in json_data_dict.items():
        review_text = entry.get('reviewText', '')
        sentiment_label = entry.get('sentiment_label', 'Neutral')
        output_line = f"Sentiment: {sentiment_label}\nReview Text: {review_text}\n\n"
        output_file.write(output_line)

print(f"Results saved to {output_file_path}")


Results saved to /content/sentiment_results.txt
