#### Assignment
1. Cleaning & Structuring data
2. Getting Meaningful insights from data
    1. What's avg rating?
    2. What % of users gave poor rating?
3. Build a product recommendation feature (Rule Based, not ML)
   1. If user rating >=4, recommend same brand products.
   2. If user rating < 4, recommend different brand products.  

In [5]:
import json

In [6]:
#  Function to load a data from a file
def load_data(filename):
    with open(filename, "r") as f:
        data = json.load(f)
    return data

In [7]:
# Printing file data
data = load_data("Dummy_Store_Data.json")
print(data)

[{'name': 'Alice', 'rating': '5 ', 'feedback': 'Great product!!', 'age': '25'}, {'name': 'Bob', 'rating': 'four', 'feedback': 'ok but late Delivery', 'age': '30'}, {'name': ' Charlie', 'rating': 'two', 'feedback': 'BAD EXPERIENCE '}, {'name': 'Diana', 'feedback': 'Loved it!', 'rating': '5', 'age': '28'}, {'name': 'Eve', 'rating': '3.5', 'feedback': 'Average - could be better', 'age': '20'}, {'name': 'Alice', 'rating': '5', 'feedback': 'Great product again!', 'age': '25'}]


#### When You Analyze the dummy data you'll find some issues like Data duplication (Alice Name), Data is inconsistent (mixed Datatype in rating) and Missing Data (age). SO our data is not clean & structured.

In [8]:
# Function to clean & Structure the data
def clean_data(data):
    # First solve Data inconsistency - ratings
    text_to_num = {"one":1,"two":2,"three":3,"four":4,"five":5}
    cleaned_data = []
    unique_users = set()
    for user in data:
        # print(user)
        # .strip() -> removes white spaces but not from middle + it works with strings, not with numbers
        # First run works because rating is a string, but after converting to a number,
        # running .strip() again on an int/float causes an error. Therefore, convert to string like this
        raw_rating = str(user["rating"]).strip().lower()
        if(raw_rating in text_to_num):
            raw_rating = text_to_num[raw_rating]
        # Now raw_rating contains edited data:
        # print(raw_rating)
        user["rating"] = raw_rating

        # Handle Missing value
        raw_age = user.get("age")
        if(raw_age == None):
            user["age"] = None

        # Unique Data - DeDuplication
        if(user["name"].strip() in unique_users):
            # go to next loop directly
            continue
        unique_users.add(user["name"])
        cleaned_data.append(user)
    return cleaned_data

In [9]:
data2 = clean_data(data)
print(data2)

[{'name': 'Alice', 'rating': '5', 'feedback': 'Great product!!', 'age': '25'}, {'name': 'Bob', 'rating': 4, 'feedback': 'ok but late Delivery', 'age': '30'}, {'name': ' Charlie', 'rating': 2, 'feedback': 'BAD EXPERIENCE ', 'age': None}, {'name': 'Diana', 'feedback': 'Loved it!', 'rating': '5', 'age': '28'}, {'name': 'Eve', 'rating': '3.5', 'feedback': 'Average - could be better', 'age': '20'}]


In [10]:
# Getting Meaningful insights from data
def get_insights(data2):
    # get avg rating
    total_rating = 0
    for user in data2:
        total_rating += float(user["rating"])
    print(f"Avg rating = {total_rating/len(data2)}")

    # Percentage of users with poor rating
    poor_ratings = 0
    for user in data2:
        if(float(user["rating"])<3):
            poor_ratings += 1
    print(f"User with poor rating = {poor_ratings/len(data2)*100}%")

In [11]:
get_insights(data2)

Avg rating = 3.9
User with poor rating = 20.0%


In [12]:
# Recommendation feature 
def get_recommendation(data):
    recommendations = []
    
    for user in data:
        curr_recomm = {}
        curr_recomm["name"] = user["name"]
        
        if(float(user["rating"])>=4):
            curr_recomm["brand"] = "Apple"
        else:
            curr_recomm["brand"] = "Samsung"
        recommendations.append(curr_recomm)
    return recommendations

In [13]:
data3 = get_recommendation(data2)
print(data3)

[{'name': 'Alice', 'brand': 'Apple'}, {'name': 'Bob', 'brand': 'Apple'}, {'name': ' Charlie', 'brand': 'Samsung'}, {'name': 'Diana', 'brand': 'Apple'}, {'name': 'Eve', 'brand': 'Samsung'}]
