# Code book for the data analyst project using Python 

we have a `json` data and using that we should take meaning full insits from the data 

by Reading this data and understand its structure. The data contains three main components:

-1.**users**: Each user has an ID, name, a list of friends (by their IDs), and a list of liked pages (by their IDs).

-2. **Pages**: Each page has an ID and a name.

-3.**Connections**: Users can have multiple friends and can like multiple pages.

### starting the implementation of the project

In [1]:
import json

In [2]:
#this is the function to load the data
def load_data(filename):#using this, we can load any data ( this is as a template) 
    with open(filename,"r") as f:
        data =json.load(f)
    return data
    

In [3]:
data = load_data("data.json")


In [4]:
data

{'users': [{'id': 1, 'name': 'Amit', 'friends': [2, 3], 'liked_pages': [101]},
  {'id': 2, 'name': 'Priya', 'friends': [1, 4], 'liked_pages': [102]},
  {'id': 3, 'name': 'Rahul', 'friends': [1], 'liked_pages': [101, 103]},
  {'id': 4, 'name': 'Sara', 'friends': [2], 'liked_pages': [104]}],
 'pages': [{'id': 101, 'name': 'Python Developers'},
  {'id': 102, 'name': 'Data Science Enthusiasts'},
  {'id': 103, 'name': 'AI & ML Community'},
  {'id': 104, 'name': 'Web Dev Hub'}]}

In [5]:
def display_user(data):
    print("users and their connections\n")
    for user in data['users']:
        print(f"ID:{user['id']} - {user['name']} is friends with : {user['friends']} and liked pages are {user['liked_pages']}")
    
    print("\n pages information")
    for page in data['pages']:
        print(f"{page['id']}:{page['name']}")

display_user(data)


users and their connections

ID:1 - Amit is friends with : [2, 3] and liked pages are [101]
ID:2 - Priya is friends with : [1, 4] and liked pages are [102]
ID:3 - Rahul is friends with : [1] and liked pages are [101, 103]
ID:4 - Sara is friends with : [2] and liked pages are [104]

 pages information
101:Python Developers
102:Data Science Enthusiasts
103:AI & ML Community
104:Web Dev Hub


# Cleaning and Structuring the Data

 ### we need to **clean and structure the data** properly.

#### task is to:

-Handle missing values

-Remove duplicate or inconsistent data

-Standardize the data formatm

In [6]:
import json

def clean_data(data):
    # removing users with missing names 
    data["users"] = [user for user in data["users"] if user["name"].strip()]
    # remove duplicate friends
    for user in data["users"]:
        user['friends'] = list(set(user['friends']))
    # removing inactive users
    data ['users'] = [user for user in data['users'] if user ['friends'] or user['liked_pages']]

    #removing duplicate pages
    unique_pages = {}
    for page in data ['pages']:
        unique_pages[page['id']] = page
    data['pages'] = list(unique_pages.values())
    return data

# loading the data 
with open("data2.json") as f:
    data = json.load(f)

data = clean_data(data)

with open("cleaned_data2.json", "w") as f:
    json.dump(data, f, indent=4)

print("Data has been cleaned successfully")


Data has been cleaned successfully


# Finding **"People You May Know"**
Now that our data is cleaned and structured, we need to  build a 'People You May Know' feature!

In social networks, this feature helps users connect with others by suggesting friends based on mutual connections. Your job is to analyse mutual friends and recommend potential connections.

### Task 1: Understand the Logic
How 'People You May Know' Works:
If User A and User B are not friends but have mutual friends, we suggest User B to User A and vice versa.
More mutual friends = higher priority recommendation.
Example:

Amit (ID: 1) is friends with Priya (ID: 2) and Rahul (ID: 3).
Priya (ID: 2) is friends with Sara (ID: 4).
Amit is not directly friends with Sara, but they share Priya as a mutual friend.
Suggest Sara to Amit as "People You May Know".`

In [8]:
import json 

def load_data(filename):
    with open(filename, "r") as f:
        return json.load(f)

def find_people_you_may_know(user_id, data):
    user_friends = {}
    for user in data['users']:
        user_friends[user['id']] = set(user['friends'])

    if user_id not in user_friends:
        return []

    direct_friends = user_friends[user_id]
    suggestions = {}
    for friend in direct_friends:
        for mutual in user_friends[friend]:
            if mutual != user_id and mutual not in direct_friends:
                # count mutual friends
                suggestions[mutual] = suggestions.get(mutual, 0) + 1

    sorted_suggestions = sorted(suggestions.items(), key=lambda x: x[1], reverse=True)
    return [uid for uid, mutual_count in sorted_suggestions]

# load the data 
data = load_data("massive_data.json")
user_id = 1
recc = find_people_you_may_know(user_id, data)
print(recc)


[7, 8, 9, 10, 11, 12]


# Finding "Pages You Might Like"

#### After cleaning messy data and building features like People You May Know, it's time to launch our last feature: Pages You Might Like.

##### Why This Matters
In real-world social networks, content discovery keeps users engaged. This feature simulates that experience using nothing but pure Python, showing how even simple logic can power impactful insights. So now the situation is that your manager is impressed with your 'People You May Know' feature and now assigns you a new challenge: Recommend pages that users might like!

On social media platforms, users interact with pages by liking, following, or engaging with posts. The goal is to analyze these interactions and suggest relevant pages based on user behavior.

In [10]:
import json 

# function to load the JSON data 
def load_data(filename):
    with open(filename, "r") as f:
        return json.load(f)

# function to find pages a user might like based on common interests 
def find_pages_you_might_like(user_id, data):
    # dictionary to store user interactions with pages 
    user_pages = {}

    # populate the dictionary 
    for user in data['users']:
        user_pages[user['id']] = set(user['liked_pages'])

    # if the user is not found, return an empty list 
    if user_id not in user_pages:
        return []

    user_liked_pages = user_pages[user_id]
    page_suggestions = {}

    for other_user, pages in user_pages.items():
        if other_user != user_id:
            shared_pages = user_liked_pages.intersection(pages)
            for page in pages:
                if page not in user_liked_pages:
                    page_suggestions[page] = page_suggestions.get(page, 0) + len(shared_pages)

    # sort recommended pages based on number of shared interactions
    sorted_pages = sorted(page_suggestions.items(), key=lambda x: x[1], reverse=True)
    return [(page_id, score) for page_id, score in sorted_pages]  

# load the data 
data = load_data("massive_data.json")
user_id = 1
page_recommendation = find_pages_you_might_like(user_id, data)
print(page_recommendation)


[(103, 2), (105, 1), (107, 1), (104, 0), (106, 0), (108, 0), (109, 0), (110, 0), (111, 0), (112, 0), (113, 0), (114, 0), (115, 0), (116, 0), (117, 0), (118, 0), (119, 0), (120, 0), (121, 0), (122, 0), (123, 0), (124, 0), (125, 0), (126, 0), (127, 0)]
