# Assignment 1 - Creating and Manipulating Graphs

Eight employees at a small company were asked to choose 3 movies that they would most enjoy watching for the upcoming company movie night. These choices are stored in the file `assets/Employee_Movie_Choices.txt`.

A second file, `assets/Employee_Relationships.txt`, has data on the relationships between different coworkers. 

The relationship score has value of `-100` (Enemies) to `+100` (Best Friends). A value of zero means the two employees haven't interacted or are indifferent.

Both files are tab delimited.

In [25]:
import pandas as pd
import networkx as nx

### Using NetworkX, load in the bipartite graph from assets/Employee_Movie_Choices.txt and return that graph.

In [26]:
G_df=pd.read_csv(r"C:\Users\asus\OneDrive\Desktop\Priyank\1 project_directory\MyProjects\Networks\networkx files\Employee_Movie_Choices.txt",
                delimiter="\t",header=0)
G_df

Unnamed: 0,#Employee,Movie
0,Andy,Anaconda
1,Andy,Mean Girls
2,Andy,The Matrix
3,Claude,Anaconda
4,Claude,Monty Python and the Holy Grail
5,Claude,Snakes on a Plane
6,Frida,The Matrix
7,Frida,The Shawshank Redemption
8,Frida,The Social Network
9,Georgia,Anaconda


In [27]:
from networkx.algorithms import bipartite
B = nx.Graph()
B.add_nodes_from(G_df['#Employee'],bipartite=0)
B.add_nodes_from(G_df['Movie'],bipartite=1)
B.add_edges_from(list(zip(G_df['#Employee'],G_df['Movie'])))

In [28]:
bipartite.is_bipartite(B)

True

In [29]:
len(B.nodes())

19

In [30]:
len(B.edges())

24

### Question 2
Using the graph from the previous question, add nodes attributes named 'type' where movies have the value 'movie' and employees have the value 'employee' and return that graph.

This function should return a bipartite networkx graph with node attributes {'type': 'movie'} or {'type': 'employee'}

In [31]:
B.add_nodes_from(G_df['#Employee'],type='employee')
B.add_nodes_from(G_df['Movie'],type='movie')

In [32]:
B.nodes(data=True)

NodeDataView({'Andy': {'bipartite': 0, 'type': 'employee'}, 'Claude': {'bipartite': 0, 'type': 'employee'}, 'Frida': {'bipartite': 0, 'type': 'employee'}, 'Georgia': {'bipartite': 0, 'type': 'employee'}, 'Joan': {'bipartite': 0, 'type': 'employee'}, 'Lee': {'bipartite': 0, 'type': 'employee'}, 'Pablo': {'bipartite': 0, 'type': 'employee'}, 'Vincent': {'bipartite': 0, 'type': 'employee'}, 'Anaconda': {'bipartite': 1, 'type': 'movie'}, 'Mean Girls': {'bipartite': 1, 'type': 'movie'}, 'The Matrix': {'bipartite': 1, 'type': 'movie'}, 'Monty Python and the Holy Grail': {'bipartite': 1, 'type': 'movie'}, 'Snakes on a Plane': {'bipartite': 1, 'type': 'movie'}, 'The Shawshank Redemption': {'bipartite': 1, 'type': 'movie'}, 'The Social Network': {'bipartite': 1, 'type': 'movie'}, 'Forrest Gump': {'bipartite': 1, 'type': 'movie'}, 'Kung Fu Panda': {'bipartite': 1, 'type': 'movie'}, 'The Dark Knight': {'bipartite': 1, 'type': 'movie'}, 'The Godfather': {'bipartite': 1, 'type': 'movie'}})

## Question 3
Find a weighted projection of the graph from answer_two which tells us how many movies different pairs of employees have in common.

This function should return a weighted projected graph.

In [41]:
X = set(G_df['#Employee'])
P=bipartite.weighted_projected_graph(B,X)

In [45]:
a=P.edges(data=True)
a

EdgeDataView([('Vincent', 'Frida', {'weight': 2}), ('Vincent', 'Pablo', {'weight': 1}), ('Pablo', 'Frida', {'weight': 2}), ('Pablo', 'Andy', {'weight': 1}), ('Joan', 'Lee', {'weight': 3}), ('Joan', 'Andy', {'weight': 1}), ('Lee', 'Andy', {'weight': 1}), ('Frida', 'Andy', {'weight': 1}), ('Claude', 'Andy', {'weight': 1}), ('Claude', 'Georgia', {'weight': 3}), ('Andy', 'Georgia', {'weight': 1})])

In [50]:
# Convert the EdgeDataView to a DataFrame
edge_df = pd.DataFrame(a, columns=['Employee1', 'Employee2', 'attributes'])
edge_df

Unnamed: 0,Employee1,Employee2,attributes
0,Vincent,Frida,{'weight': 2}
1,Vincent,Pablo,{'weight': 1}
2,Pablo,Frida,{'weight': 2}
3,Pablo,Andy,{'weight': 1}
4,Joan,Lee,{'weight': 3}
5,Joan,Andy,{'weight': 1}
6,Lee,Andy,{'weight': 1}
7,Frida,Andy,{'weight': 1}
8,Claude,Andy,{'weight': 1}
9,Claude,Georgia,{'weight': 3}


In [51]:
# Extract 'weight' attribute into a separate column
edge_df['weight'] = edge_df['attributes'].apply(lambda x: x['weight'])
edge_df

Unnamed: 0,Employee1,Employee2,attributes,weight
0,Vincent,Frida,{'weight': 2},2
1,Vincent,Pablo,{'weight': 1},1
2,Pablo,Frida,{'weight': 2},2
3,Pablo,Andy,{'weight': 1},1
4,Joan,Lee,{'weight': 3},3
5,Joan,Andy,{'weight': 1},1
6,Lee,Andy,{'weight': 1},1
7,Frida,Andy,{'weight': 1},1
8,Claude,Andy,{'weight': 1},1
9,Claude,Georgia,{'weight': 3},3


In [52]:
# Drop the 'attributes' column if not needed
edge_df.drop('attributes', axis=1, inplace=True)
edge_df

Unnamed: 0,Employee1,Employee2,weight
0,Vincent,Frida,2
1,Vincent,Pablo,1
2,Pablo,Frida,2
3,Pablo,Andy,1
4,Joan,Lee,3
5,Joan,Andy,1
6,Lee,Andy,1
7,Frida,Andy,1
8,Claude,Andy,1
9,Claude,Georgia,3


### Question 4
Suppose you'd like to find out if people that have a high relationship score also like the same types of movies.

Find the pearson correlation between employee relationship scores and the number of movies they have in common. If two employees have no movies in common it should be treated as a 0, not a missing value, and should be included in the correlation calculation.

This function should return a float.

In [53]:
G_df1=pd.read_csv(r"C:\Users\asus\OneDrive\Desktop\Priyank\1 project_directory\MyProjects\Networks\networkx files\Employee_Relationships.txt",
                delimiter="\t",header=0)
G_df1

Unnamed: 0,Employee1,Employee2,relationship_score
0,Andy,Claude,0
1,Andy,Frida,20
2,Andy,Georgia,-10
3,Andy,Joan,30
4,Andy,Lee,-10
5,Andy,Pablo,-10
6,Andy,Vincent,20
7,Claude,Frida,0
8,Claude,Georgia,90
9,Claude,Joan,0


In [57]:
comb=pd.merge(left=G_df1,right=edge_df,on=['Employee1','Employee2'],how='left')
comb=comb.fillna(0)
comb

Unnamed: 0,Employee1,Employee2,relationship_score,weight
0,Andy,Claude,0,0.0
1,Andy,Frida,20,0.0
2,Andy,Georgia,-10,1.0
3,Andy,Joan,30,0.0
4,Andy,Lee,-10,0.0
5,Andy,Pablo,-10,0.0
6,Andy,Vincent,20,0.0
7,Claude,Frida,0,0.0
8,Claude,Georgia,90,3.0
9,Claude,Joan,0,0.0


In [61]:
comb['relationship_score'].corr(comb['weight'])

0.5031625849659712