---

_You are currently looking at **version 1.1** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-social-network-analysis/resources/yPcBs) course resource._

---

# Assignment 1 - Creating and Manipulating Graphs

Eight employees at a small company were asked to choose 3 movies that they would most enjoy watching for the upcoming company movie night. These choices are stored in the file `Employee_Movie_Choices.txt`.

A second file, `Employee_Relationships.txt`, has data on the relationships between different coworkers. 

The relationship score has value of `-100` (Enemies) to `+100` (Best Friends). A value of zero means the two employees haven't interacted or are indifferent.

Both files are tab delimited.

In [70]:
import networkx as nx
import pandas as pd
import numpy as np
from networkx.algorithms import bipartite


# This is the set of employees
employees = set(['Pablo',
                 'Lee',
                 'Georgia',
                 'Vincent',
                 'Andy',
                 'Frida',
                 'Joan',
                 'Claude'])

# This is the set of movies
movies = set(['The Shawshank Redemption',
              'Forrest Gump',
              'The Matrix',
              'Anaconda',
              'The Social Network',
              'The Godfather',
              'Monty Python and the Holy Grail',
              'Snakes on a Plane',
              'Kung Fu Panda',
              'The Dark Knight',
              'Mean Girls'])


# you can use the following function to plot graphs
# make sure to comment it out before submitting to the autograder
def plot_graph(G, weight_name=None):
    '''
    G: a networkx G
    weight_name: name of the attribute for plotting edge weights (if G is weighted)
    '''
    %matplotlib notebook
    import matplotlib.pyplot as plt
    
    plt.figure()
    pos = nx.spring_layout(G)
    edges = G.edges()
    weights = None
    
    if weight_name:
        weights = [int(G[u][v][weight_name]) for u,v in edges]
        labels = nx.get_edge_attributes(G,weight_name)
        nx.draw_networkx_edge_labels(G,pos,edge_labels=labels)
        nx.draw_networkx(G, pos, edges=edges, width=weights);
    else:
        nx.draw_networkx(G, pos, edges=edges);

### Question 1

Using NetworkX, load in the bipartite graph from `Employee_Movie_Choices.txt` and return that graph.

*This function should return a networkx graph with 19 nodes and 24 edges*

In [32]:
#!cat Employee_Relationships.txt

Andy	Claude	0
Andy	Frida	20
Andy	Georgia	-10
Andy	Joan	30
Andy	Lee	-10
Andy	Pablo	-10
Andy	Vincent	20
Claude	Frida	0
Claude	Georgia	90
Claude	Joan	0
Claude	Lee	0
Claude	Pablo	10
Claude	Vincent	0
Frida	Georgia	0
Frida	Joan	0
Frida	Lee	0
Frida	Pablo	50
Frida	Vincent	60
Georgia	Joan	0
Georgia	Lee	10
Georgia	Pablo	0
Georgia	Vincent	0
Joan	Lee	70
Joan	Pablo	0
Joan	Vincent	10
Lee	Pablo	0
Lee	Vincent	0
Pablo	Vincent	-20


In [71]:
#!cat Employee_Movie_Choices.txt

G_df = pd.read_csv('Employee_Movie_Choices.txt')
G_df['employee'] = G_df['#Employee\tMovie'].apply(lambda x: x.split('\t')[0])
G_df['movie'] = G_df['#Employee\tMovie'].apply(lambda x: x.split('\t')[1])
del G_df['#Employee\tMovie']

def answer_one():
    G = nx.from_pandas_dataframe(G_df, 'employee', 'movie')
    return G

answer_one()

<networkx.classes.graph.Graph at 0x7fb1c902f860>

### Question 2

Using the graph from the previous question, add nodes attributes named `'type'` where movies have the value `'movie'` and employees have the value `'employee'` and return that graph.

*This function should return a networkx graph with node attributes `{'type': 'movie'}` or `{'type': 'employee'}`*

In [72]:
employee_list = G_df['employee'].tolist()
movie_list = G_df['movie'].tolist()

def answer_two():
    G = answer_one()
    for employee in employee_list:
        G.node[employee]['type'] = 'employee'
    for movie in movie_list:
        G.node[movie]['type'] = 'movie'
    return G

answer_two()

<networkx.classes.graph.Graph at 0x7fb1c902b908>

### Question 3

Find a weighted projection of the graph from `answer_two` which tells us how many movies different pairs of employees have in common.

*This function should return a weighted projected graph.*

In [73]:
def answer_three():
    G = answer_two()
    X = set(employee_list)
    P = bipartite.weighted_projected_graph(G, X)    
    return P

answer_three()

<networkx.classes.graph.Graph at 0x7fb1c9050b70>

### Question 4

Suppose you'd like to find out if people that have a high relationship score also like the same types of movies.

Find the Pearson correlation ( using `DataFrame.corr()` ) between employee relationship scores and the number of movies they have in common. If two employees have no movies in common it should be treated as a 0, not a missing value, and should be included in the correlation calculation.

*This function should return a float.*

In [75]:
def answer_four():
    relationship_df = pd.read_csv('Employee_Relationships.txt', delim_whitespace=True,
                                  header=None, names=['n1', 'n2', 'score'])
    relationship_df = relationship_df.set_index(keys=['n1', 'n2'])
    relationship_df.head()
    G = answer_three()
    result_dict = G.edge
    weight_list = []
    for employee in employee_list:
        part_result = result_dict[employee]
        for key, value in part_result.items():
            new_dict = {"n1":employee,
                        "n2":key,
                        "score2": value['weight']}
            weight_list.append(new_dict)
    interest_df = pd.DataFrame(weight_list)        
    interest_df = interest_df.drop_duplicates()
    interest_df = interest_df.set_index(keys=['n1', 'n2'])
    new_df = relationship_df.merge(interest_df, how='outer', left_index=True, right_index=True)
    new_df['score2'] = new_df['score2'].fillna(0)
    new_df = new_df.dropna()
    return new_df.corr()['score2']['score']

answer_four()

0.78839622217334759

0.78839622217334759

Help on method set_index in module pandas.core.frame:

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False) method of pandas.core.frame.DataFrame instance
    Set the DataFrame index (row labels) using one or more existing
    columns. By default yields a new object.
    
    Parameters
    ----------
    keys : column label or list of column labels / arrays
    drop : boolean, default True
        Delete columns to be used as the new index
    append : boolean, default False
        Whether to append columns to existing index
    inplace : boolean, default False
        Modify the DataFrame in place (do not create a new object)
    verify_integrity : boolean, default False
        Check the new index for duplicates. Otherwise defer the check until
        necessary. Setting to False will improve the performance of this
        method
    
    Examples
    --------
    >>> indexed_df = df.set_index(['A', 'B'])
    >>> indexed_df2 = df.set_index(['A', [0, 1, 2