# Instructions
On this first assignment, applying the basic functions of the Igraph package is required. The following datasets are going to be used:

* Actors dataset - undirected graph - : For the 2005 Graph Drawing conference a data set was provided of the IMDB movie database. We will use a reduced version of this dataset, which derived all actor-actor collaboration edges where the actors co-starred in at least 2 movies together between 1995 and 2004. 


You have to complete the code chunks in this document but also analyze the results, extract insights and answer the short questions. Fill the CSV attached with your answers, sometimes just the number is enough, some others just a small sentence or paragraph. Remember to change the header with your email.

In your submission please upload both this document in HTML and the CSV with the solutions.


# Loading data

In this section, the goal is loading the datasets given, building the graph and analyzing basics metrics. Include the edge or node attributes you consider.

Describe the values provided by summary function on the graph object.

In [49]:
from igraph import *
import cairo
import pandas as pd
import plotly.express as px

actorsdf = pd.read_csv('/Users/francoandresbenvenuto/Downloads/imdb_actors_key.tsv', sep='\t',encoding= 'unicode_escape')

actorsdf.head(10)

Unnamed: 0,id,name,movies_95_04,main_genre,genres
0,15629,"Rudder, Michael (I)",12,Thriller,"Action:1,Comedy:1,Drama:1,Fantasy:1,Horror:1,N..."
1,5026,"Morgan, Debbi",16,Drama,"Comedy:2,Documentary:1,Drama:6,Horror:2,NULL:3..."
2,11252,"Bellows, Gil",33,Drama,"Comedy:6,Documentary:1,Drama:7,Family:1,Fantas..."
3,5150,"Dray, Albert",20,Comedy,"Comedy:6,Crime:1,Documentary:1,Drama:4,NULL:5,..."
4,4057,"Daly, Shane (I)",18,Drama,"Comedy:2,Crime:1,Drama:7,Horror:1,Music:1,Musi..."
5,12373,"Macfadyen, Angus",24,Drama,"Action:1,Adventure:1,Documentary:1,Drama:7,Fam..."
6,3453,"Djola, Badja",11,Drama,"Adult:1,Drama:7,Thriller:3"
7,9878,Twiggy,12,Music,"Documentary:5,Drama:1,Music:4,Romance:2"
8,4988,"Winfrey, Oprah",21,Music,"Comedy:1,Documentary:8,Drama:1,Family:1,Music:..."
9,13032,Champagne (II),11,Adult,"Adult:8,NULL:3"


In [50]:
edges_actorsdf = pd.read_csv('/Users/francoandresbenvenuto/Downloads/imdb_actor_edges.tsv', sep='\t',encoding= 'unicode_escape')

edges_actorsdf.head(10)

Unnamed: 0,from,to,weight
0,17776,17778,6
1,5578,9770,3
2,5578,929,2
3,5578,9982,2
4,1835,6278,2
5,1835,1664,7
6,1835,1791,2
7,1835,6435,2
8,1835,10037,4
9,1835,10697,3


**1) How many nodes are there?**

In [51]:

print('Number of nodes:')

actorsdf['id'].nunique()


Number of nodes:


17577

**2) How many edges are there?**

In [52]:
print('Number of edges:') 

len(edges_actorsdf.index)


Number of edges:


287074

# Degree distribution

Analyse the degree distribution. Compute the total degree distribution.

In [53]:
df_graph = Graph.DataFrame(edges_actorsdf, directed=False)

actorsdata = df_graph.get_vertex_dataframe()

In [54]:
actorsdata['amount_of_degrees'] = df_graph.degree(mode='all')

**3) How does this distributions look like?**

In [55]:
import plotly.express as px

fig1 = px.histogram(actorsdata, x="amount_of_degrees")
fig1.show()

**4) What is the maximum degree?**

In [56]:
max(df_graph.degree())

784

**5) What is the minum degree?**

In [57]:
min(df_graph.degree())

1

# Network Diameter and Average Path Length

You have functions in igraph to calculate the diameter and the average path length. Think if you should consider the weights, the directions, etc.

**6) What is the diameter of the graph?**

In [58]:
df_graph.diameter()

16

**7) What is the avg path length of the graph?**

In [59]:
df_graph.average_path_length()

4.890545545798965

# Node importance: Centrality measures

(Optional but recommended): Obtain the distribution of the number of movies made by an actor and the number of genres in which an actor starred in. It may be useful to analyze and discuss the results to be obtained in the following exercises.

In [60]:
actorsdf['amount_of_genres'] = actorsdf.genres.apply(lambda x: len(x.split(',')) )
actorsdf

Unnamed: 0,id,name,movies_95_04,main_genre,genres,amount_of_genres
0,15629,"Rudder, Michael (I)",12,Thriller,"Action:1,Comedy:1,Drama:1,Fantasy:1,Horror:1,N...",10
1,5026,"Morgan, Debbi",16,Drama,"Comedy:2,Documentary:1,Drama:6,Horror:2,NULL:3...",6
2,11252,"Bellows, Gil",33,Drama,"Comedy:6,Documentary:1,Drama:7,Family:1,Fantas...",11
3,5150,"Dray, Albert",20,Comedy,"Comedy:6,Crime:1,Documentary:1,Drama:4,NULL:5,...",8
4,4057,"Daly, Shane (I)",18,Drama,"Comedy:2,Crime:1,Drama:7,Horror:1,Music:1,Musi...",8
...,...,...,...,...,...,...
17572,16211,"Urrutia, Paulina",10,Romance,"Comedy:1,Drama:2,NULL:4,Romance:2,Short:1",5
17573,4910,"Kay, Lisa (I)",10,Comedy,"Comedy:5,Drama:1,Fantasy:1,NULL:2,Romance:1",5
17574,5746,"Sutherland, Kiefer",43,Drama,"Action:2,Comedy:3,Documentary:10,Drama:7,Famil...",13
17575,1645,"Glyde, Billy",182,Adult,"Adult:139,Drama:1,NULL:38,Sci-Fi:3,Short:1",5


Obtain three vectors with the degree, betweeness and closeness for each vertex of the actors' graph.

In [61]:
actorsdata['betweeness'] = df_graph.betweenness()
actorsdata['closeness'] = df_graph.closeness()

actorsdata.head(10)

Unnamed: 0_level_0,name,amount_of_degrees,betweeness,closeness
vertex ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0,8,94.526366,0.168868
1,1,40,9914.32713,0.16893
2,2,29,134403.55082,0.196873
3,3,6,154.697492,0.167262
4,4,42,31566.54805,0.185074
5,5,40,25408.367555,0.181312
6,6,9,1181.540684,0.171957
7,7,7,13.781625,0.157331
8,8,25,1153.017469,0.168902
9,9,25,8810.661314,0.172383


---
Obtain the list of the 20 actors with the largest degree centrality. It can be useful to show a list with the degree, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

In [62]:
centrality_actors = pd.merge(actorsdf,actorsdata, left_on='id', right_on='name')
centrality_actors.sort_values('amount_of_degrees',ascending=False).head(20)

Unnamed: 0,id,name_x,movies_95_04,main_genre,genres,amount_of_genres,name_y,amount_of_degrees,betweeness,closeness
12147,162,"Davis, Mark (V)",540,Adult,"Action:1,Adult:429,Comedy:3,Crime:1,Documentar...",10,162,784,931853.1,0.2493
1761,1743,"Sanders, Alex (I)",467,Adult,"Action:1,Adult:380,Adventure:1,Comedy:2,Docume...",10,1743,610,557236.5,0.245821
13442,1754,"North, Peter (I)",460,Adult,"Action:1,Adult:389,Documentary:5,Drama:5,NULL:...",8,1754,599,417338.5,0.241765
11272,1802,"Marcus, Mr.",435,Adult,"Adult:343,Crime:1,Documentary:2,NULL:86,Short:...",6,1802,584,1463808.0,0.249964
4092,407,"Tedeschi, Tony",364,Adult,"Adult:286,Adventure:1,Comedy:1,Documentary:2,D...",11,407,561,672163.5,0.245693
8354,164,"Dough, Jon",300,Adult,"Adult:248,Adventure:1,Comedy:1,Documentary:1,D...",8,164,555,863647.9,0.248562
5968,179,"Stone, Lee (II)",403,Adult,"Adult:310,Comedy:1,Documentary:1,Fantasy:2,NUL...",7,179,545,339310.9,0.238488
2236,176,"Voyeur, Vince",370,Adult,"Action:1,Adult:303,Comedy:3,Crime:1,Documentar...",10,176,533,381060.6,0.245783
5752,175,"Lawrence, Joel (II)",315,Adult,"Adult:257,Comedy:1,Documentary:1,Musical:1,NUL...",7,175,500,285123.6,0.241337
15511,160,"Steele, Lexington",429,Adult,"Adult:340,Comedy:1,Documentary:4,Drama:1,Fanta...",8,160,493,297173.5,0.240841


**8) Who is the actor with highest degree centrality?**

The actor with highest degree centrality: **Mark Davis**

**9) How do you explain the high degree of the top-20 list??**

The degree centrality has a clear increase since a lot of actors participate in several moviees.  We can find an outliar with Tom Hanks who has a high amount of degrees due to his high level of popularity and is not from the adult movie industry.
Adult movies actors are able to act in more movies than hollywood actors since they are shorter and cheaper.

Obtain the list of the 20 actors with the largest betweenness centrality. Show a list with the betweenness, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

In [63]:
centrality_actors.sort_values('betweeness',ascending=False).head(20)

Unnamed: 0,id,name_x,movies_95_04,main_genre,genres,amount_of_genres,name_y,amount_of_degrees,betweeness,closeness
10548,2108,"Jeremy, Ron",280,Adult,"Adult:149,Adventure:1,Animation:1,Comedy:15,Do...",14,2108,471,9748544.0,0.28272
4693,3284,"Chan, Jackie (I)",59,Comedy,"Action:2,Comedy:13,Crime:4,Documentary:18,Fami...",12,3284,135,4716909.0,0.287238
2563,564,"Cruz, Penélope",46,Drama,"Adventure:1,Comedy:2,Documentary:5,Drama:6,Fam...",13,564,182,4330663.0,0.295555
14433,14458,"Shahlavi, Darren",16,Action,"Action:4,Comedy:3,Documentary:1,Drama:1,Fantas...",9,14458,8,4295503.0,0.193886
15720,17308,"Del Rosario, Monsour",20,Action,"Action:8,Drama:3,Fantasy:1,Horror:2,NULL:1,Rom...",9,17308,6,4267099.0,0.163154
17458,285,"Depardieu, Gérard",56,Comedy,"Adventure:1,Comedy:15,Crime:2,Documentary:11,D...",11,285,159,4037356.0,0.278351
8799,13723,"Bachchan, Amitabh",35,Romance,"Action:1,Comedy:1,Crime:1,Documentary:1,Drama:...",13,13723,66,2570247.0,0.226349
10412,1529,"Jackson, Samuel L.",97,Drama,"Action:3,Adventure:1,Comedy:3,Crime:3,Document...",14,1529,427,2539614.0,0.309265
5517,5083,"Soualem, Zinedine",65,Comedy,"Animation:1,Comedy:17,Crime:3,Documentary:1,Dr...",12,5083,121,2368164.0,0.249825
15894,1923,"Del Rio, Olivia",84,Adult,"Adult:64,Drama:1,Fantasy:2,NULL:14,Sci-Fi:1,Sh...",6,1923,168,2316388.0,0.240033


**10) Who is the actor with highest betweenes?**

The actor with the highest betweenes: **Ron Jeremy**

**11) How do you explain the high betweenness of the top-20 list?**

We can see a correlation betweeen famous actors such as Samuel L. Jackson and Salma Hayek who have high levels of betweeness due to their celebrity profiles and hollywood stars.

Obtain the list of the 20 actors with the largest closeness centrality. Show a list with the closeness the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

In [64]:
centrality_actors[centrality_actors['closeness'] < 1].sort_values('closeness',ascending=False).head(20)

Unnamed: 0,id,name_x,movies_95_04,main_genre,genres,amount_of_genres,name_y,amount_of_degrees,betweeness,closeness
2109,16747,"Armanis, Julian",12,Adult,"Adult:11,Documentary:1",2,16747,6,24.0,0.714286
14828,13582,"Fazira, Erra",13,Romance,"Animation:1,Crime:1,NULL:2,Romance:9",4,13582,1,0.0,0.666667
13001,16913,"Lim, Kay Tong",11,Drama,"Comedy:3,Drama:3,Romance:2,Short:1,Thriller:1,...",6,16913,1,0.0,0.666667
6367,17822,"Lee, Mark (X)",10,Comedy,"Comedy:4,Crime:2,Drama:1,Family:1,NULL:1,Roman...",6,17822,1,0.0,0.666667
17467,13581,"Hassan, Jalaluddin",14,Romance,"Comedy:1,Drama:5,NULL:1,Romance:6,Sci-Fi:1",5,13581,1,0.0,0.666667
2567,17804,"Kovac, Erik",11,Adult,"Adult:8,Documentary:1,NULL:2",3,17804,6,1.0,0.588235
9514,17803,"Sulik, Dano",21,Adult,"Adult:17,Documentary:1,NULL:2,Romance:1",4,17803,6,1.0,0.588235
7288,16740,"Novotny, Pavel",15,Adult,Adult:15,1,16740,4,21.0,0.588235
6659,16745,"Bonnet, Sebastian",17,Adult,"Adult:14,Documentary:1,NULL:1,Romance:1",4,16745,6,1.0,0.588235
5377,16748,"Davidov, Ion",10,Adult,"Adult:7,Documentary:1,NULL:2",3,16748,6,1.0,0.588235


**12) Who is the actor with highest closeness centrality?**

The actor with the highest closeness centrality is **Julian Armanis** with 0.714286

**13) How do you explain the high closeness of the top-20 list?**

We can appreciate how the top 20 actors have strong interactions due to their short average distance to others.  Technically, it's shorter for nodes to reach these top 20 actors.