# Visualization 1 - Degree

## Libraries

In [None]:
import pandas as pd
import plotly.express as px

## Download data

In [None]:
df = pd.read_csv('gephi-inicial.csv')
df.head()

Unnamed: 0,Id,Label,timeset,d0,indegree,outdegree,degree
0,2010 Fifa World Cup Qualification (Inter-Confe...,2010 Fifa World Cup Qualification (Inter-Confe...,,0,6,0,6
1,Julio Alberto,Julio Alberto,,0,6,0,6
2,Miklos Molnar,Miklos Molnar,,0,4,0,4
3,Odawara,Odawara,,0,8,0,8
4,Chinese Football Association Footballer Of The...,Chinese Football Association Footballer Of The...,,0,5,0,5


## Boxplot and histogram

Used to understand data distribution and how to separate values into groups in a new column

*   Inferior limit: 4
*   1st quartile: 5
*   Median: 7
*   3rd quartile: 11
*   Superior limit: 20





In [None]:
fig = px.box(df, y='degree')
fig.show()

In [None]:
fig = px.histogram(df, x='degree', nbins=400)
fig.show()

## Data segregation

*   Group 1: $4 \leq degree \leq 5$ 
*   Group 2: $6 \leq degree \leq 7$
*   Group 3: $8 \leq degree \leq 11$
*   Group 4: $12 \leq degree \leq 20$
*   Group 5: $21 \leq degree \leq 1369$



In [None]:
list_group = []
for i in df.itertuples():
  if i.degree <= 5:
    list_group.append('Group 1')
  elif i.degree <= 7:
    list_group.append('Group 2')
  elif i.degree <= 11:
    list_group.append('Group 3')
  elif i.degree <= 20:
    list_group.append('Group 4')
  else:
    list_group.append('Group 5')

list_group[0:10]

['Group 2',
 'Group 2',
 'Group 1',
 'Group 3',
 'Group 1',
 'Group 1',
 'Group 2',
 'Group 1',
 'Group 4',
 'Group 2']

In [None]:
df['degree_group'] = list_group
df.head()

Unnamed: 0,Id,Label,timeset,d0,indegree,outdegree,degree,degree_group
0,2010 Fifa World Cup Qualification (Inter-Confe...,2010 Fifa World Cup Qualification (Inter-Confe...,,0,6,0,6,Group 2
1,Julio Alberto,Julio Alberto,,0,6,0,6,Group 2
2,Miklos Molnar,Miklos Molnar,,0,4,0,4,Group 1
3,Odawara,Odawara,,0,8,0,8,Group 3
4,Chinese Football Association Footballer Of The...,Chinese Football Association Footballer Of The...,,0,5,0,5,Group 1


## Saving data

In [None]:
df.to_csv('gephi-groups.csv', index=False)

# Visualization 2 - Degree (improved)

## Download data

In [None]:
df = pd.read_csv('gephi-inicial.csv')
df.head()

Unnamed: 0,Id,Label,timeset,d0,indegree,outdegree,degree
0,2010 Fifa World Cup Qualification (Inter-Confe...,2010 Fifa World Cup Qualification (Inter-Confe...,,0,6,0,6
1,Julio Alberto,Julio Alberto,,0,6,0,6
2,Miklos Molnar,Miklos Molnar,,0,4,0,4
3,Odawara,Odawara,,0,8,0,8
4,Chinese Football Association Footballer Of The...,Chinese Football Association Footballer Of The...,,0,5,0,5


## Node filtering and counting

In [None]:
df_100 = df[df.degree >= 100]
print(f"{df_100.shape[0]} nodes")
df_100

460 nodes


Unnamed: 0,Id,Label,timeset,d0,indegree,outdegree,degree
90,El-Hadji Diouf,El-Hadji Diouf,,0,39,329,368
94,Fifa World Cup Mascot,Fifa World Cup Mascot,,0,12,170,182
128,2010 Fifa World Cup,2010 Fifa World Cup,,0,197,576,773
135,Russia National Football Team,Russia National Football Team,,0,129,460,589
137,Julius Aghahowa,Julius Aghahowa,,0,7,126,133
...,...,...,...,...,...,...,...
9883,2002 Fifa World Cup Group E,2002 Fifa World Cup Group E,,0,27,143,170
9900,Ecuadorian Football Federation,Ecuadorian Football Federation,,0,21,118,139
9903,Takayuki Suzuki,Takayuki Suzuki,,0,14,136,150
9905,Ahn Jung-Hwan,Ahn Jung-Hwan,,0,18,218,236


## Data segregation

*   Group 1: $100 \leq degree \leq 199$ 
*   Group 2: $200 \leq degree \leq 299$
*   Group 3: $300 \leq degree \leq 499$
*   Group 4: $500 \leq degree \leq 699$
*   Group 5: $700 \leq degree \leq 1369$

In [None]:
list_group = []
for i in df_100.itertuples():
  if i.degree <= 199:
    list_group.append('Group 1')
  elif i.degree <= 299:
    list_group.append('Group 2')
  elif i.degree <= 499:
    list_group.append('Group 3')
  elif i.degree <= 699:
    list_group.append('Group 4')
  else:
    list_group.append('Group 5')

list_group[0:10]

['Group 3',
 'Group 1',
 'Group 5',
 'Group 4',
 'Group 1',
 'Group 1',
 'Group 1',
 'Group 1',
 'Group 1',
 'Group 2']

In [None]:
df_100['degree_group'] = list_group
df_100.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_100['degree_group'] = list_group


Unnamed: 0,Id,Label,timeset,d0,indegree,outdegree,degree,degree_group
90,El-Hadji Diouf,El-Hadji Diouf,,0,39,329,368,Group 3
94,Fifa World Cup Mascot,Fifa World Cup Mascot,,0,12,170,182,Group 1
128,2010 Fifa World Cup,2010 Fifa World Cup,,0,197,576,773,Group 5
135,Russia National Football Team,Russia National Football Team,,0,129,460,589,Group 4
137,Julius Aghahowa,Julius Aghahowa,,0,7,126,133,Group 1


## Saving data

In [None]:
df_100.to_csv('gephi-groups-100.csv', index=False)