# Table of contents
1. Introduction
2. Datasets and preprocessing
3. Visualisations
    - property 1 etc.
4. Reflection
5. Work distribution


# 1. Introduction
- Aims and Goals
- Take a general audience into network science
- explain the multiple perspectives

relationships between entities in a network

the project is designed so that a person with little knowledge of networks can understand the main features of social networks

the project is designed to include a person with little knowledge of networks in understanding properties of networks

# 2. Datasets and preprocessing
This project is based on three different datasets of social networks that can be found on Kaggle.(reffer to links). Each dataset is a representation of a network in the form of a list of edges. To some datasets there is also a file added with node attributes. This file is often a `.json` file. The latest dataset is slightly different from the first two. Unique to this dataset, is the fact that it is a dataset containing multiple datasets of social networks.

Before the datasets can be read in, it is necessary to manually select which datasets will and will not be selected for the project. This selection is based on the size of the datasets to reduce the risk of very long processing times. The table below is an overview of all the different networks, retrieved from the datasets, that will be used in this project. For each network, it is given what its node representation is, what its link representation is and what the node attributes are if applicable.

|Dataset|Node representation|Link representation|Node attributes if applicable|
|---|---|---|---|
|**NashvilleMeetupNetwork**|Member of a Meetup group|Shared group membership in 'weight' groups|n/a|
|**DeezerHR**|Deezer users from Croatia|The relation friendship|Genre preferences of each user|
|**DeezerHU**|Deezer users from Hungary|The relation friendship|Genre preferences of each user|
|**DeezerRO**|Deezer users from Romania|The relation friendship|Genre preferences of each user|
|**FacebookLargePage**|Official Facebook pages| The amount of mutual likes between pages| Descriptions of the purpose of the site|
|**FeatherDeezerSocial**|Deezer users from Europe|The relation friendship|Artists liked by the users|
|**FeatherLastfmSocial**|
|**GemsecFacebook**|
|**GithubSocial**|
|**TwitchGames**|
|**TwitchSocialNetworks**|

Other basic info that is excluded:
- Time span of the network
- Where the data is retrieved from

## Importing the relevant python libaries

In [1]:
import networkx as nx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Read in datasets with pandas
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

In [9]:
NashvilleMeetupNetwork_member_edges = pd.read_csv('datasets/NashvilleMeetupNetwork/member-edges.csv', index_col=0)
DeezerHR_edges = pd.read_csv('datasets/DeezerSocialNetworks/HR/HR_edges.csv')
DeezerHU_edges = pd.read_csv('datasets/DeezerSocialNetworks/HU/HU_edges.csv')
DeezerRO_edges = pd.read_csv('datasets/DeezerSocialNetworks/RO/RO_edges.csv')
FacebookLargePage_edges = pd.read_csv('datasets/facebook-large-page-page-network/musae_facebook_edges.csv')
FeatherDeezerSocial_edges = pd.read_csv('datasets/feather-deezer-social/deezer_europe_edges.csv')
FeatherLastfmSocial_edges = pd.read_csv('datasets/feather-lastfm-social/lastfm_asia_edges.csv')
GemsecFacebook_edges = 
GithubSocial_edges = 
TwitchGames_edges = 
TwitchSocialNetworks_edges = 

SyntaxError: invalid syntax (997943277.py, line 8)

## Construct graphs with networkx
https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.from_pandas_edgelist.html

In [3]:
NashvilleMeetupNetwork = nx.from_pandas_edgelist(NashvilleMeetupNetwork_member_edges, 'member1', 'member2', edge_attr='weight', create_using=nx.Graph)
DeezerHR = nx.from_pandas_edgelist(DeezerHR_edges, 'node_1', 'node_2', create_using=nx.Graph)
DeezerHU = nx.from_pandas_edgelist(DeezerHU_edges, 'node_1', 'node_2', create_using=nx.Graph)
DeezerRO = nx.from_pandas_edgelist(DeezerRO_edges, 'node_1', 'node_2', create_using=nx.Graph)
FacebookLargePage = nx.from_pandas_edgelist(FacebookLargePage_edges, 'id_1', 'id_2', create_using=nx.Graph)
FeatherDeezerSocial = nx.from_pandas_edgelist(FeatherDeezerSocial_edges, 'node_1', 'node_2', create_using=nx.Graph)
FeatherLastfmSocial = nx.from_pandas_edgelist(FeatherLastfmSocial_edges, 'node_1', 'node_2', create_using=nx.Graph)
GemsecFacebook = 
GithubSocial = 
TwitchGames = 
TwitchSocialNetworks = 

# Destriptive statistics of different networks
- number of nodes
- number of edges
- density
- connectedness
- average clustering coefficient

In [4]:
networks = {
    'NashvilleMeetupNetwork' : NashvilleMeetupNetwork,
    'DeezerHR' : DeezerHR,
    'DeezerHU' : DeezerHU,
    'DeezerRO' : DeezerRO,
    'FacebookLargepage' : FacebookLargePage
}

In [5]:
descriptive_stats = pd.DataFrame(index=list(networks.keys()), columns=['number_of_nodes', 'number_of_edges', 'density', 'connected_network', 'avg_cc'])
for name, network in networks.items():
    descriptive_stats['number_of_nodes'].loc[name] = nx.number_of_nodes(network)
    descriptive_stats['number_of_edges'].loc[name] = nx.number_of_edges(network)
    descriptive_stats['density'].loc[name] = nx.density(network)
    descriptive_stats['connected_network'].loc[name] = nx.is_connected(network)
    descriptive_stats['avg_cc'].loc[name] = nx.average_clustering(network) #Duurt ook lang om te berekenen

descriptive_stats

Unnamed: 0,number_of_nodes,number_of_edges,density,connected_network,avg_cc
NashvilleMeetupNetwork,11372,1176368,0.018194,True,0.884957
DeezerHR,54573,498202,0.000335,True,0.136477
DeezerHU,47538,222887,0.000197,True,0.116187
DeezerRO,41773,125826,0.000144,True,0.091212
FacebookLargepage,22470,171002,0.000677,True,0.359738


In [None]:
# degree_sequence = sorted((d for n, d in DeezerHR.degree()), reverse=True)
# dmax = max(degree_sequence)

# fig = plt.figure("Degree of a random graph", figsize=(8, 8))
# # Create a gridspec for adding subplots of different sizes
# axgrid = fig.add_gridspec(5, 4)

# ax2 = fig.add_subplot(axgrid[3:, 2:])
# ax2.bar(*np.unique(degree_sequence, return_counts=True))
# ax2.set_title("Degree histogram")
# ax2.set_xlabel("Degree")
# ax2.set_ylabel("# of Nodes")

# fig.tight_layout()
# plt.show()

In [6]:
networks.keys()

dict_keys(['NashvilleMeetupNetwork', 'DeezerHR', 'DeezerHU', 'DeezerRO', 'FacebookLargepage'])

# Iedeen over wat we verder nog kunnen zeggen over netwerken
- Nodes in larges SCC
- Average clustering coefficient
- Number of triangles
- fraction of closed triangles
- Diameter (longest shortest path) In order to say something about small world property.
- Relationship between network density and diameter, average cc, etc.

## Oude tekst bij datasets and preprocessing

### 1. Nashville Meetup Network
https://www.kaggle.com/datasets/stkbailey/nashville-meetup?select=rsvps.csv

Dataset about who goes to what meetups. From these relations a social network can be constructed.

- `member-to-group-edges.csv`: Edge list for constructing a member-to-group bipartite graph. Weights represent number of events attended in each group.
- `group-edges.csv`: Edge list for constructing a group-to-group graph. Weights represent shared members between groups.
- `member-edges.csv`: Edge list for constructing a member-to-member graph. Weights represent shared group membership.
- `rsvps.csv`: Raw member-to-event attendance data, which was aggregated to form member-to-group-edges.csv.

In short the relations mean:
- `member-to-group-edges`: member is part of group and has attended 'weight' events in this group
- `group-edges`: group A has 'weight' shared members with group B
- `member-edges`: member A attented in 'weight' events with member B

### 2. Deezer Social Networks
https://www.kaggle.com/datasets/andreagarritano/deezer-social-networks

Dataset about friendship networks of users on Deezer in three European countries: Romania, Croatia and Hungary. The edges represent the relationship 'friendship'. Since this data is from three countries, it are basicly three different networks. (There is also a json file which contains properties of a node, namely its genre preferences.)

### 3. Facebook Large Page Page Network
https://www.kaggle.com/datasets/wolfram77/graphs-social?select=feather-lastfm-social

This is a webgraph of verified Facebook sites. The nodes represent official facebook pages. The edges represent mutual likes between the sites. (There is also a json file which contains properties of a node, namely its genre preferences.) This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories which are defined by Facebook. These categories are: politicians, governmental organizations, television shows and companies.

# datasets die we nog kunnen toevoegen:
- feather-deezer-social
- feather-lastfm-social
- gemsec-Facebook
- github-social
- twitch_games
- twitch-social-networks

# datasets die we kunnen toevoegen met wat preprocessing
De edges staan namelijk opgeslagen als txt file en niet als csv file. Dat betekent dat je niet gelijk met pandas read_csv kan toepassen maar eerst de commas(comma seperated values) aan de file moet toevoegen.
- soc-Epinions1.txt
- soc-sign-Slashdot081106.txt
- soc-sign-Slashdot090216.txt
- soc-sign-Slashdot090221.txt
- soc-Slashdot0811.txt
- soc-Slashdot0902.txt


dit is een extra test
