### Dataset by:
```
@misc{rozemberczki2021twitch,
      title={Twitch Gamers: a Dataset for Evaluating Proximity Preserving and Structural Role-based Node Embeddings}, 
      author={Benedek Rozemberczki and Rik Sarkar},
      year={2021},
      eprint={2101.03091},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}
```

### Social Network Analysis of Twitch Partners
A social network of Twitch users which was collected from the public API in Spring 2018. Nodes are Twitch users and edges are mutual follower relationships between them. The graph forms a single strongly connected component without missing attributes. The machine learning tasks related to the graph are count data regression and node classification.

In [61]:
import networkx as nx   
import pandas as pd
import seaborn as sns

## Netzwerk einlesen und in NetworkX abbilden

In [20]:
# Netzwerk einlesen
fh = open("data/large_twitch_edges.csv", "rb")
G = nx.read_edgelist(fh, delimiter=',')
fh.close()

In [23]:
# Remove header nodes
G.remove_node('numeric_id_1')
G.remove_node('numeric_id_2')

### Manuelle Überprüfung, ob das Netz korrekt eingelesen wurde

In [44]:
edge_list = pd.read_csv('data/large_twitch_edges.csv').astype(str)
edge_list.head()

Unnamed: 0,numeric_id_1,numeric_id_2
0,98343,141493
1,98343,58736
2,98343,140703
3,98343,151401
4,98343,157118


In [49]:
# Überprüfung, ob die Nachbarn von 98343 korrekt abgebildet wurden.
assert set(G.neighbors('98343')) == set(edge_list[edge_list.numeric_id_1=='98343'].numeric_id_2), "Neighbors of 98343 are not correct."

## EDA auf Knotenattributen und Netzwerkgrösse
### Netzwerkgrösse

In [80]:
print(f'Anzahl Knoten: {G.number_of_nodes()}')
print(f'Anzahl Kanten: {G.number_of_edges()}')
print(f'Density: {nx.density(G)}')

Anzahl Knoten: 168114
Anzahl Kanten: 6797557
Density: 0.00048103610439398153


Wir haben es also mit einem sehr grossen Netzwerk zu tun, welches aber wenig Kanten hat im Vergleich mit der Anzahl Knoten. Dies ergibt soweit Sinn, da eine Kante nur entsteht, wenn sich User gegenseitig followen. User, welche fast nur anderen zuschauen, werden wohl wenig Kanten haben, da diese eher selten gefollowed werden. Wir werden das Netzwerk nach gewissen Knotenattributen filtern, um die Prozessierung zu ermöglichen.

### Knotenattribute

In [4]:
features = pd.read_csv('data/large_twitch_features.csv')

In [56]:
features.columns

Index(['views', 'mature', 'life_time', 'created_at', 'updated_at',
       'numeric_id', 'dead_account', 'language', 'affiliate'],
      dtype='object')

Der Datensatz hat folgende Attribute:
- `views`: Anzahl Views auf dem User / Channel
- `mature`: Ob der Channel sich an Erwachsene richtet.
- `life_time`: Anzahl Tage zwischen erstem Stream und letztem Stream.
- `created_at`: Wann der User erstellt wurde
- `updated_at`: Wann der User das letzte mal gestreamed hat.
- `numeric_id`: Unique ID des Users
- `dead_account`: Ob der Account deaktiviert wurde oder der User lange nicht aktiv war (~3 Monate)
- `language`: Sprache des Users / Channel
- `affiliate`: Ob der User ein Twitchpartner ist.


In [79]:
features.isna().sum()

views           0
mature          0
life_time       0
created_at      0
updated_at      0
numeric_id      0
dead_account    0
language        0
affiliate       0
dtype: int64

In [6]:
node_list = []
for rows in features.iterrows():
    node_list.append((str(rows[1]['numeric_id']), {'views': rows[1]['views']}))

In [11]:
G.add_nodes_from(node_list)

In [12]:
list(G.adj['1'])

['113417',
 '95914',
 '155127',
 '71050',
 '112881',
 '57532',
 '64605',
 '8079',
 '6250',
 '125642',
 '30268',
 '53724',
 '123076',
 '165361',
 '149069',
 '42688',
 '90650',
 '35358',
 '8176',
 '54063',
 '32920',
 '31140',
 '56352',
 '83691',
 '87271',
 '32338',
 '61862',
 '37818',
 '55192',
 '85756',
 '110345',
 '85701',
 '52703',
 '119034',
 '152296',
 '144643',
 '161362',
 '108809',
 '128864',
 '35408',
 '94483',
 '83095',
 '61780',
 '78367',
 '26309',
 '86321',
 '119017',
 '79844',
 '89069',
 '158403',
 '78354',
 '39372',
 '58493',
 '126773',
 '128328',
 '130232',
 '105992',
 '5507',
 '64419',
 '55261',
 '45001',
 '120819',
 '98175',
 '53444',
 '38031',
 '34625',
 '41855',
 '86266',
 '137899',
 '21357',
 '40868',
 '148905',
 '117105',
 '67333',
 '8121',
 '31559',
 '85877',
 '111372',
 '107854',
 '161658',
 '14649',
 '77138',
 '11288',
 '11604',
 '84226',
 '146335',
 '37297',
 '139448',
 '113692',
 '165834',
 '149295',
 '96173',
 '32147',
 '1852',
 '146235',
 '88518',
 '145545',
 '

In [13]:
G.nodes['1']

{'views': 500}

In [14]:
G['1']

AtlasView({'113417': {}, '95914': {}, '155127': {}, '71050': {}, '112881': {}, '57532': {}, '64605': {}, '8079': {}, '6250': {}, '125642': {}, '30268': {}, '53724': {}, '123076': {}, '165361': {}, '149069': {}, '42688': {}, '90650': {}, '35358': {}, '8176': {}, '54063': {}, '32920': {}, '31140': {}, '56352': {}, '83691': {}, '87271': {}, '32338': {}, '61862': {}, '37818': {}, '55192': {}, '85756': {}, '110345': {}, '85701': {}, '52703': {}, '119034': {}, '152296': {}, '144643': {}, '161362': {}, '108809': {}, '128864': {}, '35408': {}, '94483': {}, '83095': {}, '61780': {}, '78367': {}, '26309': {}, '86321': {}, '119017': {}, '79844': {}, '89069': {}, '158403': {}, '78354': {}, '39372': {}, '58493': {}, '126773': {}, '128328': {}, '130232': {}, '105992': {}, '5507': {}, '64419': {}, '55261': {}, '45001': {}, '120819': {}, '98175': {}, '53444': {}, '38031': {}, '34625': {}, '41855': {}, '86266': {}, '137899': {}, '21357': {}, '40868': {}, '148905': {}, '117105': {}, '67333': {}, '8121':

In [17]:
print(*list(G.neighbors('98343')), sep=" ")

141493 58736 140703 151401 157118 125430 3635 495 116648 1679 123861 89631 113417 145281 10408 3181 40675 95914 155127 124827 16783 122269 87516 106969 10372 66893 75403 143081 44326 95697 48850 59892 159097 63150 90148 78899 30520 54230 90697
