# Twitch network analysis

The Twitch social network composes of users. A small percent of those users broadcast their gameplay or activities through live streams. In the graph model, users who do live streams are tagged with a secondary label Stream. Additional information about which teams they belong to, which games they play on stream, and in which language they present their content is present. You also know how many followers they had at the moment of scraping, the all-time historical view count, and when they created their user account. The most relevant information for network analysis is knowing which users engaged in the streamer’s chat. You can distinguish if the user who chatted in the stream was a regular user (CHATTER relationship), a moderator of the stream (MODERATOR relationship), or a VIP of the stream.

The network information was scraped between the 7th and the 10th of May 2021.

In [1]:
# Import Libraries
%matplotlib inline
from neo4j import GraphDatabase
import pandas as pd
import seaborn as sns
from matplotlib import pyplot
# Visual
sns.set(font_scale = 1.5)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 1000) # Enables Pandas to display long strings properly

import warnings 
warnings.filterwarnings('ignore')

In [2]:
# Connect to Neo4j
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'letmein'))

def read_query(query):
    with driver.session() as session:
        result = session.run(query)
        return pd.DataFrame([r.values() for r in result], columns=result.keys())

## Community detection
The last category of graph algorithms we will look at is the community detection category. Community detection or clustering algorithms are used to infer the community structure of a given network. Communities are vaguely defined as groups of nodes within a network that are more densely connected to one another than to other nodes. We could try to examine the community structure of the whole user network, but that does not make a pretty network visualization of results. First of all, we will release the existing project network from memory.

Since the 2 focus were on ESL, where CSGO was covered, and Riot Games, where League of Lengends was covered, it is likely that there are other users streaming the same tournements from 7-10may. Moving forward, analysis will be done on the community structure of a subgraph that contains CSGO and League of Legends streamers. To ease our further queries, we will first tag relevant nodes with an additional node label.

In [3]:
read_query("""
MATCH (s:Stream)-[:PLAYS]->(g:Game)
WHERE g.name in ["League of Legends","Counter-Strike: Global Offensive"]
SET s:leagueCSGO
""")

![title](img/node_community_overview.png)

There were a total of 164 streamers that broadcasted either CSGO or League of Legends on their channel. We can see that many of the nodes are isolated, a couple in pairs, and a few that have many nodes connected with one another.
Following from [this](https://towardsdatascience.com/twitchverse-a-network-analysis-of-twitch-universe-using-neo4j-graph-data-science-d7218b4453ff) blog post. The Louvain Modularity algorithm will be used to infer the community structure of this subgraph. This network will be treated as undirected. This is because if a streamer engages in another streamer's chat, they are probably friends. The results of the Louvain Modularity algorithm will be stored back onto the graph, so the community structure information can be used in visualizations.

In [4]:
read_query("""
CALL gds.graph.project('CSGOleague','leagueCSGO', {ALL: {orientation:'UNDIRECTED', type:'*'}})
""")

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'leagueCSGO': {'label': 'leagueCSGO', 'properties': {}}}","{'ALL': {'orientation': 'UNDIRECTED', 'aggregation': 'DEFAULT', 'type': '*', 'properties': {}}}",CSGOleague,422,622,823


In [5]:
read_query("""CALL gds.louvain.write('CSGOleague', {
    writeProperty:'louvain_CSGOleague'
})""")

Unnamed: 0,writeMillis,nodePropertiesWritten,modularity,modularities,ranLevels,communityCount,communityDistribution,postProcessingMillis,preProcessingMillis,computeMillis,configuration
0,7,422,0.781821,"[0.6012189700271916, 0.7588993083198065, 0.7818209075588549]",3,208,"{'p99': 25, 'min': 1, 'max': 39, 'mean': 2.0288461538461537, 'p90': 2, 'p50': 1, 'p999': 39, 'p95': 5, 'p75': 1}",1,0,88,"{'maxIterations': 10, 'writeConcurrency': 4, 'seedProperty': None, 'consecutiveIds': False, 'maxLevels': 10, 'relationshipWeightProperty': None, 'concurrency': 4, 'writeProperty': 'louvain_CSGOleague', 'includeIntermediateCommunities': False, 'nodeLabels': ['*'], 'sudo': False, 'relationshipTypes': ['*'], 'tolerance': 0.0001, 'username': None}"


![title](img/node_community_closeup.png)

After the algorithm has been applied, I will be taking a look at the bigger connected nodes. The color coded nodes seems to suggest that these various streamers within the community are isolated within their own language even though they probably play the same game. Users having "CHATTER" relationships that connects to the different streamers also suggests that the user is only interacting with the bot, where there are input commands by the user, and the bot will respond with the corresponding message.

In [6]:
read_query("""CALL apoc.periodic.iterate("
    MATCH (u:User)
    WHERE size((u)-->(:Stream)) > 1
    RETURN u",
    "SET u:Audiences",
    {batchSize:50000, parallel:true}
)""")

Unnamed: 0,batches,total,timeTaken,committedOperations,failedOperations,failedBatches,retries,errorMessages,batch,operations,wasTerminated,failedParams,updateStatistics
0,67,3304360,10,3304360,0,0,0,{},"{'total': 67, 'committed': 67, 'failed': 0, 'errors': {}}","{'total': 3304360, 'committed': 3304360, 'failed': 0, 'errors': {}}",False,{},"{'nodesDeleted': 0, 'labelsAdded': 0, 'relationshipsCreated': 0, 'nodesCreated': 0, 'propertiesSet': 0, 'relationshipsDeleted': 0, 'labelsRemoved': 0}"


In [7]:
read_query("""
CALL gds.graph.project('shared-audience',
  ['leagueCSGO', 'Audiences'],
  {CHATTERS: {type:'*', orientation:'REVERSE'}})
""")

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,projectMillis
0,"{'leagueCSGO': {'label': 'leagueCSGO', 'properties': {}}, 'Audiences': {'label': 'Audiences', 'properties': {}}}","{'CHATTERS': {'orientation': 'REVERSE', 'aggregation': 'DEFAULT', 'type': '*', 'properties': {}}}",shared-audience,3304639,4016885,1084


In [8]:
read_query("""
CALL gds.nodeSimilarity.mutate('shared-audience',
 {similarityCutoff:0.05, topK:15, mutateProperty:'score', mutateRelationshipType:'SHARED_AUDIENCE', similarityMetric: 'Jaccard'})
""")

Unnamed: 0,preProcessingMillis,computeMillis,mutateMillis,postProcessingMillis,nodesCompared,relationshipsWritten,similarityDistribution,configuration
0,0,3743,46,-1,1298,5736,"{'p1': 0.05023908615112305, 'max': 0.541187047958374, 'p5': 0.05135941505432129, 'p90': 0.17142844200134277, 'p50': 0.07407402992248535, 'p95': 0.21311545372009277, 'p10': 0.05328202247619629, 'p75': 0.11111092567443848, 'p99': 0.30909132957458496, 'p25': 0.05963301658630371, 'p100': 0.541187047958374, 'min': 0.04999995231628418, 'mean': 0.09602355158977428, 'stdDev': 0.057099034034213056}","{'topK': 15, 'similarityMetric': 'Jaccard', 'bottomK': 10, 'bottomN': 0, 'relationshipWeightProperty': None, 'mutateRelationshipType': 'SHARED_AUDIENCE', 'topN': 0, 'concurrency': 4, 'degreeCutoff': 1, 'similarityCutoff': 0.05, 'nodeLabels': ['*'], 'sudo': False, 'relationshipTypes': ['*'], 'mutateProperty': 'score', 'username': None}"


In [9]:
read_query("""
CALL gds.louvain.write('shared-audience', 
       { nodeLabels:['leagueCSGO'],
         relationshipTypes:['SHARED_AUDIENCE'], 
         relationshipWeightProperty:'score',
         writeProperty:'louvain_shared_audience'})
""")

Unnamed: 0,writeMillis,nodePropertiesWritten,modularity,modularities,ranLevels,communityCount,communityDistribution,postProcessingMillis,preProcessingMillis,computeMillis,configuration
0,5,422,0.788481,"[0.7660461923438506, 0.7884806730461174]",2,164,"{'p99': 15, 'min': 1, 'max': 83, 'mean': 2.573170731707317, 'p90': 5, 'p50': 1, 'p999': 83, 'p95': 7, 'p75': 2}",2,1,91,"{'maxIterations': 10, 'writeConcurrency': 4, 'seedProperty': None, 'consecutiveIds': False, 'maxLevels': 10, 'relationshipWeightProperty': 'score', 'concurrency': 4, 'writeProperty': 'louvain_shared_audience', 'includeIntermediateCommunities': False, 'nodeLabels': ['leagueCSGO'], 'sudo': False, 'relationshipTypes': ['SHARED_AUDIENCE'], 'tolerance': 0.0001, 'username': None}"


In [10]:
# Helps to run codes smoothly without recreating database from scratch when restarting notebook
read_query("""
CALL gds.graph.drop("CSGOleague")
""")

Unnamed: 0,graphName,database,memoryUsage,sizeInBytes,nodeCount,relationshipCount,configuration,density,creationTime,modificationTime,schema
0,CSGOleague,neo4j,,-1,422,622,"{'relationshipProjection': {'ALL': {'orientation': 'UNDIRECTED', 'aggregation': 'DEFAULT', 'type': '*', 'properties': {}}}, 'nodeProjection': {'leagueCSGO': {'label': 'leagueCSGO', 'properties': {}}}, 'relationshipProperties': [], 'creationTime': 2022-06-08T06:44:36.865747300+08:00, 'validateRelationships': False, 'readConcurrency': 4, 'sudo': False, 'nodeProperties': [], 'username': None}",0.003501,2022-06-08T06:44:36.865747300+08:00,2022-06-08T06:44:37.689601200+08:00,"{'relationships': {'ALL': {}}, 'nodes': {'leagueCSGO': {}}}"


In [11]:
# # Helps to run codes smoothly without recreating database from scratch when restarting notebook
read_query("""
CALL gds.graph.drop("shared-audience")
""")

Unnamed: 0,graphName,database,memoryUsage,sizeInBytes,nodeCount,relationshipCount,configuration,density,creationTime,modificationTime,schema
0,shared-audience,neo4j,,-1,3304639,4022621,"{'relationshipProjection': {'CHATTERS': {'orientation': 'REVERSE', 'aggregation': 'DEFAULT', 'type': '*', 'properties': {}}}, 'nodeProjection': {'leagueCSGO': {'label': 'leagueCSGO', 'properties': {}}, 'Audiences': {'label': 'Audiences', 'properties': {}}}, 'relationshipProperties': [], 'creationTime': 2022-06-08T06:44:48.549505100+08:00, 'validateRelationships': False, 'readConcurrency': 4, 'sudo': False, 'nodeProperties': [], 'username': None}",3.683504e-07,2022-06-08T06:44:48.549505100+08:00,2022-06-08T06:44:53.465351400+08:00,"{'relationships': {'SHARED_AUDIENCE': {'score': 'Float (DefaultValue(NaN), TRANSIENT, Aggregation.NONE)'}, 'CHATTERS': {}}, 'nodes': {'leagueCSGO': {}, 'Audiences': {}}}"
