# Event related community changes
Communities may change in relation to events. This notebook is looking at event related behaviour, especially in regards to interactions. While considering real life events would be extremely beneficial, due to the limitations of our data (only one month of postings) we need to consider very relevant articles as significant events. This makes sense since they are relevant events in the context of a forum.

__Research Question: Is there a correlation between community interactions and events and if so, how can it be quantified?__

* TODO
    * Find Events
    * Look at Interactions in relation to events

* IDEAS
    * There are 4 communities in the final state. how did they merge over time?

In [48]:
# automatically reload imports before executing any line in case you changed something
%load_ext autoreload
%autoreload 2

In [161]:
import utils
import read_graph
import networkx as nx
import pandas as pd
import pickle
import plotly.express as px

## Create Interaction Graph

In [83]:
all_postings = utils.read_all_postings()
all_votes = utils.read_all_votes()
G_int = read_graph.get_all_users_interactions(all_postings, all_votes)

In [59]:
all_postings.columns

Index(['ID_Posting', 'ID_Posting_Parent', 'ID_CommunityIdentity',
       'PostingHeadline', 'PostingComment', 'PostingCreatedAt', 'ID_Article',
       'ArticlePublishingDate', 'ArticleTitle', 'ArticleChannel',
       'ArticleRessortName', 'UserCommunityName', 'UserGender',
       'UserCreatedAt'],
      dtype='object')

In [58]:
all_votes.columns

Index(['ID_CommunityIdentity', 'ID_Posting', 'VoteNegative', 'VotePositive',
       'VoteCreatedAt', 'UserCommunityName', 'UserGender', 'UserCreatedAt'],
      dtype='object')

In [84]:
print(nx.info(G_int))

Name: 
Type: DiGraph
Number of nodes: 78199
Number of edges: 3247011
Average in degree:  41.5224
Average out degree:  41.5224


## Get Events

In [168]:
# move this to util

def extract_events(article_ids):
    events = {}
    for aid in article_ids:
        postings = all_postings[all_postings.ID_Article == aid]
        votes = all_votes[all_votes.ID_Posting.isin(postings.ID_Posting)]   
        events[aid] = len(postings) + len(votes)

    return {k: v for k, v in sorted(events.items(), key=lambda x: x[1], reverse=True)}

def save_events(events, path='../data/events.p'):
    with open(path, 'wb') as fp:
        pickle.dump(ord_events, fp, protocol=pickle.HIGHEST_PROTOCOL)
        
def load_events(path='../data/events.p'):
    with open(path, 'rb') as fp:
        data = pickle.load(fp)
    return data

In [169]:
# try to load events, if not construct it yourself
try:
    events = load_events()
except:
    article_ids = pd.unique(all_postings.ID_Article)
    events = extract_events(article_ids)
    save_events(events)

In [181]:
px.histogram(events.values())

## Find Communities

In [28]:
# For global state
global_state_community = utils.get_communities(G_int, min_size=100)
[len(x) for x in global_state_community]

[25912, 20315, 16735, 15059]