# Event related community changes
Communities may change in relation to events. This notebook is looking at event related behaviour, especially in regards to interactions. While considering real life events would be extremely beneficial, due to the limitations of our data (only one month of postings) we need to consider very relevant articles as significant events. This makes sense since they are relevant events in the context of a forum.

__Research Question: Is there a correlation between community interactions and events and if so, how can it be quantified?__

* TODO
    * Find Events
    * Look at Interactions in relation to events

* IDEAS
    * There are 4 communities in the final state. how did they merge over time?

In [48]:
# automatically reload imports before executing any line in case you changed something
%load_ext autoreload
%autoreload 2

In [416]:
import utils
import read_graph
import networkx as nx
import pandas as pd
import pickle
import plotly.express as px
import plotly.graph_objs as go
import numpy as np
from tqdm import tqdm

## Load Data

In [83]:
all_postings = utils.read_all_postings()
all_votes = utils.read_all_votes()

## Get Events

In [420]:
events = all_postings.groupby(by=['ID_Article', 'ArticlePublishingDate']).count().sort_values(by='ID_Posting', ascending=False)['ID_Posting']
qtop = np.quantile(events, 0.995)
big_events = events[events > qtop]
big_events.shape

(22,)

## Calculate Metrics

- For each event (article)  
    - For each community  
        - calculate community metric  
        
Then plot each article in order with box plots for community metric

In [451]:
result = {}

# loop this over all relevant articles!
for art_idx in tqdm([x[0] for x in big_events.index]):
    posts = all_postings[all_postings.ID_Article == art_idx]
    interactions = read_graph.get_all_users_interactions(posts, all_votes, salvage_original_node_ids=True)
    communities = utils.get_communities(interactions, min_size=100)
    
    # for com in communies
    com_metrics = []
    for com in communities:
        com_interactions = interactions.subgraph(com)
        metric = len(com_interactions.edges) / len(com_interactions.nodes)
        # metric = len(com_interactions.edges)
        com_metrics.append(metric)
    
    result[art_idx] = com_metrics

100%|██████████| 22/22 [09:40<00:00, 26.38s/it]


In [452]:
# prepare viz
viz_arr = [
    (all_postings[all_postings.ID_Article == x]['ArticlePublishingDate'].iloc[0],
     np.mean(result[x]), 
     min(result[x]), 
     max(result[x]),
     all_postings[all_postings.ID_Article == x]['ArticleTitle'].iloc[0])
    for x in result.keys() if len(result[x]) > 0]

viz_df = pd.DataFrame(viz_arr, columns = ['date', 'mean', 'min', 'max', 'title'])
viz_df = viz_df.sort_values(by='date')

In [453]:
# visualize
x = list(viz_df['date'])
y = list(viz_df['mean'])
y_upper = list(viz_df['max'])
y_lower = list(viz_df['min'])
titles = list(viz_df['title'])

fig = go.Figure([
    go.Scatter(
        x=x,
        y=y,
        line=dict(color='rgb(0,100,80)'),
        mode='lines',
        name='interactions'
    ),
    go.Scatter(
        x=x+x[::-1], # x, then x reversed
        y=y_upper+y_lower[::-1], # upper, then lower reversed
        fill='toself',
        fillcolor='rgba(0,100,80,0.2)',
        line=dict(color='rgba(255,255,255,0)'),
        hoverinfo='skip',
        showlegend=False
    )
])

fig.update_layout(
    yaxis_title='Interactions',
    title='Average Community Interactions',
    hovermode='x unified'
)

fig.show()