## For making and saving distribution graphs of networks

For each network will make two distribution graphs

### Node title embeddings

Each node in our networks have title embeddings which we use in both of our measures (from micheles algorithm(s)).

Made distribution graphs, where we can see if there are concentrations of common title embeddings.

### Edge sentiments

Each edge in our networks have a sentiment value which we use in our correlation measure.

Made distribution bar plots.

### Run below to create plots and save them

In [9]:
import sys
sys.path.append('../')
from py_files.distribution_graphs import *

In [10]:
# list of subreddits and paths to the folder containign their .gexf files
# list is from newest to oldest scraped subreddits with
network_list = [['News', '../data/date_folders/april_24/graphs/News.gexf'],
                ['Communism', '../data/date_folders/april_24/graphs/communism.gexf'],
                ['Democrats', '../data/date_folders/april_23/graphs/democrats.gexf'],
                ['Political Discussion', '../data/date_folders/april_23/graphs/PoliticalDiscussion.gexf'],
                ['Republican', '../data/date_folders/april_23/graphs/Republican.gexf'],
                ['UK Politics', '../data/date_folders/april_23/graphs/ukpolitics.gexf'],
                ['World News', '../data/date_folders/april_23/graphs/worldnews.gexf'],
                ['Anti-Work', '../data/date_folders/april_17/graphs/antiwork.gexf'],
                ['Politics', '../data/date_folders/march_23/graphs/politics.gexf']
                # ['Gaming', '../data/date_folders/april_4/graphs/gaming.gexf'],
                # ['Music', '../data/date_folders/april_2/graphs/2apr_2Music.gexf'],
                # ['FIFA', '../data/date_folders/april_18/graphs/FIFA.gexf']
                ]

### Creating and saving visualizations

In [None]:
for title, filepath in network_list:
    do_both_save(title, filepath)

## Creating grid visualizations

In [11]:
# general seaborn settings
import matplotlib.pyplot as plt

sns.set_style('darkgrid')
sns.set(rc={"figure.dpi":300, 'savefig.dpi':300})

### Title Embeddings

In [None]:
fig, axes = plt.subplots(3, 3, figsize=(20, 18))
fig.suptitle('Distributions of Title Embeddings', size=35, y=.925)

for idx, list_ in enumerate(network_list):
    title, path = list_[0], list_[1]
    match idx:
        case 0: grid_coords = axes[0, 0]
        case 1: grid_coords = axes[0, 1]
        case 2: grid_coords = axes[0, 2]
        case 3: grid_coords = axes[1, 0]
        case 4: grid_coords = axes[1, 1]
        case 5: grid_coords = axes[1, 2]
        case 6: grid_coords = axes[2, 0]
        case 7: grid_coords = axes[2, 1]
        case 8: grid_coords = axes[2, 2]
        case _: print('Too many networks!')

    title_embeddings = get_title_embeddings(path)

    title_plot = sns.kdeplot(ax=grid_coords,
                             data=title_embeddings,
                             bw_adjust=.3) # bw_adjust to reduce the line smoothing to retain more information at the cost of prettiness)

    title_plot.set(ylabel=None, title=title, xlim=(-11.5, 16.5), ylim=(0,.18))
    
    # setting x and y labels for specific plots for more elegance
    if idx == 7:
        title_plot.set_xlabel('Title Embeddings', size=25)
    if idx == 3:
        title_plot.set_ylabel('Density', size=25)

# fig = title_plot.get_figure()
# fig.savefig('../data/plots/gridplot_titles.png', bbox_inches='tight')

### Comment Sentiments

In [None]:
fig, axes = plt.subplots(3, 3, figsize=(20, 18))
fig.suptitle('Distributions of Comment Sentiments', size=35, y=.925)

for idx, list_ in enumerate(network_list):
    title, path = list_[0], list_[1]
    match idx:
        case 0: grid_coords = axes[0, 0]
        case 1: grid_coords = axes[0, 1]
        case 2: grid_coords = axes[0, 2]
        case 3: grid_coords = axes[1, 0]
        case 4: grid_coords = axes[1, 1]
        case 5: grid_coords = axes[1, 2]
        case 6: grid_coords = axes[2, 0]
        case 7: grid_coords = axes[2, 1]
        case 8: grid_coords = axes[2, 2]
        case _: print('Too many networks!')

    sentiments = get_comment_sentiments(path)

    sentiment_plot = sns.histplot(ax=grid_coords,
                                  data=sentiments,
                                  stat='density',
                                  binrange=(-1,1),
                                  bins=11) # bin number should be odd so that there is 1 bin in the middle

    sentiment_plot.set(ylabel=None, title=title, xlim=(-1.02, 1.02), ylim=(0,4))

    # setting x and y labels for specific plots for more elegance
    if idx == 7:
        sentiment_plot.set_xlabel('Comment Sentiment Score', size=25)
    if idx == 3:
        sentiment_plot.set_ylabel('Density', size=25)

# fig = sentiment_plot.get_figure()
# fig.savefig('../data/plots/gridplot_sentiments.png', bbox_inches='tight')

### Testing visualizations

In [None]:
for i in get_comment_sentiments(network_list[0][0], network_list[0][1]):
    if i > 0.9:
        print(i)
    if i < -0.9:
        print(i)

In [None]:
do_both_save(network_list[8][0], network_list[8][1])