## Networks Notebook

This week's notebook is very simple. We will take a set of documents (eg. posts from Instagram) and create a network based on the hashtags within it.

We have some dummy data, which is some very noisy data from #foodie on Instagram, which has a lot of different languages and many messy "communities" within it. This is good for testing purposes, but you should use your own dataset.

You should have a CSV or Excel dataset, so use the appropriate function.

We have two networks to choose from, depending on what you want:
1. Create a Hashtag-to-Hashtag network (or a co-occurrence network)
2. Create a User-to-Hashtag network

Make sure to pick the right column names from the table above and put them in the right place in the function. Also, don't forget to add the filename.

Of course, I can help you create a custom network creation function.

Once you have created the network, you will find a new CSV file. You can open this in Gephi using File | Import Spreadsheet.

## Import packages and Read Data

In [23]:
# Import network utils functions and read the dataset
from drs_network import *
import os

df = pd.read_excel("../../input_data/anxiety.xlsx")

# Filter out blank cells 
df = df[df["caption"].isna() == False]

KeyError: 'caption'

## Set Parameters

In [28]:
# Documents/posts column
DOC_COL = "body" 

# User/author column
USR_COL = "author"

# Set topic
TOPIC = 'network'

# Set project
PROJECT = 'anxiety'

# Create a folder for the intermediate network data
if os.path.exists(f"../../intermediate_data/{PROJECT}_{TOPIC}"):
    pass
else:
    os.mkdir(f"../../intermediate_data/{PROJECT}_{TOPIC}")

In [27]:
f"../../intermediate_data/{PROJECT}_{TOPIC}"

'../../intermediate_data/anxiety_network'

In [29]:
# Create a hashtag-to-hashtag network
h2h = create_hashtag2hashtag_network(
    list(df[DOC_COL]),
    save_name = f"../../intermediate_data/{PROJECT}_{TOPIC}/{PROJECT}_h2h_network.csv"
)

Network file saved. Open in Gephi for further processing using File | Import Spreadsheet. This is an undirected edgelist.


In [30]:
# Create a user-to-hashtag network
u2h = create_user2hashtag_network(
    df,
    user_column = USR_COL,
    doc_column = DOC_COL, 
    save_name = f"../../intermediate_data/{PROJECT}_{TOPIC}/{PROJECT}_u2h_network.csv"
)

Network file saved. Open in Gephi for further processing using File | Import Spreadsheet. This is a directed edgelist.


## Gephi processing and file export

Here I went to Gephi and ran all the statistics and then exported the Nodes table from Gephi, where I got the modularity information. 

You can do a lot of interesting stuff with just the network statistics in Excel, but here we will replicate Freelon's (2018) measurement of proximity.

I saved my Nodes table as "Modularity_Example.csv". Recall that we already created a file called "u2h_network.csv", which we will use as our edgelist.

You can also do this with the hashtag-to-hashtag network.

In [None]:
# Compute proximity matrix
proximity_matrix = create_proximity_matrix(
    gephi_nodes_table = "./Modularity_Example.csv",
    edgelist = "h2h_network.csv", 
    save_name = "h2h_proximity_matrix.csv"

There are also other ways you can slice your datat -- maybe use the partitions on users (for example) as a variable for keyness. To do that, you will have to join your data back together.