# Adjacency Matrices

In this Notebook, we create the adjacency matrices for the creation of the graph. Each matrix contains the Retweet Network in windows of 3 days between April 28 and June 27 of 2023

In [2]:
import pandas as pd
import numpy as np
import pickle
import os
from tqdm import tqdm
import scipy.sparse as sp
from scipy.sparse import lil_matrix
from scipy.sparse import csr_matrix
from scipy.sparse import find
import scipy.sparse

Here We import the necessary data and then look it up

In [3]:
tweets = pd.read_pickle('../fcastrillon/Data/tweets_lite.pkl')

In [4]:
print('Shape:', tweets.shape)
tweets.head()

Shape: (45330426, 4)


Unnamed: 0,Author ID,Date,Reference Type,Referenced Tweet Author ID
0,1.000014e+18,2021/06/28 08:17:49,retweeted,352373166.0
1,1.000014e+18,2021/06/25 12:00:06,retweeted,14834302.0
2,1.000014e+18,2021/06/25 11:52:30,retweeted,528290945.0
3,1.000014e+18,2021/06/24 17:49:16,retweeted,753376280.0
4,1.000014e+18,2021/06/24 15:21:04,retweeted,132102878.0


In [5]:
# We define a function which returns a Boolean specifying if matrix is Non Zero
def is_matrix_nonzero(matrix):
    return len(matrix.nonzero()[0]) > 0

We create a Dictionary with all the Author IDs and their indexes in the Dataframe. This will help us query the Dataframe for the Tweets and ReTweets of each user

In [57]:
tweets['Date'] = pd.to_datetime(tweets['Date'], errors = 'coerce')

# List of Twitter users
users = np.unique(tweets[['Author ID']].values)
users = [ int(x) for x in users ]

# Dates of the Paro Nacional
v1_start = '2021-04-28 00:00:00'
v1_end = '2021-06-27 00:00:00'
date1 = pd.date_range(start = v1_start, end = v1_end, freq = 'D')

v2_start = '2021-04-30 23:59:59'
v2_end = '2021-06-29 23:59:59'
date2 = pd.date_range(start = v2_start, end = v2_end, freq = 'D')

user_indices = {user: idx for idx, user in enumerate(users)}
datestr = list(date2.strftime("%d-%m-%Y"))

In [7]:
# We save this file for further usage
with open('../fcastrillon/Data/user_indices', 'wb') as file:
    pickle.dump(user_indices, file)

In this _for loop_ we create the adjacency matrix for constructing the graph.

Each cell _RT<sub>i,j</sub>_ is the amount of Tweets the _i_ user Retweeted from the _j_ user.

This process is done for all the tweets done in intervals of 3 days during the Paro Nacional.

The Adjacency Matrix will be stored in the Matrices folder of Data

In [58]:
k = 0
os.chdir('../Matrices/')
for start_date, end_date in tqdm(zip(date1, date2)):
    # get tweets by current day between start_date and end_date
    test = tweets[(tweets['Date'] >= start_date) & (tweets['Date'] <= end_date)]

    # 'rts' dataframe contains the Author ID and the Referenced Author ID in the 
    # timeframe we are interested.
    rts = test.loc[(test["Reference Type"] == "retweeted") & (test["Referenced Tweet Author ID"].isin(users)),
                                                    ["Author ID", "Referenced Tweet Author ID"]]

    # We rename the 'rts' dataframe columns for code easyness now.
    new_column_names = {'Author ID':'user1', 'Referenced Tweet Author ID':'user2'}
    rts = rts.rename(columns = new_column_names)
    
    # Because of the data structure, we use a sparse matrix.
    A = sp.csr_matrix((len(users), len(users)), dtype = int)
    lil = lil_matrix(A.shape)

    for row in rts.itertuples(index = False):
        user1, user2 = row.user1, row.user2
    
        idx_user1 = user_indices[user1]
        idx_user2 = user_indices[user2]

        lil[idx_user1, idx_user2] += 1
        lil[idx_user2, idx_user1] += 1

    if is_matrix_nonzero(lil):
        print("Matrix is nonzero")
    else:
        print("Matrix is zero")
    
    # This matrices are sparse. Therefore, we save it as such.
    A = lil.tocsr()
    filename = f'adj_end_of_{datestr[k]}.csr'
    sp.save_npz(filename, A, compressed = False)
    k += 1

1it [00:09,  9.03s/it]

Matrix is nonzero


2it [00:19,  9.77s/it]

Matrix is nonzero


3it [00:31, 10.72s/it]

Matrix is nonzero


4it [00:43, 11.52s/it]

Matrix is nonzero


5it [00:57, 12.24s/it]

Matrix is nonzero


6it [01:11, 12.98s/it]

Matrix is nonzero


7it [01:25, 13.28s/it]

Matrix is nonzero


8it [01:37, 12.84s/it]

Matrix is nonzero


9it [01:48, 12.10s/it]

Matrix is nonzero


10it [01:58, 11.57s/it]

Matrix is nonzero


11it [02:08, 11.17s/it]

Matrix is nonzero


12it [02:18, 10.69s/it]

Matrix is nonzero


13it [02:27, 10.08s/it]

Matrix is nonzero


14it [02:35,  9.52s/it]

Matrix is nonzero


15it [02:43,  9.13s/it]

Matrix is nonzero


16it [02:51,  8.81s/it]

Matrix is nonzero


17it [02:59,  8.43s/it]

Matrix is nonzero


18it [03:06,  8.17s/it]

Matrix is nonzero


19it [03:14,  7.97s/it]

Matrix is nonzero


20it [03:22,  8.05s/it]

Matrix is nonzero


21it [03:30,  8.08s/it]

Matrix is nonzero


22it [03:38,  8.04s/it]

Matrix is nonzero


23it [03:46,  7.98s/it]

Matrix is nonzero


24it [03:54,  7.93s/it]

Matrix is nonzero


25it [04:02,  8.08s/it]

Matrix is nonzero


26it [04:11,  8.32s/it]

Matrix is nonzero


27it [04:20,  8.55s/it]

Matrix is nonzero


28it [04:29,  8.66s/it]

Matrix is nonzero


29it [04:38,  8.87s/it]

Matrix is nonzero


30it [04:48,  8.98s/it]

Matrix is nonzero


31it [04:57,  9.03s/it]

Matrix is nonzero


32it [05:05,  8.83s/it]

Matrix is nonzero


33it [05:13,  8.56s/it]

Matrix is nonzero


34it [05:20,  8.20s/it]

Matrix is nonzero


35it [05:27,  7.75s/it]

Matrix is nonzero


36it [05:33,  7.29s/it]

Matrix is nonzero


37it [05:39,  6.79s/it]

Matrix is nonzero


38it [05:44,  6.40s/it]

Matrix is nonzero


39it [05:50,  6.08s/it]

Matrix is nonzero


40it [05:55,  5.89s/it]

Matrix is nonzero


41it [06:01,  5.84s/it]

Matrix is nonzero


42it [06:07,  5.83s/it]

Matrix is nonzero


43it [06:12,  5.79s/it]

Matrix is nonzero


44it [06:18,  5.61s/it]

Matrix is nonzero


45it [06:22,  5.32s/it]

Matrix is nonzero


46it [06:27,  5.07s/it]

Matrix is nonzero


47it [06:31,  4.87s/it]

Matrix is nonzero


48it [06:36,  4.80s/it]

Matrix is nonzero


49it [06:40,  4.72s/it]

Matrix is nonzero


50it [06:45,  4.61s/it]

Matrix is nonzero


51it [06:49,  4.48s/it]

Matrix is nonzero


52it [06:53,  4.32s/it]

Matrix is nonzero


53it [06:57,  4.35s/it]

Matrix is nonzero


54it [07:02,  4.46s/it]

Matrix is nonzero


55it [07:07,  4.58s/it]

Matrix is nonzero


56it [07:11,  4.50s/it]

Matrix is nonzero


57it [07:15,  4.32s/it]

Matrix is nonzero


58it [07:19,  4.22s/it]

Matrix is nonzero


59it [07:23,  4.13s/it]

Matrix is nonzero


60it [07:27,  4.07s/it]

Matrix is nonzero


61it [07:31,  7.40s/it]

Matrix is nonzero





In [48]:
datestr[0]

'30-04'