# Practice Session 04: Networks from text

Author: <font color="blue">Nil Tomas Plans</font>

E-mail: <font color="blue">nil.tomas01@estudiant.upf.edu</font>

Date: <font color="blue">06/10/2023</font>

# 1. Create the directed mention network

In [6]:
import io
import json
import gzip
import csv
import re

from IPython.display import Image

In [11]:
# Leave this code as-is

# Input file
COMPRESSED_INPUT_FILENAME = "CovidLockdownCatalonia.json.gz"

# These are the output files, leave as-is
OUTPUT_ALL_EDGES_FILENAME = "CovidLockdownCatalonia.csv"
OUTPUT_FILTERED_EDGES_FILENAME = "CovidLockdownCatalonia-min-weight-filtered.csv"
OUTPUT_CO_MENTIONS_FILENAME = "CovidLockdownCatalonia-co-mentions.csv"

## 1.1. Extract mentions

In [12]:
# Leave this code as-is

def extract_mentions(text):
    return re.findall("@([a-zA-Z0-9_]{5,20})", text)

print(extract_mentions("RT @DiariDeSabadell: check this post by @EspaiNaturaSbd"))

['DiariDeSabadell', 'EspaiNaturaSbd']


## 1.2. Count mentions

In [13]:
#code to read the compressed input file and create the mentions_counter dictionary.
mentions_counter = {}
with gzip.open(COMPRESSED_INPUT_FILENAME, "rt", encoding="utf-8") as input_file:
    for line in input_file:
        tweet = json.loads(line)
        author = tweet["user"]["screen_name"]
        message = tweet["full_text"]
        
        mentions = extract_mentions(message)
        for mention in mentions:
            key = (author,mention)
            if key in mentions_counter:
                mentions_counter[key] += 1
            else:
                mentions_counter[key] = 1

print (mentions_counter[("BCN_Mobilitat", "TMBinfo")])


8


In [46]:
#code to print all the pairs of accounts (u,v) in which account u mentioned account v, and account v mentioned account u.
mentions_counter = {}
mentioned_each_other = set()

with gzip.open(COMPRESSED_INPUT_FILENAME, "rt", encoding="utf-8") as input_file:
    for line in input_file:
        tweet = json.loads(line)
        author = tweet["user"]["screen_name"]
        message = tweet["full_text"]
        
        mentions = extract_mentions(message)
        
        for mention in mentions:
            key = (author, mention)
            re_key = (mention, author) #creo una key a l'inrevés
            
            
            if re_key in mentions_counter:#comprovem que la key inversa exista
                if re_key not in mentioned_each_other: #si aun no se han mencionado
                    print(f"La cuenta @ {author} y @{mention} se han mencionado mútuamente")#imprimimos
                    mentioned_each_other.add(re_key)#i ara l'afegim
            else:#si no existeix la key inversa
                if key in mentions_counter: #actualitzem el diccionari
                    mentions_counter[key] += 1
                else:
                    mentions_counter[key] = 1




La cuenta @ CanalTerrassa y @eseiaat_upc se han mencionado mútuamente
La cuenta @ EspaiNaturaSbd y @DiariDeSabadell se han mencionado mútuamente
La cuenta @ infolliteras y @infolliteras se han mencionado mútuamente
La cuenta @ MuseuMaritim y @MuseuMaritim se han mencionado mútuamente
La cuenta @ LaVanguardia y @LaVanguardia se han mencionado mútuamente
La cuenta @ josepjover y @josepjover se han mencionado mútuamente
La cuenta @ dmegiasuoc y @AuntySue se han mencionado mútuamente
La cuenta @ TGNAjuntament y @TGNAjuntament se han mencionado mútuamente
La cuenta @ manelmarquez y @manelmarquez se han mencionado mútuamente
La cuenta @ armayones y @armayones se han mencionado mútuamente
La cuenta @ diaridtarragona y @diaridtarragona se han mencionado mútuamente
La cuenta @ nlbigas y @bbglab se han mencionado mútuamente
La cuenta @ OscarAllue26 y @Laporteriabtv se han mencionado mútuamente
La cuenta @ JuntsxCatBCN y @elsa_artadi se han mencionado mútuamente
La cuenta @ MargaXrepublica y @Mar

In [15]:
# Leave this code as-is

lines_written = 0
with io.open(OUTPUT_ALL_EDGES_FILENAME, "w") as output_file:
    writer = csv.writer(output_file, delimiter='\t', quotechar='"', lineterminator='\n')
    writer.writerow(["Source", "Target", "Weight"])
    for key in mentions_counter:
        author = key[0]
        mention = key[1]
        weight = mentions_counter[key]
        writer.writerow([author, mention, weight])
        lines_written += 1
        
print("Wrote %d lines to file %s" % (lines_written, OUTPUT_ALL_EDGES_FILENAME))

Wrote 34025 lines to file CovidLockdownCatalonia.csv


In [16]:
#code to create a file named OUTPUT_FILTERED_EDGES_FILENAME containing all (author, mention) pairs with a value greater or equal to 2
lines_written = 0
with io.open(OUTPUT_FILTERED_EDGES_FILENAME, "w") as output_file:
    writer = csv.writer(output_file, delimiter='\t', quotechar='"', lineterminator='\n')
    writer.writerow(["Source", "Target", "Weight"])
    for key in mentions_counter:
        author = key[0]
        mention = key[1]
        weight = mentions_counter[key]
        if weight >=2:#si es cumpleix la condició que es mencionen més de 2 cops
            writer.writerow([author, mention, weight])
            lines_written += 1
        
print("Wrote %d lines to file %s" % (lines_written, OUTPUT_FILTERED_EDGES_FILENAME))

Wrote 1338 lines to file CovidLockdownCatalonia-min-weight-filtered.csv


# 2. Visualize the directed mention network

## 2.1. Visualize the largest connected component


<font size="+1"> La mida d.aquesta component connexa és de 699 nodes, i el percentatge de nodes respecte el graf inicial és del 43,8%. El diametre és igual a 20 </font>

In [17]:
# Adjust width/height as needed

Image(url="mentions-largest-cc.png", width=1200)

<font size="+1">Els perfils que estan més mencionats són: el nacional, QuimTorraiPla, emergenciescat, govern,sanchezcastejon i vilaweb.
Aquests van ser molt mencionats durant el període en què es van recollir, durant la pandèmia, i per tant eren el focus de totes les informacions del moment. 
El president de Catalunya i el dEspanya, el demergenciescat o diaris digitals com són el cas del nacionalcat i vilaweb.
També podem observar usuaris que en mencionen molts altres com: emocionycambio, SpanishDan1 principalment. És possible que ho fagin per
difondre informació sigui o no certa amb lobjectiu darribar al maxim nombre de gent tant per ajudar i col·laborar o per difondre missatges de por i mites falsos</font>

## 2.2. Cluster the largest connected component


<font size="+1" >En el cluster de @salutcat, alguns altres perfils que hi estan relacionats són la Clínica Teknon i lhospital del mar, pel fet de ser hospitals i necessiten estar al dia de 
les notícies i canvis dultima hora que es decideixi des del departament de salut. O també la FececFederació (Federació catalana dentitats contra el cancer) o ajuntaments com: el de Castelldefels o Palamós.
També, necessiten estar al corrent de les informacions actualitzades per aplicar protocols de confinament com va ser el cas</font>

## 2.3. Examine degree distributions

In [9]:
# Adjust width/height as needed

display(Image(url="mentions-largest-cc-indegree.png", width=400))

display(Image(url="mentions-largest-cc-outdegree.png", width=400))

<font size="+1">Des del meu punt de vista aquests dos gràfics mostren els numero de grafs duna banda que tenen vertexs que apunten a ells i per altra banda vèrtexs que surten dells. 
Aleshores podem extreure que la majoria dels nodes tenen entre 0 i 2 edges que apunten a ells i entre 0 i 2.5 edges que surten dells .</font>

# 3. Create the undirected co-mention network

In [25]:
#create the co_mentions_counter
co_mentions_counter = {}
with gzip.open(COMPRESSED_INPUT_FILENAME, "rt", encoding="utf-8") as input_file:
    for line in input_file:
        tweet = json.loads(line)
        author = tweet["user"]["screen_name"]
        message = tweet["full_text"]

        mentions=extract_mentions(message)
        for mention1 in mentions:
            for mention2 in mentions:
                if mention1 < mention2:
                    key = (mention1, mention2)
                    if key in co_mentions_counter:#si es cumpleix la condició actualitzem el diccionari comentions_counter
                        co_mentions_counter[key] += 1
                    else:
                        co_mentions_counter[key] = 1

                    


In [26]:
# KEEP AS-IS

print(co_mentions_counter[('emergenciescat', 'govern')])

31


In [39]:
#print all pairs of accounts that have been co-mentioned 20 times or more.
for key, contar in co_mentions_counter.items(): #va iterant key i contar 
    account1,account2=key #assigno a cada account un usuari de la tupla
    if contar >= 20:
        print("co-mentioned ",account1 ,"con",account2," un total de ",contar," cops")
        

co-mentioned  VilaWeb con mossos  un total de  20  cops
co-mentioned  QuimTorraiPla con govern  un total de  92  cops
co-mentioned  sanchezcastejon con tjparfitt  un total de  28  cops
co-mentioned  elnacionalcat con joseantich  un total de  90  cops
co-mentioned  QuimTorraiPla con tjparfitt  un total de  59  cops
co-mentioned  QuimTorraiPla con sanchezcastejon  un total de  25  cops
co-mentioned  emergenciescat con govern  un total de  31  cops
co-mentioned  josepcosta con sanchezcastejon  un total de  49  cops
co-mentioned  eldiarioes con iescolar  un total de  28  cops
co-mentioned  gencat con govern  un total de  105  cops
co-mentioned  mossos con semgencat  un total de  44  cops
co-mentioned  elnacionalcat con juansrod1  un total de  30  cops
co-mentioned  QuimTorraiPla con emergenciescat  un total de  75  cops
co-mentioned  Antoni_Gelonch con sanchezcastejon  un total de  106  cops


In [42]:
lines_written = 0
with io.open(OUTPUT_CO_MENTIONS_FILENAME, "w") as output_file:
    writer = csv.writer(output_file, delimiter='\t', quotechar='"', lineterminator='\n')
    writer.writerow(["Source", "Target", "Weight"])
    for key in co_mentions_counter:
        author = key[0]
        mention = key[1]
        weight = co_mentions_counter[key]
        writer.writerow([author, mention, weight])
        lines_written += 1
        
print("Wrote %d lines to file %s" % (lines_written, OUTPUT_CO_MENTIONS_FILENAME))

Wrote 7816 lines to file CovidLockdownCatalonia-co-mentions.csv


# 4. Visualize the undirected co-mention network in Cytoscape


In [43]:
# Adjust width/height as needed

Image(url="co-mentions-min-degree-15.png", width=1200)

<font size="+1">La comunitat o subgraf que he triat està situat al lateral esquerre formant una esfera. Està formada per usuaris que cadascun representa un diari, com per exemple newYorkTimes11, LeMondefr, o la BBCScotlandNews. Per tant tots estan molt relacionats entre ells, ja que tots tenen com a funció principal informar a la població. Una altra comunitat de nodes, però que no són subgrafs, són els usuaris de emergenciescat, govern, gencat, QuimTorraiPla en què tots formen part del govern de catalunya d'alguna forma o altre i per tant estan interrelacionats.</font>


# DELIVER (individually)

Deliver a zip file containing:

* Your code as a Python notebook (a `.ipynb` file).
   * Remove all unnecessary elements
   * Add comments when needed
* Any png files that you inserted in the notebook

## Extra points available

For more learning and extra points, create a file `account-type.csv` containing the type of account of the top 50 accounts with the most mentions. You can use types "journalist", "media", "politician", "government institution", "individual", "health-related", etc. which you should categorize manually. Create a visualization of the **mentions** graph either including only these 50 accounts, or including more accounts but highlighting these top 50 with colors. Use broad categories as needed and **do not worry if there are some ambiguities in the categorization,** e.g., if you are not 100% sure on whether someone should be in one category or another; just do your best.

**Note:** if you go for the extra points, add ``<font size="+2" color="blue">Additional results: account types</font>`` at the top of your notebook.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>


<font size="+2" color="#003300">I hereby declare that, except for the code provided by the course instructors, all of my code, text, and figures were produced by myself.</font>
