# Visualización a través de grafos

A través de Text Network Analysis (ref: [Dmitry Paranyushkin](https://noduslabs.com/wp-content/uploads/2019/06/InfraNodus-Paranyushkin-WWW19-Conference.pdf?ref=github))

Crearemos un DataFrame con conexiones entre palabras cercanas y lo pasaremo a grafo a través de la la librería pelote.

A continuación el grafo será procesado y visualizado por ipysigma, un widget que usa sigma.js para ser visualizada en jupyter notebooks

In [9]:
%pip install ipysigma

Note: you may need to restart the kernel to use updated packages.


In [10]:
%pip install networkx

Note: you may need to restart the kernel to use updated packages.


In [11]:
%pip install pelote

Note: you may need to restart the kernel to use updated packages.


In [12]:
from pelote import edges_table_to_graph
import pandas as pd 
import numpy as np
from ipysigma import Sigma
import wordpreproc as wp
from nltk.util import ngrams
from gensim.corpora import Dictionary
import networkx as nx

In [13]:
full_data = pd.read_csv("../data/processed/balanced_train.csv",index_col=0).reset_index(drop=True)
full_files = pd.read_csv("../data/processed/balanced_train.csv",index_col=0).reset_index(drop=True).iloc[:,0]
highschool_files = full_data.query("target == 0").iloc[:,0]
college_files = full_data.query("target == 1").iloc[:,0]
adult_files = full_data.query("target == 2").iloc[:,0]
highschool_files

0      dream huge dark house random scary guy laptop ...
1      dream write essay spanish embarrass thing ever...
2      bizarre dream dreamt like friend house like th...
3      last night dreamt somehow get pregnant make gu...
4      tell crush jose water give bottle middle discu...
                             ...                        
395    really scary dream go see movie day later supp...
396    dreamt bee also vampires begin vampire others ...
397    last night pretty crazy dream dream three four...
398    last night first dream get new flavor ice crea...
399    many dream last night first dream study hall a...
Name: text_cleaned, Length: 400, dtype: object

# Ejemplo de uso de las librerías de grafos

In [14]:
# cada fila es una conexión: el nodo 1 estará conectado al nodo 2 con una fuerza de "weight"
df = pd.DataFrame(np.array([[1, 2,3], [2, 3, 3],[3,4,5]]),columns=["node1","node2","weight"])
df


Unnamed: 0,node1,node2,weight
0,1,2,3
1,2,3,3
2,3,4,5


In [15]:
prueba = edges_table_to_graph(edge_table=df,edge_source_col="node1",edge_target_col="node2",edge_weight_attr="weight",directed=True,edge_data=["weight"])
Sigma(prueba,edge_size="weight")

Sigma(nx.DiGraph with 4 nodes and 3 edges)

In [16]:
# Importamos el preprocesado
new_prep = wp.WordPrep()
new_prep.update_stopwords()

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/manuel/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [17]:
def processed_text_todf(text_proc):
    """
    procesed_text_todf(texto_procesado) --> DataFrame() 
    
    Función que convierte un texto procesado en un DataFrame con laas conexioones entr los nodos y su weight.

    Conexiones entre palabras:
    Ejemplo frase: palabra1 palabra2 palabra3 palabr4
    Peso 3: conexión entre cada palabra y su palabra a continuación palabra1-palabra2 (bigrama)
    Peso 2: palabra1-palabra3
    Peso 1: palabra1-palabra4

    (esto hace que tengan pesos menores cuanto más lejanas son)

    """
    df_data2 = pd.DataFrame()
    df_data2["word"] = text_proc[:-1]
    df_data2["n_gram"] = list(ngrams((text_proc),2))
    df_data2["n_gram"] = [el[1] for el in list(ngrams((text_proc),2))]
    df_data2["weight"] = 3

    df_data3 = pd.DataFrame()
    df_data3["word"] = text_proc[:-2]
    df_data3["n_gram"] = [el[2] for el in list(ngrams((text_proc),3))]
    df_data3["weight"] = 2

    df_data4 = pd.DataFrame()
    df_data4["word"] = text_proc[:-3]
    df_data4["n_gram"] = [el[3] for el in list(ngrams((text_proc),4))]
    df_data4["weight"] = 1

    df_all = pd.concat([df_data2,df_data3,df_data4],axis=0)
    return df_all

In [18]:
text_new = "This is a test text to by processed and visualized in a graph. This is a repeated repeated word word, here it is again word cat"
text_proc = new_prep.corpus_text_preprocessing(text_new).split()
print(text_proc)

['test', 'text', 'process', 'visualize', 'graph', 'repeat', 'repeat', 'word', 'word', 'word', 'cat']


In [19]:
edges_df = processed_text_todf(text_proc)
edges_df

Unnamed: 0,word,n_gram,weight
0,test,text,3
1,text,process,3
2,process,visualize,3
3,visualize,graph,3
4,graph,repeat,3
5,repeat,repeat,3
6,repeat,word,3
7,word,word,3
8,word,word,3
9,word,cat,3


In [20]:
prueba = edges_table_to_graph(edge_table=edges_df,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])
Sigma(prueba,edge_size="weight",node_metrics=["louvain"], node_color="louvain")

Sigma(nx.DiGraph with 8 nodes and 17 edges)

In [21]:
lista_final = []
for i in range(len(highschool_files)):
    lista_final += highschool_files[i].split()
lista_final
highschool_edges = processed_text_todf(lista_final)
highschool_edges

highschool_edges1 = processed_text_todf(highschool_files[0].split())

# Grafo básico

In [22]:
# Ajustes del algoritmo ForceAtlas2
forceatlas2_dict = {
    'adjustSizes':False,
    'barnesHutOptimize':False,
    'barnesHutTheta':0.5,
    'edgeWeightInfluence':0.5,
    'gravity':1,
    'linLogMode':False,
    'outboundAttractionDistribution':False,
    'scalingRatio':2, 
    'slowDown':10,
    'strongGravityMode':False
    }

prueba1 = edges_table_to_graph(edge_table=highschool_edges1,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])
dream1 = Sigma(prueba1,node_size=nx.betweenness_centrality(prueba1),node_metrics=["louvain"], node_color="louvain",layout_settings=forceatlas2_dict)
dream1

Sigma(nx.DiGraph with 90 nodes and 356 edges)

# Grafo estilizado

In [23]:
highschool_edges_2 = processed_text_todf(highschool_files[0].split())
prueba2 = edges_table_to_graph(edge_table=highschool_edges_2,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])
dream2 = Sigma(
    prueba2,
    node_size=nx.betweenness_centrality(prueba1),
    node_metrics=["louvain"], 
    node_color="louvain",
    layout_settings=forceatlas2_dict,
    node_label_size=nx.betweenness_centrality(prueba1),
    label_font='cursive',
    edge_color_from="source"
    )
dream2

Sigma(nx.DiGraph with 90 nodes and 356 edges)

# Varios sueños

In [16]:
lista_varios = []
for i in range(3):
    lista_varios += highschool_files[i].split()
highschool_varios = processed_text_todf(lista_varios)
highschool_varios

prueba_varios = edges_table_to_graph(edge_table=highschool_varios,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])

dream_varios = Sigma(
    prueba_varios,
    node_size=nx.betweenness_centrality(prueba_varios),
    node_metrics=["louvain"], 
    node_color="louvain",
    layout_settings=forceatlas2_dict,
    node_label_size=nx.betweenness_centrality(prueba_varios),
    label_font='cursive',
    edge_color_from="source"
    )
dream_varios

Sigma(nx.DiGraph with 178 nodes and 779 edges)

# Eliminamos los nodos con pocas conexiones para representar más sueños

## Conexiones Highschool

In [24]:
lista_varios = []
for i in range(100):
    lista_varios += highschool_files[i].split()
highschool_varios = processed_text_todf(lista_varios)

prueba_varios = edges_table_to_graph(edge_table=highschool_varios,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])

in_degree = dict(prueba_varios.in_degree())

# Filtro que elimina los nodos con menos de 70 conexiones
for node, degree in in_degree.items():
    if degree < 70:
        prueba_varios.remove_node(node)

dream_varios = Sigma(
    prueba_varios,
    node_size=nx.betweenness_centrality(prueba_varios),
    node_metrics=["louvain"], 
    node_color="louvain",
    layout_settings=forceatlas2_dict,
    node_label_size=nx.betweenness_centrality(prueba_varios),
    label_font='cursive',
    edge_color_from="source"
    )
dream_varios

Sigma(nx.DiGraph with 48 nodes and 1,221 edges)

## Conexiones College

In [18]:
lista_varios = []
for i in range(100):
    lista_varios += college_files.iloc[i].split()
college_varios = processed_text_todf(lista_varios)

prueba_varios = edges_table_to_graph(edge_table=college_varios,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])

in_degree = dict(prueba_varios.in_degree())

for node, degree in in_degree.items():
    if degree < 70:
        prueba_varios.remove_node(node)

dream_varios = Sigma(
    prueba_varios,
    node_size=nx.betweenness_centrality(prueba_varios),
    node_metrics=["louvain"], 
    node_color="louvain",
    layout_settings=forceatlas2_dict,
    node_label_size=nx.betweenness_centrality(prueba_varios),
    label_font='cursive',
    edge_color_from="source"
    )
dream_varios

Sigma(nx.DiGraph with 43 nodes and 883 edges)

## Adult

In [19]:
lista_varios = []
for i in range(len(adult_files)):
    lista_varios += adult_files.iloc[i].split()
adult_varios = processed_text_todf(lista_varios)


prueba_varios = edges_table_to_graph(edge_table=adult_varios,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])

in_degree = dict(prueba_varios.in_degree())
for node, degree in in_degree.items():
    if degree < 300:
        prueba_varios.remove_node(node)

dream_varios = Sigma(
    prueba_varios,
    node_size=nx.betweenness_centrality(prueba_varios),
    node_metrics=["louvain"], 
    node_color="louvain",
    layout_settings=forceatlas2_dict,
    node_label_size=nx.betweenness_centrality(prueba_varios),
    label_font='cursive',
    edge_color_from="source"
    )
dream_varios

Sigma(nx.DiGraph with 43 nodes and 1,762 edges)

# All

In [20]:
lista_varios = []
for i in range(len(full_files)):
    lista_varios += full_files.iloc[i].split()
full_varios = processed_text_todf(lista_varios)


prueba_varios = edges_table_to_graph(edge_table=full_varios,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])

in_degree = dict(prueba_varios.in_degree())
for node, degree in in_degree.items():
    if degree < 500:
        prueba_varios.remove_node(node)

dream_varios = Sigma(
    prueba_varios,
    node_size=nx.betweenness_centrality(prueba_varios),
    node_metrics=["louvain"], 
    node_color="louvain",
    layout_settings=forceatlas2_dict,
    node_label_size=nx.betweenness_centrality(prueba_varios),
    label_font='cursive',
    edge_color_from="source"
    )
dream_varios

Sigma(nx.DiGraph with 55 nodes and 2,935 edges)

# Función final: de texto a visualización en grafo

In [21]:
def dream_graph(your_dream_text): 
    """
    dream_graph(tu_texto_sin_procesar) --> visualización de grafo con ipysigma

    Función final que pasa de texto a visualización de grafo

    """
    processed_dream = new_prep.corpus_text_preprocessing(your_dream_text).split()
    print("Yor text processed:",processed_dream)

    your_forceatlas2_dict = {
        'adjustSizes':False,
        'barnesHutOptimize':False,
        'barnesHutTheta':0.5,
        'edgeWeightInfluence':0.5,
        'gravity':1,
        'linLogMode':False,
        'outboundAttractionDistribution':False,
        'scalingRatio':1,
        'slowDown':10,
        'strongGravityMode':False
        }

    your_dream = processed_text_todf(processed_dream)
    dream_graph = edges_table_to_graph(edge_table=your_dream,edge_source_col="word",edge_target_col="n_gram",edge_weight_attr="weight",directed=True,edge_data=["weight"])
    your_dream_graph = Sigma(
        dream_graph,
        node_size=nx.betweenness_centrality(dream_graph),
        node_metrics=["louvain"], 
        node_color="louvain",
        layout_settings=your_forceatlas2_dict,
        node_label_size=nx.betweenness_centrality(dream_graph),
        label_font='cursive',
        edge_color_from="source"
        )
    return your_dream_graph


## Probando la función final

In [22]:
# Puedes probar con otro tipo de textos, por ejemplo con un cuento de Edgar Allan Poe
dream_graph(
    """
    "Listen to me," said the Demon, as he placed his hand upon my head. 
    "There is a spot upon this accursed earth which thou hast never yet beheld 
    And if by any chance thou hast beheld it, it must have been in one of those vigorous dreams 
    which come like the Simoon upon the brain of the sleeper who hath lain down to sleep among the 
    forbidden sunbeams --among the sunbeams, I say, which slide from off the solemn columns of the melancholy temples
    in the wilderness. The region of which I speak is a dreary region in Libya, by the borders of the river Zaire. 
    And there is no quiet there, nor silence.
    "The waters of the river have a saffron and sickly hue --and they flow not onwards to the sea, 
    but palpitate forever and forever beneath the red eye of the sun with a tumultuous and convulsive motion. 
    For many miles on either side of the river's oozy bed is a pale desert of gigantic water-lilies. 
    They sigh one unto the other in that solitude, and stretch towards the heaven their long ghastly necks, 
    and nod to and fro their everlasting heads. And there is an indistinct murmur which cometh out from among them like 
    the rushing of subterrene water. And they sigh one unto the other.
    "But there is a boundary to their realm --the boundary of the dark, horrible, lofty forest. 
    There, like the waves about the Hebrides, the low underwood is agitated continually. 
    But there is no wind throughout the heaven. And the tall primeval trees rock eternally hither and 
    thither with a crashing and mighty sound. And from their high summits, one by one, drop everlasting dews. 
    And at the roots strange poisonous flowers lie writhing in perturbed slumber. And overhead, with a rustling and loud noise, 
    the grey clouds rush westwardly forever, until they roll, a cataract, over the fiery wall of the horizon. 
    But there is no wind throughout the heaven. And by the shores of the river Zaire there is neither quiet nor silence.
    "It was night, and the rain fell; and, falling, it was rain, but, having fallen, it was blood. 
    And I stood in the morass among the tall lilies, and the rain fell upon my head --and the lilies sighed one unto 
    the other in the solemnity of their desolation.
    "And, all at once, the moon arose through the thin ghastly mist, and was crimson in color. 
    And mine eyes fell upon a huge grey rock which stood by the shore of the river, and was litten by the light of the moon. 
    And the rock was grey, and ghastly, and tall, --and the rock was grey. Upon its front were characters engraven in the stone; 
    and I walked through the morass of water-lilies, until I came close unto the shore, that I might read the characters upon the stone. 
    But I could not decypher the characters. And I was going back into the morass, when the moon shone with a fuller red, and I turned and looked again upon the rock, and upon the characters --and the characters were DESOLATION.
    "And I looked upwards, and there stood a man upon the summit of the rock, and I hid myself among the water-lilies that I might discover the actions of the man. 
    And the man was tall and stately in form, and was wrapped up from his shoulders to his feet in the toga of old Rome. And the outlines of his figure were indistinct --but his features were the features of a Deity; for the mantle of the night, and of the mist, and of the moon, and of the dew, had left uncovered the features of his face. And his brow was lofty with thought, and his eye wild with care; and, in the few furrows upon his cheek I read the fables of sorrow, and weariness, and disgust with mankind, and a longing after solitude. And the moon shone upon his face, and upon the features of his face, and oh! they were more beautiful than the airy dreams which hovered about the souls of the daughters of Delos!
    "And the man sat down upon the rock, and leaned his head upon his hand, and looked out upon the desolation. 
    He looked down into the low unquiet shrubbery, and up into the tall primeval trees, and up higher at the rustling heaven, 
    and into the crimson moon. And I lay close within shelter of the lilies, and observed the actions of the man. 
    And the man trembled in the solitude --but the night waned and he sat upon the rock.
    "And the man turned his attention from the heaven, and looked out upon the dreary river Zaire, 
    and upon the yellow ghastly waters, and upon the pale legions of the water-lilies. 
    And the man listened to the sighs of the water-lilies, and of the murmur that came up from among them. 
    And I lay close within my covert and observed the actions of the man. And the man trembled in the solitude 
    --but the night waned and he sat upon the rock.
    "Then I went down into the recesses of the morass, and waded afar in among the wilderness of the lilies, 
    and called unto the hippopotami which dwelt among the fens in the recesses of the morass. And the hippopotami 
    heard my call, and came, with the behemoth, unto the foot of the rock, and roared loudly and fearfully beneath the moon. 
    And I lay close within my covert and observed the actions of the man. And the man trembled in the solitude --but the night waned and he sat upon the rock.
    "Then I cursed the elements with the curse of tumult; and a frightful tempest gathered in the heaven where before there 
    had been no wind. And the heaven became livid with the violence of the tempest --and the rain beat upon the head of the man 
    --and the floods of the river came down --and the river was tormented into foam --and the water-lilies shrieked within their beds 
    --and the forest crumbled before the wind --and the thunder rolled, --and the lightning fell --and the rock rocked to its foundation. 
    And I lay close within my covert and observed the actions of the man. And the man trembled in the solitude -- but the night waned and he sat upon the rock.
    "Then I grew angry and cursed, with the curse of silence, the river, and the lilies, and the wind, and the forest, and the heaven, 
    and the thunder, and the sighs of the water-lilies. And they became accursed and were still. And the moon ceased to totter in its 
    pathway up the heaven --and the thunder died away --and the lightning did not flash --and the clouds hung motionless --and the waters 
    sunk to their level and remained --and the trees ceased to rock --and the water-lilies sighed no more --and the murmur was heard no 
    longer from among them, nor any shadow of sound throughout the vast illimitable desert. And I looked upon the characters of the rock, 
    and they were changed --and the characters were SILENCE.
    "And mine eyes fell upon the countenance of the man, and his countenance was wan with terror. 
    And, hurriedly, he raised his head from his hand, and stood forth upon the rock, and listened. 
    But there was no voice throughout the vast illimitable desert, and the characters upon the rock were SILENCE. 
    And the man shuddered, and turned his face away, and fled afar off, and I beheld him no more."
    """
    )


Yor text processed: ['listen', 'say', 'demon', 'place', 'hand', 'upon', 'head', 'spot', 'upon', 'accursed', 'earth', 'thou', 'hast', 'never', 'yet', 'behold', 'chance', 'thou', 'hast', 'beheld', 'must', 'one', 'vigorous', 'dream', 'come', 'like', 'simoon', 'upon', 'brain', 'sleeper', 'hath', 'lain', 'sleep', 'among', 'forbidden', 'sunbeam', 'among', 'sunbeam', 'say', 'slide', 'solemn', 'column', 'melancholy', 'temples', 'wilderness', 'region', 'speak', 'dreary', 'region', 'libya', 'border', 'river', 'zaire', 'quiet', 'silence', 'water', 'river', 'saffron', 'sickly', 'hue', 'flow', 'onwards', 'sea', 'palpitate', 'forever', 'forever', 'beneath', 'red', 'eye', 'sun', 'tumultuous', 'convulsive', 'motion', 'many', 'mile', 'either', 'side', 'river', 'oozy', 'bed', 'pale', 'desert', 'gigantic', 'water', 'lily', 'sigh', 'one', 'unto', 'solitude', 'stretch', 'towards', 'heaven', 'long', 'ghastly', 'neck', 'nod', 'fro', 'everlasting', 'head', 'indistinct', 'murmur', 'cometh', 'among', 'like', 'r

Sigma(nx.DiGraph with 289 nodes and 1,499 edges)

## Ahora prueba con tus propios sueños (en inglés)

In [23]:
dream_graph(
    """
    Now is your turn, try your own dream. 
    """
    )

Yor text processed: ['turn', 'try', 'dream']


Sigma(nx.DiGraph with 3 nodes and 3 edges)