# Narreme Visualization

This second part of the code will focus on enphasizing the visualization and exploration of the different documents and topics we extracted from the previous notebook. By doing those we can understand the linking between each documents so that to create narrative connections between them.

This notebook is part of a master's thesis project in Digital Interaction Design at Politecnico di Milano, by Federico Denni.

In [1]:
from IPython.display import clear_output
!pip install pyvis networkx seaborn spacy
!python -m spacy download xx_sent_ud_sm
clear_output()

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import spacy #nlp
from spacy import displacy #spacy visualizer library
import pyvis #interactive visualization
import networkx as nx
from pyvis.network import Network
import matplotlib.pyplot as plt #plots
import seaborn as sns #make these plots nice

plt.style.use('ggplot')

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/topic-data/ready_data (1).csv
/kaggle/input/test-data/ready_data.csv


In [3]:
try:
    #Change this directory for other dataset
    df = pd.read_csv(r'/kaggle/input/topic-data/ready_data (1).csv', encoding="utf-8")
    
except FileNotFoundError:
      print("Error: file not found. Please upload the file or provide the correct path.")

In [4]:
df.head(10)

Unnamed: 0,Name,Labels,Count,Keywords,Topic,docs_1,docs_2,docs_3,docs_1 Sentiment,docs_1 Compound Score,docs_2 Sentiment,docs_2 Compound Score,docs_3 Sentiment,docs_3 Compound Score
0,-1_the_it_to_you,"""Card Game Design and Symbolism""",5706,"['symbols', 'cards', 'text', 'game', 'editing'...",0,This is for everyone Just how you didn’t expec...,#4 is definitely one of the biggest criticisms...,"On adding symbols to the card, I disagree that...",Neutral,-0.032442,Neutral,-0.051918,Negative,-0.08475
1,Family and Games,Family bonding through Uno,526,"['playing', 'game', 'played', 'games', 'play',...",1,"I'm a game developer, and I realized that my i...",How me and my friends play uno,I like the card game of uno,Positive,0.083079,Neutral,0.041889,Positive,0.136026
2,Rules of Games,"""Rule breaches in game design""",139,"['symbols', 'rules', 'games', 'confusing', 'pl...",2,You should be careful with the symbols (Golden...,Thanks for this list! Soooo many games don't d...,Thanks for this list! Soooo many games don't d...,Neutral,0.025101,Negative,-0.008004,Negative,-0.008004
3,Games Collection,"""Grande Jack Collection""",123,"['collezione', 'collection', 'bellissima', 'gr...",3,che bella collezione,Che bella collezione 😍😍,Che collezione 💪💪💪,Positive,0.226141,Positive,0.233989,Positive,0.224398
4,Traditional Games,Card games and techniques,77,"['playing', 'games', 'game', 'played', 'play',...",4,"​@@TheHistoryGuyChannel ​Yah, as I recall, the...","@@alymbouras it doesn't really matter, the imp...","@@alymbouras it doesn't really matter, the imp...",Positive,0.054061,Neutral,-0.02018,Neutral,-0.02018
5,Memories,Facts are relatable and true,433,"['facts', 'honestly', 'interessante', 'interes...",5,Facts,Facts,FACTS,Neutral,-0.030185,Neutral,-0.030185,Neutral,-0.058812
6,5_fr_bruh_frr_bruhv,"""Fr Bruh""",307,"['fr', 'frr', 'ft', 'frrr', 'fe', 'hi', 'fcf',...",6,Fr,Fr,Fr,Negative,-0.060402,Negative,-0.060402,Negative,-0.060402
7,6_the_to_of_and,Card games without indexes,7791,"['playing', 'cards', 'game', 'card', 'games', ...",7,It seems that every country has a card game of...,One thing most cards are missing (which is wei...,@@StefanLopuszanski I have looked you up and y...,Neutral,-0.056865,Negative,-0.05464,Negative,-0.16614


---

Now that is loaded, lets proceed to analyze and clean the different elements

In [5]:
#let's split the docs_n in sentences
nlp = spacy.load("xx_sent_ud_sm")#translate all sentences in english before using this

# Define a function to tokenize text using spaCy
def tokenize(text):
    if isinstance(text, str):
        doc = nlp(text)
        return [token.text for token in doc]
    return []

# Define a function to split text into sentences using spaCy
def split_sentences(text):
    if isinstance(text, str):
        doc = nlp(text)
        return [sent.text for sent in doc.sents]
    return []

# Apply the sentence splitting function to the new columns
df['docs_1_sentences'] = df['docs_1'].apply(split_sentences)
df['docs_2_sentences'] = df['docs_2'].apply(split_sentences)
df['docs_3_sentences'] = df['docs_3'].apply(split_sentences)

df.to_csv(r'/kaggle/working/sentences.csv', index=False)

df.head(10)

Unnamed: 0,Name,Labels,Count,Keywords,Topic,docs_1,docs_2,docs_3,docs_1 Sentiment,docs_1 Compound Score,docs_2 Sentiment,docs_2 Compound Score,docs_3 Sentiment,docs_3 Compound Score,docs_1_sentences,docs_2_sentences,docs_3_sentences
0,-1_the_it_to_you,"""Card Game Design and Symbolism""",5706,"['symbols', 'cards', 'text', 'game', 'editing'...",0,This is for everyone Just how you didn’t expec...,#4 is definitely one of the biggest criticisms...,"On adding symbols to the card, I disagree that...",Neutral,-0.032442,Neutral,-0.051918,Negative,-0.08475,[This is for everyone Just how you didn’t expe...,"[#, 4 is definitely one of the biggest critici...","[On adding symbols to the card, I disagree tha..."
1,Family and Games,Family bonding through Uno,526,"['playing', 'game', 'played', 'games', 'play',...",1,"I'm a game developer, and I realized that my i...",How me and my friends play uno,I like the card game of uno,Positive,0.083079,Neutral,0.041889,Positive,0.136026,"[I'm a game developer, and I realized that my ...",[How me and my friends play uno],[I like the card game of uno]
2,Rules of Games,"""Rule breaches in game design""",139,"['symbols', 'rules', 'games', 'confusing', 'pl...",2,You should be careful with the symbols (Golden...,Thanks for this list! Soooo many games don't d...,Thanks for this list! Soooo many games don't d...,Neutral,0.025101,Negative,-0.008004,Negative,-0.008004,[You should be careful with the symbols (Golde...,"[Thanks for this list!, Soooo many games don't...","[Thanks for this list!, Soooo many games don't..."
3,Games Collection,"""Grande Jack Collection""",123,"['collezione', 'collection', 'bellissima', 'gr...",3,che bella collezione,Che bella collezione 😍😍,Che collezione 💪💪💪,Positive,0.226141,Positive,0.233989,Positive,0.224398,[che bella collezione],[Che bella collezione 😍😍],[Che collezione 💪💪💪]
4,Traditional Games,Card games and techniques,77,"['playing', 'games', 'game', 'played', 'play',...",4,"​@@TheHistoryGuyChannel ​Yah, as I recall, the...","@@alymbouras it doesn't really matter, the imp...","@@alymbouras it doesn't really matter, the imp...",Positive,0.054061,Neutral,-0.02018,Neutral,-0.02018,"[​@@TheHistoryGuyChannel ​Yah, as I recall, th...","[@@alymbouras it doesn't really matter, the im...","[@@alymbouras it doesn't really matter, the im..."
5,Memories,Facts are relatable and true,433,"['facts', 'honestly', 'interessante', 'interes...",5,Facts,Facts,FACTS,Neutral,-0.030185,Neutral,-0.030185,Neutral,-0.058812,[Facts],[Facts],[FACTS]
6,5_fr_bruh_frr_bruhv,"""Fr Bruh""",307,"['fr', 'frr', 'ft', 'frrr', 'fe', 'hi', 'fcf',...",6,Fr,Fr,Fr,Negative,-0.060402,Negative,-0.060402,Negative,-0.060402,[Fr],[Fr],[Fr]
7,6_the_to_of_and,Card games without indexes,7791,"['playing', 'cards', 'game', 'card', 'games', ...",7,It seems that every country has a card game of...,One thing most cards are missing (which is wei...,@@StefanLopuszanski I have looked you up and y...,Neutral,-0.056865,Negative,-0.05464,Negative,-0.16614,[It seems that every country has a card game o...,[One thing most cards are missing (which is we...,[@@StefanLopuszanski I have looked you up and ...


In [6]:
#lets now identify and extract the entities type in the sentences
nlp = spacy.load("en_core_web_sm")#translate all sentences in english before using this

df = pd.read_csv(r'/kaggle/working/sentences.csv', encoding="utf-8")
columns_to_extract = ['docs_1_sentences', 'docs_2_sentences', 'docs_3_sentences']

def get_entities_and_keywords(sentences, keywords):
    entities = []
    for sentence in sentences:
        doc = nlp(sentence)
        sentence_entities = [ent.label_ for ent in doc.ents]
        for keyword in keywords:
            if keyword in sentence:
                sentence_entities.append(keyword)
                
        entities.append(sentence_entities)
    return entities

# Apply the function to each column
df['doc_1_entity_list'] = df.apply(lambda row: get_entities_and_keywords(eval(row['docs_1_sentences']), eval(row['Keywords'])), axis=1)
df['doc_2_entity_list'] = df.apply(lambda row: get_entities_and_keywords(eval(row['docs_2_sentences']), eval(row['Keywords'])), axis=1)
df['doc_3_entity_list'] = df.apply(lambda row: get_entities_and_keywords(eval(row['docs_3_sentences']), eval(row['Keywords'])), axis=1)

df.to_csv(r'/kaggle/working/ent_list.csv', index=False)
df.head(10)

Unnamed: 0,Name,Labels,Count,Keywords,Topic,docs_1,docs_2,docs_3,docs_1 Sentiment,docs_1 Compound Score,docs_2 Sentiment,docs_2 Compound Score,docs_3 Sentiment,docs_3 Compound Score,docs_1_sentences,docs_2_sentences,docs_3_sentences,doc_1_entity_list,doc_2_entity_list,doc_3_entity_list
0,-1_the_it_to_you,"""Card Game Design and Symbolism""",5706,"['symbols', 'cards', 'text', 'game', 'editing'...",0,This is for everyone Just how you didn’t expec...,#4 is definitely one of the biggest criticisms...,"On adding symbols to the card, I disagree that...",Neutral,-0.032442,Neutral,-0.051918,Negative,-0.08475,['This is for everyone Just how you didn’t exp...,"['#', '4 is definitely one of the biggest crit...","['On adding symbols to the card, I disagree th...","[[PERSON, this], [PERSON, your], [], [this], [...","[[], [CARDINAL, PERSON], [], [CARDINAL, text],...","[[symbols, text, card, should], [ORG, cards, t..."
1,Family and Games,Family bonding through Uno,526,"['playing', 'game', 'played', 'games', 'play',...",1,"I'm a game developer, and I realized that my i...",How me and my friends play uno,I like the card game of uno,Positive,0.083079,Neutral,0.041889,Positive,0.136026,"[""I'm a game developer, and I realized that my...",['How me and my friends play uno'],['I like the card game of uno'],"[[PERSON, playing, game, games, play, watching...","[[play, uno]]","[[game, like, uno]]"
2,Rules of Games,"""Rule breaches in game design""",139,"['symbols', 'rules', 'games', 'confusing', 'pl...",2,You should be careful with the symbols (Golden...,Thanks for this list! Soooo many games don't d...,Thanks for this list! Soooo many games don't d...,Neutral,0.025101,Negative,-0.008004,Negative,-0.008004,['You should be careful with the symbols (Gold...,"['Thanks for this list!', ""Soooo many games do...","['Thanks for this list!', ""Soooo many games do...","[[symbols, should], [CARDINAL, rule], [symbols...","[[], [CARDINAL, games, game, rule], [confusing...","[[], [CARDINAL, games, game, rule], [confusing..."
3,Games Collection,"""Grande Jack Collection""",123,"['collezione', 'collection', 'bellissima', 'gr...",3,che bella collezione,Che bella collezione 😍😍,Che collezione 💪💪💪,Positive,0.226141,Positive,0.233989,Positive,0.224398,['che bella collezione'],['Che bella collezione 😍😍'],['Che collezione 💪💪💪'],"[[ORG, collezione]]",[[collezione]],[[collezione]]
4,Traditional Games,Card games and techniques,77,"['playing', 'games', 'game', 'played', 'play',...",4,"​@@TheHistoryGuyChannel ​Yah, as I recall, the...","@@alymbouras it doesn't really matter, the imp...","@@alymbouras it doesn't really matter, the imp...",Positive,0.054061,Neutral,-0.02018,Neutral,-0.02018,"['\u200b@@TheHistoryGuyChannel \u200bYah, as I...","[""@@alymbouras it doesn't really matter, the i...","[""@@alymbouras it doesn't really matter, the i...","[[CARDINAL, played, play], [], [WORK_OF_ART, p...","[[], [ORG, games, game, cards]]","[[], [ORG, games, game, cards]]"
5,Memories,Facts are relatable and true,433,"['facts', 'honestly', 'interessante', 'interes...",5,Facts,Facts,FACTS,Neutral,-0.030185,Neutral,-0.030185,Neutral,-0.058812,['Facts'],['Facts'],['FACTS'],[[]],[[]],[[ORG]]
6,5_fr_bruh_frr_bruhv,"""Fr Bruh""",307,"['fr', 'frr', 'ft', 'frrr', 'fe', 'hi', 'fcf',...",6,Fr,Fr,Fr,Negative,-0.060402,Negative,-0.060402,Negative,-0.060402,['Fr'],['Fr'],['Fr'],[[]],[[]],[[]]
7,6_the_to_of_and,Card games without indexes,7791,"['playing', 'cards', 'game', 'card', 'games', ...",7,It seems that every country has a card game of...,One thing most cards are missing (which is wei...,@@StefanLopuszanski I have looked you up and y...,Neutral,-0.056865,Negative,-0.05464,Negative,-0.16614,['It seems that every country has a card game ...,['One thing most cards are missing (which is w...,['@@StefanLopuszanski I have looked you up and...,"[[game, card, this], [ORG], [GPE, PERSON, game...","[[CARDINAL, playing, cards, card, play], [card...","[[], [PERCENT], [], [], [ORG], [ORG, this], [O..."


In [12]:
#Let's now prepare the data for the network visualization
df = pd.read_csv(r'/kaggle/working/ent_list.csv', encoding="utf-8")

# Function to create source and target columns based on entity lists
def create_source_target(df, entity_column):
    source = []
    target = []
    for index, row in df.iterrows():
        entities = eval(row[entity_column])
        for i in range(len(entities) - 1):
            for j in range(i + 1, len(entities)):
                source.append(str(entities[i]))
                target.append(str(entities[j]))
    return pd.Series([source, target])

# Create source and target columns for each document column
df[['doc_1_source', 'doc_1_target']] = df.apply(lambda row: create_source_target(df, 'doc_1_entity_list'), axis=1)
df[['doc_2_source', 'doc_2_target']] = df.apply(lambda row: create_source_target(df, 'doc_2_entity_list'), axis=1)
df[['doc_3_source', 'doc_3_target']] = df.apply(lambda row: create_source_target(df, 'doc_3_entity_list'), axis=1)

In [13]:
# Combine all source and target columns into a single DataFrame
source_target_df = pd.DataFrame({
    'source': df['doc_1_source'].explode().tolist() + df['doc_2_source'].explode().tolist() + df['doc_3_source'].explode().tolist(),
    'target': df['doc_1_target'].explode().tolist() + df['doc_2_target'].explode().tolist() + df['doc_3_target'].explode().tolist()
})

source_target_df.head(10)

Unnamed: 0,source,target
0,"['PERSON', 'this']","['PERSON', 'your']"
1,"['PERSON', 'this']",[]
2,"['PERSON', 'this']",['this']
3,"['PERSON', 'this']",['TIME']
4,"['PERSON', 'this']",['PERSON']
5,"['PERSON', 'this']",[]
6,"['PERSON', 'this']",['WORK_OF_ART']
7,"['PERSON', 'your']",[]
8,"['PERSON', 'your']",['this']
9,"['PERSON', 'your']",['TIME']


In [14]:
# Create a graph using networkx
G = nx.from_pandas_edgelist(source_target_df, 'source', 'target')

# Visualize the graph using pyvis
net = Network(notebook=True)
net.from_nx(G)

# Save the network visualization to an HTML file
net.save_graph('entity_network.html')
net.show('entity_network.html')

entity_network.html
