# Tables - Fake Covid-19 Dataset

## Tweets Visualization

### tabella_fake and style.css

We've used the following packages to create a table showing the Tweets and the relative Tweets links:

In [None]:
import itertools
import pandas as pd
import json
from dateutil.parser import parse
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import dash_table
import re

We've defined a function to remove URLs from the Tweet's text:

In [None]:
def remove_urls(text):
    result = re.sub(r"http\S+", "", text)
    return(result)

In order to do the classification of the tweets, we need to read the CSV file and the JSON file:

In [None]:
csv_dataframe = pd.read_csv('dataset/FINAL_fakecovid_final_filtered_dataset_clean.csv',sep=";")
csv_dataframe['tweet_id'] = csv_dataframe['tweet_id'].astype(str)
csv_list = csv_dataframe.values.tolist()
lista_unica_csv=list(itertools.chain.from_iterable(csv_list))

data = []
with open('dataset/fakecovid_result_final_translated_full.json', 'r') as f:
    for line in f:
        data.append(json.loads(line))

We're going to visualize Tweets and relative links in a table and we will classify them in two categories: "fake" and "partially false".

In [None]:
index= 0

category = []
date = []
txt = []
link = []

for element in data:
    token_id = data[index]['id_str']                          
    indice_csv = lista_unica_csv.index(token_id)   
    value_cat =  lista_unica_csv[indice_csv+1].lower()
    if value_cat == "false":
        value_cat = "fake"
    category.append(value_cat.replace(" ", ""))
    
    token=data[index]['created_at']
    d = parse(token)
    d = d.strftime('%Y/%m/%d')
    date.append(d)
    
    txt.append(remove_urls(data[index]['full_text']))
    link.append("[http://twitter.com/anyuser/status/"+data[index]['id_str']+"](http://twitter.com/anyuser/status/"+data[index]['id_str']+")")
    index=index+1

We create the Pandas DataFrame and then we work on it in order to create the table:

In [None]:
df = pd.DataFrame(
    {'Type': category,
    'Date': date,
    'Tweet': txt,
    'Link': link
    })

In order to create the table, we've used the Dash module, that allows to generate an interactive table.

In [None]:
app = dash.Dash(__name__)
#https://dash.plotly.com/datatable/filtering
app.layout = html.Div([
    dash_table.DataTable(
        id='datatable-interactivity',
        columns=[
            {'name': 'Type', 'id': 'Type'},
            {'name': 'Date', 'id': 'Date'},
            {'name': 'Tweet', 'id': 'Tweet'},
            {'name': 'Link', 'id':'Link', 'type': 'text', 'presentation':'markdown'}],
        data=df.to_dict('records'),
        style_filter={
            "backgroundColor":"white"
        },
        style_data_conditional=[
        {
            'if': {
                'column_id': 'Type',
            },
            'font-weight':'bold',
            'width':'200px'
        },
        {
            'if': {
                'column_id': 'Date',
            },
            'width':'200px'
        },
        {
            'if': {
                'column_id': 'Tweet',
            },
            'width':'2500px'
        },
        {
            'if': {
                'column_id': 'Link',
            },
            'font-size':'16px'
        }],
        style_cell={
            'textAlign':'left',
            'font-family': 'Helvetica Neue',
            'whiteSpace': 'normal',
            'padding-bottom': '15px',
            'border':'0px solid darkslategray',
            'font-size':'16px',
            'height': 'auto'
        },
        style_header={
            'backgroundColor':"white", #mocassin
            'font-family':'Helvetica Neue',
            'font-weight': 'bold',
            'whiteSpace': 'normal',
            'padding': '10px',
            'border-bottom':'1px solid darkslategray',
            'font-size':'18px',
            'height': 'auto'
        },
        style_data={
            'whiteSpace': 'normal',
            'height': 'auto'
        },
        filter_action="native",
        sort_action="native",
        sort_mode="multi",
        page_action="native",
        page_current= 0,
        page_size= 8,
        fill_width=False
    ),
    html.Div(id='datatable-interactivity-container')
])



if __name__ == '__main__':
    app.run_server(debug=False)

For the "filter data" field, unlike the other table cells, the style was applied using a CSS file:

In [None]:
.dash-table-container .dash-spreadsheet-container .dash-spreadsheet-inner input:not([type=radio]):not([type=checkbox]){
    color: black!important;
    text-align: left!important;
    font-family: 'Helvetica Neue'!important;
    font-size: 16px!important;
    padding: 20px!important;
    font-weight: bold!important;
}

a {
    color:black!important;
}

## Tweets Insider Links Visualization

### urlcount.py and tabella_urls_fake.py

We've used the following packages:

In [None]:
import pandas as pd
import json
from bs4 import BeautifulSoup
from dateutil.parser import parse
import itertools
import requests
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import dash_table
import dash

In order to do the classification of the tweets, we need to read the CSV file and the JSON file:

In [None]:
csv_dataframe = pd.read_csv('dataset/FINAL_fakecovid_final_filtered_dataset_clean.csv',sep=";")
csv_dataframe['tweet_id'] = csv_dataframe['tweet_id'].astype(str)
csv_list = csv_dataframe.values.tolist()
lista_unica_csv=list(itertools.chain.from_iterable(csv_list))

data = []
with open('dataset/fakecovid_result_final_translated_full.json', 'r') as f:
    for line in f:
        data.append(json.loads(line))

We have to extract the link from the Tweet, so with the BeautifulSoup module, we obtain the title of the page (which is located at the link indicated in the Tweet).

In [None]:
index=0
urls = []
titles = []
dates = []
category = []
for element in data:
    print(index)
    if data[index]['entities']['urls'] is not None:
        for entity in data[index]['entities']['urls']:
            #TIMEOUT
            #https://stackoverflow.com/questions/16511337/correct-way-to-try-except-using-python-requests-module

            # SERVONO DAVVERO SOLO QUESTI CAMPI?
            if entity['expanded_url'].lower() not in urls:
                token_id = data[index]['id_str']                          
                indice_csv = lista_unica_csv.index(token_id)   
                value_cat =  lista_unica_csv[indice_csv+1].lower()
                if value_cat == "false":
                    value_cat = "fake"
                try:
                    r = requests.get(entity['expanded_url'], timeout=10)
                except requests.exceptions.Timeout:
                    titles.append("[TIMEOUT ERROR]"+"("+entity['expanded_url'].lower()+")")
                    urls.append(entity['expanded_url'].lower())
                    category.append(value_cat.replace(" ", ""))
                    d = parse(data[index]['created_at'])
                    d = d.strftime('%Y/%m/%d')
                    dates.append(d)
                except requests.ConnectionError:
                    titles.append("[CONNECTION ERROR]"+"("+entity['expanded_url'].lower()+")")
                    urls.append(entity['expanded_url'].lower())
                    category.append(value_cat.replace(" ", ""))
                    d = parse(data[index]['created_at'])
                    d = d.strftime('%Y/%m/%d')
                    dates.append(d)
                else:
                    soup = BeautifulSoup(r.text,features="lxml")
                    if soup.title is None:
                        titles.append("[NO TITLE ERROR]"+"("+entity['expanded_url'].lower()+")")
                        urls.append(entity['expanded_url'].lower())
                        category.append(value_cat.replace(" ", ""))
                        d = parse(data[index]['created_at'])
                        d = d.strftime('%Y/%m/%d')
                        dates.append(d)
                    else:
                        titles.append("["+soup.title.text+"]"+"("+entity['expanded_url'].lower()+")")
                        urls.append(entity['expanded_url'].lower())
                        category.append(value_cat.replace(" ", ""))
                        d = parse(data[index]['created_at'])
                        d = d.strftime('%Y/%m/%d')
                        dates.append(d)
            else:
                print("URL già presente")
    index=index+1

We create the Pandas DataFrame, in order to create a table, containing all the links in the Tweets, that it will be transcripted in a CSV file:

In [None]:
df = pd.DataFrame(
    {'Type': category,
    'Link': titles,
    'First-Shared': dates
    })

Then, we create the CSV file, that we'll use to create the Dash DataTable:

In [None]:
df.to_csv('urls.csv', sep=',', index=False)

### Let's create the Dash DataTable...but first

The generated URLs dataset (urls.csv) contained links that referred to:
- private Tweets, so not publicly visible and considered not relevant
- Tweets of suspended accounts, so no longer available
- deleted Tweets
- web pages no longer available

Thus, we have manually cleaned the dataset by veryfing every single link and by removing those rows that contained these irrilevant links.

### Now we can start

First, we have to read the CSV file:

In [None]:
df = pd.read_csv('urls.csv',sep=';')
print(df)

Then, we create the Dash DataTable:

In [None]:
app = dash.Dash(__name__)
#https://dash.plotly.com/datatable/filtering
app.layout = html.Div([
    dash_table.DataTable(
        id='datatable-interactivity',
        columns=[{'name': 'Type', 'id':'Type'},
            {'name': 'Link', 'id':'Link', 'type': 'text', 'presentation':'markdown'},
            {'name': 'First-Shared', 'id': 'First-Shared'}],
        data=df.to_dict('records'),
        style_data_conditional=[{
            'if': {
                'column_id': 'Type',
            },
            'font-weight':'bold',
            'width':'200px'
        }],
        style_filter={
            "backgroundColor":"white"
        },
        style_cell={
            'textAlign':'left',
            'font-family': 'Helvetica Neue',
            'border':'0px solid darkslategray',
            'font-size':'16px',
        },
        style_header={
            'backgroundColor':"moccasin", #moccasin
            'font-family':'Helvetica Neue',
            'font-weight': 'bold',
            'border-bottom':'1px solid darkslategray',
            'font-size':'18px',
        },
        filter_action="native",
        sort_action="native",
        sort_mode="multi",
        page_action="native",
        page_current= 0,
        page_size= 12,
    ),
    html.Div(id='datatable-interactivity-container')
])



if __name__ == '__main__':
    app.run_server(debug=False)