# Debate analysis example

In this notebook we present an example of how to evaluate the aspects we defined on shprt debate. We will use as example a the topic of Universal Basic Income. 
We will use three different debates: 
* one from [Wikidebate](https://en.wikiversity.org/wiki/Category:Wikidebates) titled ['_Should universal basic income be established?_'](https://en.wikiversity.org/wiki/Should_universal_basic_income_be_established%3F)
* one from [Kialo](https://www.kialo.com) titled ['_Should governments provide a universal basic income?_'](https://www.kialo.com/should-governments-provide-a-universal-basic-income-14053)
* one from [/rchangemyview](https://www.reddit.com/r/changemyview/) titled ['_CMV: Universal basic income is the way of the future._'](https://www.reddit.com/r/changemyview/comments/tdmuae/cmv_universal_basic_income_is_the_way_of_the/).

For further general information on data collection see the readme file and `src/data_collection/` folder for the code.

We start by importing all the functions we will use to evaluate the defined metrics.

In [1]:
from src.complexity_utils import *
from src.disagreement_utils import *
from src.equality_engagement_utils import *
from src.reason_utils import *
from src.sentiment_utils import *
from src.sourcing_utils import *
from src.topic_distance_utils import *

import pandas as pd

We proceeed to import the data we need. In this notebook we have three separated csv files, each row of these files contain a post, which is assigned with the following infomration:
* **id**: a unique id identifying the post
* **page_id**: an id identifying the topic 
* **item**: the content of the post
* **parent_id**: the id of the parent post, in the case of root post this value is 0
* **title**: a title identifying the topic
* **debate_id**: an id identifying the debate (as Kialo and CMV in the original dataset may have different debates for the same topic)
* **length**: post length in charachters
* **level**: depth level of the post (root posts are assigned with a level of 0)
* **thread_id**: an id identifying the thread each post belongs to
* **author**: author(s) of the post, in the case of Wikidebate more than one user could be involved. In the case of Kialo we cannot associate each post with its authors, thus this column is assigned with Nan values, while we use unmatched statistics when needed.
* **platform**: short name of the platform the post was published in. 

Moreover Wikidebate csv has:

* **references**: indicates the number of references of each post 

While Canghe My View has:

* **original_item**: post content before preprocessing, used to extract references. 

In [2]:
platforms=['wiki','kialo','cmv']
wiki_data=pd.read_csv('data/UBI_wiki.csv',index_col=0)
kialo_data=pd.read_csv('data/UBI_kialo.csv',index_col=0)
cmv_data=pd.read_csv('data/UBI_cmv.csv',index_col=0)
merged_data = pd.concat([wiki_data,kialo_data,cmv_data],ignore_index=True)

In [3]:
engagement_value,equality_value=engagement_and_equality_assignment(merged_data,platforms)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  wiki_data['author']=wiki_data['author'].apply(ast.literal_eval)


In [4]:
wiki_num_ref,kialo_num_ref,cmv_num_ref=get_platforms_reference_number(merged_data,platforms)
print(wiki_num_ref)
print(kialo_num_ref)
print(cmv_num_ref)

Should universal basic income be established?
125    I recently started learning more about Andrew ...
126    If the the point of UBI is to provide for basi...
127    Biggest issue is always gonna be the cost, rig...
128    Universal basic income advocates like to think...
129    So... UBI Happens right. Like they put it in p...
                             ...                        
482    >\tHomeschooling is already allowed though, so...
483    But you're still reaping the benefits of havin...
484    >But you're still reaping the benefits of havi...
485    So that once again brings me back to: why is p...
486    My guess is that people don’t question why thi...
Name: original_item, Length: 362, dtype: object
2
103
40


In [5]:
polarity=VADER_sentiment(merged_data)
sentiment=tex_blob_sentiment(merged_data,platforms)
mltd=mltd(merged_data)
readability_score=readability_score(merged_data)