## ANTAGONIZER

Map polarity/partisanship between two categories of authors, based on a manually provided list of classifying words. 

In [None]:
import antagonizer as az

In [None]:
import pandas as pd
dataset = pd.read_csv('climate_data.csv') # file not provided with the repo

----
#### `.prepare_data(df, threshold)`
Read a pandas dataframe including the columns `author` and `text`. Keep authors with a number of documents that is `> threshold`.

Merge docs into docs-per-user. Preprocess and add bigrams.




In [None]:
prep_df = az.prepare_data(dataset, 5)

----
#### `.categorize(prep_df,cat1,cat2,cat3,tags1,tags2)`
Read the prepared dataframe. Categorize it into two named categories, as well as a hybrid category, based on the use of words in two manually defined lists.

In [None]:
# the following are examples
cat1 = 'denialism'
cat2 = 'activism'
cat3 = 'hybrid'
tags1 = ['hoax', 'leftist']
tags2 = ['fridaysforfuture', 'denial']

In [None]:
cat_df = az.categorize(prep_df,cat1,cat2,cat3,tags1,tags2)

----
#### `.partisan_phrases(prep_df,max_df,min_df,cat1,cat2)`

Calculate a bias_score for phrases, reflecting partisanship of language-use. The method used draws on a workflow described [here](https://towardsdatascience.com/detecting-politically-biased-phrases-from-u-s-senators-with-natural-language-processing-tutorial-d6273211d331). The maths for calculating bias is based on the paper _Auditing the partisanship of Google search snippets_ ([Hu et al. , 2019](https://dl.acm.org/doi/10.1145/3308558.3313654)).

The `max_df` and `min_df` parameters speak with the corresponding parameters in [sklearn's CountVectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html).

In [None]:
phrases_df = az.partisan_phrases(cat_df,max_df=0.8,min_df=10,cat1=cat1,cat2=cat2)

----
#### `.score_authors(phrases_df,cat_df)`
Phrases in `phrases_df` have a bias score between -1 (category 1 partisan) and 1 (category 2 partisan). This function will score authors (mean bias score) based on their use of phrases.

We want to use the most polarising phrases to score authors. We set a `polarity_cutoff`, where e.g. `0.6` means including phrases with a score below `-0,6` and above `0.6`.

In [None]:
inal_df = az.score_authors(phrases_df,cat_df,polarity_cutoff)

----
#### `.reduce(final_df,cat1,cat2,cat3)`
Reduce the dataframe by removing less active authors, to make plotting less demanding.

In [None]:
plot_df = az.reduce(final_df,cat1,cat2,cat3)

----
#### `.plot(plot_df,plotcut,colour1,colour2,colour3,width,height)`

Plot the data using [Bokeh](https://bokeh.org/). The `plotcut` parameter decides how many items should be drawn on the plot (start with lower and increase).

To set other parameters than the colours, width, and height, edit the source of the `plot()` function.

This draws an interactive plot, for inspection, and where mouse hover labels reveal author data.

In [None]:
p = az.plot(plot_df,4000, 'purple','green','grey',800,550)

----
#### Deluxe plot
Plot the full `final_df` using [Seaborn](https://seaborn.pydata.org/). This method draws a scatterplot, with kde density contours.


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

final_df = pd.read_csv('final_df.csv')

sns.set(rc={'axes.facecolor':'lightgrey', 'figure.facecolor':'white', 
            'legend.markerscale':1.4, 'font.family' : "monospace"})
x = final_df.authorscores
y = final_df.numdocs
category = final_df.category

# Draw a combo histogram and scatterplot with density contours
f, ax = plt.subplots(figsize=(20, 8))
ax.set(yscale="log", xlabel = 'Polarity', ylabel='Log number of documents', yticks= [500,1000,8000], yticklabels = ['500','1000','8000'], xlim = (-1,1))

scatter = sns.scatterplot(x=x, y=y, s=80, color=".15", data= final_df, alpha=0.6, hue=category, palette=dict(hybrid="darkgrey", denialism="purple", activism="green"), linewidth=0)
density = sns.kdeplot(x=x, y=y, levels=50, color="black", linewidths=0.6, alpha = 0.8)

scatter.legend(fontsize = 15, \
               bbox_to_anchor= (1.05, 1), \
               loc=2,
               borderaxespad=0,
               title= "Discursive orientation", \
               title_fontsize = 16, \
               facecolor = 'white',
               edgecolor = 'white'
              )
plt.savefig('full_plot.png')
plt.savefig('full_plot.pdf')