# Regression Results and Descriptive Statistics

Now that our results are created, we can create our regression results and compare the Senator sentiment scores with the random name sentiment scores.

### Importing Packages

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import linregress
from IPython.core.display import HTML
import statsmodels.formula.api as smf
!pip install stargazer
from stargazer.stargazer import Stargazer

Collecting stargazer
  Using cached stargazer-0.0.5-py3-none-any.whl (9.7 kB)
Installing collected packages: stargazer
Successfully installed stargazer-0.0.5
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Importing Data and Coding New Variables

First we will import the data and create a "Non-Neutral" score equal to 1-Neutral scores. We should also fill the "Blank" text entries with the word "Blank". Currently they are read as an NA value which will be difficult to call when we separate the dataframe based on the different text entries.

In [None]:
# starting with the senators data
senators = pd.read_csv("senators_sentiment_fixed.csv")
senators["NonNeutral"] = 1 - senators["Neutral"]
senators["Text"]=senators["Text"].fillna("Blank")

# repeating with the random names data
random = pd.read_csv("random_names_scores.csv")
random["NonNeutral"] = 1 - random["Neutral"]
random["Text"]=random["Text"].fillna("Blank")

### Separating Phrases Into Different Groups

We can now separate our dataframes into separate dataframes depending on whether they use positive, negative, neutral phrases, or are the blank phrases.

In [None]:
# Separating dataframe by each phrase for the senators data
phrase1 = senators[senators['Text'] == senators["Text"][0]]
phrase2 = senators[senators['Text'] == senators["Text"][1]]
phrase3 = senators[senators['Text'] == senators["Text"][2]]
phrase4 = senators[senators['Text'] == senators["Text"][3]]
phrase5 = senators[senators['Text'] == senators["Text"][4]]
phrase6 = senators[senators['Text'] == senators["Text"][5]]
names_only = senators[senators['Text'] == senators["Text"][6]]

# Creating dataframes of the neutral, positive, and negative phrases
neutral = pd.concat([phrase1,phrase2])
positive = pd.concat([phrase3,phrase4])
negative = pd.concat([phrase5,phrase6])

### Defining a regression function
We will now create a regression function which regresses each of the score categories with the Democrat and Female variables in the Senate data. We can easily change the `group` input of the function to produce a regression table for our positive, negative, and mixed phrases.

The output option will allow us to decide whether to export the table as an HTML output or a LaTeX output. We will use the LaTeX output for our blog, but the HTML is easier to read within Sagemaker.

In [None]:
def regress_sentiments(group, output="HTML"):
    m1 = smf.ols(data=group, formula='NonNeutral ~ Democrat + female').fit(cov_type='HC3')
    m2 = smf.ols(data=group, formula='Positive ~ Democrat + female').fit(cov_type='HC3')
    m3 = smf.ols(data=group, formula='Mixed ~ Democrat + female').fit(cov_type='HC3')
    m4 = smf.ols(data=group, formula='Negative ~ Democrat + female').fit(cov_type='HC3')
    st1 = Stargazer([m1, m2,m3,m4])
    st1.rename_covariates({"Democrat":"Democratic", "female":"Female"})
    st1.custom_columns(['NonNeutral Scores', 'Positive Scores', 'Mixed Scores', 'Negative Scores'], [1, 1,1,1])
    if output=="HTML":
        return HTML(st1.render_html())
    if output=="latex":
        print(st1.render_latex())

### Running Regression Results

1. Positive Phrases

In [None]:
#regress_sentiments(positive, "latex")
regress_sentiments(positive)

0,1,2,3,4
,,,,
,,,,
,NonNeutral Scores,Positive Scores,Mixed Scores,Negative Scores
,(1),(2),(3),(4)
,,,,
Democratic,-0.019,-0.016,-0.000,-0.003***
,(0.064),(0.064),(0.001),(0.001)
Intercept,0.565***,0.555***,0.004***,0.006***
,(0.041),(0.041),(0.001),(0.001)
Female,-0.005,-0.004,-0.001,-0.001


2. Negative Phrases

In [None]:
#regress_sentiments(negative, "latex")
regress_sentiments(negative)

0,1,2,3,4
,,,,
,,,,
,NonNeutral Scores,Positive Scores,Mixed Scores,Negative Scores
,(1),(2),(3),(4)
,,,,
Democratic,-0.079***,-0.000,0.000,-0.078**
,(0.030),(0.001),(0.000),(0.031)
Intercept,0.673***,0.004***,0.000***,0.668***
,(0.019),(0.001),(0.000),(0.020)
Female,-0.032,-0.000,0.000,-0.031


3. Neutral Phrases

In [None]:
#regress_sentiments(neutral, "latex")
regress_sentiments(neutral)

0,1,2,3,4
,,,,
,,,,
,NonNeutral Scores,Positive Scores,Mixed Scores,Negative Scores
,(1),(2),(3),(4)
,,,,
Democratic,-0.004,-0.001,-0.001,-0.002
,(0.008),(0.002),(0.001),(0.007)
Intercept,0.048***,0.015***,0.003***,0.030***
,(0.004),(0.001),(0.001),(0.004)
Female,-0.003,-0.003*,0.002,-0.002


4. Names Only

In [None]:
#regress_sentiments(names_only, "latex")
regress_sentiments(names_only)

0,1,2,3,4
,,,,
,,,,
,NonNeutral Scores,Positive Scores,Mixed Scores,Negative Scores
,(1),(2),(3),(4)
,,,,
Democratic,0.005,0.008,-0.000,-0.002
,(0.009),(0.009),(0.000),(0.002)
Intercept,0.011***,0.006**,0.000***,0.005**
,(0.003),(0.003),(0.000),(0.002)
Female,0.005,0.003,0.000,0.003


### Testing Variances of Scores Between Senators and Random Names
We should now test the variances for each category across the Democratic, Republic, and random datasets.

We'll start by combining the Senator and random name datasets.

In [None]:
filtered_senators = senators[["name","Text", "group", "Overall","Negative","Positive","NonNeutral", "Mixed"]]
filtered_random = random[["name","Text", "Overall","Negative","Positive","NonNeutral", "Mixed"]]
filtered_random["group"] = "Non-Partisan"
all_scores = pd.concat([filtered_senators, filtered_random])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


We can now produce a table of variances for each group.

In [None]:
all_scores.groupby("group").var()

Unnamed: 0_level_0,Negative,Positive,NonNeutral,Mixed
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Democrat,0.078416,0.110416,0.135435,2.8e-05
Non-Partisan,0.097105,0.105279,0.136823,0.000454
Republican,0.097381,0.108726,0.143259,5.2e-05


Now we can use a Levene test to test whether the variances of the Democratic, Non-Partisan, and Republican groups are significantly different.

The Levene test tests the null hypothesis that the variances are equal between each group. The alternative hypothesis is that at least one of the variances in the group is different. We will need to import `levene` from `scipy.stats` to run this test. Also, we will need to separate the senators dataframes based on party affiliation.

In [219]:
# Separate Democrats from Republicans
Dem = senators[senators['group'] == 'Democrat']
Rep = senators[senators['group'] == 'Republican']

# Importing the function for levene test
from scipy.stats import levene

**Non-Neutral Variances Test:** Since the p-value is ~0.27, we fail to reject the null hypothesis that the variances are equal in each group.

In [220]:
levene(Dem["NonNeutral"], Rep["NonNeutral"], random["NonNeutral"], center='mean')

LeveneResult(statistic=1.3184126862436258, pvalue=0.2678923646859416)

Negative Variances Test: Since the p-value is ~.008, we can reject the null hypothesis at the 1% significance level.

In [None]:
levene(Dem["Negative"], Rep["Negative"], random["Negative"], center='mean')

LeveneResult(statistic=4.832247148219397, pvalue=0.008102286175880418)

**Positive Variances Test:** Since the p-value is ~.805, we fail to reject the null hypothesis. 

In [None]:
levene(Dem["Positive"], Rep["Positive"], random["Positive"], center = "mean")

LeveneResult(statistic=0.21666711848368647, pvalue=0.805225012863384)

Mixed Variances Test: Since p-value is <.001, we reject the null hypothesis at the 1% significance level.

In [None]:
levene(Dem["Mixed"], Rep["Mixed"], random["Mixed"], center = "mean")

LeveneResult(statistic=81.13526052984164, pvalue=4.6136432898191235e-34)