<center>
<img src="https://laelgelcpublic.s3.sa-east-1.amazonaws.com/lael_50_years_narrow_white.png.no_years.400px_96dpi.png" width="300" alt="LAEL 50 years logo">
<h3>APPLIED LINGUISTICS GRADUATE PROGRAMME (LAEL)</h3>
</center>
<hr>

# Corpus Linguistics - Study 1 - ANOVA Preparation - INRS

## What is `ANOVA`?

ANOVA, which stands for Analysis of Variance, is a statistical method used to compare the means of three or more groups to determine if there are any statistically significant differences between them. It helps in understanding whether the variation in data is due to the differences between the groups or just random chance.

Please refer to:
- [Analysis of Variance](https://en.wikipedia.org/wiki/Analysis_of_variance)

## Required Python packages

- pandas

## Importing the required libraries

In [1]:
import pandas as pd

## Defining input variables

In [2]:
input_directory = 'cl_st1_inrs_tc'
output_directory = 'cl_st1_inrs_anova'

## Enriching the Target Corpus with additional metadata

### Importing the Target Corpus into a DataFrame

#### `Republican + Democratic + Independent` data set

In [3]:
df_debates_turns = pd.read_json(f"{input_directory}/debates_turns.jsonl", lines=True)

In [4]:
df_debates_turns['Date'] = pd.to_datetime(df_debates_turns['Date'], unit='ms')

In [5]:
df_debates_turns.dtypes

Title                   object
Debate                  object
Date            datetime64[ns]
Participants            object
Moderators              object
Speaker                 object
Text                    object
dtype: object

In [6]:
df_debates_turns.head(5)

Unnamed: 0,Title,Debate,Date,Participants,Moderators,Speaker,Text
0,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,"Thank you very much, Chris. I will tell you ve..."
1,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,"Well, first of all, thank you for doing this a..."
2,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,The American people have a right to have a say...
3,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,There aren’t a hundred million people with pre...
4,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,"During that period of time, during that period..."


##### Checking the number of texts

In [7]:
df_debates_turns.shape

(3478, 7)

### Adding the column `Decade` from the column `Date`

This piece of code extracts the year from each date, then divides the year by 10, discards the remainder (using integer division), and finally multiplies the result back by 10 to get the start year of the decade.

In [8]:
df_debates_turns['Decade'] = (df_debates_turns['Date'].dt.year // 10) * 10

### Adding the column `Election` from the column `Date`

This piece of code converts the year to a string and then appends ' Election' to each year's value. The `astype(str)` method converts the year to a string, and the + ' Election' part appends the string to each year value.

In [9]:
df_debates_turns['Election'] = df_debates_turns['Date'].dt.year.astype(str) + ' Election'

In [10]:
df_debates_turns

Unnamed: 0,Title,Debate,Date,Participants,Moderators,Speaker,Text,Decade,Election
0,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,"Thank you very much, Chris. I will tell you ve...",2020,2020 Election
1,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,"Well, first of all, thank you for doing this a...",2020,2020 Election
2,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,The American people have a right to have a say...,2020,2020 Election
3,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,There aren’t a hundred million people with pre...,2020,2020 Election
4,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,"During that period of time, during that period...",2020,2020 Election
...,...,...,...,...,...,...,...,...,...
3473,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. NIXON,I would say that the issue will stay with us a...,1960,1960 Election
3474,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,"Well, Mr. Nixon, to go back to 1955. The resol...",1960,1960 Election
3475,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,And that’s the testimony of uh – General Twini...,1960,1960 Election
3476,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,I uh – said that I’ve served this country for ...,1960,1960 Election


### Retrieving the dictionary of speakers' respective political party

In [11]:
candidates_parties = {
    'ADMIRAL STOCKDALE': 'independent', 
    'ANDERSON': 'independent', 
    'BENTSEN': 'Democratic', 
    'BIDEN': 'Democratic', 
    'BUSH': 'Republican', 
    'CHENEY': 'Republican', 
    'CLINTON': 'Democratic', 
    'DOLE': 'Republican', 
    'DUKAKIS': 'Democratic', 
    'EDWARDS': 'Democratic', 
    'FERRARO': 'Democratic', 
    'GORE': 'Democratic', 
    'GOV. RONALD REAGAN': 'Republican', 
    'GOVERNOR CLINTON': 'Democratic', 
    'HARRIS': 'Democratic', 
    'KAINE': 'Democratic', 
    'KEMP': 'Republican', 
    'KERRY': 'Democratic', 
    'LIEBERMAN': 'Democratic', 
    'MCCAIN': 'Republican', 
    'MR. CARTER': 'Democratic', 
    'MR. FORD': 'Republican', 
    'Mr. KENNEDY': 'Democratic', 
    'MR. KENNEDY': 'Democratic', 
    'MR. MONDALE': 'Democratic', 
    'Mr. NIXON': 'Republican', 
    'MR. NIXON': 'Republican', 
    'MR. REAGAN': 'Republican', 
    'MR.FORD': 'Republican', 
    'OBAMA': 'Democratic', 
    'PALIN': 'Republican', 
    'PENCE': 'Republican', 
    'PEROT': 'independent', 
    'PRESIDENT BUSH': 'Republican', 
    'PRESIDENT GEORGE BUSH': '', 
    'QUAYLE': 'Republican', 
    'REAGAN': 'Republican', 
    'REP. JOHN B. ANDERSON': 'independent', 
    'RYAN': 'Republican', 
    'ROMNEHY': 'Republican', 
    'ROMNEY': 'Republican', 
    'ROSS PEROT': 'independent', 
    'SENATOR GORE': 'Democratic', 
    'SENATOR KENNEDY': 'Democratic', 
    'STOCKDALE': 'independent', 
    'THE PRESIDENT': 'Republican', 
    'The President': 'Republican', 
    'TRUMP': 'Republican', 
    'VICE PRESIDENT QUAYLE': 'Republican'
}

### Creating the column `Party` based on the dictionary `candidates_parties`

In [12]:
df_debates_turns['Party'] = df_debates_turns['Speaker'].map(candidates_parties)

In [13]:
df_debates_turns

Unnamed: 0,Title,Debate,Date,Participants,Moderators,Speaker,Text,Decade,Election,Party
0,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,"Thank you very much, Chris. I will tell you ve...",2020,2020 Election,Republican
1,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,"Well, first of all, thank you for doing this a...",2020,2020 Election,Democratic
2,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,The American people have a right to have a say...,2020,2020 Election,Democratic
3,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,There aren’t a hundred million people with pre...,2020,2020 Election,Republican
4,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,"During that period of time, during that period...",2020,2020 Election,Republican
...,...,...,...,...,...,...,...,...,...,...
3473,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. NIXON,I would say that the issue will stay with us a...,1960,1960 Election,Republican
3474,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,"Well, Mr. Nixon, to go back to 1955. The resol...",1960,1960 Election,Democratic
3475,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,And that’s the testimony of uh – General Twini...,1960,1960 Election,Democratic
3476,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,I uh – said that I’ve served this country for ...,1960,1960 Election,Democratic


In [14]:
df_debates_turns.dtypes

Title                   object
Debate                  object
Date            datetime64[ns]
Participants            object
Moderators              object
Speaker                 object
Text                    object
Decade                   int32
Election                object
Party                   object
dtype: object

#### Checking if there are any missing values in the column `Party`

In [15]:
df_debates_turns['Party'].isnull().sum()

0

### Exporting to a file

In [16]:
df_debates_turns[['Title', 'Debate', 'Date', 'Decade', 'Election', 'Participants', 'Moderators', 'Speaker', 'Party', 'Text']].to_json(f'{output_directory}/debates_turns_parties.jsonl', orient='records', lines=True)

## Preparing data for ANOVA

### Importing the factor scores

In [17]:
df_cl_st1_ph4_inrs_scores_only = pd.read_csv(f"{output_directory}/cl_st1_ph4_inrs_scores_only.tsv", sep='\t')

In [18]:
df_cl_st1_ph4_inrs_scores_only

Unnamed: 0,file,fac1,fac2,fac3,fac4,fac5,fac6
0,t000000,1,8,0,3,0,3
1,t000001,0,1,0,0,0,0
2,t000002,11,11,-1,5,-7,0
3,t000003,1,3,0,0,-2,0
4,t000004,4,2,0,0,-1,0
...,...,...,...,...,...,...,...
3423,t003473,-7,25,-3,-14,-3,0
3424,t003474,0,25,2,-15,0,0
3425,t003475,0,0,0,-1,0,0
3426,t003476,12,72,-4,-7,1,-1


### Importing the enriched Target Corpus into a DataFrame

In [19]:
df_debates_turns_parties = pd.read_json(f"{output_directory}/debates_turns_parties.jsonl", lines=True)

In [20]:
df_debates_turns_parties['Date'] = pd.to_datetime(df_debates_turns_parties['Date'], unit='ms')

In [21]:
df_debates_turns_parties.dtypes

Title                   object
Debate                  object
Date            datetime64[ns]
Decade                   int64
Election                object
Participants            object
Moderators              object
Speaker                 object
Party                   object
Text                    object
dtype: object

In [22]:
df_debates_turns_parties

Unnamed: 0,Title,Debate,Date,Decade,Election,Participants,Moderators,Speaker,Party,Text
0,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"Thank you very much, Chris. I will tell you ve..."
1,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,"Well, first of all, thank you for doing this a..."
2,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,The American people have a right to have a say...
3,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,There aren’t a hundred million people with pre...
4,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"During that period of time, during that period..."
...,...,...,...,...,...,...,...,...,...,...
3473,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. NIXON,Republican,I would say that the issue will stay with us a...
3474,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,"Well, Mr. Nixon, to go back to 1955. The resol..."
3475,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,And that’s the testimony of uh – General Twini...
3476,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,I uh – said that I’ve served this country for ...


#### Adding the column `File`

In [23]:
df_debates_turns_parties['File'] = 't' + df_debates_turns_parties.index.astype(str).str.zfill(6)

In [24]:
df_debates_turns_parties

Unnamed: 0,Title,Debate,Date,Decade,Election,Participants,Moderators,Speaker,Party,Text,File
0,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"Thank you very much, Chris. I will tell you ve...",t000000
1,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,"Well, first of all, thank you for doing this a...",t000001
2,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,The American people have a right to have a say...,t000002
3,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,There aren’t a hundred million people with pre...,t000003
4,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"During that period of time, during that period...",t000004
...,...,...,...,...,...,...,...,...,...,...,...
3473,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. NIXON,Republican,I would say that the issue will stay with us a...,t003473
3474,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,"Well, Mr. Nixon, to go back to 1955. The resol...",t003474
3475,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,And that’s the testimony of uh – General Twini...,t003475
3476,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,I uh – said that I’ve served this country for ...,t003476


### Merging the DataFrames with a left join

The processing of factor analysis have dropped a few texts causing the number of texts in both DataFrames not to match. The following piece of code performs a left join, which means that we are keeping all the rows from the DataFrame on the "left" side of the join (in our case, df_debates_turns_parties) and adding matching rows from the "right" DataFrame (df_cl_st1_ph4_inrs_scores_only). For those rows in the left DataFrame that don’t have matching keys (in the column 'File') in the right DataFrame, the new columns from the right DataFrame will be filled with NaN (Not a Number).

In [25]:
# Renaming 'file' column in 'df_cl_st1_ph4_inrs_scores_only' to match 'File' in 'df_debates_turns_parties'
df_cl_st1_ph4_inrs_scores_only.rename(columns={'file': 'File'}, inplace=True)

# Merge DataFrames on 'File', keeping all rows from 'df_debates_turns_parties'
df_merged = pd.merge(df_debates_turns_parties, df_cl_st1_ph4_inrs_scores_only, on='File', how='left')

# Rename the factor score columns
df_merged.rename(columns={
    'fac1': 'Factor 1',
    'fac2': 'Factor 2',
    'fac3': 'Factor 3',
    'fac4': 'Factor 4',
    'fac5': 'Factor 5',
    'fac6': 'Factor 6'
}, inplace=True)

In [26]:
df_merged.head(10)

Unnamed: 0,Title,Debate,Date,Decade,Election,Participants,Moderators,Speaker,Party,Text,File,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6
0,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"Thank you very much, Chris. I will tell you ve...",t000000,1.0,8.0,0.0,3.0,0.0,3.0
1,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,"Well, first of all, thank you for doing this a...",t000001,0.0,1.0,0.0,0.0,0.0,0.0
2,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,The American people have a right to have a say...,t000002,11.0,11.0,-1.0,5.0,-7.0,0.0
3,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,There aren’t a hundred million people with pre...,t000003,1.0,3.0,0.0,0.0,-2.0,0.0
4,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"During that period of time, during that period...",t000004,4.0,2.0,0.0,0.0,-1.0,0.0
5,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"Well, you’re certainly going to socialist. You...",t000005,,,,,,
6,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,"Number one, he knows what I proposed. What I p...",t000006,7.0,1.0,0.0,0.0,0.0,0.0
7,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,That’s not what you’ve said and it’s not what ...,t000007,0.0,0.0,0.0,0.0,0.0,-1.0
8,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,Your party doesn’t say it. Your party wants to...,t000008,0.0,0.0,0.0,0.0,0.0,-2.0
9,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,"The party is me. Right now, I am the Democrati...",t000009,0.0,1.0,0.0,0.0,0.0,-1.0


#### Checking for missing values in the column `Factor 1`

As expected, the missing values are exactly the ones whose keys in column 'File' did not match.

In [27]:
df_merged[['Factor 1', 'Factor 2', 'Factor 3', 'Factor 4', 'Factor 5', 'Factor 6']].isnull().sum()

Factor 1    50
Factor 2    50
Factor 3    50
Factor 4    50
Factor 5    50
Factor 6    50
dtype: int64

#### Dropping the rows where there are missing values

In [28]:
df_merged.dropna(subset=['Factor 1', 'Factor 2', 'Factor 3', 'Factor 4', 'Factor 5', 'Factor 6'], inplace=True)

In [29]:
df_merged[['Factor 1', 'Factor 2', 'Factor 3', 'Factor 4', 'Factor 5', 'Factor 6']].isnull().sum()

Factor 1    0
Factor 2    0
Factor 3    0
Factor 4    0
Factor 5    0
Factor 6    0
dtype: int64

In [30]:
df_merged

Unnamed: 0,Title,Debate,Date,Decade,Election,Participants,Moderators,Speaker,Party,Text,File,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6
0,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"Thank you very much, Chris. I will tell you ve...",t000000,1.0,8.0,0.0,3.0,0.0,3.0
1,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,"Well, first of all, thank you for doing this a...",t000001,0.0,1.0,0.0,0.0,0.0,0.0
2,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,The American people have a right to have a say...,t000002,11.0,11.0,-1.0,5.0,-7.0,0.0
3,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,There aren’t a hundred million people with pre...,t000003,1.0,3.0,0.0,0.0,-2.0,0.0
4,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,"During that period of time, during that period...",t000004,4.0,2.0,0.0,0.0,-1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3473,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. NIXON,Republican,I would say that the issue will stay with us a...,t003473,-7.0,25.0,-3.0,-14.0,-3.0,0.0
3474,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,"Well, Mr. Nixon, to go back to 1955. The resol...",t003474,0.0,25.0,2.0,-15.0,0.0,0.0
3475,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,And that’s the testimony of uh – General Twini...,t003475,0.0,0.0,0.0,-1.0,0.0,0.0
3476,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,I uh – said that I’ve served this country for ...,t003476,12.0,72.0,-4.0,-7.0,1.0,-1.0


### Exporting to a file

#### JSONL format

In [31]:
df_merged[['File', 'Title', 'Debate', 'Date', 'Decade', 'Election', 'Participants', 'Moderators', 'Speaker', 'Party', 'Factor 1', 'Factor 2', 'Factor 3', 'Factor 4', 'Factor 5', 'Factor 6', 'Text']].to_json(f'{output_directory}/debates_turns_parties_scores.jsonl', orient='records', lines=True)

#### TSV format

In [32]:
df_merged[['File', 'Title', 'Debate', 'Date', 'Decade', 'Election', 'Participants', 'Moderators', 'Speaker', 'Party', 'Factor 1', 'Factor 2', 'Factor 3', 'Factor 4', 'Factor 5', 'Factor 6', 'Text']].to_csv(f'{output_directory}/debates_turns_parties_scores.tsv', sep='\t', index=False, encoding='utf-8', lineterminator='\n')

## Appendices

### Preparing data for an exercise on SAS OnDemand for Academics

#### Defining input variables

In [2]:
input_directory = 'cl_st1_inrs_anova'
output_directory = 'cl_st1_inrs_anova/sas_exercise'

#### Importing the enriched Target Corpus into a DataFrame

In [3]:
df_debates_turns_parties = pd.read_json(f"{input_directory}/debates_turns_parties_scores.jsonl", lines=True)

In [4]:
df_debates_turns_parties['Date'] = pd.to_datetime(df_debates_turns_parties['Date'], unit='ms')

In [5]:
df_debates_turns_parties.dtypes

File                    object
Title                   object
Debate                  object
Date            datetime64[ns]
Decade                   int64
Election                object
Participants            object
Moderators              object
Speaker                 object
Party                   object
Factor 1                 int64
Factor 2                 int64
Factor 3                 int64
Factor 4                 int64
Factor 5                 int64
Factor 6                 int64
Text                    object
dtype: object

In [6]:
df_debates_turns_parties

Unnamed: 0,File,Title,Debate,Date,Decade,Election,Participants,Moderators,Speaker,Party,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6,Text
0,t000000,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,1,8,0,3,0,3,"Thank you very much, Chris. I will tell you ve..."
1,t000001,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,0,1,0,0,0,0,"Well, first of all, thank you for doing this a..."
2,t000002,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),BIDEN,Democratic,11,11,-1,5,-7,0,The American people have a right to have a say...
3,t000003,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,1,3,0,0,-2,0,There aren’t a hundred million people with pre...
4,t000004,"September 29, 2020 Debate Transcript",Presidential Debate at Case Western Reserve Un...,2020-09-29,2020,2020 Election,Former Vice President Joe Biden (D) and Presid...,Chris Wallace (Fox News),TRUMP,Republican,4,2,0,0,-1,0,"During that period of time, during that period..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3423,t003473,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. NIXON,Republican,-7,25,-3,-14,-3,0,I would say that the issue will stay with us a...
3424,t003474,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,0,25,2,-15,0,0,"Well, Mr. Nixon, to go back to 1955. The resol..."
3425,t003475,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,0,0,0,-1,0,0,And that’s the testimony of uh – General Twini...
3426,t003476,"October 21, 1960 Debate Transcript",The Fourth Kennedy-Nixon Presidential Debate,1960-10-21,1960,1960 Election,Kennedy-Nixon,QUINCY HOWE,MR. KENNEDY,Democratic,12,72,-4,-7,1,-1,I uh – said that I’ve served this country for ...


#### Creating one file per text named after the column `File` with the content of the column `Text`

In [7]:
# Writing each row to a separate text file
for i, row in df_debates_turns_parties.iterrows():
    filename = f"{output_directory}/{row['File']}.txt"
    with open(filename, 'w', encoding='utf8', newline='\n') as file:
        file.write(row['Text'])