# Data Processing 1: Manual Coding Reliability & Export of Training Codes 

## Online Appendix of "International News Coverage and Foreign Image Building"

### Gento Kato (Nov. 4, 2017)

<p style="text-align:right;"> Back to [Summary Page](v3_SummaryNotebook.ipynb) </p>

In [23]:
#################################################################################
## File Name: v3_Data1_ManualCodingTests.R                                     ##
## Creation Date: 4 Nov 2017                                                   ##
## Author: Gento Kato                                                          ##
## Project: Foreign Image News Project                                         ##
## Purpose: Test Reliability of Manual Coding and Export Final Coding Dataset  ##
#################################################################################

## For Jupyter Notebook (Ignore if Using Other Software) ##
library(IRdisplay)

display_html(
'<script>  
code_show=true; 
function code_toggle() {
  if (code_show){
    $(\'div.input\').hide();
  } else {
    $(\'div.input\').show();
  }
  code_show = !code_show
}  
$( document ).ready(code_toggle);
</script>
  <form action="javascript:code_toggle()">
    <input type="submit" value="Click here to toggle on/off the raw code.">
 </form>'
)


### Load Packages and Set Directory

In [24]:
#################
## Preparation ##
#################

## Clear Workspace
rm(list=ls())

## Library Required Packages
library(rprojroot); library(readstata13);library(xtable);library(irr)

## Set Working Directory (Automatically or Manually) ##
#setwd(dirname(rstudioapi::getActiveDocumentContext()$path)); setwd(../) #In RStudio
projdir <- find_root(has_file("README.md")); projdir; setwd(projdir) #In Atom
#setwd("C:/GoogleDrive/Projects/Agenda-Setting Persuasion Framing/Foreign_Image_News_Project")


### Import Original Manual Coding Dataset

In [25]:
#################
## Import Data ##
#################

# Read Manual Coding Data
mc <- read.csv("data/manual_codes.csv")

head(mc)

id,c1_us,c2_us,c5_us,c6_us,us1count,us2count,us3count,c3_chn,c4_chn,...,sko1count,sko2count,sko3count,c2_nko,c4_nko,c6_nko,c10_nko,nko1count,nko2count,nko3count
1,1,1,2,2,2,2,0,2,2,...,0,4,0,1,2,1,1,3,1,0
2,1,2,2,2,1,3,0,2,2,...,0,4,0,9,9,3,9,0,0,1
3,2,2,2,1,1,3,0,3,3,...,0,1,3,1,1,3,1,3,0,1
4,2,1,2,2,1,3,0,2,2,...,1,3,0,2,2,3,1,1,2,1
5,3,3,3,3,0,0,4,2,2,...,1,3,0,2,2,3,3,0,2,2
6,2,2,2,2,0,4,0,2,2,...,0,4,0,2,2,1,2,1,3,0


### Manage Missing Data (Revised Dataset 1)

From the original data, the coding of "don't know" (8) and irrelevant (9) are recoded by the following scheme. First, "don't know" is recoded as neutral codes. Second, irrelvant codes are recoded as missing. To make coding consistent, coding for <i>all</i> coders are considered missing if at least one coder suggests the case as irrelevant.   

In [26]:
#############################
## Missing Recoded Dataset ##
#############################

# Copy Data
mc1 <- mc

# Add 1 to Count Scales of Neutral if coding==8 
for (i in grep("_us",names(mc))) {
  mc1[mc1[,i]==8,]$us2count <- mc1[mc1[,i]==8,]$us2count+1
}
for (i in grep("_chn",names(mc))) {
  mc1[mc1[,i]==8,]$chn2count <- mc1[mc1[,i]==8,]$chn2count+1
}
for (i in grep("_sko",names(mc))) {
  mc1[mc1[,i]==8,]$sko2count <- mc1[mc1[,i]==8,]$sko2count+1
}
for (i in grep("_nko",names(mc))) {
  mc1[mc1[,i]==8,]$nko2count <- mc1[mc1[,i]==8,]$nko2count+1
}

# 8 to Neutral
for (i in c(grep("_us",names(mc)),grep("_chn",names(mc)),
            grep("_sko",names(mc)),grep("_nko",names(mc)))) {
  mc1[mc1[,i]==8,i] <- 2
}

# 9 to Missing
for (i in c(grep("_us",names(mc)),grep("_chn",names(mc)),
           grep("_sko",names(mc)),grep("_nko",names(mc)))) {
  mc1[mc1[,i]==9,i] <- NA
}

# Missing if Any other codes of the same country is missing
for (i in grep("_us",names(mc))) {
  mc1[is.na(mc1[,i]),grep("_us",names(mc))] <- rep(NA,4)
}
for (i in grep("_chn",names(mc))) {
  mc1[is.na(mc1[,i]),grep("_chn",names(mc))] <- rep(NA,4)
}
for (i in grep("_sko",names(mc))) {
  mc1[is.na(mc1[,i]),grep("_sko",names(mc))] <- rep(NA,4)
}
for (i in grep("_nko",names(mc))) {
  mc1[is.na(mc1[,i]),grep("_nko",names(mc))] <- rep(NA,4)
}

head(mc1)

id,c1_us,c2_us,c5_us,c6_us,us1count,us2count,us3count,c3_chn,c4_chn,...,sko1count,sko2count,sko3count,c2_nko,c4_nko,c6_nko,c10_nko,nko1count,nko2count,nko3count
1,1,1,2,2,2,2,0,2,2,...,0,4,0,1.0,2.0,1.0,1.0,3,1,0
2,1,2,2,2,1,3,0,2,2,...,0,4,0,,,,,0,0,1
3,2,2,2,1,1,3,0,3,3,...,0,1,3,1.0,1.0,3.0,1.0,3,0,1
4,2,1,2,2,1,3,0,2,2,...,1,3,0,2.0,2.0,3.0,1.0,1,2,1
5,3,3,3,3,0,0,4,2,2,...,1,3,0,2.0,2.0,3.0,3.0,0,2,2
6,2,2,2,2,0,4,0,2,2,...,0,4,0,2.0,2.0,1.0,2.0,1,3,0


### Recode Overly Directional Code to Neutral (Revised Dataset 2)

To adjust the variability among coders, I recode the directional codes (i.e., positive/negative) to neutral, when three out of four coders (= three other coders) coded the same case as neutral. This procedure adjust the coders' tendency to provide overly directional codings. 

In [27]:
#########################################################
## Overy Extreme Coding to Neutral (Triple 2 to All 2) ##
#########################################################

# Copy the Previous Data
mc2 <- mc1

## Replace Overly Extreme Code to Neutral
mc2[mc1$us2count==3,grep("_us",names(mc))] <- rep(2,4)
mc2[mc1$chn2count==3,grep("_chn",names(mc))] <- rep(2,4)
mc2[mc1$sko2count==3,grep("_sko",names(mc))] <- rep(2,4)
mc2[mc1$nko2count==3,grep("_nko",names(mc))] <- rep(2,4)

## Put NA Values Back
mc2[is.na(mc1$c1_us),grep("_us",names(mc))] <- rep(NA,4)
mc2[is.na(mc1$c3_chn),grep("_chn",names(mc))] <- rep(NA,4)
mc2[is.na(mc1$c1_sko),grep("_sko",names(mc))] <- rep(NA,4)
mc2[is.na(mc1$c2_nko),grep("_nko",names(mc))] <- rep(NA,4)

head(mc2)

id,c1_us,c2_us,c5_us,c6_us,us1count,us2count,us3count,c3_chn,c4_chn,...,sko1count,sko2count,sko3count,c2_nko,c4_nko,c6_nko,c10_nko,nko1count,nko2count,nko3count
1,1,1,2,2,2,2,0,2,2,...,0,4,0,1.0,2.0,1.0,1.0,3,1,0
2,2,2,2,2,1,3,0,2,2,...,0,4,0,,,,,0,0,1
3,2,2,2,2,1,3,0,3,3,...,0,1,3,1.0,1.0,3.0,1.0,3,0,1
4,2,2,2,2,1,3,0,2,2,...,1,3,0,2.0,2.0,3.0,1.0,1,2,1
5,3,3,3,3,0,0,4,2,2,...,1,3,0,2.0,2.0,3.0,3.0,0,2,2
6,2,2,2,2,0,4,0,2,2,...,0,4,0,2.0,2.0,2.0,2.0,1,3,0


### Recode Overly Neutral Codes to Directional (Revised Dataset 3)

In addition to the previous adjustments, one more adjustment can be made. Here, I recode the case by certain directional codes, if all three other coders coded the same case in the same direction. This procedure is intended to adjust coder's tendency to provide overly neutral codes.   

In [28]:
##############################################
## Overy Neutral Coding to Directional Code ##
##############################################

# Copy Data
mc3 <- mc2

## Replace Overly Neutral Code to Directional
mc3[mc1$us1count==3,grep("_us",names(mc))] <- rep(1,4)
mc3[mc1$chn1count==3,grep("_chn",names(mc))] <- rep(1,4)
mc3[mc1$sko1count==3,grep("_sko",names(mc))] <- rep(1,4)
mc3[mc1$nko1count==3,grep("_nko",names(mc))] <- rep(1,4)
mc3[mc1$us3count==3,grep("_us",names(mc))] <- rep(3,4)
mc3[mc1$chn3count==3,grep("_chn",names(mc))] <- rep(3,4)
mc3[mc1$sko3count==3,grep("_sko",names(mc))] <- rep(3,4)
mc3[mc1$nko3count==3,grep("_nko",names(mc))] <- rep(3,4)

## Put NA Values Back
mc3[is.na(mc1$c1_us),grep("_us",names(mc))] <- rep(NA,4)
mc3[is.na(mc1$c3_chn),grep("_chn",names(mc))] <- rep(NA,4)
mc3[is.na(mc1$c1_sko),grep("_sko",names(mc))] <- rep(NA,4)
mc3[is.na(mc1$c2_nko),grep("_nko",names(mc))] <- rep(NA,4)

head(mc3)

id,c1_us,c2_us,c5_us,c6_us,us1count,us2count,us3count,c3_chn,c4_chn,...,sko1count,sko2count,sko3count,c2_nko,c4_nko,c6_nko,c10_nko,nko1count,nko2count,nko3count
1,1,1,2,2,2,2,0,2,2,...,0,4,0,1.0,1.0,1.0,1.0,3,1,0
2,2,2,2,2,1,3,0,2,2,...,0,4,0,,,,,0,0,1
3,2,2,2,2,1,3,0,3,3,...,0,1,3,1.0,1.0,1.0,1.0,3,0,1
4,2,2,2,2,1,3,0,2,2,...,1,3,0,2.0,2.0,3.0,1.0,1,2,1
5,3,3,3,3,0,0,4,2,2,...,1,3,0,2.0,2.0,3.0,3.0,0,2,2
6,2,2,2,2,0,4,0,2,2,...,0,4,0,2.0,2.0,2.0,2.0,1,3,0


### Krippendorf's Alpha (Inter-Coder Reliability)

Krippendorf's Alpha (Hayes and Krippendorf 2007) is suggested as a standard measure to assess inter-coder reliability. The conventional threshold of the "good" reliability is around 0.70. In this study, I use "ordinal" version of Krippendorf's Alpha. The result suggests that reliability is relatively low in the original dataset, but after recoding some overly extreme and overly neutral codes, it reaches (or at least gets closer to) the 0.70 threshold.

<i>Hayes, A. F. & Krippendorff, K. Answering the Call for a Standard Reliability Measure for Coding Data Communication Methods and Measures, 2007, 1, pp. 77-89</i>

In [29]:
#########################
## Krippendorf's Alpha ##
#########################

## Missing Recoded Data
ka1us <- kripp.alpha(t(mc1[,grep("_us",names(mc1))]),"ordinal")
ka1chn <- kripp.alpha(t(mc1[,grep("_chn",names(mc1))]),"ordinal")
ka1sko <- kripp.alpha(t(mc1[,grep("_sko",names(mc1))]),"ordinal")
ka1nko <- kripp.alpha(t(mc1[,grep("_nko",names(mc1))]),"ordinal")
ka1 <- c(ka1us$value,ka1chn$value,ka1sko$value,ka1nko$value)

## Overly Extreme to Neutral
ka2us <- kripp.alpha(t(mc2[,grep("_us",names(mc2))]),"ordinal")
ka2chn <- kripp.alpha(t(mc2[,grep("_chn",names(mc2))]),"ordinal")
ka2sko <- kripp.alpha(t(mc2[,grep("_sko",names(mc2))]),"ordinal")
ka2nko <- kripp.alpha(t(mc2[,grep("_nko",names(mc2))]),"ordinal")
ka2 <- c(ka2us$value,ka2chn$value,ka2sko$value,ka2nko$value)

## Overly Neutral to Directional
ka3us <- kripp.alpha(t(mc3[,grep("_us",names(mc3))]),"ordinal")
ka3chn <- kripp.alpha(t(mc3[,grep("_chn",names(mc3))]),"ordinal")
ka3sko <- kripp.alpha(t(mc3[,grep("_sko",names(mc3))]),"ordinal")
ka3nko <- kripp.alpha(t(mc3[,grep("_nko",names(mc3))]),"ordinal")
ka3 <- c(ka3us$value,ka3chn$value,ka3sko$value,ka3nko$value)

## Summary Table
katab <- rbind(ka1,ka2,ka3)
rownames(katab) <- c("Original Coding","Overly Directional Codes Recoded",
                      "Overly Neutral Codes Recoded")
colnames(katab) <- c("US","China","S.Korea","N.Korea")
round(katab,3)

Unnamed: 0,US,China,S.Korea,N.Korea
Original Coding,0.428,0.476,0.504,0.401
Overly Directional Codes Recoded,0.54,0.658,0.669,0.44
Overly Neutral Codes Recoded,0.685,0.786,0.791,0.629


### Consistent & Conservative Training Codes *(Currently Not Used in Machine-Learning)*

In addition to the simple majority rule for assigning each coding, the training codes for the machine learning can be constructed in two other ways. First, we can use "consistent" scheme. In this scheme, if three or more coders provide the same codes, the majority code is considered as final. Then, if the code is evenly splitted between (the same) directional and neutral codes, then directional code is given. All other cases are considered to be neutral. Second, "conservative" scheme posits that directional codes are given only when three or more coders agree on the same directional codes. The neutral codes are given for all other cases. 

In [30]:
########################################################################
## Generate Training Codes by Consistent & Conservative Coding Scheme ##
########################################################################

# Copy Data
mc4 <- mc1

## Consistent Coding Scheme ##

# US Coding
mc4$us_final <- 999
mc4[is.na(mc1$c1_us),c("us1count","us2count","us3count")] <- NA
mc4$us_final[which(mc4$us1count>=3)] <- 1
mc4$us_final[which(mc4$us2count>=3)] <- 2
mc4$us_final[which(mc4$us3count>=3)] <- 3
mc4$us_final[which(mc4$us1count==2 & mc4$us2count==2)] <- 1
mc4$us_final[which(mc4$us2count==2 & mc4$us3count==2)] <- 3
mc4$us_final[which(mc4$us1count==2 & mc4$us3count==2)] <- 2
mc4$us_final[which(mc4$us1count==1 & mc4$us2count==2 & mc4$us3count==1)] <- 2
mc4$us_final[which(mc4$us1count==2 & mc4$us2count==1 & mc4$us3count==1)] <- 2
mc4$us_final[which(mc4$us1count==1 & mc4$us2count==1 & mc4$us3count==2)] <- 2
mc4$us_final[which(mc4$us1count==1 & mc4$us2count==1 & mc4$us3count==2)] <- 2
mc4$us_final[is.na(mc4$us1count)] <- NA
usfin <- t(table(mc4$us_final,useNA="always"))

# China Coding
mc4$chn_final <- 999
mc4[is.na(mc1$c3_chn),c("chn1count","chn2count","chn3count")] <- NA
mc4$chn_final[which(mc4$chn1count>=3)] <- 1
mc4$chn_final[which(mc4$chn2count>=3)] <- 2
mc4$chn_final[which(mc4$chn3count>=3)] <- 3
mc4$chn_final[which(mc4$chn1count==2 & mc4$chn2count==2)] <- 1
mc4$chn_final[which(mc4$chn2count==2 & mc4$chn3count==2)] <- 3
mc4$chn_final[which(mc4$chn1count==2 & mc4$chn3count==2)] <- 2
mc4$chn_final[which(mc4$chn1count==1 & mc4$chn2count==2 & mc4$chn3count==1)] <- 2
mc4$chn_final[which(mc4$chn1count==2 & mc4$chn2count==1 & mc4$chn3count==1)] <- 2
mc4$chn_final[which(mc4$chn1count==1 & mc4$chn2count==1 & mc4$chn3count==2)] <- 2
mc4$chn_final[which(mc4$chn1count==1 & mc4$chn2count==1 & mc4$chn3count==2)] <- 2
mc4$chn_final[is.na(mc4$chn1count)] <- NA
chnfin <- t(table(mc4$chn_final,useNA="always"))

# South Korea Coding
mc4$sko_final <- 999
mc4[is.na(mc1$c1_sko),c("sko1count","sko2count","sko3count")] <- NA
mc4$sko_final[which(mc4$sko1count>=3)] <- 1
mc4$sko_final[which(mc4$sko2count>=3)] <- 2
mc4$sko_final[which(mc4$sko3count>=3)] <- 3
mc4$sko_final[which(mc4$sko1count==2 & mc4$sko2count==2)] <- 1
mc4$sko_final[which(mc4$sko2count==2 & mc4$sko3count==2)] <- 3
mc4$sko_final[which(mc4$sko1count==2 & mc4$sko3count==2)] <- 2
mc4$sko_final[which(mc4$sko1count==1 & mc4$sko2count==2 & mc4$sko3count==1)] <- 2
mc4$sko_final[which(mc4$sko1count==2 & mc4$sko2count==1 & mc4$sko3count==1)] <- 2
mc4$sko_final[which(mc4$sko1count==1 & mc4$sko2count==1 & mc4$sko3count==2)] <- 2
mc4$sko_final[which(mc4$sko1count==1 & mc4$sko2count==1 & mc4$sko3count==2)] <- 2
mc4$sko_final[is.na(mc4$sko1count)] <- NA
skofin <- t(table(mc4$sko_final,useNA="always"))

# North Korea Coding
mc4$nko_final <- 999
mc4[is.na(mc1$c2_nko),c("nko1count","nko2count","nko3count")] <- NA
mc4$nko_final[which(mc4$nko1count>=3)] <- 1
mc4$nko_final[which(mc4$nko2count>=3)] <- 2
mc4$nko_final[which(mc4$nko3count>=3)] <- 3
mc4$nko_final[which(mc4$nko1count==2 & mc4$nko2count==2)] <- 1
mc4$nko_final[which(mc4$nko2count==2 & mc4$nko3count==2)] <- 3
mc4$nko_final[which(mc4$nko1count==2 & mc4$nko3count==2)] <- 2
mc4$nko_final[which(mc4$nko1count==1 & mc4$nko2count==2 & mc4$nko3count==1)] <- 2
mc4$nko_final[which(mc4$nko1count==2 & mc4$nko2count==1 & mc4$nko3count==1)] <- 2
mc4$nko_final[which(mc4$nko1count==1 & mc4$nko2count==1 & mc4$nko3count==2)] <- 2
mc4$nko_final[which(mc4$nko1count==1 & mc4$nko2count==1 & mc4$nko3count==2)] <- 2
mc4$nko_final[is.na(mc4$nko1count)] <- NA
nkofin <- t(table(mc4$nko_final,useNA="always"))

## Conservative Coding Scheme ##

# US
mc4$us_final2 <- 2
mc4$us_final2[which(mc4$us1count>=3)] <- 1
mc4$us_final2[which(mc4$us3count>=3)] <- 3
mc4$us_final2[is.na(mc4$us1count)] <- NA
usfin2 <- t(table(mc4$us_final2,useNA="always"))

# China
mc4$chn_final2 <- 2
mc4$chn_final2[which(mc4$chn1count>=3)] <- 1
mc4$chn_final2[which(mc4$chn3count>=3)] <- 3
mc4$chn_final2[is.na(mc4$chn1count)] <- NA
chnfin2 <- t(table(mc4$chn_final2,useNA="always"))

# South Korea
mc4$sko_final2 <- 2
mc4$sko_final2[which(mc4$sko1count>=3)] <- 1
mc4$sko_final2[which(mc4$sko3count>=3)] <- 3
mc4$sko_final2[is.na(mc4$sko1count)] <- NA
skofin2 <- t(table(mc4$sko_final2,useNA="always"))

# North Korea
mc4$nko_final2 <- 2
mc4$nko_final2[which(mc4$nko1count>=3)] <- 1
mc4$nko_final2[which(mc4$nko3count>=3)] <- 3
mc4$nko_final2[is.na(mc4$nko1count)] <- NA
nkofin2 <- t(table(mc4$nko_final2,useNA="always"))

fintab <- rbind(usfin,chnfin,skofin,nkofin,
                usfin2,chnfin2,skofin2,nkofin2)
rownames(fintab) <- c("US Code (Consistent)","China Code (Consistent)",
                      "S.Korea Code (Consistent)","N.Korea Code (Consistent)",
                      "US Code (Conservative)","China Code (Conservative)",
                      "S.Korea Code (Conservative)","N.Korea Code (Conservative)")
fintab

Unnamed: 0,1,2,3,NA
US Code (Consistent),264,590,133,13
China Code (Consistent),251,625,120,4
S.Korea Code (Consistent),111,778,104,7
N.Korea Code (Consistent),409,383,157,51
US Code (Conservative),152,783,52,13
China Code (Conservative),165,791,40,4
S.Korea Code (Conservative),74,878,41,7
N.Korea Code (Conservative),292,604,53,51


### Save Datasets

The R Environment is saved do <code>data_heavy</code> directory. The training codes dataset is saved to <code>data</code> directory, by the name of <code>trainingcode.csv</code>.

In [31]:
###############
## Save Data ##
###############

# All Data
save.image("data/v3_Data1_ManualCodingTests.RData")
# Training Codes Dataset
write.csv(mc4,"data/trainingcode.csv",fileEncoding = "CP932",row.names=FALSE)                    