In [1]:
library(dplyr)
library(tidytext)
library(magrittr)

data.bbc <- as_tibble(read.table("https://jsienkiewicz.pl/TEXT/lab/data_bbc.csv", stringsAsFactors = F))



Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




In [2]:
data.bbc <- data.bbc[sample(1:nrow(data.bbc)),] #random order to ensure that we have some -1 and 1 in subjective class

data.bbc %<>% 
    mutate(emo = abs(emo)) %>%  #negative and positive as the same class (subjective)
    arrange(desc(emo)) %>% 
    slice(-c(701:(n()-700)))

data.bbc$text <- sapply(data.bbc$text, enc2utf8)
data.bbc$emo <- as.factor(data.bbc$emo)

data.bbc <- data.bbc[sample(1:nrow(data.bbc)),]

data.bbc$doc_id <- 1:nrow(data.bbc)

In [3]:
data.bbc %>% 
  group_by(emo) %>% 
  summarise(n = n())

emo,n
<fct>,<int>
0,700
1,700


In [4]:
data.bbc

text,emo,doc_id
<chr>,<fct>,<int>
"You don't know much about western culture, do you feller?",1,1
Ask FYP or check back.,0,2
"""Less than 5% of the population and there are people here who really believe that we are about to 'take over', the mind boggles."" A 5% radical element seem to control allot through Terror (see Spain).....or didn't you notice the influence in Lebenon or Afghanistan or the Mountain Districts of Pakistan or the Sudan and Darfur or anywhere else these Islamists ply their trade. There is a direct pipeline from those Countries to the UK...and all the Terrorists head to those enclaves before returning with their radicalizied notions... Is it a threat? ....It's not my Country but if recent arrests in the UK are any indication...You got a problem.",0,3
"I think for every temple that is destroyed, a mosque should be destroyed. Eye for an eye.",1,4
The VIDEO EVIDENCE says it all.,1,5
"YOu forgot to mention, that those remarks were translated and taken out of context by the translator. I suggest you watch the movie, Lost in translation.",1,6
"I don't think a sustained hostility towards Jews by Arabs predates the arrival of Zionists in Palestine, Grant. I mean Zionists and not Jews. I mean people who arrived there with the express intention of creating a new nation. Undoubtedly, Jews did not enjoy equality of status with Muslims in Muslim lands over the centuries, but then nor did people of any other faith. They were certainly not of the lowest status. Nor were they subject to the periodic pogroms in Arab lands that they endured in Europe. On the whole, they were not badly treated in the context of the times. Most histories that I have read of the region, predominently written by Jewish authors as it happens, are agreed on this.",1,7
"Don't be a fool Grant, nobdy is suggesting that anything with Bob Dylan in is likely to be accurate. Any more than the fantasies of you and Gavin. Now do something useful if you remember and remind me of the name of the film!!",0,8
"Mistress, Yes AQ types make up their own lines, but is it really such a good idea to confirm what they are saying? Today's paper reports that the US and Israel planned the attack on Lebanon months ago. What is the justification for that? And doesn't it play straight into the hands of those who say the US is out to destroy Islam? AQ is a small group, no more than a few thousand active members. From the press here it is reported that last weeks plot was foiled through a tip-off from a British Muslim and a few arrests in Pakistan. My take is we would be far more effective in dealing with that threat if we saw it as a police and security issue and not a ""war on terror"". Recent events would appear to bear that out. Seeing enemies rather than criminals legitimises those you are up against and therefore weakens the cooperation you are likely to get from those who do not believe in your self-righteousness.",0,9
"You mean when he was alive? Didn't stop the US helping him out by bombing the Serbs though, assisting a process of ethnic cleansing.",1,10


In [5]:
library(tm)
source <- DataframeSource(as.data.frame(data.bbc))

corpus <- VCorpus(source)

corpus %<>%
  tm_map(content_transformer(tolower)) %>%
  tm_map(removePunctuation) %>%
  tm_map(stripWhitespace) %>%
  tm_map(removeNumbers)

tdm <- DocumentTermMatrix(corpus)

tdm <- tdm[, apply(tdm, 2, sum) > 4]

tdm <- as.matrix(tdm)

ind <- apply(tdm, 1, sum) > 1

tdm <- tdm[ind, ]
class <- data.bbc$emo[ind]

dim(tdm); length(class)

Loading required package: NLP



In [6]:
library(MASS)

CM <- function(org.class, pred.class) {

  CM <- table(org.class, pred.class)
  return(sum(diag(CM)) / sum(CM))
}



Attaching package: ‘MASS’


The following object is masked from ‘package:dplyr’:

    select




In [7]:
library(e1071)

bbc.svml <- svm(tdm, class, type = "C-classification", kernel = "linear")
bbc.svml.pred <- predict(bbc.svml, tdm)

table(class, bbc.svml.pred)

     bbc.svml.pred
class   0   1
    0 677   0
    1   0 681

In [8]:
library(caret)
levels(class) <- c("obj", "sub") #objective and subjective
data <- cbind(as.data.frame(tdm), class.out = class)


fit <- trainControl(method = "cv", number = 10)
model <- train(class.out ~ ., data = data, method = "svmLinear", trControl = fit)

Loading required package: ggplot2


Attaching package: ‘ggplot2’


The following object is masked from ‘package:NLP’:

    annotate


Loading required package: lattice

“Variable(s) `' constant. Cannot scale data.”
“Variable(s) `' constant. Cannot scale data.”
“Variable(s) `' constant. Cannot scale data.”
“Variable(s) `' constant. Cannot scale data.”
“Variable(s) `' constant. Cannot scale data.”


In [9]:
model

Support Vector Machines with Linear Kernel 

1358 samples
2778 predictors
   2 classes: 'obj', 'sub' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 1223, 1222, 1222, 1222, 1222, 1223, ... 
Resampling results:

  Accuracy  Kappa    
  0.56628   0.1323631

Tuning parameter 'C' was held constant at a value of 1

In [10]:
confusionMatrix(model)

Cross-Validated (10 fold) Confusion Matrix 

(entries are percentual average cell counts across resamples)
 
          Reference
Prediction  obj  sub
       obj 26.4 19.9
       sub 23.5 30.3
                            
 Accuracy (average) : 0.5663
