# Selecting the examples for analisys
Long, medium and short text entries


In [1]:
import numpy as np
import pandas as pd
import random
from gensim.utils import simple_preprocess
import matplotlib.pyplot as plt

## Importing data

In [2]:
df = pd.read_csv("../../data/ad_hominem/ad_hominems_cleaned_Murilo.csv", sep=",", index_col=0, header=0, names=["body", "isAdHominem"])
df = df[~df.isin([np.nan, np.inf, -np.inf, 'nan']).any(1)] ## Remove rows with NaN values
print(df.shape)

(29218, 2)


In [3]:
df["length"] = df["body"].apply(lambda x: len(simple_preprocess(x, deacc=True))) # Make row for length
df = df.reset_index(drop=True)

In [4]:
pd.set_option('display.max_colwidth', 0)
df.head(n=5)

Unnamed: 0,body,isAdHominem,length
0,What makes corporations different in this case? They have interests too.,0,11
1,"I'm sorry if your smugness gets in the way. Like I said elsewhere in this thread. Somolia is not close to anything I advocate for so why on earth would I move there? Any time the Somolia ""argument"" is brought up, I instantly know I'm dealing with someone who refuses to learn the difference between a voluntary society and a third world country ravaged by warlords and foreign policies of other countries. If you want a thoughtful response to an argument, make sure you're not comparing Antarctica to the Bahamas. Otherwise, take your circlejerk, ""arguments"" elsewhere. You have contributed absolutely nothing to this thread but ad hominem Attacks and the typical liberal/conservative talking points and almost everyone in here knows it.",1,114
2,"Basically to believe a patriarchy exists, you must believe that men are maintaining a system of oppression against women, despite knowing the harm it does to both women and men.EG - Wanting to maintain a system that, among other things, condones severe anti-male bias in all facets of the legal system, simply isn't rational. Thus men, being the ones in power, want to oppress women so much they are willing to harm themselves to do it. It'd be like cutting off your own arm so you had something to club someone with.A long time ago one could say it was ignorance, but with how mainstream feminism thoughts are today this can no longer be true. So the actions of men to maintain the patriarchy must also be willful.How can a person believe this, and not hate men?",0,135
3,The punishment for heresy was being burned at the stake.,0,10
4,No it doesn't. Sex is defined by DNA. DNA cannot be changed from male to female. A sex change is putting lipstick on a pig. It may look different but it's still a pig,0,31


In [5]:
ilong_true = df.loc[(df["length"] > 300) & (df["length"] < 400) & (df['isAdHominem'] == 1)].sample(n=1).index[0]
ilong_false = df.loc[(df["length"] > 300) & (df["length"] < 400) & (df['isAdHominem'] == 0)].sample(n=1).index[0]
imed_true = df.loc[(df["length"] > 100) & (df["length"] < 150) & (df['isAdHominem'] == 1)].sample(n=1).index[0]
imed_false = df.loc[(df["length"] > 100) & (df["length"] < 150) & (df['isAdHominem'] == 0)].sample(n=1).index[0]
ishort_true = df.loc[(df["length"] > 10)  & (df["length"] < 20)  & (df['isAdHominem'] == 1)].sample(n=1).index[0]
ishort_false = df.loc[(df["length"] > 10)  & (df["length"] < 20)  & (df['isAdHominem'] == 0)].sample(n=1).index[0]
print("The indexes for the examples picked (in the original data frame) are {}, {}, {}, {}, {} and {}.".format(ilong_true, ilong_false, imed_true, imed_false, ishort_true, ishort_false))

The indexes for the examples picked (in the original data frame) are 3206, 7888, 15733, 4361, 7166 and 23308.


In [6]:
indexes = [ilong_true, ilong_false, imed_true, imed_false, ishort_true, ishort_false]
df_samples = df.loc[indexes,:]

## Filtering the dataset

In [7]:
df_samples["body"] = df_samples["body"].apply(lambda x: " ".join(simple_preprocess(str(x), deacc=True))) # Remove stop words, special characters, make everything lower case, etc.
df_samples = df_samples.reset_index(drop=True).reindex(["length", "body", "isAdHominem"], axis=1)        # Reset new indexes for data frame and reorder columns (visualization purposes)
df_samples

Unnamed: 0,length,body,isAdHominem
0,343,alright here the meat of my aggressiveness you show great lack of empathy for those who didn have the opportunities that you did while you were busy with college some of the people you don want making higher wage were already busy in the work force why does their time working and years of experience count as less work than your school work did that tight budget allow you to purchase home and vehicle feed kid pay all your bills and tuition costs or did you have outside help as well it varies by state where live the minimum wage is hour and again wages should keep up with inflation cost of living and worker productivity americans have the longest working hours of all the industrialized nations with the least amount of vacation days per year also the us sits at th in median individual to sum this up you re basically arguing to keep minimum wages low which keeps median wages from going as well you re advocating against future raise for yourself then you re short sighted individual who doesn understand economics and slightly selfish ass to boot ll explain let make it simple if you re an entrepreneur what do you want the most profits how do you get those profits if people use your service or consume your product or use your product how will they do that buy buying it that means you and me will spend money to buy things what do the entrepreneur do with those profits he ll take some home and the rest he ll re invest in his business to make it grow make it grow how if you re manufacturer or producer you will increase manufacturing or production capacity this will mean buying machines employing more people using up more raw materials for the process of production and manufacturing giving the people at the bottom more power to support their lives is good for everyone including you and its why you should not only care about your wages but others as well,1
1,358,don think donald trump was bragging about getting away with sexual assault in the access hollywood tapes first of all yes of course it is sexual assault to grab woman genitalia without their consent but it is not sexual assault if it consensual and that what donald trump is bragging about in the tape for reference here the transcript of the relevant portion trump yeah that her with the gold better use some tic tacs just in case start kissing her you know automatically attracted to beautiful just start kissing them it like magnet just kiss don even wait and when you re star they let you do it you can do anything bush whatever you want trump grab them by the you can do anything ve bolded the phrase they let you do it because that reference to consent with they referring to the woman on the receiving end of his actions trump is not bragging about his wealth and fame allowing him to get away with groping women without their consent which again absolutely would be sexual assault he is instead bragging that his wealth and his fame make him so sexually attractive to women that they consent to being groped by him and that why he and his defenders are so insistent on it being locker room talk because while bragging about sexual assault is not normal in locker rooms outlandish claims about one sexual attractiveness and prowess absolutely are and just to be clear not defending trump here think that trump did not brag about sexual assault in the tapes but not saying that if he did what he said he did those women actually did consent in fact given the stories from the women who have come forward so far think it likely that they did not and think the takeaway of my view trump believes he was bragging about consensual sexual activity is that trump doesn truly understand what consent is and that might actually be even worse but it worse in way that underlines just how deeply rooted misogyny is and that can be dismissed as trump sociopath and an aberration,0
2,124,except the unquestionably well recorded long history handed down generation to generation that any sociologist worth their salt would argue is absolutely real evidence that you can find records of an atrocity buried in the dust of ruined empire official whitewashed record is hardly leg to stand on and just for the record using lack of evidence as evidence when jewish people have holiday literally thousands of years old proof to the contrary seems particularly doltish even for reddit might remind you that many scholars believe jesus last supper was passover no matter lets forget egypt and instead use the whole rest of jewish history from the end of the roman empire through the dark ages crusades until today my point still stands satisfied,1
3,116,yeah uh thats not understanding pro choice the point is that the women can how to regard the fetus do you think that the women who exercises her choice by carrying the child to term regards it as parasite of course not it the child she psyched to have however what the pro choice individual believes is that women_ gets to decide how to regard the fetus because it is her body_ and she doen have to set aside her right to determine how to regard things that take up residence inside her for any reason of course there are other reasons people are pro choice but this hardly shift from the mainstream perspective on choice,0
4,17,your inability to understand clear sentences is not failing on my part also it spelled you re,1
5,13,it is more effective to report it than downvote it speaking of which,0


## Neural Network/TFIDF
From [here](../02_tfidf/neural_network.ipynb).

In [8]:
from keras import utils
from keras.preprocessing import text, sequence
from sklearn.model_selection import train_test_split
from keras.models import model_from_json

vocab_size = 3000

tokenize = text.Tokenizer(num_words=vocab_size)
#tokenize.fit_on_texts(result.headline_text)

tokenize.fit_on_texts(df_samples["body"]) # only fit on train
x_test = tokenize.texts_to_matrix(df_samples["body"])
x_test.shape

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


(6, 3000)

In [9]:
# load json and create model
json_file = open('../02_tfidf/model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("../02_tfidf/model.h5")
print("Loaded model from disk")
 
# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#score = loaded_model.evaluate(x_test, df_samples['isAdHominem'], verbose=0)
ynew = loaded_model.predict_classes(x_test)

Loaded model from disk


In [10]:
df_samples["NN/TFIDF"] = ynew
df_samples

Unnamed: 0,length,body,isAdHominem,NN/TFIDF
0,343,alright here the meat of my aggressiveness you show great lack of empathy for those who didn have the opportunities that you did while you were busy with college some of the people you don want making higher wage were already busy in the work force why does their time working and years of experience count as less work than your school work did that tight budget allow you to purchase home and vehicle feed kid pay all your bills and tuition costs or did you have outside help as well it varies by state where live the minimum wage is hour and again wages should keep up with inflation cost of living and worker productivity americans have the longest working hours of all the industrialized nations with the least amount of vacation days per year also the us sits at th in median individual to sum this up you re basically arguing to keep minimum wages low which keeps median wages from going as well you re advocating against future raise for yourself then you re short sighted individual who doesn understand economics and slightly selfish ass to boot ll explain let make it simple if you re an entrepreneur what do you want the most profits how do you get those profits if people use your service or consume your product or use your product how will they do that buy buying it that means you and me will spend money to buy things what do the entrepreneur do with those profits he ll take some home and the rest he ll re invest in his business to make it grow make it grow how if you re manufacturer or producer you will increase manufacturing or production capacity this will mean buying machines employing more people using up more raw materials for the process of production and manufacturing giving the people at the bottom more power to support their lives is good for everyone including you and its why you should not only care about your wages but others as well,1,1
1,358,don think donald trump was bragging about getting away with sexual assault in the access hollywood tapes first of all yes of course it is sexual assault to grab woman genitalia without their consent but it is not sexual assault if it consensual and that what donald trump is bragging about in the tape for reference here the transcript of the relevant portion trump yeah that her with the gold better use some tic tacs just in case start kissing her you know automatically attracted to beautiful just start kissing them it like magnet just kiss don even wait and when you re star they let you do it you can do anything bush whatever you want trump grab them by the you can do anything ve bolded the phrase they let you do it because that reference to consent with they referring to the woman on the receiving end of his actions trump is not bragging about his wealth and fame allowing him to get away with groping women without their consent which again absolutely would be sexual assault he is instead bragging that his wealth and his fame make him so sexually attractive to women that they consent to being groped by him and that why he and his defenders are so insistent on it being locker room talk because while bragging about sexual assault is not normal in locker rooms outlandish claims about one sexual attractiveness and prowess absolutely are and just to be clear not defending trump here think that trump did not brag about sexual assault in the tapes but not saying that if he did what he said he did those women actually did consent in fact given the stories from the women who have come forward so far think it likely that they did not and think the takeaway of my view trump believes he was bragging about consensual sexual activity is that trump doesn truly understand what consent is and that might actually be even worse but it worse in way that underlines just how deeply rooted misogyny is and that can be dismissed as trump sociopath and an aberration,0,1
2,124,except the unquestionably well recorded long history handed down generation to generation that any sociologist worth their salt would argue is absolutely real evidence that you can find records of an atrocity buried in the dust of ruined empire official whitewashed record is hardly leg to stand on and just for the record using lack of evidence as evidence when jewish people have holiday literally thousands of years old proof to the contrary seems particularly doltish even for reddit might remind you that many scholars believe jesus last supper was passover no matter lets forget egypt and instead use the whole rest of jewish history from the end of the roman empire through the dark ages crusades until today my point still stands satisfied,1,1
3,116,yeah uh thats not understanding pro choice the point is that the women can how to regard the fetus do you think that the women who exercises her choice by carrying the child to term regards it as parasite of course not it the child she psyched to have however what the pro choice individual believes is that women_ gets to decide how to regard the fetus because it is her body_ and she doen have to set aside her right to determine how to regard things that take up residence inside her for any reason of course there are other reasons people are pro choice but this hardly shift from the mainstream perspective on choice,0,1
4,17,your inability to understand clear sentences is not failing on my part also it spelled you re,1,1
5,13,it is more effective to report it than downvote it speaking of which,0,1


## Neural Network/Word2Vec
From [here](../02_tfidf/tfidf.ipynb).

In [11]:
# I can't run it on my local computer due to memory limitations

## LinearSVC/Word2Vec
From [here](../02_tfidf/tfidf.ipynb).

In [12]:
# I can't run it on my local computer due to memory limitations

# SVMs/TFIDF

SVMs tested:
* `svm.NuSVC([nu, kernel, degree, gamma, …])`: Nu-Support Vector Classification.
* `svm.SVC([C, kernel, degree, gamma, coef0, …])`: C-Support Vector Classification.
* `svm.LinearSVR([epsilon, tol, C, loss, …])`: Linear Support Vector Regression.

Kernels tested:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

From [here](../05_SVM/SVMs-kernels.ipynb).

In [13]:
from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import LinearSVC, NuSVC, OneClassSVM, SVC, SVR, l1_min_c
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import gensim
import sys
sys.path.insert(0, '/home/mcunha/Documents/Classes/KW/G0B34a_knowledge_and_the_web/')
import data.ad_hominem.tokenize_df
from sklearn.metrics import confusion_matrix
import itertools
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer

#### TdidfVectorizer used [here](../02_tfidf/tfidf.ipynb).

In [14]:
v = TfidfVectorizer(ngram_range = (1, 1), max_features=3000)
desired_indices = [i for i in range(len(df.index)) if i not in indexes]
df_notInSamples = df.iloc[desired_indices]

train_data, test_data = train_test_split(df_notInSamples, test_size=0.3, random_state=3)
v.fit(train_data['body'].values.astype('U'))

x_train_tfidf = v.transform(train_data['body'].values.astype('U'))
y_train = list(train_data["isAdHominem"])

x_test_tfidf = v.transform(df_samples['body'].values.astype('U'))
y_test = list(df_samples["isAdHominem"])

## NuSVC/TFIDF
The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

The kernel `rbf` was used in the [SVMs.ipynb](./SVMs.ipynb). Here for comparison.

In [15]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='linear').fit(x_train_tfidf, y_train)
print("Done!")

predicted = nuModel.predict(x_test_tfidf)
df_samples["NuSVC-linear/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 3min 22s, sys: 380 ms, total: 3min 22s
Wall time: 3min 23s


In [16]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='poly').fit(x_train_tfidf, y_train)
print("Done!")

predicted = nuModel.predict(x_test_tfidf)

df_samples["NuSVC-poly/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 12.8 s, sys: 31.5 ms, total: 12.8 s
Wall time: 13 s


In [17]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='sigmoid').fit(x_train_tfidf, y_train)
print("Done!")

predicted = nuModel.predict(x_test_tfidf)

df_samples["NuSVC-sigmoid/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 14.8 s, sys: 128 ms, total: 15 s
Wall time: 15.1 s


In [18]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='rbf').fit(x_train_tfidf, y_train)
print("Done!")

predicted = nuModel.predict(x_test_tfidf)

df_samples["NuSVC-rbf/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 15 s, sys: 75.8 ms, total: 15.1 s
Wall time: 15.1 s


## SVC/TFIDF
Theoretically equivalent to other methods (LinearSVC and NuSVC), but uses different implementations.
* `LinearSVC` is equivalent to `SVC(kernel = 'linear')`
* From documentation: *`SVC` and `NuSVC` are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section [Mathematical formulation](https://scikit-learn.org/stable/modules/svm.html#svm-mathematical-formulation))

The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

In [19]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='poly').fit(x_train_tfidf, y_train)
print("Done!")

predicted = svcModel.predict(x_test_tfidf)

df_samples["SVC-poly/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 37.6 s, sys: 180 ms, total: 37.8 s
Wall time: 37.9 s


In [20]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='sigmoid').fit(x_train_tfidf, y_train)
print("Done!")

predicted = svcModel.predict(x_test_tfidf)

df_samples["SVC-sigmoid/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 39.7 s, sys: 220 ms, total: 39.9 s
Wall time: 40 s


In [21]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='rbf').fit(x_train_tfidf, y_train)
print("Done!")

predicted = svcModel.predict(x_test_tfidf)

df_samples["SVC-rbf/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 49.1 s, sys: 327 ms, total: 49.4 s
Wall time: 50 s


In [22]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='linear').fit(x_train_tfidf, y_train)
print("Done!")

predicted = svcModel.predict(x_test_tfidf)

df_samples["SVC-linear/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 1min 2s, sys: 168 ms, total: 1min 2s
Wall time: 1min 3s


## LinearSVC/TFIDF
As seen before, it is equivalent to `SVC(kernel='linear'`, with implementation differences.

In [23]:
%%time

print("Fitting linear model...")
linearModel = LinearSVC().fit(x_train_tfidf, y_train)
print("Done!")

predicted = linearModel.predict(x_test_tfidf)

df_samples["linearSVC/TFIDF"] = predicted

Fitting linear model...
Done!
CPU times: user 411 ms, sys: 0 ns, total: 411 ms
Wall time: 425 ms


## Analyze the results
See the classifications for the examples below

In [24]:
pd.set_option('display.max_colwidth', 30)
df_samples

Unnamed: 0,length,body,isAdHominem,NN/TFIDF,NuSVC-linear/TFIDF,NuSVC-poly/TFIDF,NuSVC-sigmoid/TFIDF,NuSVC-rbf/TFIDF,SVC-poly/TFIDF,SVC-sigmoid/TFIDF,SVC-rbf/TFIDF,SVC-linear/TFIDF,linearSVC/TFIDF
0,343,alright here the meat of m...,1,1,1,1,1,1,0,0,0,0,0
1,358,don think donald trump was...,0,1,0,0,0,0,0,0,0,0,0
2,124,except the unquestionably ...,1,1,0,0,0,1,0,0,0,0,0
3,116,yeah uh thats not understa...,0,1,1,0,0,1,0,0,0,0,0
4,17,your inability to understa...,1,1,0,0,0,1,0,0,0,0,0
5,13,it is more effective to re...,0,1,0,0,0,0,0,0,0,0,0


## Mixed Neural Network/Word2Vec + POS tags + Doc2Vec

In [25]:
# Can't run locally due to memory limitations.

## Using Doc2Vec...

In [26]:
from gensim.models.doc2vec import Doc2Vec
doc2vec_model = Doc2Vec.load("reddit-doc2vec.model")

In [27]:
%%time

train_data = train_data.reset_index(drop=True)

x_train_vec = list(train_data["body"].apply(lambda x:simple_preprocess(str(x), deacc=True))) # Tokenize bodies
x_train_vec = [doc2vec_model.infer_vector(i) for i in x_train_vec]                           # Infer vectors
y_train = list(train_data["isAdHominem"])                                                    # Same as before, here for reference

x_test_vec = list(df_samples["body"].apply(lambda x:simple_preprocess(str(x), deacc=True))) # Tokenize bodies
x_test_vec = [doc2vec_model.infer_vector(i) for i in x_test_vec]                            # Infer vectors
y_test = list(df_samples["isAdHominem"])                                                    # Same as before, here for reference

CPU times: user 1min 43s, sys: 212 ms, total: 1min 43s
Wall time: 1min 43s


## NuSVC/Doc2Vec
The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

The kernel `rbf` was used in the [SVMs.ipynb](./SVMs.ipynb). Here for comparison.

In [28]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='linear').fit(x_train_vec, y_train)
print("Done!")

predicted = nuModel.predict(x_test_vec)
df_samples["NuSVC-linear/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 3min 50s, sys: 728 ms, total: 3min 51s
Wall time: 3min 51s


In [29]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='poly').fit(x_train_vec, y_train)
print("Done!")

predicted = nuModel.predict(x_test_vec)

df_samples["NuSVC-poly/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 25.5 s, sys: 220 ms, total: 25.7 s
Wall time: 25.7 s


In [30]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='sigmoid').fit(x_train_vec, y_train)
print("Done!")

predicted = nuModel.predict(x_test_vec)

df_samples["NuSVC-sigmoid/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 35.5 s, sys: 268 ms, total: 35.7 s
Wall time: 35.8 s


In [31]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='rbf').fit(x_train_vec, y_train)
print("Done!")

predicted = nuModel.predict(x_test_vec)

df_samples["NuSVC-rbf/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 1min 2s, sys: 412 ms, total: 1min 2s
Wall time: 1min 2s


## SVC/Doc2Vec
Theoretically equivalent to other methods (LinearSVC and NuSVC), but uses different implementations.
* `LinearSVC` is equivalent to `SVC(kernel = 'linear')`
* From documentation: *`SVC` and `NuSVC` are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section [Mathematical formulation](https://scikit-learn.org/stable/modules/svm.html#svm-mathematical-formulation))

The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

In [32]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='poly').fit(x_train_vec, y_train)
print("Done!")

predicted = svcModel.predict(x_test_vec)

df_samples["SVC-poly/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 1min 27s, sys: 508 ms, total: 1min 28s
Wall time: 1min 28s


In [33]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='sigmoid').fit(x_train_vec, y_train)
print("Done!")

predicted = svcModel.predict(x_test_vec)

df_samples["SVC-sigmoid/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 1min 45s, sys: 428 ms, total: 1min 45s
Wall time: 1min 46s


In [34]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='rbf').fit(x_train_vec, y_train)
print("Done!")

predicted = svcModel.predict(x_test_vec)

df_samples["SVC-rbf/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 1min 59s, sys: 396 ms, total: 1min 59s
Wall time: 1min 59s


In [35]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='linear').fit(x_train_vec, y_train)
print("Done!")

predicted = svcModel.predict(x_test_vec)

df_samples["SVC-linear/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 7min 5s, sys: 639 ms, total: 7min 6s
Wall time: 7min 8s


## LinearSVC/Doc2Vec
As seen before, it is equivalent to `SVC(kernel='linear')`, with implementation differences.

In [36]:
%%time

print("Fitting linear model...")
linearModel = LinearSVC().fit(x_train_vec, y_train)
print("Done!")

predicted = linearModel.predict(x_test_vec)

df_samples["linearSVC/Doc2Vec"] = predicted

Fitting linear model...
Done!
CPU times: user 58.1 s, sys: 196 ms, total: 58.3 s
Wall time: 58.7 s


## Once again, let's look at the results
See the classifications for the examples below

In [37]:
df_samples.drop('NN/TFIDF', axis=1) # Dropped. In the script, it didn't improve in accuraccy with training...

Unnamed: 0,length,body,isAdHominem,NuSVC-linear/TFIDF,NuSVC-poly/TFIDF,NuSVC-sigmoid/TFIDF,NuSVC-rbf/TFIDF,SVC-poly/TFIDF,SVC-sigmoid/TFIDF,SVC-rbf/TFIDF,...,linearSVC/TFIDF,NuSVC-linear/Doc2Vec,NuSVC-poly/Doc2Vec,NuSVC-sigmoid/Doc2Vec,NuSVC-rbf/Doc2Vec,SVC-poly/Doc2Vec,SVC-sigmoid/Doc2Vec,SVC-rbf/Doc2Vec,SVC-linear/Doc2Vec,linearSVC/Doc2Vec
0,343,alright here the meat of m...,1,1,1,1,1,0,0,0,...,0,0,1,0,0,0,0,0,0,1
1,358,don think donald trump was...,0,0,0,0,0,0,0,0,...,0,1,1,1,0,0,0,0,0,0
2,124,except the unquestionably ...,1,0,0,0,1,0,0,0,...,0,0,1,1,1,0,0,0,0,0
3,116,yeah uh thats not understa...,0,1,0,0,1,0,0,0,...,0,0,1,1,0,0,0,0,0,0
4,17,your inability to understa...,1,0,0,0,1,0,0,0,...,0,0,1,0,1,0,0,0,0,0
5,13,it is more effective to re...,0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,0,0
