# Selecting the examples for analisys
Long, medium and short text entries


In [1]:
import numpy as np
import pandas as pd
import random
from gensim.utils import simple_preprocess
import matplotlib.pyplot as plt

## Importing data

In [2]:
df = pd.read_csv("../../data/ad_hominem/ad_hominems_cleaned_Murilo.csv", sep=",", index_col=0, header=0, names=["body", "isAdHominem"])
df = df[~df.isin([np.nan, np.inf, -np.inf, 'nan']).any(1)] ## Remove rows with NaN values
print(df.shape)

(29218, 2)


In [3]:
df["length"] = df["body"].apply(lambda x: len(simple_preprocess(x, deacc=True))) # Make row for length
df = df.reset_index(drop=True)

In [4]:
pd.set_option('display.max_colwidth', 0)
df.head(n=5)

Unnamed: 0,body,isAdHominem,length
0,What makes corporations different in this case? They have interests too.,0,11
1,"I'm sorry if your smugness gets in the way. Like I said elsewhere in this thread. Somolia is not close to anything I advocate for so why on earth would I move there? Any time the Somolia ""argument"" is brought up, I instantly know I'm dealing with someone who refuses to learn the difference between a voluntary society and a third world country ravaged by warlords and foreign policies of other countries. If you want a thoughtful response to an argument, make sure you're not comparing Antarctica to the Bahamas. Otherwise, take your circlejerk, ""arguments"" elsewhere. You have contributed absolutely nothing to this thread but ad hominem Attacks and the typical liberal/conservative talking points and almost everyone in here knows it.",1,114
2,"Basically to believe a patriarchy exists, you must believe that men are maintaining a system of oppression against women, despite knowing the harm it does to both women and men.EG - Wanting to maintain a system that, among other things, condones severe anti-male bias in all facets of the legal system, simply isn't rational. Thus men, being the ones in power, want to oppress women so much they are willing to harm themselves to do it. It'd be like cutting off your own arm so you had something to club someone with.A long time ago one could say it was ignorance, but with how mainstream feminism thoughts are today this can no longer be true. So the actions of men to maintain the patriarchy must also be willful.How can a person believe this, and not hate men?",0,135
3,The punishment for heresy was being burned at the stake.,0,10
4,No it doesn't. Sex is defined by DNA. DNA cannot be changed from male to female. A sex change is putting lipstick on a pig. It may look different but it's still a pig,0,31


In [5]:
ilong_true = df.loc[(df["length"] > 300) & (df["length"] < 400) & (df['isAdHominem'] == 1)].sample(n=1).index[0]
ilong_false = df.loc[(df["length"] > 300) & (df["length"] < 400) & (df['isAdHominem'] == 0)].sample(n=1).index[0]
imed_true = df.loc[(df["length"] > 100) & (df["length"] < 150) & (df['isAdHominem'] == 1)].sample(n=1).index[0]
imed_false = df.loc[(df["length"] > 100) & (df["length"] < 150) & (df['isAdHominem'] == 0)].sample(n=1).index[0]
ishort_true = df.loc[(df["length"] > 10)  & (df["length"] < 20)  & (df['isAdHominem'] == 1)].sample(n=1).index[0]
ishort_false = df.loc[(df["length"] > 10)  & (df["length"] < 20)  & (df['isAdHominem'] == 0)].sample(n=1).index[0]
print("The indexes for the examples picked (in the original data frame) are {}, {}, {}, {}, {} and {}.".format(ilong_true, ilong_false, imed_true, imed_false, ishort_true, ishort_false))

The indexes for the examples picked (in the original data frame) are 18326, 13376, 1989, 5391, 17927 and 14290.


In [6]:
indexes = [ilong_true, ilong_false, imed_true, imed_false, ishort_true, ishort_false]
df_samples = df.loc[indexes,:]

## Filtering the dataset

In [7]:
df_samples["body"] = df_samples["body"].apply(lambda x: " ".join(simple_preprocess(str(x), deacc=True))) # Remove stop words, special characters, make everything lower case, etc.
df_samples = df_samples.reset_index(drop=True).reindex(["length", "body", "isAdHominem"], axis=1)        # Reset new indexes for data frame and reorder columns (visualization purposes)
df_samples

Unnamed: 0,length,body,isAdHominem
0,301,yeah people like non whites are already getting death threats here and civil war is actually way more likely than anyone wants to believe just because it was fair and legal election well sorry godwin but so was the election that put hitler into power also just because someone problems don measure up to people with worse problems doesn mean that their problems aren real and that bullshit fucking thing to say and not good way to start out discourse who says they ll be doing that firstly that bullshit if you have less stuff you actually have an easier time packing everything up and moving it as long as you have means of transportation even that can be overcome with either bit of saving or bit of larceny secondly if it is panic situation they might just abandon shit as refugees commonly do they aren refugees because they have nothing they have nothing because the active verb in fled their country is fled don know that you understand the utter helpless feeling of some people so let summarize people here were trying to change them and that shit didn fucking work personally am going to stay and change them but can understand someone who already given up hope also there that actual problems thing again and this time it got me bit riled so let see if you think me pulling pistol and saying should just kill you now as happened to minority friend after this election seems like fucking actual problem to you nowhere is perfect the people leaving don think it going to be perfect just better and don know if they re right or not but my guess is that with at least somewhat tighter gun control you won have people flashing their pieces and threatening minorities,1
1,341,in more recent years the concept of white privilege has gained considerable standing in national discussions over race developed as term for societal privileges that benefit people identified as white social privileges that are not commonly experienced by non white people under the same conditions there are two primary problems with this as see it first the idea of white privilege reifies race unlike the color blindness approach to racial inequality the white privilege or critical race theory approach makes white identity dominant force of history but whiteness in this view is not divorced from historical agents white people therefore become the architects of history and the only group with agency while claiming to deconstruct whiteness proponents of white privilege theory end up essentializing it and providing whites with self identification as whites which leads to the second problem the group interests attached to whiteness white privilege is described as set of privileges and even rights for example the privilege of not being targeted by police because of your race indeed many of the examples used by proponents of the theory involve law enforcement other examples include advantages in housing education and employment markets even when there is no race conscious decision being made as result sometimes of social networks unconscious bias and of course historical advantages the net result of reifying white privilege and identifying it with concrete material interests however is to create situation where proponents of white privilege in essence tell whites that they are distinct and advantaged social group that has material interests rational political actor would ordinarily advance their material interests but proponents of white privilege appear to believe that moral case against white privilege can persuade or perhaps shame whites into acting against these material interests is that at all likely or is it more likely that proponents of this theory will unwittingly unleash new forms of racial consciousness among whites that will advance white interests at the expense of racial minorities think that the latter is more likely if you disagree change my view,0
2,132,oh look someone who can read stop raping orphans for moment and think the lifetime statistics methods are never released and are supposedly derived somehow from the month figure any statistical process that isn made public basically doesn exist as it can be independently verified going to guess they are using previous results from other intellectually incorrect studies but because they never share how they arrive at this number cannot verify its correctness and therefore can be discarded even if the lifetime figure is correct it has no bearing on current risk if the rate of assault for women was higher years ago this would be reflected in higher lifetime rate but far lower month rate current score me billionyou negative infinity also stop raping orphan children and kill yourself you cunt,1
3,120,okay since you re discussing dna you re getting into dangerous territory so ll oblige your statement implies that black people are genetically pre disposed to crime ok then by the same token how do you explain that nearly every major serial killer of the th century was white man are white people genetically pre disposed to be murdering sociopaths by your logic would have to arrive at that conclusion know what just said is ridiculous however that outlandish claim still holds more water than what you re trying to argue because while poor people naturally turn to crime as means to survive serial killers and sociopaths are actually physically biologically different in the way that their brains are wired,0
4,13,honestly just shut the hell up you make massive fool out of yourself,1
5,15,what you said was so he wouldn be identified as criminal not he did it,0


## Neural Network/TFIDF
From [here](../02_tfidf/neural_network.ipynb).

In [8]:
from keras import utils
from keras.preprocessing import text, sequence
from sklearn.model_selection import train_test_split
from keras.models import model_from_json

vocab_size = 3000

tokenize = text.Tokenizer(num_words=vocab_size)
#tokenize.fit_on_texts(result.headline_text)

tokenize.fit_on_texts(df_samples["body"]) # only fit on train
x_test = tokenize.texts_to_matrix(df_samples["body"])
x_test.shape

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


(6, 3000)

In [9]:
# load json and create model
json_file = open('../02_tfidf/model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("../02_tfidf/model.h5")
print("Loaded model from disk")
 
# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#score = loaded_model.evaluate(x_test, df_samples['isAdHominem'], verbose=0)
ynew = loaded_model.predict_classes(x_test)

Loaded model from disk


In [10]:
df_samples["NN/TFIDF"] = ynew
df_samples

Unnamed: 0,length,body,isAdHominem,NN/TFIDF
0,301,yeah people like non whites are already getting death threats here and civil war is actually way more likely than anyone wants to believe just because it was fair and legal election well sorry godwin but so was the election that put hitler into power also just because someone problems don measure up to people with worse problems doesn mean that their problems aren real and that bullshit fucking thing to say and not good way to start out discourse who says they ll be doing that firstly that bullshit if you have less stuff you actually have an easier time packing everything up and moving it as long as you have means of transportation even that can be overcome with either bit of saving or bit of larceny secondly if it is panic situation they might just abandon shit as refugees commonly do they aren refugees because they have nothing they have nothing because the active verb in fled their country is fled don know that you understand the utter helpless feeling of some people so let summarize people here were trying to change them and that shit didn fucking work personally am going to stay and change them but can understand someone who already given up hope also there that actual problems thing again and this time it got me bit riled so let see if you think me pulling pistol and saying should just kill you now as happened to minority friend after this election seems like fucking actual problem to you nowhere is perfect the people leaving don think it going to be perfect just better and don know if they re right or not but my guess is that with at least somewhat tighter gun control you won have people flashing their pieces and threatening minorities,1,1
1,341,in more recent years the concept of white privilege has gained considerable standing in national discussions over race developed as term for societal privileges that benefit people identified as white social privileges that are not commonly experienced by non white people under the same conditions there are two primary problems with this as see it first the idea of white privilege reifies race unlike the color blindness approach to racial inequality the white privilege or critical race theory approach makes white identity dominant force of history but whiteness in this view is not divorced from historical agents white people therefore become the architects of history and the only group with agency while claiming to deconstruct whiteness proponents of white privilege theory end up essentializing it and providing whites with self identification as whites which leads to the second problem the group interests attached to whiteness white privilege is described as set of privileges and even rights for example the privilege of not being targeted by police because of your race indeed many of the examples used by proponents of the theory involve law enforcement other examples include advantages in housing education and employment markets even when there is no race conscious decision being made as result sometimes of social networks unconscious bias and of course historical advantages the net result of reifying white privilege and identifying it with concrete material interests however is to create situation where proponents of white privilege in essence tell whites that they are distinct and advantaged social group that has material interests rational political actor would ordinarily advance their material interests but proponents of white privilege appear to believe that moral case against white privilege can persuade or perhaps shame whites into acting against these material interests is that at all likely or is it more likely that proponents of this theory will unwittingly unleash new forms of racial consciousness among whites that will advance white interests at the expense of racial minorities think that the latter is more likely if you disagree change my view,0,1
2,132,oh look someone who can read stop raping orphans for moment and think the lifetime statistics methods are never released and are supposedly derived somehow from the month figure any statistical process that isn made public basically doesn exist as it can be independently verified going to guess they are using previous results from other intellectually incorrect studies but because they never share how they arrive at this number cannot verify its correctness and therefore can be discarded even if the lifetime figure is correct it has no bearing on current risk if the rate of assault for women was higher years ago this would be reflected in higher lifetime rate but far lower month rate current score me billionyou negative infinity also stop raping orphan children and kill yourself you cunt,1,1
3,120,okay since you re discussing dna you re getting into dangerous territory so ll oblige your statement implies that black people are genetically pre disposed to crime ok then by the same token how do you explain that nearly every major serial killer of the th century was white man are white people genetically pre disposed to be murdering sociopaths by your logic would have to arrive at that conclusion know what just said is ridiculous however that outlandish claim still holds more water than what you re trying to argue because while poor people naturally turn to crime as means to survive serial killers and sociopaths are actually physically biologically different in the way that their brains are wired,0,1
4,13,honestly just shut the hell up you make massive fool out of yourself,1,1
5,15,what you said was so he wouldn be identified as criminal not he did it,0,1


## Neural Network/Word2Vec
From [here](../02_tfidf/tfidf.ipynb).

In [11]:
# I can't run it on my local computer due to memory limitations

## LinearSVC/Word2Vec
From [here](../02_tfidf/tfidf.ipynb).

In [12]:
# I can't run it on my local computer due to memory limitations

# SVMs/TFIDF

SVMs tested:
* `svm.NuSVC([nu, kernel, degree, gamma, …])`: Nu-Support Vector Classification.
* `svm.SVC([C, kernel, degree, gamma, coef0, …])`: C-Support Vector Classification.
* `svm.LinearSVR([epsilon, tol, C, loss, …])`: Linear Support Vector Regression.

Kernels tested:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

From [here](../05_SVM/SVMs-kernels.ipynb).

In [13]:
from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import LinearSVC, NuSVC, OneClassSVM, SVC, SVR, l1_min_c
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import gensim
import sys
sys.path.insert(0, '/home/mcunha/Documents/Classes/KW/G0B34a_knowledge_and_the_web/')
import data.ad_hominem.tokenize_df
from sklearn.metrics import confusion_matrix
import itertools
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer

#### TdidfVectorizer used [here](../02_tfidf/tfidf.ipynb).

In [42]:
v = TfidfVectorizer(ngram_range = (1, 1), max_features=3000)
desired_indices = [i for i in range(len(df.index)) if i not in indexes]
df_notInSamples = df.iloc[desired_indices]

train_data, test_data = train_test_split(df_notInSamples, test_size=0.3, random_state=3)
v.fit(train_data['body'].values.astype('U'))

x_train_tfidf = v.transform(train_data['body'].values.astype('U'))
y_train = list(train_data["isAdHominem"])

x_test_tfidf = v.transform(df_samples['body'].values.astype('U'))
y_test = list(df_samples["isAdHominem"])

## NuSVC/TFIDF
The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

The kernel `rbf` was used in the [SVMs.ipynb](./SVMs.ipynb). Here for comparison.

In [15]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='linear').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)
df_samples["NuSVC-linear/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 3min 1s, sys: 221 ms, total: 3min 1s
Wall time: 3min 1s


In [16]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='poly').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)

df_samples["NuSVC-poly/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 10.4 s, sys: 20 ms, total: 10.4 s
Wall time: 10.4 s


In [17]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='sigmoid').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)

df_samples["NuSVC-sigmoid/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 12.5 s, sys: 100 ms, total: 12.6 s
Wall time: 12.6 s


In [18]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='rbf').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)

df_samples["NuSVC-rbf/TFIDF"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 14.1 s, sys: 108 ms, total: 14.2 s
Wall time: 14.2 s


## SVC/TFIDF
Theoretically equivalent to other methods (LinearSVC and NuSVC), but uses different implementations.
* `LinearSVC` is equivalent to `SVC(kernel = 'linear')`
* From documentation: *`SVC` and `NuSVC` are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section [Mathematical formulation](https://scikit-learn.org/stable/modules/svm.html#svm-mathematical-formulation))

The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

In [19]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='poly').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-poly/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 40.8 s, sys: 295 ms, total: 41.1 s
Wall time: 41.5 s


In [20]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='sigmoid').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-sigmoid/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 40.5 s, sys: 216 ms, total: 40.8 s
Wall time: 40.9 s


In [21]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='rbf').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-rbf/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 42.7 s, sys: 256 ms, total: 42.9 s
Wall time: 43.2 s


In [22]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='linear').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-linear/TFIDF"] = predicted

Fitting SVC model...
Done!
CPU times: user 59.2 s, sys: 92.3 ms, total: 59.3 s
Wall time: 59.4 s


## LinearSVC/TFIDF
As seen before, it is equivalent to `SVC(kernel='linear'`, with implementation differences.

In [23]:
%%time

print("Fitting linear model...")
linearModel = LinearSVC().fit(x_train, y_train)
print("Done!")

predicted = linearModel.predict(x_test)

df_samples["linearSVC/TFIDF"] = predicted

Fitting linear model...
Done!
CPU times: user 388 ms, sys: 3.92 ms, total: 392 ms
Wall time: 401 ms


## Analyze the results
See the classifications for the examples below

In [24]:
pd.set_option('display.max_colwidth', 30)
df_samples

Unnamed: 0,length,body,isAdHominem,NN/TFIDF,NuSVC-linear/TFIDF,NuSVC-poly/TFIDF,NuSVC-sigmoid/TFIDF,NuSVC-rbf/TFIDF,SVC-poly/TFIDF,SVC-sigmoid/TFIDF,SVC-rbf/TFIDF,SVC-linear/TFIDF,linearSVC/TFIDF
0,301,yeah people like non white...,1,1,0,1,1,0,0,0,0,0,0
1,341,in more recent years the c...,0,1,0,0,1,0,0,0,0,0,0
2,132,oh look someone who can re...,1,1,0,1,1,0,0,0,0,0,0
3,120,okay since you re discussi...,0,1,0,1,1,0,0,0,0,0,0
4,13,honestly just shut the hel...,1,1,0,1,1,0,0,0,0,0,0
5,15,what you said was so he wo...,0,1,0,0,1,0,0,0,0,0,0


## Mixed Neural Network/Word2Vec + POS tags + Doc2Vec

In [25]:
# Can't run locally due to memory limitations.

## Using Doc2Vec...

In [27]:
from gensim.models.doc2vec import Doc2Vec
doc2vec_model = Doc2Vec.load("reddit-doc2vec.model")

In [48]:
%%time

train_data = train_data.reset_index(drop=True)

x_train_vec = list(train_data["body"].apply(lambda x:simple_preprocess(str(x), deacc=True))) # Tokenize bodies
x_train_vec = [doc2vec_model.infer_vector(i) for i in x_train_vec]                           # Infer vectors
y_train = list(train_data["isAdHominem"])                                                    # Same as before, here for reference

x_test_vec = list(df_samples["body"].apply(lambda x:simple_preprocess(str(x), deacc=True))) # Tokenize bodies
x_test_vec = [doc2vec_model.infer_vector(i) for i in x_test_vec]                            # Infer vectors
y_test = list(df_samples["isAdHominem"])                                                    # Same as before, here for reference

## NuSVC/Doc2Vec
The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

The kernel `rbf` was used in the [SVMs.ipynb](./SVMs.ipynb). Here for comparison.

In [49]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='linear').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)
df_samples["NuSVC-linear/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 3min 5s, sys: 264 ms, total: 3min 5s
Wall time: 3min 6s


In [50]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='poly').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)

df_samples["NuSVC-poly/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 10.5 s, sys: 7.9 ms, total: 10.5 s
Wall time: 10.5 s


In [51]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='sigmoid').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)

df_samples["NuSVC-sigmoid/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 12.4 s, sys: 39 µs, total: 12.4 s
Wall time: 12.4 s


In [52]:
%%time

print("Fitting NuSVC model...")
nuModel = NuSVC(nu=0.05, kernel='rbf').fit(x_train, y_train)
print("Done!")

predicted = nuModel.predict(x_test)

df_samples["NuSVC-rbf/Doc2Vec"] = predicted

Fitting NuSVC model...
Done!
CPU times: user 14 s, sys: 3.44 ms, total: 14 s
Wall time: 14.1 s


## SVC/Doc2Vec
Theoretically equivalent to other methods (LinearSVC and NuSVC), but uses different implementations.
* `LinearSVC` is equivalent to `SVC(kernel = 'linear')`
* From documentation: *`SVC` and `NuSVC` are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section [Mathematical formulation](https://scikit-learn.org/stable/modules/svm.html#svm-mathematical-formulation))

The kernels to be used are:
* `linear`
* `poly`
* `sigmoid`
* `rbf`

In [53]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='poly').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-poly/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 38.9 s, sys: 76 ms, total: 39 s
Wall time: 39.1 s


In [54]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='sigmoid').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-sigmoid/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 39.9 s, sys: 100 ms, total: 40 s
Wall time: 40 s


In [55]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='rbf').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-rbf/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 38.6 s, sys: 116 ms, total: 38.7 s
Wall time: 38.7 s


In [56]:
%%time

print("Fitting SVC model...")
svcModel = SVC(kernel='linear').fit(x_train, y_train)
print("Done!")

predicted = svcModel.predict(x_test)

df_samples["SVC-linear/Doc2Vec"] = predicted

Fitting SVC model...
Done!
CPU times: user 55.5 s, sys: 96.2 ms, total: 55.6 s
Wall time: 55.6 s


## LinearSVC/Doc2Vec
As seen before, it is equivalent to `SVC(kernel='linear'`, with implementation differences.

In [57]:
%%time

print("Fitting linear model...")
linearModel = LinearSVC().fit(x_train, y_train)
print("Done!")

predicted = linearModel.predict(x_test)

df_samples["linearSVC/Doc2Vec"] = predicted

Fitting linear model...
Done!
CPU times: user 272 ms, sys: 0 ns, total: 272 ms
Wall time: 273 ms


## Once again, let's look at the results
See the classifications for the examples below

In [58]:
pd.set_option('display.max_colwidth', 30)
df_samples

Unnamed: 0,length,body,isAdHominem,NN/TFIDF,NuSVC-linear/TFIDF,NuSVC-poly/TFIDF,NuSVC-sigmoid/TFIDF,NuSVC-rbf/TFIDF,SVC-poly/TFIDF,SVC-sigmoid/TFIDF,...,linearSVC/TFIDF,NuSVC-linear/Doc2Vec,NuSVC-poly/Doc2Vec,NuSVC-sigmoid/Doc2Vec,NuSVC-rbf/Doc2Vec,SVC-poly/Doc2Vec,SVC-sigmoid/Doc2Vec,SVC-rbf/Doc2Vec,SVC-linear/Doc2Vec,linearSVC/Doc2Vec
0,301,yeah people like non white...,1,1,0,1,1,0,0,0,...,0,0,1,1,0,0,0,0,0,0
1,341,in more recent years the c...,0,1,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2,132,oh look someone who can re...,1,1,0,1,1,0,0,0,...,0,0,1,1,0,0,0,0,0,0
3,120,okay since you re discussi...,0,1,0,1,1,0,0,0,...,0,0,1,1,0,0,0,0,0,0
4,13,honestly just shut the hel...,1,1,0,1,1,0,0,0,...,0,0,1,1,0,0,0,0,0,0
5,15,what you said was so he wo...,0,1,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,0,0
