Sarcasm has been part of our language for many years. It means being the opposite of what you mean, usually with a distinct tone of voice in a fun way. Understanding sarcasm depends on your language skills and your knowledge of other people’s minds. 
But what about a computer? Is it possible to train a machine learning model that can detect whether a sentence is sarcastic or not? Yes, it is! In this project, I have tried to Detect Sarcasm with Machine Learning using Python.

In [None]:
# Importing required libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB    # using this library for boolean features


In [None]:
# Using this dataset for training our model to detect sarcasm.
# This data set includes various article links and their headlines and they are classified as sarcastic or not

# Step 1 : Importing the dataset and storing it in a dataframe
url = "https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sarcasm.json"
df = pd.read_json(url)
df.head()

Unnamed: 0,article_link,headline,is_sarcastic
0,https://www.huffingtonpost.com/entry/versace-b...,former versace store clerk sues over secret 'b...,0
1,https://www.huffingtonpost.com/entry/roseanne-...,the 'roseanne' revival catches up to our thorn...,0
2,https://local.theonion.com/mom-starting-to-fea...,mom starting to fear son's web series closest ...,1
3,https://politics.theonion.com/boehner-just-wan...,"boehner just wants wife to listen, not come up...",1
4,https://www.huffingtonpost.com/entry/jk-rowlin...,j.k. rowling wishes snape happy birthday in th...,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26709 entries, 0 to 26708
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   article_link  26709 non-null  object
 1   headline      26709 non-null  object
 2   is_sarcastic  26709 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 626.1+ KB


The "is_sarcastic" column determines if the comment is sarcastic (1) or not (0)

In [None]:
# For our convenience of understanding
# mapping the labels
df['is_sarcastic'] = df['is_sarcastic'].map({0: "Not Sarcasm", 1: "Sarcasm"})
df.head()

Unnamed: 0,article_link,headline,is_sarcastic
0,https://www.huffingtonpost.com/entry/versace-b...,former versace store clerk sues over secret 'b...,Not Sarcasm
1,https://www.huffingtonpost.com/entry/roseanne-...,the 'roseanne' revival catches up to our thorn...,Not Sarcasm
2,https://local.theonion.com/mom-starting-to-fea...,mom starting to fear son's web series closest ...,Sarcasm
3,https://politics.theonion.com/boehner-just-wan...,"boehner just wants wife to listen, not come up...",Sarcasm
4,https://www.huffingtonpost.com/entry/jk-rowlin...,j.k. rowling wishes snape happy birthday in th...,Not Sarcasm


In [None]:
# Dropping column article link as it is a unique column 
# and does not have much significance here to determine whether the headline is sarcastic or not
df = df.drop(columns='article_link')
df.head()

Unnamed: 0,headline,is_sarcastic
0,former versace store clerk sues over secret 'b...,Not Sarcasm
1,the 'roseanne' revival catches up to our thorn...,Not Sarcasm
2,mom starting to fear son's web series closest ...,Sarcasm
3,"boehner just wants wife to listen, not come up...",Sarcasm
4,j.k. rowling wishes snape happy birthday in th...,Not Sarcasm


In [None]:
# now let's prepare the Training and the Test data sets. 
# We would select the “headline” column as a feature and the “is_sarcastic” column as a label
x = np.array(df['headline'])
y = np.array(df['is_sarcastic'])

In [None]:
# Creating Count Vector for all the headlines.
vec = CountVectorizer()
X = vec.fit_transform(x)   # ----> we will use this transformed data to create our train-test set

df_headline = pd.DataFrame(X.toarray(), columns= vec.get_feature_names())
df_headline



Unnamed: 0,00,000,00000000001,00003,000th,025,03,047,071,10,...,zoologist,zoologists,zoomed,zoroastrianism,zsa,zucker,zuckerberg,zz,éclairs,ünited
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26704,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26705,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26706,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26707,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# Splitting the data into 80% Training set and 20% Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [None]:
# Building the Model
model = BernoulliNB()               # Naive Bayes
# Fitting the Model
model.fit(X_train, y_train)

print("The accuracy of the model is: " + str((model.score(X_test, y_test) * 100).astype('int')) + "%") 

The accuracy of the model is: 84%


In [None]:
user = input("Enter a text to check for sarcasm: ")
data = vec.transform([user]).toarray()
output = model.predict(data)
print(output)

Enter a text to check for sarcasm:  I'm glad we're having a rehearsal dinner. I rarely practice my meals before I eat.


['Sarcasm']
