# Sarcasm Detection 

The aim of this study is to detect sarcastic expressions in sentences using machine learning with Python.

[Sarcasm Detection with Machine Learning | AMAN KHARWAL](https://amanxai.com/2021/08/24/sarcasm-detection-with-machine-learning/)

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQxWr2qbP_SnB1U-rWHScU-XvnMLEjQpePMJoY6IS9R6F620yBqyGPnQ7moKNYjDJlbTcQ&usqp=CAU" >

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB

In [2]:
df = pd.read_json("Sarcasm.json", lines=True)  

# lines formatı = JSON Lines
# Normal JSON'da veriler tek bir liste içindedir.
# JSON Lines'ta ise her satır ayrı bir JSON nesnesidir.
# lines=True → Dosya JSON Lines ise kullan; her satır bir kayıt olarak okunur.

In [3]:
df.head()

Unnamed: 0,article_link,headline,is_sarcastic
0,https://www.huffingtonpost.com/entry/versace-b...,former versace store clerk sues over secret 'b...,0
1,https://www.huffingtonpost.com/entry/roseanne-...,the 'roseanne' revival catches up to our thorn...,0
2,https://local.theonion.com/mom-starting-to-fea...,mom starting to fear son's web series closest ...,1
3,https://politics.theonion.com/boehner-just-wan...,"boehner just wants wife to listen, not come up...",1
4,https://www.huffingtonpost.com/entry/jk-rowlin...,j.k. rowling wishes snape happy birthday in th...,0


In [4]:
df["is_sarcastic"] = df["is_sarcastic"].map({0: "Not Sarcasm", 1: "Sarcasm"})

In [5]:
df.head()

Unnamed: 0,article_link,headline,is_sarcastic
0,https://www.huffingtonpost.com/entry/versace-b...,former versace store clerk sues over secret 'b...,Not Sarcasm
1,https://www.huffingtonpost.com/entry/roseanne-...,the 'roseanne' revival catches up to our thorn...,Not Sarcasm
2,https://local.theonion.com/mom-starting-to-fea...,mom starting to fear son's web series closest ...,Sarcasm
3,https://politics.theonion.com/boehner-just-wan...,"boehner just wants wife to listen, not come up...",Sarcasm
4,https://www.huffingtonpost.com/entry/jk-rowlin...,j.k. rowling wishes snape happy birthday in th...,Not Sarcasm


In [7]:
df.loc[0, "headline"]

"former versace store clerk sues over secret 'black code' for minority shoppers"

In [8]:
df.loc[2, "headline"]

"mom starting to fear son's web series closest thing she will have to grandchild"

In [9]:
x = df["headline"]
y = df["is_sarcastic"]

In [10]:
cv = CountVectorizer() 

# CountVectorizer: metinleri kelime sıklıklarına dayalı sayısal vektörlere dönüştürür (Bag-of-Words).

In [11]:
x = cv.fit_transform(x)  # Fit the data 
# metinleri sayısal vektörlere dönüştür

In [15]:
x.shape

(26709, 25292)

In [16]:
x.toarray()[:5] # ilk birlaç satır

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [12]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [13]:
model = BernoulliNB()

In [17]:
model.fit(x_train, y_train)

0,1,2
,alpha,1.0
,force_alpha,True
,binarize,0.0
,fit_prior,True
,class_prior,


In [18]:
model.score(x_test, y_test)

0.8448146761512542

In [19]:
user = input("Enter a Text: ")

Enter a Text:   Cows lose their jobs as milk prices drop


In [22]:
df = cv.transform([user]).toarray()

In [23]:
output = model.predict(df)

In [25]:
print(output)

['Sarcasm']


# Conclusion 

Using the sarcasm dataset, we applied a Bag-of-Words representation with a Bernoulli Naive Bayes classifier and achieved an accuracy of about 84.5%. The model was saved as a pipeline and can be used to predict whether new sentences are sarcastic or not.

In [26]:
import joblib
from sklearn.pipeline import Pipeline

pipe = Pipeline([("vectorizer", cv), ("clf", model)])
joblib.dump(pipe, "sarcasm_nb_pipeline.joblib")

['sarcasm_nb_pipeline.joblib']