**Duygu Analizi (Sentiment Analysis)**


*   Kutuplu Duygu Analizi(Sentiment Polarity): olumlu/olumsuz/nötr

Vader compound için etiketler:
*  > = 0.05 pozitif
*   <= -0.05 negatif
*   -0.05< x < 0.05 nötr



In [1]:
import pandas as pd
import nltk
import re

from nltk.sentiment.vader import SentimentIntensityAnalyzer #VADER duygu analizi aracını içe aktarır
from nltk.corpus import stopwords #ing stopwords listesini getirir
from nltk.tokenize import word_tokenize #metni kelimelerine ayırmak için kullanılan tokenizer
from nltk.stem import WordNetLemmatizer #kelimeleri kök haline indirmek (lemmatization için araç

nltk.download("vader_lexicon") #vader sözlüğünü indir (pozitif/negatif listesi)
nltk.download("stopwords") #stopwords listesini indir
nltk.download("punkt_tab") #tokenizer iiçin gerekl, model dosyalarını indir
nltk.download("wordnet") #lemmatizer için wordnet sözlüğü indir
nltk.download("omw-1.4") #wordnet çok dilli ek veritabanı(lemmatization destek veri)


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [3]:
df = pd.read_csv("duygu_analizi_amazon_veri_seti.csv") #veri setini yükleme
df.info()
df.head()
df.tail() # veri setinin sonu

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   reviewText  20000 non-null  object
 1   Positive    20000 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 312.6+ KB


Unnamed: 0,reviewText,Positive
19995,this app is fricken stupid.it froze on the kin...,0
19996,Please add me!!!!! I need neighbors! Ginger101...,1
19997,love it! this game. is awesome. wish it had m...,1
19998,I love love love this app on my side of fashio...,1
19999,This game is a rip off. Here is a list of thin...,0


In [4]:
# metin önişleme ve temizliği
lemmatizer = WordNetLemmatizer()
def clean_preprocess_data(text):
  tokens=word_tokenize(text.lower()) #tokenize
  filtered_tokens=[token for token in tokens if token not in stopwords.words('english')] #stopwords
  lemmatized_tokens=[lemmatizer.lemmatize(token) for token in filtered_tokens] #lemmazite
  processed_text= " ".join(lemmatized_tokens) #join words

  return processed_text
df["reviewText2"] = df["reviewText"].apply(clean_preprocess_data)



In [5]:
df.head()

Unnamed: 0,reviewText,Positive,reviewText2
0,This is a one of the best apps acording to a b...,1,one best apps acording bunch people agree bomb...
1,This is a pretty good version of the game for ...,1,pretty good version game free . lot different ...
2,this is a really cool game. there are a bunch ...,1,really cool game . bunch level find golden egg...
3,"This is a silly game and can be frustrating, b...",1,"silly game frustrating , lot fun definitely re..."
4,This is a terrific game on any pad. Hrs of fun...,1,terrific game pad . hr fun . grandkids love . ...


In [6]:
analyzer= SentimentIntensityAnalyzer() #vader duygu analizi aracının bir örneğini oluştur

def get_sentiments(text):
  score=analyzer.polarity_scores(text) #metinin duygu skorlarını hesaplar

  sentiment= 1 if score["compound"] >=0.5 else 0

  return sentiment #hesaplanan duygu etiketini geri döndürür

df["sentiment"] = df["reviewText2"].apply(get_sentiments) # her bir işlenmiş yoruma get_sentiments fonksiyonunu uygular

In [8]:
df.tail()

Unnamed: 0,reviewText,Positive,reviewText2,sentiment
19995,this app is fricken stupid.it froze on the kin...,0,app fricken stupid.it froze kindle wont allow ...,0
19996,Please add me!!!!! I need neighbors! Ginger101...,1,please add ! ! ! ! ! need neighbor ! ginger101...,1
19997,love it! this game. is awesome. wish it had m...,1,love ! game . awesome . wish free stuff house ...,1
19998,I love love love this app on my side of fashio...,1,love love love app side fashion story fight wo...,1
19999,This game is a rip off. Here is a list of thin...,0,game rip . list thing make better & bull ; fir...,1


In [12]:
from sklearn.metrics import confusion_matrix, classification_report

cm= confusion_matrix(df["Positive"], df["sentiment"])
print("Confusion matrix:\n", cm)

cr= classification_report(df["Positive"], df["sentiment"])
print(f"Classification report: \n{cr}")

Confusion matrix:
 [[ 3468  1299]
 [ 3486 11747]]
Classification report: 
              precision    recall  f1-score   support

           0       0.50      0.73      0.59      4767
           1       0.90      0.77      0.83     15233

    accuracy                           0.76     20000
   macro avg       0.70      0.75      0.71     20000
weighted avg       0.80      0.76      0.77     20000

