# NLP Uygulamaları

**N-Gram**

In [8]:
metin = """ N-gram metin içindeki birlikte kullanılan 
kelimelerin kombinasyonunu gösterir. Alex ile Semih müthiş bir ikiliydi bence.
"""

In [9]:
import textblob
from textblob import TextBlob
import nltk
nltk.download("punkt")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [11]:
TextBlob(metin).ngrams(3)

[WordList(['N-gram', 'metin', 'içindeki']),
 WordList(['metin', 'içindeki', 'birlikte']),
 WordList(['içindeki', 'birlikte', 'kullanılan']),
 WordList(['birlikte', 'kullanılan', 'kelimelerin']),
 WordList(['kullanılan', 'kelimelerin', 'kombinasyonunu']),
 WordList(['kelimelerin', 'kombinasyonunu', 'gösterir']),
 WordList(['kombinasyonunu', 'gösterir', 'Alex']),
 WordList(['gösterir', 'Alex', 'ile']),
 WordList(['Alex', 'ile', 'Semih']),
 WordList(['ile', 'Semih', 'müthiş']),
 WordList(['Semih', 'müthiş', 'bir']),
 WordList(['müthiş', 'bir', 'ikiliydi']),
 WordList(['bir', 'ikiliydi', 'bence'])]

**Part of speech tagging (POS)**

Metinde bulunan kelimelerin sıfat zarf gibi özelliklerinin tespitini sağlar.

In [13]:
nltk.download("averaged_perceptron_tagger")

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [18]:
metin2 = "Hello, I have black car and yellow cat"

In [19]:
TextBlob(metin2).tags

[('Hello', 'NNP'),
 ('I', 'PRP'),
 ('have', 'VBP'),
 ('black', 'JJ'),
 ('car', 'NN'),
 ('and', 'CC'),
 ('yellow', 'JJ'),
 ('cat', 'NN')]

**Chunking(shallow parsing)**

Sıfat zarf durumlarını diagram ile gösterebiliriz.

In [20]:
pos = TextBlob(metin2).tags
pos

[('Hello', 'NNP'),
 ('I', 'PRP'),
 ('have', 'VBP'),
 ('black', 'JJ'),
 ('car', 'NN'),
 ('and', 'CC'),
 ('yellow', 'JJ'),
 ('cat', 'NN')]

In [27]:
reg_exp = "NP: {<DT>?<JJ>*<NN>}"
rp = nltk.RegexpParser(reg_exp)
sonuc = rp.parse(pos)
print(sonuc)
sonuc.draw() #bu normal şartlarda yeni pencere ile grafiksel olarak sonucları gösterecektir

(S
  Hello/NNP
  I/PRP
  have/VBP
  (NP black/JJ car/NN)
  and/CC
  (NP yellow/JJ cat/NN))


**Name Entity Recognition**

Özel kelimelerin ne olduğunu tanımlamamızı sağlayan yaklaşım.

In [28]:
from nltk import word_tokenize, pos_tag, ne_chunk
nltk.download("maxent_ne_chunker")
nltk.download("words")

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


True

In [29]:
metin3 = "Hello, I have black car and yellow cat his name is JOHN CENA"

In [31]:
print(ne_chunk(pos_tag(word_tokenize(metin3))))

(S
  (GPE Hello/NNP)
  ,/,
  I/PRP
  have/VBP
  black/JJ
  car/NN
  and/CC
  yellow/JJ
  cat/NN
  his/PRP$
  name/NN
  is/VBZ
  (PERSON JOHN/NNP CENA/NNP))


Buraya başka uygulama yöntemlerininde ekleyeyim bi ara çok havada kaldı.

# Matematiksel İşlemler ve Basit Özellik Çıkarımı

Metinlerde geçen kelimelerin frekansları aslında basit bir özellik çıkarımı olur.

**Harf/Karakter Sayısı**

In [38]:
metin = """
A Scandal in Bohemia! 01
The Red-headed League,2
A Case, of Identity 33
The Boscombe Valley Mystery4
The Five Orange Pips1
The Man with? the Twisted Lip
The Adventure of the Blue Carbuncle
The Adventure of the Speckled Band
The Adventure of the Engineer's Thumb
The Adventure of the Noble Bachelor
The Adventure of the Beryl Coronet
The Adventure of the Copper Beeches"""
v_metin = metin.split("\n")
import pandas as pd
v = pd.Series(v_metin)
metin_vektor = v[1:len(v)]
metin_df = pd.DataFrame(metin_vektor,columns = ["Romanlar"])
metin_df


Unnamed: 0,Romanlar
1,A Scandal in Bohemia! 01
2,"The Red-headed League,2"
3,"A Case, of Identity 33"
4,The Boscombe Valley Mystery4
5,The Five Orange Pips1
6,The Man with? the Twisted Lip
7,The Adventure of the Blue Carbuncle
8,The Adventure of the Speckled Band
9,The Adventure of the Engineer's Thumb
10,The Adventure of the Noble Bachelor


In [39]:
metin_df["Romanlar"].str.len()

1     24
2     23
3     22
4     28
5     21
6     29
7     35
8     34
9     37
10    35
11    34
12    35
Name: Romanlar, dtype: int64

bu saydırma işleminde boşluklarda sayılır.

In [42]:
metin_df["Harf_sayisi"] = metin_df["Romanlar"].str.len()
metin_df

Unnamed: 0,Romanlar,Harf_sayisi
1,A Scandal in Bohemia! 01,24
2,"The Red-headed League,2",23
3,"A Case, of Identity 33",22
4,The Boscombe Valley Mystery4,28
5,The Five Orange Pips1,21
6,The Man with? the Twisted Lip,29
7,The Adventure of the Blue Carbuncle,35
8,The Adventure of the Speckled Band,34
9,The Adventure of the Engineer's Thumb,37
10,The Adventure of the Noble Bachelor,35


**Kelime Sayısı**

In [44]:
metin_df["Romanlar"].apply(lambda x: len(str(x).split(" ")))

1     5
2     3
3     5
4     4
5     4
6     6
7     6
8     6
9     6
10    6
11    6
12    6
Name: Romanlar, dtype: int64

In [45]:
metin_df["Kelime_Sayisi"] = metin_df["Romanlar"].apply(lambda x: len(str(x).split(" ")))

In [46]:
metin_df

Unnamed: 0,Romanlar,Harf_sayisi,Kelime_Sayisi
1,A Scandal in Bohemia! 01,24,5
2,"The Red-headed League,2",23,3
3,"A Case, of Identity 33",22,5
4,The Boscombe Valley Mystery4,28,4
5,The Five Orange Pips1,21,4
6,The Man with? the Twisted Lip,29,6
7,The Adventure of the Blue Carbuncle,35,6
8,The Adventure of the Speckled Band,34,6
9,The Adventure of the Engineer's Thumb,37,6
10,The Adventure of the Noble Bachelor,35,6


**Özel karakterleri yakalamak ve saydırmak**

In [53]:
metin_df["Romanlar"].apply(lambda x: len([x for x in x.split() if x.startswith("Adventure")]))

1     0
2     0
3     0
4     0
5     0
6     0
7     1
8     1
9     1
10    1
11    1
12    1
Name: Romanlar, dtype: int64

In [54]:
metin_df["OzelKarakter(Adventure)"] = metin_df["Romanlar"].apply(lambda x: len([x for x in x.split() if x.startswith("Adventure")]))

In [55]:
metin_df

Unnamed: 0,Romanlar,Harf_sayisi,Kelime_Sayisi,OzelKarakter(Adventure)
1,A Scandal in Bohemia! 01,24,5,0
2,"The Red-headed League,2",23,3,0
3,"A Case, of Identity 33",22,5,0
4,The Boscombe Valley Mystery4,28,4,0
5,The Five Orange Pips1,21,4,0
6,The Man with? the Twisted Lip,29,6,0
7,The Adventure of the Blue Carbuncle,35,6,1
8,The Adventure of the Speckled Band,34,6,1
9,The Adventure of the Engineer's Thumb,37,6,1
10,The Adventure of the Noble Bachelor,35,6,1


**Sayılar Yakalamak ve Saydırmak**

In [56]:
metin_df["Romanlar"].apply(lambda x: len([x for x in x.split() if x.isdigit()]))

1     1
2     0
3     1
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
Name: Romanlar, dtype: int64

In [57]:
metin_df["Sayi_sayisi"] = metin_df["Romanlar"].apply(lambda x: len([x for x in x.split() if x.isdigit()]))

In [58]:
metin_df

Unnamed: 0,Romanlar,Harf_sayisi,Kelime_Sayisi,OzelKarakter(Adventure),Sayi_sayisi
1,A Scandal in Bohemia! 01,24,5,0,1
2,"The Red-headed League,2",23,3,0,0
3,"A Case, of Identity 33",22,5,0,1
4,The Boscombe Valley Mystery4,28,4,0,0
5,The Five Orange Pips1,21,4,0,0
6,The Man with? the Twisted Lip,29,6,0,0
7,The Adventure of the Blue Carbuncle,35,6,1,0
8,The Adventure of the Speckled Band,34,6,1,0
9,The Adventure of the Engineer's Thumb,37,6,1,0
10,The Adventure of the Noble Bachelor,35,6,1,0
