# Introduction to Natural Language Processing with fastText
In this notebook we will discuss what is Natural Language Processing (NLP) and how to easily implement several projects using the library [fastText](https://github.com/facebookresearch/fastText).

In [2]:
#Load all libraries
import os,sys  
import pandas as pd
import numpy as np
import fasttext

print(sys.version)

3.5.2 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]


## Text classification
The first task will be to perform text classification dataset DBPedia. 

In [25]:
#Load data
train_file = 'dbpedia_train.csv'
df = pd.read_csv(train_file, header=None, names=['class','name','description'])
class_dict={
1:'Company',
2:'EducationalInstitution',
3:'Artist',
4:'Athlete',
5:'OfficeHolder',
6:'MeanOfTransportation',
7:'Building',
8:'NaturalPlace',
9:'Village',
10:'Animal',
11:'Plant',
12:'Album',
13:'Film',
14:'WrittenWork'
}
df['class_name'] = df['class'].map(class_dict)
df.head()

Unnamed: 0,class,name,description,class_name
0,1,E. D. Abbott Ltd,Abbott of Farnham E D Abbott Limited was a Br...,Company
1,1,Schwan-Stabilo,Schwan-STABILO is a German maker of pens for ...,Company
2,1,Q-workshop,Q-workshop is a Polish company located in Poz...,Company
3,1,Marvell Software Solutions Israel,Marvell Software Solutions Israel known as RA...,Company
4,1,Bergan Mercy Medical Center,Bergan Mercy Medical Center is a hospital loc...,Company


In [24]:
#df.describe().transpose()
desc = df.groupby('class')
desc.describe().transpose()

class,1,1,1,1,2,2,2,2,3,3,...,12,12,13,13,13,13,14,14,14,14
Unnamed: 0_level_1,count,unique,top,freq,count,unique,top,freq,count,unique,...,top,freq,count,unique,top,freq,count,unique,top,freq
class_name,40000,1,Company,40000,40000,1,EducationalInstitution,40000,40000,1,...,Album,40000,40000,1,Film,40000,40000,1,WrittenWork,40000
description,40000,39996,The Spanish Royal Society of Chemistry (RSEQ)...,2,40000,39992,R.B. Govt. High School (Bengali: রামদেও বাজলা...,2,40000,40000,...,Before Smile Empty Soul became Smile Empty So...,2,40000,40000,Vagabond (French: Sans toit ni loi without ro...,1,40000,39984,Tom Clancy's Net Force Explorers or Net Force...,15
name,40000,40000,Blue Arrow,1,40000,40000,Gavar Special School,1,40000,40000,...,2nd (The Rasmus EP),1,40000,40000,Eye of the Needle (film),1,40000,40000,Er ist wieder da,1


In [33]:
%%time
# Train a classifier
output_file = 'dp_model'
classifier = fasttext.supervised('data/dbpedia.train', output_file, label_prefix='__label__')

CPU times: user 1min 28s, sys: 1.1 s, total: 1min 29s
Wall time: 20.3 s


In [34]:
%%time
# Evaluate classifier
test_file = 'data/dbpedia.test'
result = classifier.test(test_file)
print('P@1:', result.precision)
print('R@1:', result.recall)
print ('Number of examples:', result.nexamples)

P@1: 0.9834571428571428
R@1: 0.9834571428571428
Number of examples: 70000
CPU times: user 580 ms, sys: 12 ms, total: 592 ms
Wall time: 591 ms


In [38]:
sentence1 = ['Picasso was a famous painter born in Malaga, Spain. He revolutionized the art in the 20th century.']
labels = classifier.predict(sentence1)
print(labels)

[['3']]
