![](https://fasttext.cc/img/ogimage.png)

# What is FastText?

FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.

## Here we go

**This notebook puts submission first.**

### Load Libarary

In [None]:
import numpy as np
import pandas as pd
import re
import csv

import fasttext

### Load Data

In [None]:
train=pd.read_csv('/kaggle/input/nlp-getting-started/train.csv')
test=pd.read_csv('/kaggle/input/nlp-getting-started/test.csv')
submit=pd.read_csv('/kaggle/input/nlp-getting-started/sample_submission.csv')

### View Data 

In [None]:
train.head()

### Preprocessing text

In [None]:
# train.isna().sum()

train['keyword']=train['keyword'].fillna('none')
train['location']=train['location'].fillna('none')
test['keyword']=test['keyword'].fillna('none')
test['location']=test['location'].fillna('none')

### Ready to use data

In [None]:
from sklearn.model_selection import train_test_split

Train=train.drop('target',axis=1)
Target=train['target']

X_tr,X_val,y_tr,y_val=train_test_split(Train,Target,test_size=0.15,random_state=71,stratify=train['target'])

In [None]:
tr_arr=[]
val_arr=[]
test_arr=[]

for i,row in X_tr.iterrows():
    target=y_tr.loc[i]
    label=f'__label__{target}'
    text=row['keyword']+' '+row['location']+' '+row['text']
    label+=' '+text
    tr_arr.append(label)
    
for i,row in X_val.iterrows():
    text=row['keyword']+' '+row['location']+' '+row['text']
    val_arr.append(text)
    
for i,row in test.iterrows():
    text=row['keyword']+' '+row['location']+' '+row['text']
    test_arr.append(text)

### Export pandas to .txt file

In [None]:
train_df=pd.DataFrame(tr_arr)
train_df.to_csv('train.txt',index=False,sep=' ',header=False,quoting=csv.QUOTE_NONE,quotechar="",escapechar=" ")

In [None]:
model=fasttext.train_supervised('train.txt',label_prefix='__label__',epoch=10)
print(model.labels,'are the labels or targets the model is predicting')

In [None]:
from sklearn.metrics import accuracy_score

val_arr=[re.sub(r'\n','',text) for text in val_arr]

pred=[int(label[0][-1]) for label in model.predict(val_arr)[0]]
print(f'val_acc : {accuracy_score(pred,y_val.values)}')

### Inference

In [None]:
test_arr=[re.sub(r'\n','',text) for text in test_arr]

pred=[int(label[0][-1]) for label in model.predict(test_arr)[0]]

In [None]:
submit['target']=pred
submit.head()

### Make submission

In [None]:
submit.to_csv('submission.csv',index=False)