# fastText Commit Classification

In this notebook, we use a pre-trained fastText classification learned from labeled datasets to classify unlabeled commits gathered from Github.

#### Import the required libraries

In [12]:
import pandas as pd
from fasttext import load_model

#### Read the data

In [13]:
classifier = load_model("../model/model.bin") 
df = pd.read_csv('../data/commits.csv',  lineterminator='\n', encoding="utf-8")
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]



#### Format the data

In [15]:
# replace \n with space
df = df.replace('\n','', regex=True)

In [16]:
# check the shape of the Github data
df.shape

(50000, 1)

In [17]:
df.columns

Index(['message'], dtype='object')

##### Normalize the data

In [18]:
# convert the commit message column into string
commits = list(df['message'].astype(str))

##### Predict using fastText 

In [19]:
# predict the label with fastText
labels = classifier.predict(commits)  

In [20]:
res = list(zip(*labels))
res_list = [x[0] for x in res]
lst2 = [item[0] for item in res_list]
df['labels_predicted'] = lst2

##### Check the predictions made

In [21]:
df

Unnamed: 0,message,labels_predicted
0,Use github's new relative path format.,__label__features
1,Fix conditional variance of LS estimate.In 18....,__label__corrective
2,Merge pull request #8 from cortex/masterShared...,__label__nonfunctional
3,Integrate Mathieu Bryen's pull request.,__label__perfective
4,Merge branch 'master' of github.com:mavam/stat...,__label__features
...,...,...
49995,10.6 build fix,__label__corrective
49996,Don't delete beanstalkd.spec during distclean.,__label__perfective
49997,Re-add bitcoin(32&80).xpm,__label__perfective
49998,fixed some ClassInfo bugsSummary:for better re...,__label__corrective


##### End!