<h2>Compounds and Compounding </h2>

Compounding is a productive process of vocabulary expansion in languages where two or more nouns are used together to generate a new lexeme. Compound analysis is computationally challenging primarily due to three factors:
<br>
<br>
i). Compounds are highly productive in nature<br>
ii). The relation between the components is implicit and<br>
iii). The correct interpretation of a compound is often dependent on contextual or pragmatic features .<br>

For example, ‘houseboat’ and ‘boathouse’ are compounds formed from the same pair of nouns, ‘house’ and ‘boat’, but do not mean the same. Similarly, the relation between ‘olive’ and ‘oil’ in ‘olive oil’ does not hold between ‘baby’ and ‘oil’ in ‘baby oil’.

<h2>Compounds in Sanskrit </h2>

The compounds in Sanskrit exhibit numerous regularities that characterise them. Compounds in Sanskrit are concatenative in nature, with a strict preference for the ordering of the components.
A generated compound is treated as a fully qualified word (pada), such that the compound is subject to
all the inflectional and derivational modifications applicable to nouns. Affixation occurs at the end of the
compound similar to languages like that of Greek and not within the components. Any compound can be analysed by decomposing it into two immediate component nouns.
Linguists in Sanskrit have deeply discussed exceptions for the aforementioned regularities leading
to different categorisations and further sub-categorisations of the compound types . We only consider the four broad categorisations of the compounds. We now explain four classes of compounds and discuss various discriminative aspects about the broad level classes that we can abstract out from the generated forms and use in our system. In Sanskrit Grammar, compounds are classified into four general categories, namely, Avyayı̄bhāva, Tatpurus.a, Bahuvrı̄hi and Dvandva.

<h2>Dataset Description </h2>

A labelled dataset of compounds and the decomposed pairs of components. The dataset contains more than 32000 unique compounds. The compounds were obtained from ancient digitised texts including Śrı̄mad Bhagavat Gı̄ta, Caraka sam
. hitā among others.

The dataset contains the sandhi split components along with the compounds. With more than 75 % of the dataset containing Tatpurus.a compounds, we down-sample the Tatpurus.a compounds to a count of 4000, to match with the second highest class, Bahuvrı̄hi. We find that the Avyayı̄bhāva compounds are severely under-represented in the data-set, with about 5 % of the Bahuvrı̄hi class. From the dataset, we filtered <b>9952</b> different data-points split into <b>7957</b> data points for training and the remaining as held out dataset.

<h4>Original Dataset without features</h4>

Below we can see the first 10 rows of the dataset to get an idea of how it looks without additional features. The data contains word1, word2 which are the components of compounds in Sanskrit. Also dataset contains label containing the class it belongs to and which we have to predict for the heldout dataset, namely : 
- Avyayı̄bhāva (A) 
- Tatpurus.a (T)
- Bahuvrı̄hi (B) 
- Dvandva (D)

In [1]:
!head -10 Data/EssentialsofML/trainingFiltered.csv

,word1,word2,label
0,svAxu,paxArWAn,T
1,xvAxaSa,varRANi,T
2,AxAna,xurbale,T
3,go,BakwAH,T
4,viXi,Bexena,T
5,nIwi,kuSalaM,T
6,lafkA,upamam,B
7,yaSas,lipsoH,T
8,kaSa,AGAwam,T


<h3>Packages needed to be imported</h3><pre>
1.scikit-learn
2.numpy</pre>

<h4>Installing scikit-learn and numpy</h4>

In [None]:
# uncomment to install
# !pip install -U scikit-learn
# !pip install -U numpy

<h4>Enumurating unigrams, bigrams and trigrams</h4>

In [2]:
file = open("Data/EssentialsofML/trainingFiltered.csv")
file.readline()

#get unigrams list of word
def getUnigrams(word):
    return list(word) 

#get bigrams list of word
def getBigrams(word):
    return [ word[i]+word[i+1] for i in range(0,len(word)-1)]

#get trigrams list of word
def getTigrams(word):
    return [ word[i]+word[i+1]+word[i+2] for i in range(0,len(word)-2)]

#read columns word1, word2 and labels
word1Vec = []
word2Vec = []
labelVec = []

for line in file.readlines():
    splited_line = line.split(',')
    word1Vec.append(splited_line[1])
    word2Vec.append(splited_line[2])
    labelVec.append(splited_line[3].strip())
    
#enumurate unigrams, bigrams and trigrams for columns word1 and word2
unigramWord1 = reduce(lambda a,b: list(set(a+b)),map(getUnigrams,word1Vec))
bigramWord1 = reduce(lambda a,b: list(set(a+b)),map(getBigrams,word1Vec))
trigramWord1 = reduce(lambda a,b: list(set(a+b)),map(getTigrams,word1Vec))

unigramWord2 = reduce(lambda a,b: list(set(a+b)),map(getUnigrams,word2Vec))
bigramWord2 = reduce(lambda a,b: list(set(a+b)),map(getBigrams,word2Vec))
trigramWord2 = reduce(lambda a,b: list(set(a+b)),map(getTigrams,word2Vec))

print "10 unigams for word1 : ","\t".join(unigramWord1[:10])
print "10 bigams  for word1 : ","\t".join(bigramWord1[:10])
print "10 trigams for word1 : ","\t".join(trigramWord1[:10])
print "10 unigams for word2 : ","\t".join(unigramWord2[:10])
print "10 bigams  for word2 : ","\t".join(bigramWord2[:10])
print "10 trigams for word2 : ","\t".join(trigramWord2[:10])
    
#combining ngrmas for word1 and word2
featureVecForWord1 = unigramWord1+bigramWord1+trigramWord1
featureVecForWord2 = unigramWord2+bigramWord2+trigramWord2

10 unigams for word1 :  )	3	A	C	B	E	D	G	F	I
10 bigams  for word1 :  iP	gu	gr	gq	gy	gg	ge	ga	tI	go
10 trigams for word1 :  nwu	nwv	AGI	nwy	yay	nwa	bXa	nwi	gnI	nwr
10 unigams for word2 :  !	7	?	A	C	B	E	D	G	F
10 bigams  for word2 :  gv	gu	gr	gq	tU	gy	tO	ge	ga	tI
10 trigams for word2 :  nwu	aGo	OkR	nwr	nun	bXu	nwy	AGA	bXi	nwe


<h4>Calculating Entropy</h4>

In [3]:
import math

#count of all feactures over column word1, word2 
countNgramsWord1 = [ sum([ word1.count(ngram) for word1 in word1Vec ]) for ngram in featureVecForWord1]
countNgramsWord2 = [ sum([ word2.count(ngram) for word2 in word2Vec ]) for ngram in featureVecForWord2]

#for each class constructing a vector of counts of occurances for each feature
aCountWord1 = [ sum([ word1Vec[i].count(ngram) for i in range(len(word1Vec)) if labelVec[i]=='A' ]) for ngram in featureVecForWord1 ]
kCountWord1 = [ sum([ word1Vec[i].count(ngram) for i in range(len(word1Vec)) if labelVec[i]=='K' ]) for ngram in featureVecForWord1 ]
bCountWord1 = [ sum([ word1Vec[i].count(ngram) for i in range(len(word1Vec)) if labelVec[i]=='B' ]) for ngram in featureVecForWord1 ]
tCountWord1 = [ sum([ word1Vec[i].count(ngram) for i in range(len(word1Vec)) if labelVec[i]=='T' ]) for ngram in featureVecForWord1 ]
dCountWord1 = [ sum([ word1Vec[i].count(ngram) for i in range(len(word1Vec)) if labelVec[i]=='D' ]) for ngram in featureVecForWord1 ]

aCountWord2 = [ sum([ word2Vec[i].count(ngram) for i in range(len(word2Vec)) if labelVec[i]=='A' ]) for ngram in featureVecForWord2 ]
kCountWord2 = [ sum([ word2Vec[i].count(ngram) for i in range(len(word2Vec)) if labelVec[i]=='K' ]) for ngram in featureVecForWord2 ]
bCountWord2 = [ sum([ word2Vec[i].count(ngram) for i in range(len(word2Vec)) if labelVec[i]=='B' ]) for ngram in featureVecForWord2 ]
tCountWord2 = [ sum([ word2Vec[i].count(ngram) for i in range(len(word2Vec)) if labelVec[i]=='T' ]) for ngram in featureVecForWord2 ]
dCountWord2 = [ sum([ word2Vec[i].count(ngram) for i in range(len(word2Vec)) if labelVec[i]=='D' ]) for ngram in featureVecForWord2 ]

def entropy(x):
    return -(x*math.log(x))

def get_prob_vec(tagCountWordx,countNgramsWordx,featureVecForWordx):
    return [ (tagCountWordx[i]+1)/(float(countNgramsWordx[i])+len(featureVecForWordx)) for i in range(len(tagCountWordx)) ]

#calculating probabilities and applying p(x)*log(p(x))
aCountWord1 = map(entropy, get_prob_vec(aCountWord1, countNgramsWord1, featureVecForWord1))
kCountWord1 = map(entropy, get_prob_vec(kCountWord1, countNgramsWord1, featureVecForWord1))
bCountWord1 = map(entropy, get_prob_vec(bCountWord1, countNgramsWord1, featureVecForWord1))
tCountWord1 = map(entropy, get_prob_vec(tCountWord1, countNgramsWord1, featureVecForWord1))
dCountWord1 = map(entropy, get_prob_vec(dCountWord1, countNgramsWord1, featureVecForWord1))

aCountWord2 = map(entropy, get_prob_vec(aCountWord2, countNgramsWord2, featureVecForWord2))
kCountWord2 = map(entropy, get_prob_vec(kCountWord2, countNgramsWord2, featureVecForWord2))
bCountWord2 = map(entropy, get_prob_vec(bCountWord2, countNgramsWord2, featureVecForWord2))
tCountWord2 = map(entropy, get_prob_vec(tCountWord2, countNgramsWord2, featureVecForWord2))
dCountWord2 = map(entropy, get_prob_vec(dCountWord2, countNgramsWord2, featureVecForWord2))

#calculating entropy for each feature
ngramEntropyWord1 = [ aCountWord1[i] + kCountWord1[i] + bCountWord1[i] + tCountWord1[i] + dCountWord1[i] for i in range(len(featureVecForWord1))]
ngramEntropyWord2 = [ aCountWord2[i] + kCountWord2[i] + bCountWord2[i] + tCountWord2[i] + dCountWord2[i] for i in range(len(featureVecForWord2))]

#taking top 1000 features for both word1 and word2
topIndices1 = sorted(range(len(ngramEntropyWord1)), key=lambda i: ngramEntropyWord1[i])[-1000:]
topIndices2 = sorted(range(len(ngramEntropyWord2)), key=lambda i: ngramEntropyWord2[i])[-1000:]

#list containing best 1000 features for word1 and word2
colfor1 = [featureVecForWord1[i] for i in topIndices1]
colfor2 = [featureVecForWord2[i] for i in topIndices2]

print 'Best 10  features for word1 : ',"\t".join(colfor1[:10])
print 'Worst 10 features for word1 : ',"\t".join(colfor1[-10:])
print 'Best 10  features for word2 : ',"\t".join(colfor2[:10])
print 'Worst 10 features for word2 : ',"\t".join(colfor2[-10:])

Best 10  features for word1 :  wrA	uRp	dga	aws	adg	sI	nwu	nim	viB	vim
Worst 10 features for word1 :  p	v	u	s	i	n	w	A	r	a
Best 10  features for word2 :  Axe	yaj	Rya	vip	lin	Sar	Asi	val	ame	aNe
Worst 10 features for word2 :  k	y	i	H	n	w	r	m	A	a


<h4>Generating files with new features</h4>

In [4]:
def calculate_features(inFileName,outFileName):
    word1Vec = []
    word2Vec = []
    remainingCols = []

    file = open(inFileName)
    outFile = open(outFileName,"w")

    outFile.write(file.readline().strip('\n')[1:]+','+','.join(colfor1)+','+','.join(colfor2)+"\n")

    for line in file.readlines():
        splited_line = line.split(',')
        word1Vec.append(splited_line[1])
        word2Vec.append(splited_line[2])
        remainingCols.append(','.join(splited_line[3:]).split('\n')[0])

    ngramCountWord1 = [[ word1.count(ngram) for ngram in colfor1 ] for word1 in word1Vec ]
    ngramCountWord2 = [[ word2.count(ngram) for ngram in colfor2 ] for word2 in word2Vec ]

    for i in range(len(word1Vec)):
        outFile.write(str(word1Vec[i])+','+str(word2Vec[i])+','+str(remainingCols[i])+','+','.join(map(str,ngramCountWord1[i]))+','+','.join(map(str,ngramCountWord2[i]))+"\n")

    outFile.close()
    
calculate_features("Data/EssentialsofML/trainingFiltered.csv","Data/EssentialsofML/trainingFilteredWithNewFeatures.csv")
calculate_features("Data/EssentialsofML/heldoutFiltered.csv","Data/EssentialsofML/heldoutFilteredWithNewFeatures.csv")

<h4>Reading training and heldout data</h4>

In [5]:
import numpy
import csv

reader=csv.reader(open("Data/EssentialsofML/trainingFilteredWithNewFeatures.csv","rb"),delimiter=',')
result=numpy.matrix(list(reader))[1:]

X = result[:,3:].astype('int')
y = numpy.squeeze(numpy.asarray(result[:,2]))

readerHeldout=csv.reader(open("Data/EssentialsofML/heldoutFilteredWithNewFeatures.csv","rb"),delimiter=',')
resultHeldout=numpy.matrix(list(readerHeldout))[1:]

heldoutX = resultHeldout[:,3:].astype('int')
heldouty = numpy.squeeze(numpy.asarray(resultHeldout[:,2]))

<h3> Cross Validation: </h3>

Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting.   
To avoid it, it is common practice when performing a (supervised) machine learning experiment to hold out part of the available data as a test set.

There is still a risk of overfitting on the test set because the parameters can be tweaked until the estimator performs optimally. This way, knowledge about the test set can “leak” into the model and evaluation metrics no longer report on generalization performance. To solve this problem, yet another part of the dataset can be held out as a so-called “validation set”: training proceeds on the training set, after which evaluation is done on the validation set, and when the experiment seems to be successful, final evaluation can be done on the test set.
However, by partitioning the available data into three sets, we drastically reduce the number of samples which can be used for learning the model, and the results can depend on a particular random choice for the pair of (train, validation) sets.

A solution to this problem is a procedure called <b>cross-validation (CV for short)</b>.
A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:

A model is trained using k-1 of the folds as training data;
the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).
The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop. This approach can be computationally expensive, but does not waste too much data (as it is the case when fixing an arbitrary test set), which is a major advantage in problem such as inverse inference where the number of samples is very small.
<br>

Below, we have incorporated cross validation technique on training data for all the algorithms and found the accuracy range.

<h3>1. Support vector machines (SVMs) </h3>

SVMs are a set of supervised learning methods used for classification, regression and outliers detection.

The advantages of support vector machines are:<pre>
1.Effective in high dimensional spaces.
2.Still effective in cases where number of dimensions is greater than the number of samples.
3.Uses a subset of training points in the decision function (called support vectors), so it is also memory     efficient.
4.Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.</pre>

The disadvantages of support vector machines include:<pre>
1.If the number of features is much greater than the number of samples, the method is likely to give poor performances.
2.SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.</pre>

<h5>Class required to implement support vector machines </h5><br>
class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None, random_state=None)

To read more, follow : http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

<h4>Cross validation for SVM</h4>

In [None]:
from sklearn import svm
from sklearn.model_selection import cross_val_score

#create model
clf = svm.SVC(kernel="linear",max_iter=-1)
'''
max_iter involves hard limit on iterations within solver, no limit means -1.

Using linear kernel instead of RBF or radial kernel gives better accuracy.
'''
#Evaluate a score by cross-validation
scores = cross_val_score(clf, X, y, cv=2)

#report accuracy
print("Accuracy for SVM: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

<h4>SVM on heldout data</h4>

In [6]:
from sklearn import svm
from sklearn.metrics import classification_report

#create model
clf = svm.SVC(kernel="linear",max_iter=-1)

#fit the model according to the training data.
clf.fit(X, y)

#predict labels for heldout data
predictedy = clf.predict(heldoutX)

#print precision recall table
print(classification_report(heldouty, predictedy))

#report accuracy on heldout data
print "Accuracy= ",clf.score(heldoutX, heldouty)

             precision    recall  f1-score   support

          A       0.59      0.56      0.57        54
          B       0.79      0.81      0.80       860
          D       0.65      0.60      0.62       228
          T       0.75      0.75      0.75       853

avg / total       0.75      0.75      0.75      1995

Accuracy=  0.751378446115


<h3>2.Random forest classifier.</h3>

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

<h5>Class required to implement random forest classifier</h5><br>
class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

To read more, follow : http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

<h4>Cross validation for Random Forest Classifier</h4>

In [6]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

#create model
clf3 = RandomForestClassifier(n_estimators=90,min_samples_split=1,max_features='sqrt')
'''
increasing n_estimators increases accuracy as we are seeing more number of trees in the forest

min_samples_split or minimum number of samples required to split an internal node, when set to 1, leads to increased accuracy

max_features include number of features to consider when looking for the best split
'''
#Evaluate a score by cross-validation
scores3 = cross_val_score(clf3, X, y, cv=10)

#report accuracy
print("Accuracy for Random Forest Classifier: %0.2f (+/- %0.2f)" % (scores3.mean(), scores3.std() * 2))

Accuracy for Random Forest Classifier: 0.75 (+/- 0.04)


<h4>Random forest classifier on heldout data</h4>

In [8]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

#create model
clf = RandomForestClassifier(n_estimators=90,min_samples_split=1,max_features='sqrt')

#fit the model according to the training data.
clf.fit(X, y)

#predict labels for heldout data
predictedy = clf.predict(heldoutX)

#print precision recall table
print(classification_report(heldouty, predictedy))

#report accuracy on heldout data
print "Accuracy= ",clf.score(heldoutX, heldouty)

             precision    recall  f1-score   support

          A       0.92      0.43      0.58        54
          B       0.79      0.79      0.79       860
          D       0.91      0.47      0.62       228
          T       0.71      0.82      0.76       853

avg / total       0.77      0.76      0.75      1995

Accuracy=  0.757393483709


<h3>3. Logistic Regression (aka logit, MaxEnt) classifier</h3>

In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’, and uses the cross- entropy loss if the ‘multi_class’ option is set to ‘multinomial’. (Currently the ‘multinomial’ option is supported only by the ‘lbfgs’, ‘sag’ and ‘newton-cg’ solvers.)

This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).

The ‘newton-cg’, ‘sag’, and ‘lbfgs’ solvers support only L2 regularization with primal formulation. The ‘liblinear’ solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty.

<h5>Class required to implement Logistic Regression (aka logit, MaxEnt) classifier</h5><br>
class sklearn.linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1)

To read more, follow : http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

<h4>Cross validation for Logistic Regression</h4>

In [None]:
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score

#create model
clf2 = LogisticRegressionCV(solver='liblinear')
'''
solver is the algorithm to use in the optimization problem and liblinear gives good accuracy.
'''
#Evaluate a score by cross-validation
scores2 = cross_val_score(clf2, X, y, cv=3)

#report accuracy
print("Accuracy for Logistic Regression: %0.2f (+/- %0.2f)" % (scores2.mean(), scores2.std() * 2))

<h4>Logistic Regression Classifier on heldout data</h4>

In [7]:
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import classification_report

#create model
clf =  LogisticRegressionCV(solver='liblinear')

#fit the model according to the training data.
clf.fit(X, y)

#predict labels for heldout data
predictedy = clf.predict(heldoutX)

#print precision recall table
print(classification_report(heldouty, predictedy))

#report accuracy on heldout data
print "Accuracy= ",clf.score(heldoutX, heldouty)

             precision    recall  f1-score   support

          A       1.00      0.39      0.56        54
          B       0.80      0.81      0.80       860
          D       0.69      0.52      0.59       228
          T       0.73      0.80      0.77       853

avg / total       0.76      0.76      0.76      1995

Accuracy=  0.760902255639


<h3>4. Decision trees </h3>

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

<h5>Class required to implement Decision trees </h5><br>
class sklearn.tree.DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_split=1e-07, class_weight=None, presort=False)

To read more, follow : http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

<h4>Cross validation for Decision Tree Classifier</h4>

In [None]:
from sklearn import tree
from sklearn.model_selection import cross_val_score

#create model
clf1 = tree.DecisionTreeClassifier(splitter='random',max_features='sqrt')
'''
to get the best random split we use splitter as random

max_features include the number of features to consider when looking for the best split and on sqrt we get good accuracy.
default : None

'''

#Evaluate a score by cross-validation
scores1 = cross_val_score(clf1, X, y, cv=10)

#report accuracy
print("Accuracy for Decision Tree Classifier: %0.2f (+/- %0.2f)" % (scores1.mean(), scores1.std() * 2))

<h4>Decision Tree Classifier on heldout data</h4>

In [None]:
from sklearn import tree
from sklearn.metrics import classification_report

#create model
clf = tree.DecisionTreeClassifier(splitter='random',max_features='sqrt')

#fit the model according to the training data.
clf.fit(X, y)

#predict labels for heldout data
predictedy = clf.predict(heldoutX)

#print precision recall table
print(classification_report(heldouty, predictedy))

#report accuracy on heldout data
print "Accuracy= ",clf.score(heldoutX, heldouty)

<h3>Comparison between the above implemented algorithms</h3><br>
<table>
<tr>
<td><b>Classification Algorithm</b></td>
<td><b>Accuracy</b></td>
</tr>
<tr>
<td>SVM with RBF kernel</td>
<td>0.60</td>
</tr>
<tr>
<td>SVM with linear kernel</td>
<td>0.75</td>
</tr>
<tr>
<td>Random Forest</td>
<td>0.76</td>
</tr>
<tr>
<td>Logistic Regression</td>
<td>0.76</td>
</tr>
<tr>
<td>Decision Tree</td>
<td>0.65</td>
</tr>

</table>

<h2>References</h2>
- Scikit Learn : http://scikit-learn.org
- Numpy : http://www.numpy.org/
- Krishna, Amrith, Satuluri, Pavankumar, Sharma, Shubham, Kumar, Apurv and Goyal, Pawan (2016). Compound Type Identiﬁcation in Sanskrit: What Roles do the Corpus and Grammar Play?. WSSANLP, Workshop at Coling 2016, Osaka, Japan, December 11-16.

<h2>Developers</h2>
<ul>
<li>Bhargavkumar Patel <a href="mailto:bhargav079@gmail.com">bhargav079@gmail.com</a><br></li>
<li>Minesh Gandhi <a href="mailto:mineshmini33@gmail.com">mineshmini33@gmail.com</a><br></li>
<li>Prachi Agarwal <a href="mailto:24prachiagarwal@gmail.com">24prachiagarwal@gmail.com</a></li>
</ul>