-
Notifications
You must be signed in to change notification settings - Fork 0
Lab Assignment 2
We have learnt about different machine learning algorithms and usage of natural language tool kit.
The main objective of this assignment is to implement the machine learning algorithms and compare the different algorithms and get the accuracy scores and to implement the natural language tool kit.
Comparison of Linear Discriminant Analysis and Logistic Regression classification Algorithms:
For this task we have taken the iris dataset from sklearn library. we have split the data into training and testing datasets.
Steps to make prediction using LDA:
1.Create the object for Linear Discriminant Analysis
2.Train the model based on the training dataset created.
3.Predict the value with the test dataset using predict() function.
4.Find the accuracy score using metrics.accuracy_score() function.
The accuracy value is more for Linear discriminant analysis in which we have taken 3 classes for classification when compared to the logistic regression.Therefore,LDA can be used when the dependent variable has 2 or more groups but logistic regression can be used only when we have two categories and to find the probabilities of these two categories.
Support Vector Machine Implementation
For SVM,we have chosen the digits dataset from sklearn library.We split the data into training and testing data with 20% test data.
Create the object for svm with kernel as linear and train the dataset with the linear model and predict the value and get the accuracy score.
Create the object for svm with kernal as rbf kernel and train the dataset with the rbf kernel model.Predict the value with the trained model and get the accuracy score. Accuracy can be improved by normalizing the data and also by using the BaggingClassifier from sklearn.ensemble.
Accuracy for svm classification with linear model is having high accuracy compared to the rbf kernel model with respect to the digits dataset.
Usage of Natural Language Tool Kit We have taken the sample text in a file and saved the content of the file to a variable and applied word and sentence tokentization on the data.
###Lemmatization
K Nearest Neighbor Algorithm
For KNN algorithm we have used digits dataset from sklearn library.we have split the data into training and testing data.
Steps to get the accuracy of KNN with different k values:
1.Select the k range from 1 to 60
2.Create an object for knn classifier with varying number of neighbors.
3.Train the model with the training data
4.Predict the value
5.Get the accuracy
6.Plot the graph
As the value increases the accuracy score is decreasing which means they are inversely proportional.When the k value is small,it has high variance and low bias.When the k value increases it has low variance and high bias with smooth boundaries.
All the given tasks have been implemented