# SVM to classify rice species.

Use SVM to classify rice species.<br>
The dataset for this exercise contains traits for two species of rice.
 

### Steps:<br>
- Create the classifier using all attributes (except id and Class).<br>
- Separate the data into training and testing.<br>
- Scale down attribute values to speed up code execution.<br>
- Use polynomial kernel function.<br>
- Check the accuracy of the classifier.<br>
- Re-create the classifier, now using two attributes (eg Area and MajorAxisLength) using the linear, polynomial and sigmoidal kernel functions.<br>
- Repeat steps 2 and 3.<br>
- Check the accuracy of the classifiers.<br>
- Plot the results obtained.<br>
- What was the best performance you got?<br>

The dataset contains the attributes described below, and is available at the link below:

https://raw.githubusercontent.com/higoramario/univesp-com410-aprendizado-de-maquinas/main/rice-classification.csv

The dataset for this exercise is based on the Rice type classification dataset, which is at:
https://www.kaggle.com/datasets/mssmartypants/rice-type-classification

### Attributes:

- id: object identifier<br>
- Area: grain area<br>
- MajorAxisLength: size of the major axis<br>
- MinorAxisLength: minor axis size<br>
- Eccentricity: eccentricity<br>
- ConvexArea: convex area<br>
- EquivDiameter: equivalent diameter<br>
- extension: extension<br>
- perimeter: perimeter<br>
- roundness: roundness<br>
- AspectRation: aspect ratio<br>
- Class (target attribute): rice type (1 - Jasmine, 0 - Gonen)<br>

### Libraries: 
scikit-learn<br>
pandas<br>
numpym<br>
matplotlib<br>

### 1. Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
plt.rcParams['figure.figsize']=[15,10]

### 2. Data

In [None]:
url = 'https://raw.githubusercontent.com/higoramario/univesp-com410-aprendizado-de-maquinas/main/rice-classification.csv'
rice = pd.read_csv(url, sep=',')
rice.head()

### 3. Class attribute distribution.

In [None]:
rice['Class'].plot.hist()
plt.show()

### 4. Selecting all attributes

In [None]:
all_attributes = rice[['Area','MajorAxisLength','MinorAxisLength','Eccentricity','ConvexArea','EquivDiameter','Extent','Perimeter','Roundness','AspectRation']]

classes = rice['Class']

### 5. Dividing the data between training (90%) and testing (10%).

In [None]:
training_all_rice, testing_all_rice, classes_treining_all, classes_testing_all = train_test_split(all_attributes, classes, test_size = 0.1)

### 6. Scaling the data to speed up the execution of model training and of the graphing the visualization of the results.

In [None]:
scaler = StandardScaler()
training_all_rice = scaler.fit_transform(training_all_rice)
testing_all_rice = scaler.transform(testing_all_rice)

### 7. Training the model to generate the classifier using the polynomial kernel function.

The following kernel function parameters can be used to enhance the model:

C: determines whether the margin will be tighter (big C) or wider (small C)

gamma: tuning parameter for polynomial and sigmoidal functions

coef0: kappa parameter, constant used in polynomial and sigmoidal functions

degree: degree of the polynomial function

In [None]:
SVM_polynomial_all = SVC(kernel = 'poly', degree = 3, gamma = 'scale', C = 1.0, coef0 = 2)

SVM_polynomial_all.fit(training_all_rice, classes_treining_all)

### 8. Checking the classification accuracy, whose value may vary depending on the set used in training.

In [None]:
predict_all = SVM_polynomial_all.predict(testing_all_rice)

accuracy_all = accuracy_score(classes_testing_all,predict_all)

print('Classification accuracy with all attributes: {}'.format(round(accuracy_all,3)*100)+'%')

### 9. Creating a new set with just two attributes to plot the results.

In [None]:
attributes = rice[['Area','MajorAxisLength']]

classes = rice['Class']

### 10. Separating the set into training and testing.

In [None]:
training_rice, testing_rice, classes_training, classes_testing = train_test_split(attributes, classes, test_size = 0.1)

### 11. Scaling the data to speed up the execution of model training and the generation of the visualization graph of the results.

In [None]:
scaler = StandardScaler()

training_rice = scaler.fit_transform(training_rice)

testing_rice = scaler.transform(testing_rice)

### 12. Generating the linear classifier.

In [None]:
SVM_linear = SVC(kernel = 'linear', C = 0.2)

SVM_linear.fit(training_rice, classes_training)

### 13. Checking classification accuracy.

In [None]:
linear_prediction = SVM_linear.predict(testing_rice)

linear_accuracy = accuracy_score(classes_testing,linear_prediction)

print('Linear SVM classification accuracy with two attributes: {}'.format(round(linear_accuracy,3)*100)+'%')

### 14. Ploting the result looking at how the model ranks the test set. 
### - Create a function that uses the functions meshgrid from numpy, which creates a rectangular grid, and contourf from matplotlib, which allows you to create lines and fill areas with colors.

In [None]:
# function that receives the attributes and classes of the test set, the SVM classifier and plots the results.

def viewSVM(t_attributes,classes_t,classifier):
    attributes, classes = t_attributes, classes_t

    year, salary = np.meshgrid(np.arange(start = attributes[:, 0].min() - 1, stop = attributes[:, 0].max() + 1, step = 0.01),
    np.arange(start = attributes[:, 1].min() - 1, stop = attributes[:, 1].max() + 1, step = 0.01))

    plt.contourf(year, salary, classifier.predict(np.array([year.ravel(), salary.ravel()]).T).reshape(year.shape),
    alpha = 0.75, cmap = ListedColormap(('red', 'green')))

    plt.xlim(year.min(), year.max())

    plt.ylim(salary.min(), salary.max())



    for i, j in enumerate(np.unique(classes)):

        plt.scatter(attributes[classes == j, 0], attributes[classes == j, 1],

        color = ListedColormap(('red', 'green'))(i), label = j)



        plt.title('SVM rating')

        plt.xlabel('Year')

        plt.ylabel('Estimated salary')

        plt.legend()

        plt.show()

### Ploting the linear classifier.

In [None]:
viewSVM(testing_rice, classes_testing, SVM_linear)

### 16. Train the model to generate the classifier using the polynomial kernel function.

In [None]:
SVM_polynomial = SVC(kernel = 'poly', degree = 3, gamma = 'scale', C = 1.0, coef0 = 2)
SVM_polynomial.fit(training_rice, classes_training)

### 17. Checking the classification accuracy of the polynomial SVM.

In [None]:
poly_prediction = SVM_polynomial.predict(testing_rice)
poly_accuracy = accuracy_score(classes_testing,poly_prediction)
print('Classification accuracy of polynomial SVM with two attributes: {}'.format(round(poly_accuracy,3)*100)+'%')

### 18. Plot the classifier using the polynomial function.

In [None]:
viewSVM(testing_rice, classes_testing, SVM_polynomial)

### 19. Plot the classifier using the polynomial function.

In [None]:
SVM_sigmoidal = SVC(kernel = 'sigmoid', gamma = 'scale', C = 2, coef0 = 2)
SVM_sigmoidal.fit(training_rice, classes_training)

### 20. Checking the classification accuracy of the sigmoidal SVM.

In [None]:
sigmoidal_prediction = SVM_sigmoidal.predict(testing_rice)
sigmoidal_accuracy = accuracy_score(classes_testing,sigmoidal_prediction)

print('Acurácia de classificação da SVM polinomial com dois atributos: {}'.format(round(sigmoidal_accuracy,3)*100)+'%')

### 21. Plotar o classificador que usa a função sigmoidal.

In [None]:
viewSVM(testing_rice, classes_testing, SVM_sigmoidal)

### Conclusion

The results show that the generated classifiers perform well for this dataset, even when we use only two of the attributes. The sigmoidal kernel function has an accuracy a little lower than the others, and also takes longer to execute.