# Task
Analyze the "IRIS.csv" dataset, train a classification model to predict the species of iris flowers, and evaluate the model's accuracy.

## Load data

### Subtask:
Load the iris dataset from the provided path into a pandas DataFrame.


**Reasoning**:
Import pandas and load the dataset into a dataframe.



In [2]:
import pandas as pd

df = pd.read_csv('IRIS.csv')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## Explore and preprocess data

### Subtask:
Display the first 5 rows, check for missing values, and analyze the distribution of the species column.


**Reasoning**:
Display the first 5 rows, check for missing values, and analyze the distribution of the species column as requested in the instructions.



In [3]:
print("First 5 rows of the DataFrame:")
display(df.head())

print("\nMissing values in the DataFrame:")
display(df.isnull().sum())

print("\nDistribution of the 'species' column:")
display(df['species'].value_counts())

First 5 rows of the DataFrame:


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa



Missing values in the DataFrame:


Unnamed: 0,0
sepal_length,0
sepal_width,0
petal_length,0
petal_width,0
species,0



Distribution of the 'species' column:


Unnamed: 0_level_0,count
species,Unnamed: 1_level_1
Iris-setosa,50
Iris-versicolor,50
Iris-virginica,50


## Prepare data for modeling

### Subtask:
Separate features (X) and target (y) variables, and split the data into training and testing sets.


**Reasoning**:
Separate the features and target variables and split the data into training and testing sets as per the instructions.



In [4]:
from sklearn.model_selection import train_test_split

X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)

Shape of X_train: (120, 4)
Shape of X_test: (30, 4)
Shape of y_train: (120,)
Shape of y_test: (30,)


## Train a machine learning model

### Subtask:
Train a classification model, such as a Support Vector Machine (SVM), using the training data.


**Reasoning**:
Train a Support Vector Machine (SVM) model using the training data as instructed.



In [5]:
from sklearn.svm import SVC

svm_model = SVC()
svm_model.fit(X_train, y_train)

## Evaluate the model

### Subtask:
Evaluate the performance of the trained model on the testing data using metrics like accuracy.


**Reasoning**:
Evaluate the performance of the trained model by making predictions on the test set and calculating the accuracy score.



In [6]:
from sklearn.metrics import accuracy_score

y_pred = svm_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy of the SVM model on the test set: {accuracy:.4f}")

Accuracy of the SVM model on the test set: 1.0000
