<a href="https://colab.research.google.com/github/ishanjabade/BML_ISHAN/blob/main/experiment_1%2C_2%2C_3%2C_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Experiment 1: Extract the data from database using Python.**

**Dataset Used: Iris Dataset**

**Load the dataset**

In [3]:
import pandas as pd

file_path = '/content/IRIS.csv'

df = pd.read_csv(file_path)

print(df.head())


   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa


# **Experiment 2: Apply data preprocessing techniques to make data suitable for Machine Learning**

## **1.Check Structure**

In [4]:
df.shape

(150, 5)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [6]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [7]:
df.isnull().sum()

Unnamed: 0,0
sepal_length,0
sepal_width,0
petal_length,0
petal_width,0
species,0


**2.Drop Duplicates**

In [8]:
df.duplicated().sum()
df.drop_duplicates(inplace=True)
df.shape

(147, 5)

**3.Encoding Categorical Data**

In [10]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['species'] = le.fit_transform(df['species'])
df


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


**4.Feature and Target Selection**

In [11]:
print(df.columns.tolist())

['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']


In [14]:
x = df[['sepal_length','sepal_width','petal_length','petal_width']]
y = df['species']


**5.Train-test split**

In [15]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

print("Training size:", x_train.shape)
print("Testing size:", x_test.shape)

Training size: (117, 4)
Testing size: (30, 4)


# **Experiment 3: Develop Linear Regression Model for the dataset.**

In [16]:
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print("x_test shape:", x_test.shape)
print("y_test shape:", y_test.shape)

x_train shape: (117, 4)
y_train shape: (117,)
x_test shape: (30, 4)
y_test shape: (30,)


In [17]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Initialize and train the model
model = LinearRegression()
model.fit(x_train, y_train)

# Predict on test set
y_pred = model.predict(x_test)

# Evaluate predictions
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)


Mean Squared Error: 0.05486571591818331
Coefficients: [-0.12720371 -0.02514952  0.26045115  0.55511277]
Intercept: 0.17446181364084545


# **Experiment 5: Naive Bayes**

In [18]:
from sklearn.naive_bayes import GaussianNB
# Gaussian Naive Bayes
model = GaussianNB()
model.fit(x_train, y_train)
print("GaussianNB Accuracy:", model.score(x_test, y_test))

GaussianNB Accuracy: 0.9666666666666667


In [19]:
from sklearn.naive_bayes import MultinomialNB
# Multinomial Naive Bayes
model = MultinomialNB()
model.fit(x_train, y_train)
print("MultinomialNB Accuracy:", model.score(x_test, y_test))

MultinomialNB Accuracy: 0.9333333333333333


In [20]:
from sklearn.naive_bayes import BernoulliNB
# Bernoulli Naive Bayes
model = BernoulliNB()
model.fit(x_train, y_train)
print("BernoulliNB Accuracy:", model.score(x_test, y_test))

BernoulliNB Accuracy: 0.3333333333333333


In [21]:
# Evaluation
from sklearn.metrics import classification_report, accuracy_score
y_pred = model.predict(x_test)
print(classification_report(y_test, y_pred, zero_division=0))
print("Accuracy Score:", accuracy_score(y_test, y_pred,))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00        11
           1       0.33      1.00      0.50        10
           2       0.00      0.00      0.00         9

    accuracy                           0.33        30
   macro avg       0.11      0.33      0.17        30
weighted avg       0.11      0.33      0.17        30

Accuracy Score: 0.3333333333333333
