---
**Iris Flower Dataset**

The `Iris flower dataset`, also known as Fisher's Iris data set, is a classic data set famous in statistics and machine learning. It was developed by Ronald Fisher in 1936, based on data collected by Edgar Anderson.

The data set includes 50 samples each for three species of the Iris flower:
- `Iris setosa`
- `Iris versicolor`
- `Iris virginica`

For each sample, four features were measured in centimeters:
- `sepal length`
- `sepal width`
- `petal length`
- `petal width`

You can use this dataset by importing it from sklearn.datasets

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load the dataset
iris = load_iris()
print(f"{type(iris) = }\n")
print(f"Dataset features: {iris.feature_names}\n") # Features names
print(f"Target Species  : {iris.target_names}\n")  # Species names
print(f"Data:\n{iris.data[0:10]}\n")  # Species names
print(f"Species:\n{iris.target[0:10]}\n")

type(iris) = <class 'sklearn.utils._bunch.Bunch'>

Dataset features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Target Species  : ['setosa' 'versicolor' 'virginica']

Data:
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]]

Species:
[0 0 0 0 0 0 0 0 0 0]



---
Data about the flower is divided in two categories:
```
- `features` : [sepal length, sepal width, petal length, petal width]
- `target`  :  [species]
```

Lets combine them and convert them into a single dataframe.

In [34]:
# Convert the dataset into a dataframe
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
print(f"{type(iris_df) = }\n")
print(f"Dataset size = {iris_df.shape[0]} rows & {iris_df.shape[1]} columns.")
print(f"50 of these rows are for Iris setosa, 50 are for Iris versicolor, 50 are for Iris virginica")
iris_df.head()

type(iris_df) = <class 'pandas.core.frame.DataFrame'>

Dataset size = 150 rows & 5 columns.
50 of these rows are for Iris setosa, 50 are for Iris versicolor, 50 are for Iris virginica


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


---
1. Create a new dataframe having data only for `iris setosa`

In [None]:
# write your code here

iris_setosa_df = None
iris_setosa_df.head()

---
2. Make two numpy arrays -
- first having `sepal lengths` of iris setosa
- second having `sepal widths` of iris setosa

In [None]:
# write your code here
import numpy as np

setosa_sepal_length = np.array()
setosa_sepal_width  = np.array()

---
3. Plot a scatter plot having -
- X = `setosa_sepal_length`
- Y = `setosa_sepal_width`

In [None]:
# write your code here
import matplotlib.pyplot as plt

plt.scatter()
plt.show()

---
4. Explain with reasoning, if it's a nice idea to use a linear regression model to predict the sepal width of an iris setosa flower given its sepal length.