# Iris

### Introduction:

This exercise may seem a little bit strange, but keep doing it.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd


### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data). 

### Step 3. Assign it to a variable called iris

In [2]:
iris = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")

In [3]:
iris

Unnamed: 0,5.1,3.5,1.4,0.2,Iris-setosa
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa
...,...,...,...,...,...
144,6.7,3.0,5.2,2.3,Iris-virginica
145,6.3,2.5,5.0,1.9,Iris-virginica
146,6.5,3.0,5.2,2.0,Iris-virginica
147,6.2,3.4,5.4,2.3,Iris-virginica


### Step 4. Create columns for the dataset

In [6]:
# 1. sepal_length (in cm)
# 2. sepal_width (in cm)
# 3. petal_length (in cm)
# 4. petal_width (in cm)
# 5. class

iris.columns = ["sepal_length","sepal_width","petal_length","petal_width","class"]  

In [7]:
iris

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa
...,...,...,...,...,...
144,6.7,3.0,5.2,2.3,Iris-virginica
145,6.3,2.5,5.0,1.9,Iris-virginica
146,6.5,3.0,5.2,2.0,Iris-virginica
147,6.2,3.4,5.4,2.3,Iris-virginica


### Step 5.  Is there any missing value in the dataframe?

In [14]:
iris.isna().astype(int).value_counts()

# no there is no NaN values 

sepal_length  sepal_width  petal_length  petal_width  class
0             0            0             0            0        149
Name: count, dtype: int64

### Step 6.  Lets set the values of the rows 10 to 29 of the column 'petal_length' to NaN

In [16]:
import numpy as np 

In [20]:
iris.petal_length[10:30] = np.NaN
iris.iloc[10:30]

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  iris.petal_length[10:30] = np.NaN
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iris.petal_length[10:30] = np

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
10,4.8,3.4,,0.2,Iris-setosa
11,4.8,3.0,,0.1,Iris-setosa
12,4.3,3.0,,0.1,Iris-setosa
13,5.8,4.0,,0.2,Iris-setosa
14,5.7,4.4,,0.4,Iris-setosa
15,5.4,3.9,,0.4,Iris-setosa
16,5.1,3.5,,0.3,Iris-setosa
17,5.7,3.8,,0.3,Iris-setosa
18,5.1,3.8,,0.3,Iris-setosa
19,5.4,3.4,,0.2,Iris-setosa


### Step 7. Good, now lets substitute the NaN values to 1.0

In [21]:
iris.petal_length.fillna(value=1.0,inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  iris.petal_length.fillna(value=1.0,inplace=True)


In [22]:
iris.petal_length[10:30]

10    1.0
11    1.0
12    1.0
13    1.0
14    1.0
15    1.0
16    1.0
17    1.0
18    1.0
19    1.0
20    1.0
21    1.0
22    1.0
23    1.0
24    1.0
25    1.0
26    1.0
27    1.0
28    1.0
29    1.0
Name: petal_length, dtype: float64

### Step 8. Now let's delete the column class

In [23]:
iris.drop(columns="class",inplace=True)
iris

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,4.9,3.0,1.4,0.2
1,4.7,3.2,1.3,0.2
2,4.6,3.1,1.5,0.2
3,5.0,3.6,1.4,0.2
4,5.4,3.9,1.7,0.4
...,...,...,...,...
144,6.7,3.0,5.2,2.3
145,6.3,2.5,5.0,1.9
146,6.5,3.0,5.2,2.0
147,6.2,3.4,5.4,2.3


### Step 9.  Set the first 3 rows as NaN

In [31]:
iris.iloc[:3] = iris.iloc[:3].ffill(axis=0,inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  iris.iloc[:3] = iris.iloc[:3].ffill(axis=0,inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iris.iloc[:3] = iris.iloc[:3].ffill(axis=0,inplace=True)


In [32]:
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,,,,
1,,,,
2,,,,
3,5.0,3.6,1.4,0.2
4,5.4,3.9,1.7,0.4


### Step 10.  Delete the rows that have NaN

In [33]:
iris.dropna(axis=0,how="all",inplace=True)

### Step 11. Reset the index so it begins with 0 again

In [34]:
iris.reset_index()

Unnamed: 0,index,sepal_length,sepal_width,petal_length,petal_width
0,3,5.0,3.6,1.4,0.2
1,4,5.4,3.9,1.7,0.4
2,5,4.6,3.4,1.4,0.3
3,6,5.0,3.4,1.5,0.2
4,7,4.4,2.9,1.4,0.2
...,...,...,...,...,...
141,144,6.7,3.0,5.2,2.3
142,145,6.3,2.5,5.0,1.9
143,146,6.5,3.0,5.2,2.0
144,147,6.2,3.4,5.4,2.3


### BONUS: Create your own question and answer it.