   ## The Iris Dataset and Pandas

![Iris flowers](https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/1280px-Iris_versicolor_3.jpg)

In [1]:
import pandas as pd

The dataset can either be stored locally within the same location as this notebook, or you can access it from an online resource, as below. I've copied this from my final project in the first semester.

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/kiehozero/pands-project/main/iris.csv")

It's possible to slice this selection by individual or multiple rows or columns.

In [3]:
df['species']

0         setosa
1         setosa
2         setosa
3         setosa
4         setosa
         ...    
145    virginica
146    virginica
147    virginica
148    virginica
149    virginica
Name: species, Length: 150, dtype: object

In [4]:
df[['sepal_length','species']]

Unnamed: 0,sepal_length,species
0,5.1,setosa
1,4.9,setosa
2,4.7,setosa
3,4.6,setosa
4,5.0,setosa
...,...,...
145,6.7,virginica
146,6.3,virginica
147,6.5,virginica
148,6.2,virginica


In [5]:
df[2:6]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa


In [6]:
df[['sepal_length','species']][2:6]

Unnamed: 0,sepal_length,species
2,4.7,setosa
3,4.6,setosa
4,5.0,setosa
5,5.4,setosa


### `loc`, `iloc` and `at`

The `loc` function takes as an input the specific key of the item. However, while the standard notation is inclusive of the first value and exclusive of the latter, `loc` instead uses these as an address and assumes that if a number is referenced in the function, then the user specifically wants to return the corresponding value.

In [7]:
df.loc[2:6]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa


You can then assign these items to a new variable, but the index keys will be preserved from the original selection.

In [8]:
newdf = df.loc[19:43,'species']

In [9]:
newdf[1:13]

20    setosa
21    setosa
22    setosa
23    setosa
24    setosa
25    setosa
26    setosa
27    setosa
28    setosa
29    setosa
30    setosa
31    setosa
Name: species, dtype: object

In contrast, `iloc` takes the input as a position, and also reverts to the traditional 'inclusive-exclusive' method of returning data.

In [10]:
df.iloc[2]

sepal_length       4.7
sepal_width        3.2
petal_length       1.3
petal_width        0.2
species         setosa
Name: 2, dtype: object

Both functions can make use of additional notation to further isolate items, for instance skipping every second value below.

In [11]:
df.iloc[1:12:2,:]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
1,4.9,3.0,1.4,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
7,5.0,3.4,1.5,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa
11,4.8,3.4,1.6,0.2,setosa


The `at` function allows you to return a scalar value without the usual dataframe structure seen in the other functions.

In [12]:
df.at[2,'species']

'setosa'

### Boolean selections

Just like a WHERE clause in SQL, you can add conditional statements to isolate a selection by the content of the range.

In [13]:
df.loc[:,'species'] == 'setosa'

0       True
1       True
2       True
3       True
4       True
       ...  
145    False
146    False
147    False
148    False
149    False
Name: species, Length: 150, dtype: bool

Wrapping this selection in another `loc` function will then store this refined selection, instead of returning everything but assigning TRUE or FALSE to each row.

In [14]:
versi_df = df.loc[df.loc[:,'species'] == 'versicolor']

In [15]:
versi_df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
50,7.0,3.2,4.7,1.4,versicolor
51,6.4,3.2,4.5,1.5,versicolor
52,6.9,3.1,4.9,1.5,versicolor
53,5.5,2.3,4.0,1.3,versicolor
54,6.5,2.8,4.6,1.5,versicolor
55,5.7,2.8,4.5,1.3,versicolor
56,6.3,3.3,4.7,1.6,versicolor
57,4.9,2.4,3.3,1.0,versicolor
58,6.6,2.9,4.6,1.3,versicolor
59,5.2,2.7,3.9,1.4,versicolor


In [18]:
width_df = df.loc[df.loc[:,'petal_width'] >= 2.2]
width_df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
100,6.3,3.3,6.0,2.5,virginica
104,6.5,3.0,5.8,2.2,virginica
109,7.2,3.6,6.1,2.5,virginica
114,5.8,2.8,5.1,2.4,virginica
115,6.4,3.2,5.3,2.3,virginica
117,7.7,3.8,6.7,2.2,virginica
118,7.7,2.6,6.9,2.3,virginica
120,6.9,3.2,5.7,2.3,virginica
132,6.4,2.8,5.6,2.2,virginica
135,7.7,3.0,6.1,2.3,virginica
