# Hands-On Exercise 2.1:
# Working With Data in Python
***

## Objectives

### In this exercise, you will familiarize yourself with Python syntax commands for exploring data sets in Python.

### Overview

In this exercise, you will use Python commands to examine the data structures, query rows, columns, and subsets of a number of data sets.

**Major Step: Querying from data sets**
1. ❏ Import the **iris** dataset from file **iris.csv**<br><br>
*Hint: Remember to import the pandas library first*

In [1]:
import pandas as pd

data = pd.read_csv("iris.csv")
data

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


2. ❏ Display the data in the dataframe with and without the **print()** command and note the difference in output format

In [2]:
print(data)

     sepal_length  sepal_width  petal_length  petal_width         species
0             5.1          3.5           1.4          0.2     Iris-setosa
1             4.9          3.0           1.4          0.2     Iris-setosa
2             4.7          3.2           1.3          0.2     Iris-setosa
3             4.6          3.1           1.5          0.2     Iris-setosa
4             5.0          3.6           1.4          0.2     Iris-setosa
..            ...          ...           ...          ...             ...
145           6.7          3.0           5.2          2.3  Iris-virginica
146           6.3          2.5           5.0          1.9  Iris-virginica
147           6.5          3.0           5.2          2.0  Iris-virginica
148           6.2          3.4           5.4          2.3  Iris-virginica
149           5.9          3.0           5.1          1.8  Iris-virginica

[150 rows x 5 columns]


In [3]:
data

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


3. ❏ Use the **.head()** command to display just the first 5 rows and take note of the index

In [4]:
data.head(5)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


4. ❏ Change the **index** to the *species* column using the **.set_index()** method and confirm it has been changed using the **.head()** method

In [39]:
data_indexed = data.set_index(data.species)

5. ❏ Check the number of rows and columns in the dataframe using the **.shape** attribute

In [9]:
data_indexed.shape

(150, 5)

6. ❏ List the column names of the dataframe using the **.columns** attribute

In [10]:
data_indexed.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

7. ❏ Check the datatypes of the columns using either the **.dtypes** attribute or the **.info()** method

In [11]:
data_indexed.dtypes

sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object

8. ❏ Retrieve the **sepal_width** column from the **iris** dataframe.  What datatype is the result? Can you return the data as a dataframe as well as a series?

In [42]:
data_indexed.sepal_width


species
Iris-setosa       3.5
Iris-setosa       3.0
Iris-setosa       3.2
Iris-setosa       3.1
Iris-setosa       3.6
                 ... 
Iris-virginica    3.0
Iris-virginica    2.5
Iris-virginica    3.0
Iris-virginica    3.4
Iris-virginica    3.0
Name: sepal_width, Length: 150, dtype: float64

In [41]:
print(type(data_indexed[["sepal_width"]]))
data_indexed[["sepal_width"]]

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,sepal_width
species,Unnamed: 1_level_1
Iris-setosa,3.5
Iris-setosa,3.0
Iris-setosa,3.2
Iris-setosa,3.1
Iris-setosa,3.6
...,...
Iris-virginica,3.0
Iris-virginica,2.5
Iris-virginica,3.0
Iris-virginica,3.4


9. ❏ Retrieve both the **sepal_length** and **sepal_width** columns

In [22]:
data_indexed[["sepal_length", "sepal_width"]]

Unnamed: 0_level_0,sepal_length,sepal_width
species,Unnamed: 1_level_1,Unnamed: 2_level_1
Iris-setosa,5.1,3.5
Iris-setosa,4.9,3.0
Iris-setosa,4.7,3.2
Iris-setosa,4.6,3.1
Iris-setosa,5.0,3.6
...,...,...
Iris-virginica,6.7,3.0
Iris-virginica,6.3,2.5
Iris-virginica,6.5,3.0
Iris-virginica,6.2,3.4


10. ❏ Retrieve the first 5 rows of the **sepal_length** column in the **iris** dataframe using the **.head()** method

In [23]:
data_indexed.sepal_length.head(5)

species
Iris-setosa    5.1
Iris-setosa    4.9
Iris-setosa    4.7
Iris-setosa    4.6
Iris-setosa    5.0
Name: sepal_length, dtype: float64

11. ❏ View the first two rows of the **iris** data frame.

In [43]:
data[0:2]

Unnamed: 0_level_0,sepal_length,sepal_width,petal_length,petal_width,species
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Iris-setosa,5.1,3.5,1.4,0.2,Iris-setosa
Iris-setosa,4.9,3.0,1.4,0.2,Iris-setosa


12. ❏ Retrieve rows from the **iris** dataframe that have a **sepal_length** greater than 7

In [25]:
data_indexed[data_indexed.sepal_length > 7]

Unnamed: 0_level_0,sepal_length,sepal_width,petal_length,petal_width,species
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Iris-virginica,7.1,3.0,5.9,2.1,Iris-virginica
Iris-virginica,7.6,3.0,6.6,2.1,Iris-virginica
Iris-virginica,7.3,2.9,6.3,1.8,Iris-virginica
Iris-virginica,7.2,3.6,6.1,2.5,Iris-virginica
Iris-virginica,7.7,3.8,6.7,2.2,Iris-virginica
Iris-virginica,7.7,2.6,6.9,2.3,Iris-virginica
Iris-virginica,7.7,2.8,6.7,2.0,Iris-virginica
Iris-virginica,7.2,3.2,6.0,1.8,Iris-virginica
Iris-virginica,7.2,3.0,5.8,1.6,Iris-virginica
Iris-virginica,7.4,2.8,6.1,1.9,Iris-virginica


13. ❏ Using the **iloc[]** method, view the fourth column of the **iris** dataframe<br><br>
*Hint: Remember that column and row positions are zero-based*<br><br>
*Hint: Use **:** to specify all rows*

In [30]:
data_indexed.iloc[:, 3]

species
Iris-setosa       0.2
Iris-setosa       0.2
Iris-setosa       0.2
Iris-setosa       0.2
Iris-setosa       0.2
                 ... 
Iris-virginica    2.3
Iris-virginica    1.9
Iris-virginica    2.0
Iris-virginica    2.3
Iris-virginica    1.8
Name: petal_width, Length: 150, dtype: float64

14. ❏ Using the **iloc[]** method, view the **third** and **fourth** columns of the **iris** data frame<br><br>
*Hint: Use **:** to specify all rows*

In [44]:
data_indexed.iloc[:, [2,3]]

Unnamed: 0_level_0,petal_length,petal_width
species,Unnamed: 1_level_1,Unnamed: 2_level_1
Iris-setosa,1.4,0.2
Iris-setosa,1.4,0.2
Iris-setosa,1.3,0.2
Iris-setosa,1.5,0.2
Iris-setosa,1.4,0.2
...,...,...
Iris-virginica,5.2,2.3
Iris-virginica,5.0,1.9
Iris-virginica,5.2,2.0
Iris-virginica,5.4,2.3


15. ❏ Drop the *sepal_length* column from the *iris* dataframe<br><br>
*Hint: Set axis=1 to drop a column rather than a row*<br><br>
*Hint: The change is made to a copy of the iris dataframe. The original iris dataframe is unaffected.*

In [36]:
data_indexed.drop(['sepal_length'], axis=1)

Unnamed: 0_level_0,sepal_width,petal_length,petal_width,species
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Iris-setosa,3.5,1.4,0.2,Iris-setosa
Iris-setosa,3.0,1.4,0.2,Iris-setosa
Iris-setosa,3.2,1.3,0.2,Iris-setosa
Iris-setosa,3.1,1.5,0.2,Iris-setosa
Iris-setosa,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...
Iris-virginica,3.0,5.2,2.3,Iris-virginica
Iris-virginica,2.5,5.0,1.9,Iris-virginica
Iris-virginica,3.0,5.2,2.0,Iris-virginica
Iris-virginica,3.4,5.4,2.3,Iris-virginica


## <center>**Congratulations! You have completed the exercise.**</center>

![image.png](attachment:image.png)

# <center>**This is the end of the exercise.**</center>