# Lesson 3 Practice: Pandas Part 1

Use this notebook to follow along with the lesson in the corresponding lesson notebook: [L03-Pandas_Part1-Lesson.ipynb](./L03-Pandas_Part1-Lesson.ipynb).  

## Instructions
Follow along with the teaching material in the lesson. Throughout the tutorial sections labeled as "Tasks" are interspersed and indicated with the icon: ![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/16/Apps-gnome-info-icon.png). You should follow the instructions provided in these sections by performing them in the practice notebook.  When the tutorial is completed you can turn in the final practice notebook. For each task, use the cell below it to write and test your code.  You may add additional cells for any task as needed or desired.  

## Task 1a: Setup

+ `numpy` as `np`
+ `pandas` as `pd`


In [1]:
import numpy as np
import pandas as pd

## Task 2a Create a `pd.Series` object

+ Create a series of your own design.

In [2]:
a_series = pd.Series([2, 4, 6, 8, np.nan, np.nan])
a_series

0    2.0
1    4.0
2    6.0
3    8.0
4    NaN
5    NaN
dtype: float64

## Task 2b: Creating a DataFrame

+ Create a pd.DataFrame object from a Python dictionary. Design the data as you like.

In [3]:
df_2 = pd.DataFrame(
    {'fruit': [0, 1, 2, 3, 4],
    'type': ["apple", "orange", "banana", "lemon", "raspberry"]})
df_2

Unnamed: 0,fruit,type
0,0,apple
1,1,orange
2,2,banana
3,3,lemon
4,4,raspberry


## Task 2c: Create DataFrame with labels

+ Create a 10x5 dataframe of random numeric integers that follow a [Guassian (normal) Distribution](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html). 
  + Center the distrubtion at 0.85.
  + We will use these values as assumed grades for a class of students
+ Adjust the row indexes to be the names of hypothetical students.
+ Adjust the columsn to be the names of hypothetical projects, homework, exam names, etc.

In [23]:
df_3 = pd.DataFrame(np.random.normal(loc = 0.85, size = (10,5)))
df_3.index = ['Sarah', 'John', 'Amanda', 'Steve', 'Bob', 'Katie', 'Karen', 'John', 'Zach', 'Linda']
df_3.columns = ['HW1', 'HW2', 'Proj1', 'Exam1', 'Exam2']
df_3



Unnamed: 0,HW1,HW2,Proj1,Exam1,Exam2
Sarah,1.924845,1.854562,0.613801,1.067394,-0.015847
John,1.214865,1.65794,1.954365,0.850892,1.551447
Amanda,1.010829,0.782328,-0.661906,0.680307,0.587896
Steve,0.932667,-0.303121,1.414541,-0.546478,2.258526
Bob,3.133895,0.203571,0.548253,0.255101,3.201276
Katie,0.453087,0.505436,1.073193,-0.377952,2.032694
Karen,0.197839,1.902005,1.155668,-0.390353,1.937802
John,0.633744,-1.086726,1.520625,-0.363841,0.995249
Zach,0.062569,0.173087,0.779895,0.902107,0.64719
Linda,0.506457,1.222316,0.105167,0.955655,0.44693


## Task 3a: Import the iris.csv file

+ Import the iris dataset.
+ Take a look at the `pd.read_csv` online documentation. Write example code in a Markup cell for how you would import this file if it were tab-delimited.

In [25]:
iris_df = pd.read_csv('data/iris.csv')

`iris_df = pd.read_csv('data/iris.csv', sep="\t")`

## Task 4a: Explore Data

 + Use `head`, `tail` and `sample` with the iris dataset.
 + Do the same with the dataset you created in task 2c.

In [27]:
iris_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
149,5.9,3.0,5.1,1.8,virginica


In [28]:
iris_df.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
149,5.9,3.0,5.1,1.8,virginica


In [29]:
iris_df.sample(10)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
41,4.5,2.3,1.3,0.3,setosa
21,5.1,3.7,1.5,0.4,setosa
121,5.6,2.8,4.9,2.0,virginica
1,4.9,3.0,1.4,0.2,setosa
71,6.1,2.8,4.0,1.3,versicolor
14,5.8,4.0,1.2,0.2,setosa
49,5.0,3.3,1.4,0.2,setosa
57,4.9,2.4,3.3,1.0,versicolor
131,7.9,3.8,6.4,2.0,virginica
98,5.1,2.5,3.0,1.1,versicolor


In [30]:
df_3.head()

Unnamed: 0,HW1,HW2,Proj1,Exam1,Exam2
Sarah,1.924845,1.854562,0.613801,1.067394,-0.015847
John,1.214865,1.65794,1.954365,0.850892,1.551447
Amanda,1.010829,0.782328,-0.661906,0.680307,0.587896
Steve,0.932667,-0.303121,1.414541,-0.546478,2.258526
Bob,3.133895,0.203571,0.548253,0.255101,3.201276


In [31]:
df_3.tail()

Unnamed: 0,HW1,HW2,Proj1,Exam1,Exam2
Katie,0.453087,0.505436,1.073193,-0.377952,2.032694
Karen,0.197839,1.902005,1.155668,-0.390353,1.937802
John,0.633744,-1.086726,1.520625,-0.363841,0.995249
Zach,0.062569,0.173087,0.779895,0.902107,0.64719
Linda,0.506457,1.222316,0.105167,0.955655,0.44693


In [32]:
df_3.sample(5)

Unnamed: 0,HW1,HW2,Proj1,Exam1,Exam2
Bob,3.133895,0.203571,0.548253,0.255101,3.201276
John,1.214865,1.65794,1.954365,0.850892,1.551447
Steve,0.932667,-0.303121,1.414541,-0.546478,2.258526
Katie,0.453087,0.505436,1.073193,-0.377952,2.032694
Linda,0.506457,1.222316,0.105167,0.955655,0.44693


## Task 5a: Viewing columns and rows

+ Display the columns and indexes of the iris dataset.
+ Do the same with the dataset you created in Task 2c.

In [34]:
iris_df.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

In [35]:
iris_df.index

RangeIndex(start=0, stop=150, step=1)

In [36]:
df_3.columns

Index(['HW1', 'HW2', 'Proj1', 'Exam1', 'Exam2'], dtype='object')

In [37]:
df_3.index

Index(['Sarah', 'John', 'Amanda', 'Steve', 'Bob', 'Katie', 'Karen', 'John',
       'Zach', 'Linda'],
      dtype='object')

## Task 5b: Get Values

+ Check the version of `pandas` you have 
+ Use the appropriate method to convert the iris data to a dictionary.
+ Do the same with the dataset you created in Task 2c.


In [38]:
pd.__version__

'0.23.4'

In [41]:
iris_dictionary = iris_df.to_dict()
iris_dictionary

{'sepal_length': {0: 5.1,
  1: 4.9,
  2: 4.7,
  3: 4.6,
  4: 5.0,
  5: 5.4,
  6: 4.6,
  7: 5.0,
  8: 4.4,
  9: 4.9,
  10: 5.4,
  11: 4.8,
  12: 4.8,
  13: 4.3,
  14: 5.8,
  15: 5.7,
  16: 5.4,
  17: 5.1,
  18: 5.7,
  19: 5.1,
  20: 5.4,
  21: 5.1,
  22: 4.6,
  23: 5.1,
  24: 4.8,
  25: 5.0,
  26: 5.0,
  27: 5.2,
  28: 5.2,
  29: 4.7,
  30: 4.8,
  31: 5.4,
  32: 5.2,
  33: 5.5,
  34: 4.9,
  35: 5.0,
  36: 5.5,
  37: 4.9,
  38: 4.4,
  39: 5.1,
  40: 5.0,
  41: 4.5,
  42: 4.4,
  43: 5.0,
  44: 5.1,
  45: 4.8,
  46: 5.1,
  47: 4.6,
  48: 5.3,
  49: 5.0,
  50: 7.0,
  51: 6.4,
  52: 6.9,
  53: 5.5,
  54: 6.5,
  55: 5.7,
  56: 6.3,
  57: 4.9,
  58: 6.6,
  59: 5.2,
  60: 5.0,
  61: 5.9,
  62: 6.0,
  63: 6.1,
  64: 5.6,
  65: 6.7,
  66: 5.6,
  67: 5.8,
  68: 6.2,
  69: 5.6,
  70: 5.9,
  71: 6.1,
  72: 6.3,
  73: 6.1,
  74: 6.4,
  75: 6.6,
  76: 6.8,
  77: 6.7,
  78: 6.0,
  79: 5.7,
  80: 5.5,
  81: 5.5,
  82: 5.8,
  83: 6.0,
  84: 5.4,
  85: 6.0,
  86: 6.7,
  87: 6.3,
  88: 5.6,
  89: 5.5,
  90

In [42]:
dict_3 = df_3.to_dict()
dict_3

{'HW1': {'Sarah': 1.9248447830986843,
  'John': 0.6337436165195754,
  'Amanda': 1.0108290990760473,
  'Steve': 0.9326665141042522,
  'Bob': 3.13389495066975,
  'Katie': 0.45308667043690604,
  'Karen': 0.19783936656383805,
  'Zach': 0.06256859760149747,
  'Linda': 0.5064567482495917},
 'HW2': {'Sarah': 1.8545617771703453,
  'John': -1.0867260878290552,
  'Amanda': 0.7823276402004852,
  'Steve': -0.30312146600046475,
  'Bob': 0.203570839620821,
  'Katie': 0.5054360568733085,
  'Karen': 1.9020050758710147,
  'Zach': 0.17308715160892363,
  'Linda': 1.222316156149576},
 'Proj1': {'Sarah': 0.6138010441648719,
  'John': 1.5206252773349478,
  'Amanda': -0.661905689632229,
  'Steve': 1.4145410047338984,
  'Bob': 0.548253095677178,
  'Katie': 1.0731926798427571,
  'Karen': 1.155668408824421,
  'Zach': 0.779895366696042,
  'Linda': 0.10516748976633983},
 'Exam1': {'Sarah': 1.0673941978403858,
  'John': -0.3638412903369882,
  'Amanda': 0.6803073365714023,
  'Steve': -0.5464779992285406,
  'Bob': 0

## Task 5c: Using `loc`

+ Use any iris dataframe to:
  + Select a row slice with `loc`.
  + Select a row and column slice with `loc`.
  + Take a look at the [Pandas documentation for the `at` selector](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html). Use what you learn there to select a single item with Pandas `at` accessor.

In [43]:
iris_df.loc[5:10]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa
10,5.4,3.7,1.5,0.2,setosa


In [44]:
iris_df.loc[1:3, 'sepal_length':'sepal_width']

Unnamed: 0,sepal_length,sepal_width
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1


In [45]:
iris_df.at[5,'sepal_length']

5.4

## Task 5d: Using `iloc`

+ Use any iris dataframe to:    
    + Select a row slice with `iloc`.
    + Select a row and column slice with `iloc`.
    + Take a look at the [Pandas documentation for the `iat` selector](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iat.html). Use what you learn there to select a single item with Pandas `iat` accessor.



In [46]:
iris_df.iloc[1:3]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa


In [47]:
iris_df.iloc[1:3, 2:5]

Unnamed: 0,petal_length,petal_width,species
1,1.4,0.2,setosa
2,1.3,0.2,setosa


In [48]:
iris_df.iat[1,1]

3.0

## Task 5e: Boolean Indexing

+ Create subsets of the iris dataset using boolean indexes that:
    + Use one boolean operator.
    + Use two boolean operators.



In [50]:
iris_df[iris_df['sepal_width'] > 3.0].head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa


In [61]:
limit1 = iris_df['sepal_width'] > 3.0
limit2 = iris_df['sepal_length'] < 6.0
iris_df[limit1 & limit2]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa
10,5.4,3.7,1.5,0.2,setosa
11,4.8,3.4,1.6,0.2,setosa
