# Practicing with DataFrames

## Task 1. Create DataFrame

In [2]:
import pandas as pd

grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90], 'Sam': [94, 77, 90], 'Katie': [100, 81, 82], 'Bob': [83, 65, 85]}

grades = pd.DataFrame(grades_dict)

grades


Unnamed: 0,Wally,Eva,Sam,Katie,Bob
0,87,100,94,100,83
1,96,87,77,81,65
2,70,90,90,82,85


## Task 2. Custom Index

Follow the instructions to apply a custom index (instead of 0,1,2), let's call them 'Test1', Test2', and 'Test3'. See 'Customizing a DataFrame's Indices with the index Attribute subsection. 
As we did in Part 1, we can request grades by key using the text attribute (square brackets with single quoted text strings). Find Eva's grades with grades['Eva'] 
If no spaces in the key, we can use the simpler dot attribute approach. Find Sam's grades with grades.Sam as shown in 'Accessing a DataFrame's columns' subsection. 

In [3]:
grades.index = ['Test1', 'Test2', 'Test3']

print(f"Eva's grades: {grades['Eva']}")
print(f"Sam's grades: {grades.Sam}")

Eva's grades: Test1    100
Test2     87
Test3     90
Name: Eva, dtype: int64
Sam's grades: Test1    94
Test2    77
Test3    90
Name: Sam, dtype: int64


## Task 3. Accessing Rows (loc, iloc)

Like spreadsheets, we can access specific rows or columns.
We use loc['ColA'] and iloc[i]  to access rows by name and index, respectively.
Execute the examples using loc['Test1'] to get scores for the first exam, or iLoc[0] to get scores for the first exam and iLoc[1] to get scores for the second exam. Which do you prefer? 
We can also get slices of rows, e.g., from ['Test1':'Test3'], inclusive, or, using index values, from [0:2], inclusive.

I prefer using loc['ColA'] because selecting a row by name is intuitive and it is easy to see what the code is referencing. 

In [4]:
print('Different ways of selecting the first row')
print(f"Using loc: \n{grades.loc['Test1']}") 
print(f"\nUsing iloc: \n{grades.iloc[0]}")

print('\nSelecting slices of rows')
print(f"Using loc: \n{grades.loc['Test1':'Test3']}")
print(f"\nUsing iloc: \n{grades.iloc[0:2]}")

Different ways of selecting the first row
Using loc: 
Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

Using iloc: 
Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

Selecting slices of rows
Using loc: 
       Wally  Eva  Sam  Katie  Bob
Test1     87  100   94    100   83
Test2     96   87   77     81   65
Test3     70   90   90     82   85

Using iloc: 
       Wally  Eva  Sam  Katie  Bob
Test1     87  100   94    100   83
Test2     96   87   77     81   65


## Task 3. Accessing Subsets (at, iat)

We can use similar notation to get a single cell in the DataFrame (much like getting a single cell in a spreadsheet). 
Use grades.at['Test2', 'Eva'} to find her score on the second exam using labels.
Use grades.iat[2,0] to get the score on the third exam for the first student (Wally), using indices. 

In [5]:
print(f"Eva's score on the second exam: {grades.at['Test2', 'Eva']}")
print(f"Wally's score on the third exam: {grades.iat[2,0]}")

Eva's score on the second exam: 87
Wally's score on the third exam: 70


## Task 4. Describe (By Column)

Use grades.describe() to get descriptive statistics for our gradebook columns.
See why learning Python and DataFrames can be so powerful?  
Try to set the precision using the pd.set_option('precision',2) provided.
Does it work? Libraries are evolving. If you get an error, copy the error text and do a web search.
Can you find something about a newer option using "display.precision"?
If so, try pd.set_option("display.precision",2).
Being able to debug evolving features on your own is important for success.
Our field, languages, and libraries are constantly evolving. 

In [6]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.333333,92.333333,87.0,87.666667,77.666667
std,13.203535,6.806859,8.888194,10.692677,11.015141
min,70.0,87.0,77.0,81.0,65.0
25%,78.5,88.5,83.5,81.5,74.0
50%,87.0,90.0,90.0,82.0,83.0
75%,91.5,95.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


pd.set_option('precision',2) returns an error: 'Pattern matched multiple keys'
This way of adjusting precision is outdated.


In [7]:
pd.set_option("display.precision",2)
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.33,92.33,87.0,87.67,77.67
std,13.2,6.81,8.89,10.69,11.02
min,70.0,87.0,77.0,81.0,65.0
25%,78.5,88.5,83.5,81.5,74.0
50%,87.0,90.0,90.0,82.0,83.0
75%,91.5,95.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


## Task 5. Transpose (rows <--> columns)

Get the average for each column by calling grades.mean()
Transpose the DataFrame using the T attribute.
Get the mean by the new columns with .T.describe()

In [12]:
print(f'grades mean:\n{grades.mean()}')
# transposing grades so that columns represent test1, test2, and test3
grades_t = grades.T
grades_t.describe()

grades mean:
Wally    84.33
Eva      92.33
Sam      87.00
Katie    87.67
Bob      77.67
dtype: float64


Unnamed: 0,Test1,Test2,Test3
count,5.0,5.0,5.0
mean,92.8,81.2,83.4
std,7.66,11.54,8.23
min,83.0,65.0,70.0
25%,87.0,77.0,82.0
50%,94.0,81.0,85.0
75%,100.0,87.0,90.0
max,100.0,96.0,90.0


## Task 6. Sort 

Sort the gradebook rows in reverse order so the most recent exam row appears at the top with grades.sort_index(ascending=False)
Sort the gradebook columns so the names appear in order using grades.sort_index(axis=1).
We can sort our data pretty much however we like. 

In [14]:
grades.sort_index(ascending=False)
grades.sort_index(axis = 1)

Unnamed: 0,Bob,Eva,Katie,Sam,Wally
Test1,83,100,100,94,87
Test2,65,87,81,77,96
Test3,85,90,82,90,70
