# Practicing with DataFrames

## Task 1. Create DataFrame

In [15]:
import pandas as pd

grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90], 'Sam': [94, 77, 90], 'Katie': [100, 81, 82], 'Bob': [83, 65, 85]}

grades = pd.DataFrame(grades_dict)

grades


Unnamed: 0,Wally,Eva,Sam,Katie,Bob
0,87,100,94,100,83
1,96,87,77,81,65
2,70,90,90,82,85


## Task 2. Custom Index

Applies a custom index to grades

Explores different ways of selecting grades by student

In [16]:
grades.index = ['Test1', 'Test2', 'Test3']

print(f"Eva's grades: {grades['Eva']}")
print(f"Sam's grades: {grades.Sam}")

Eva's grades: Test1    100
Test2     87
Test3     90
Name: Eva, dtype: int64
Sam's grades: Test1    94
Test2    77
Test3    90
Name: Sam, dtype: int64


## Task 3. Accessing Rows (loc, iloc)

Using loc and iLoc to view scores by test

I prefer using loc['ColumnName'] because selecting a row by name is intuitive and it is easy to see what the code is referencing. 

In [17]:
print('Different ways of selecting the first row')
print(f"Using loc: \n{grades.loc['Test1']}") 
print(f"\nUsing iloc: \n{grades.iloc[0]}")

print('\nSelecting slices of rows')
print(f"Using loc: \n{grades.loc['Test1':'Test3']}")
print(f"\nUsing iloc: \n{grades.iloc[0:2]}")

Different ways of selecting the first row
Using loc: 
Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

Using iloc: 
Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

Selecting slices of rows
Using loc: 
       Wally  Eva  Sam  Katie  Bob
Test1     87  100   94    100   83
Test2     96   87   77     81   65
Test3     70   90   90     82   85

Using iloc: 
       Wally  Eva  Sam  Katie  Bob
Test1     87  100   94    100   83
Test2     96   87   77     81   65


## Task 3. Accessing Subsets (at, iat)

Using at and iat to get a single cell in the DataFrame

In [18]:
print(f"Eva's score on the second exam: {grades.at['Test2', 'Eva']}")
print(f"Wally's score on the third exam: {grades.iat[2,0]}")

Eva's score on the second exam: 87
Wally's score on the third exam: 70


## Task 4. Describe (By Column)

Use grades.describe() to get descriptive statistics for our gradebook columns.

In [19]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.33,92.33,87.0,87.67,77.67
std,13.2,6.81,8.89,10.69,11.02
min,70.0,87.0,77.0,81.0,65.0
25%,78.5,88.5,83.5,81.5,74.0
50%,87.0,90.0,90.0,82.0,83.0
75%,91.5,95.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


Rounding to two decimal places in pandas

pd.set_option('precision',2) is outdated and will return an error. pd.set_option("display.precision",2) is the new way to do this.


In [20]:
pd.set_option("display.precision",2)
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.33,92.33,87.0,87.67,77.67
std,13.2,6.81,8.89,10.69,11.02
min,70.0,87.0,77.0,81.0,65.0
25%,78.5,88.5,83.5,81.5,74.0
50%,87.0,90.0,90.0,82.0,83.0
75%,91.5,95.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


## Task 5. Transpose (rows <--> columns)

Get the average for each column by calling grades.mean()
Transpose the DataFrame using the T attribute.
Get the mean by the new columns with .T.describe()

In [21]:
print(f'grades mean:\n{grades.mean()}')
# transposing grades so that columns represent test1, test2, and test3
grades_t = grades.T
grades_t.describe()

grades mean:
Wally    84.33
Eva      92.33
Sam      87.00
Katie    87.67
Bob      77.67
dtype: float64


Unnamed: 0,Test1,Test2,Test3
count,5.0,5.0,5.0
mean,92.8,81.2,83.4
std,7.66,11.54,8.23
min,83.0,65.0,70.0
25%,87.0,77.0,82.0
50%,94.0,81.0,85.0
75%,100.0,87.0,90.0
max,100.0,96.0,90.0


## Task 6. Sort 

Sort the gradebook rows in reverse order so the most recent exam row appears at the top with grades.sort_index(ascending=False)

In [22]:
grades.sort_index(ascending=False)

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test3,70,90,90,82,85
Test2,96,87,77,81,65
Test1,87,100,94,100,83


Sort the gradebook columns so the names appear in order using grades.sort_index(axis=1).

In [23]:
grades.sort_index(axis = 1)

Unnamed: 0,Bob,Eva,Katie,Sam,Wally
Test1,83,100,100,94,87
Test2,65,87,81,77,96
Test3,85,90,82,90,70
