# DataFrame Access

In [None]:
import numpy as np
from pandas import DataFrame
from pandas import Series
import pandas as pd

## Column Access

In [None]:
df = DataFrame({'A': [ 1,  2,  3,  4], 
                'B': [ 5,  6,  7,  8], 
                'C': [ 9, 10, 11, 12],
                'D': [13, 14, 15, 16], 
                'E': [17, 18, 19, 20]},
        index = ['one', 'two', 'three', 'four'] )
print(df)

In [None]:
print(df['B'])
print(type(df['B']))

- **NOTICE:** Returning a column returns a `Series` not a `DataFrame`

In [None]:
print(df.B)
print(type(df.B))

- `df.B` is an alternative format for `df['B']`
- Cannot assign to `df.B` unless it already exists

In [None]:
print(df[['C', 'A']])

- To get more than one column, use a `list` of the column names wanted
- Because each column is a `Series`, you can change the order of the columns
- To get just one column of the DataFrame as a DataFrame, place column name in `[ ]` 

In [None]:
print(df[['B']])
print(type(df[['B']]))

## `.loc`  (Label-Based) Access

### Row Access

- In `.loc` access, the first entry specifies the row (axis = 0). The second axis (axis = 1), the column axis, is assumed to be `:` which means all columns. 

In [None]:
print(df.loc['two', 'A':'C'])

In [None]:
print(df.loc['two'])

In [None]:
print(df.loc[:, 'A'])

In [None]:
print(df.loc['two', :])

In [None]:
print(df.loc['two'])

- Remember, `'two'` is an index. `df.loc['two']` is the same as `df.loc['two', :]` as the default when a column is not specified is to use `:` (all columns).

### Columns Access

In [None]:
df['B']

In [None]:
print(df.loc[:, 'B'])

In [None]:
print(df.loc['B'])

- When wanting columns, you must specify the rows

### Specifying Row and Column

In [None]:
df.loc['one':'two', 'B':'D']

- **NOTICE:** The stop is included in Pandas!

In [None]:
df.loc[['one', 'three'], ['B', 'D']]

- This is a list of rows and columns. What you ask for is what you get. 

In [None]:
import time
start = time.time()
df.loc['four', 'E']
stop = time.time()
stop - start

- `df.loc['four', 'E']` is the same as `df.at['four', 'E']` except `.at` is faster

In [None]:
import time
start = time.time()
df.at['four', 'E']
stop = time.time()
stop - start

##  `.iloc` (Position Access) Access

In [None]:
df

In [None]:
print(df.iloc[1])
print(type(df.iloc[1]))

- There is always a numeric value for the axes, starting at zero. Element 
  can be accessed by using the numeric values of the indexes.
- The first entry is the row index (axis = 0). Again, if the second index is 
  left out, it is assumed to be `:` meaning all columns.
- `df.iloc[1]` is the same as `df.iloc[1,:]`

In [None]:
df

In [None]:
print(df.iloc[[3, 1], [2, 0]])


- **NOTICE:** The indexes are the ones assigned
- Again, the list specifies rows you want and the second entry specifies columns.

In [None]:
print(df.iloc[0:1, 0:1])

In [None]:
print(df.loc['one':'two', 'A':'B'])

In [None]:
print(df.loc[0:1, 'A':'B'])

- **WARNING:** When you do a range request using `.iloc` the `<stop>` value is not included!

In [None]:
print(df.iloc[0:1, :])
#print(type(df.iloc[0:1, :]))

- **NOTICE:** For `iloc` the stop value is not included!

In [None]:
print(df.iloc[0, 0])
#print(type(df.iloc[0,0]))

- `df.iloc[0,0]` is the same as `df.iat[0,0]` except `.iat` is faster

## `.loc` with Location Access

In [None]:
df2 = DataFrame({'A': [ 1,  2,  3,  4], 
                 'B': [ 5,  6,  7,  8], 
                 'C': [ 9, 10, 11, 12],
                 'D': [13, 14, 15, 16], 
                 'E': [17, 18, 19, 20]}, )
print(df2)

### With Assigned Row (index, axis=0) Axis

In [None]:
df.loc['two':'four', 'D']

In [None]:
# This does not work because a row axis has been specified. 
# Use `.iloc` and position access to access elements.
#df.iloc['1':'3', 'D']

In [None]:
# This does not work for the same reason.
#df.iloc[[1, 2, 3], 'D']

### Without Assigned Row (index, axis=0) Index

In [None]:
df2 = DataFrame({'A': [ 1,  2,  3,  4], 
                 'B': [ 5,  6,  7,  8], 
                 'C': [ 9, 10, 11, 12],
                 'D': [13, 14, 15, 16], 
                 'E': [17, 18, 19, 20]}, )
print(df2)

In [None]:
print(df2.loc[1:3, 'D'])
#print(type(df2.loc[1:3, 'D']))

- If row or column label is from the default location accession, 
  labeled access, `.loc`, can use all of the location addresses in `.loc`.

In [None]:
print(df2.iloc[1:3, 3])
#print(type(df2.iloc[1:3, 3]))

- **Remember, `.loc` includes the `<stop>` value and `.iloc` does not include it.**

### Only Default Labels

In [None]:
df3 = pd.DataFrame([[ 1,  2,  3,  4], 
                    [ 5,  6,  7,  8], 
                    [ 9, 10, 11, 12],
                    [13, 14, 15, 16], 
                    [17, 18, 19, 20]])
print(df3)

In [None]:
print(df3.loc[1:3, 1:3])

In [None]:
print(df3.iloc[1:3, 1:3])

- **Remember, `.loc` includes the `<stop>` value and `.iloc` does not include it.**

## `.at` Access

- Fast access to a single element using label access

In [None]:
print(df.at['one', 'B'])

In [None]:
print(df2.at[1, 'C'])

In [None]:
print(df3.at[1,1])

## `.iat` Access

- Fast access to a single element using location access

In [None]:
print(df.iat[1,1])

In [None]:
# This returns an error
# print(df2.iat[1, 'B'])

In [None]:
print(df3.iat[1,1])

## Attribute Access

### Accessing Index Names

In [None]:
print(df.index)
print(df.index[1])

### Accessing Column Names

In [None]:
print(df.columns)
print(df.columns[1])

## Filter Access (Boolean Access)

### Using a Boolean Series for Row Selection

In [None]:
# showing df again 
df

- Creating a Boolean Series based on column `B`

In [None]:
truth_series_row = df.loc[:, 'B'] < 7
print(truth_series_row)
#print(type(truth_series_row))

- For `df.loc[:, 'B']`, you could have used `df.B` or `df['B']`
- **NOTICE:** The index for the Series aligns with the row index of the `df` DataFrame

In [None]:
df.loc[truth_series_row]

In [None]:
df.loc[df.loc[:,'B'] < 7]

- This works because the `truth_series_row` aligns with the row index. 
  Rows `one` and `two` are `True`, the rest are `False`. 

In [None]:
df.loc[:, truth_series_row]

- This does not work because `truth_series` is a Pandas Series and cannot 
  be aligned with the column (axis=1) labels 

In [None]:
df

### Using a Boolean Series for Column Selection

In [None]:
truth_series_column = df.loc['two'] > 10
print(truth_series_column)

- **NOTICE:** The index for the Series matches the column names for `df`

In [None]:
df.loc[:, truth_series_column]

In [None]:
df.loc[['two', 'one'], df.loc['two'] > 10]

- The colon (`:`) is required because `truth_series_column` aligns with columns


## Do Now!

1. For the DataFrame, `df` select all rows where column `C` is greater that `9`

In [None]:
df

In [None]:
# Place your answer here
df





In [None]:
truth = df['C'] > 9
truth

In [None]:
df.loc[df['C'] > 9, :]

In [None]:
df[df['C'] > 9]

# End of Notebook