# Anatomy of a DataFrame

In [None]:
import pandas as pd

In [None]:
nashville_art = pd.read_csv('./Art_in_Public_Places.csv')
nashville_art.head()

---

## What makes up a DataFrame?

First and foremost..
DataFrames are like SQL tables.

You have:
* Columns
* Rows
* And data points

![dataframe 1](./assets/dataframe_anatomy_1.png)

---

How many rows and columns are in our `nashville_art` DataFrame?

In [None]:
n_rows, n_columns = nashville_art.shape
print(f'There are {n_rows} rows and {n_columns} columns in the nashville_art DataFrame.')

What are the columns in my DataFrame?

In [None]:
print(f'Column names: {nashville_art.columns.values}')

---

Of course, like a SQL table, the columns can have different types. 
The general data types are:

* Integer
* Float
* Boolean
* Text

There are more complex types than this of course.. 
But this is what we'll focus on.

What are the data types in the `nashville_art` DataFrame?

In [None]:
nashville_art.dtypes

Ok.. 
What are the `object` types?

In this case.. Text.

---

## Accessing columns

Generally, you want to work with your data. 
To do this, you will need to know how to handle columns!!

You can select columns using brackets. 
Syntax is like: 
```python
df [ [ 'columns1', 'column2', ... ] ]
```

Go ahead and select the "Title" and "Location" columns.

In [None]:
# code here!

---

## Accessing a single column

You can also work with a single column at a time. 
This will allow you to perform specific operations around a particular datatype, vs the entire dataframe.

To do this, use a single set of brackets. 
_But_ , remember to only use a single column!
Syntax is like:
```python
df[ 'column' ]
```

In [None]:
# code here!

What is different here?? 

* 2 sets of brackets - Returned a **_DataFrame_**
* 1 set of brackets - Returned a **_Series_**

---

### What is a Series object?
Think of a Series as a column.
Series and DataFrames are different. 
They behave differently, represent different things, and have different methods associated with them (though some share the same name).

Series is not bad, just different. 

You can find documentation for **_DataFrame_** objects [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).  
You can find documentation for **_Series_** objects [here](https://pandas.pydata.org/pandas-docs/stable/reference/series.html).  

A DataFrame is able to reference an entire set of data or subset of the data in general terms, whereas a Series references a single column/row and is able to access type-specifiic functionality.

---

## Getting Fancy

Pandas gives you the option to select certain rows as well!
Not just columns. 
The best way to do that is with the `.iloc` property.

Remember slicing lists? Accessing single elements of a list.. Multiple elements in a list.. Etc.

As a reminder..

In [None]:
my_list = [1, 2, 3, 4, 5, 6]

# first element
print('First element is: ', my_list[0])

# last element
print('Last element is: ', my_list[-1])

# first 3 elements
print('First 3 elements are: ', my_list[:3])

# last 3 elements
print('Last 3 elements are: ', my_list[-3:])

# middle 2 elements
print('Middle 2 elements are: ', my_list[3:5])

You are able to access/slice DataFrames and Series in a similar way.

Let's do the same thing for our DataFrame to **_access the rows_** we're interested in.
Like we mentioned earlier, we need a special property of the DataFrame called `.iloc`.

Example: 
```python
df.iloc[0]
df.iloc[-1]
df.iloc[0:3]
df.iloc[-3:0]
df.iloc[3:5]
```

In [None]:
# Accessing the first element (we'll do the first one for you)
nashville_art.loc[0]

In [None]:
# Accessing the last element


In [None]:
# Accessing first 3 elements


In [None]:
# Accessing last 3 elements


In [None]:
# Accessing third and fourth elements


In these examples, different types were returned. 
Try to figure out what the types were, and when they showed up.

---

## Getting Fancier

But what if you want to filter **BOTH** the rows and the columns?? 
Use can still use `.iloc` for that.

### Slice By Row and Column Indices

Let's blend rows and column slicing together into a single command.

Example: 
```
df.iloc[ <row_indexer> , <column_indexer> ]
df.iloc[ 0 , 0 ]
df.iloc[ -1 , -1 ]
df.iloc[ :3 , :3 ]
df.iloc[ -3: , -3: ]
df.iloc[ 3:5 , 3:5 ]

# or a mix!!
df.iloc[0, 3:5]
df.iloc[:3, -3:]
df.iloc[:, :3]
df.iloc[-3:, :]
```

Just like we've been slicing the rows with `iloc`, put a comma right after your slicer and repeat it.

Apply all the example iloc operations to your DataFrame, and try to understand what data types are being returned by each operation.

In [None]:
# Code all the things here!

---

## Bringing it back

OKOK.
Now we are frustrated because we hate thinking about numbers.
We name our columns for a reason, right? 
Shouldn't we be able to slice-n-dice with the names too?

Why, of course you should.
And you can!
You can do _anything_!!

You just can't do everything with `.iloc`. 
`.iloc` does have a best friend that knows what to do!!

Introducing.. `.loc`!!!

Example: 
```python
df.loc[ [ 0:5 ] ]
```

In [None]:
nashville_art.loc[0:5]

In [None]:
nashville_art.loc[0]

---

## Why didn't it change?!?!

![dataframe 1](./assets/dataframe_anatomy_2.png)

Well first, we need to understand that there are **2 parts** to the columns and rows. 
Up until now, we have been working with the _index_ of either.
But now, we need to think about the names/labels.

Everything you don't see is the index.
This would call for `.iloc`.

Everything you see is a label. 
This calls for `.loc`.

In [None]:
# boom!
nashville_art.loc[0:5, 'Title':'Location']

In [None]:
nashville_art.loc[[1, 3, 5], ['Title', 'Location']]

In [None]:
# query the first 2 rows with just the title

In [None]:
# query the first rows with the title and the location only.

In [None]:
# Using loc, get the 50th row with all the columns.|

---

## The Fanciest Of All!

Finally, you can add clauses to your slicers.
Just like a **WHERE** clause in SQL.
This pairs up with `.loc` or `.iloc`!

In [None]:
# Example
nashville_art.loc[
    nashville_art['Type'] == 'Mural'
]

In [None]:
# Example
nashville_art.loc[
    (nashville_art['Type'] == 'Mural') & 
    (nashville_art['Medium'] == 'Paint'),
    'Title':'Location'
]

---

### A couple last notes

Be sure to make use of the special operators when querying your dataframe.

* `&` - and
* `|` - or
* `~` - not

Also, **_PARENTHESIS MATTER_** when you are performing multiple conditions.
Example: 

```python
df.loc[ (df['column'] == 'foo') & (df['another_column'] != 'bar') , :]
```

Note the parenthesis around each one. 