In [None]:
import numpy as np 
import pandas as pd 
df = pd.read_csv('ca_wac_S000_JT00_2015.csv')

## Exploring the Data Frame

Now that we've loaded in the data set as a Data Frame, let's check the number of rows and columns. We can do this by looking at the `shape` attribute of a data frame.

In [None]:
df.shape

It looks like there are 243,462 rows and 53 columns.

Let's also find out the names of all the variables in this data set. 

In [None]:
df.columns

We can use the `head` and `tail` methods in order to look at the first or last few rows of the data frame. 

In [None]:
df.head() # Default is to show first 5 rows.

In [None]:
df.head(10) # We can specify how many rows we want to see.

In [None]:
df.tail(10) # Same as head, except the last 10 instead of first 10

> #### Side Note: Instance variables and Methods
Note that we used `head()`, with parentheses, while we used just `shape` or `column`, without parentheses. This is because `shape` and `column` are **instance variables** and head is a **method**. To put it another way, `shape` and `columns` are variables that each Data Frame object has, and we're just displaying the values in those variables. On the other hand, `head` is a method, or a function that you perform specifically on a certain type of object (in this case, a Data Frame object).

### Accessing the Data Frame
What if we want to only look at certain cells, or certain columns? We can use a variety of commands to do just that.

To access individual columns, we can use square brackets or we can simply use dot notation.

In [None]:
# Look at just total number of jobs (C000)
df["C000"] 

# This does the same thing
df.C000 

What if we want to get certain rows? We can also use `loc` with square brackets. We use a colon to indicate that we want a series of indices with a start and end. We can also leave one side of the colon empty to indicate that we want the rest of the values on that end.

In [None]:
# Show rows 10 - 20. Remember, the first row is row 0
df.loc[10:20] 

In [None]:
df.loc[:10]

In [None]:
df.loc[:] # This gives all rows

In addition, we can use `loc` to access certain columns as well as certain indices in the Data Frame.

In [None]:
# Look at rows 10 - 20 for total number of jobs (C000)
df.loc[10:20,"C000"] 

In [None]:
# Look at rows 10 - 20 for total number of jobs (C000) and jobs by age group
df.loc[10:20,['C000','CA01','CA02','CA03'] ] 

Here, we wanted to select 4 variables to look at. Notice that we replaced `"C000"` with `['C000','CA01','CA02','CA03']`. The square brackets create a list with 4 elements, `'C000'`,`'CA01'`,`'CA02'`, and `'CA03'`.

In [None]:
type(['C000','CA01','CA02','CA03'])

In [None]:
vars_to_show = ['CA01','CA02','CA03'] # A list of strings containing names of variables (jobs by age group)
df.iloc[-5:][vars_to_show]

In this case, we were able to use the "`-5:`" to indicate that we want the last 5 rows of the data frame. Note that we can't do the same with `.loc`. This is because `.loc` retrieves the rows from a particular *label* in the Data Frame, while `.iloc` retrieves them from particular *positions*. 