# DATA VISUALISATION WORK

## Accessing Data with Pandas

In [1]:
import pandas as pd
df = pd.read_csv('parks.csv', index_col=['Park Code'])
df.head(3)

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ACAD,Acadia National Park,ME,47390,44.35,-68.21
ARCH,Arches National Park,UT,76519,38.68,-109.57
BADL,Badlands National Park,SD,242756,43.75,-102.5


## Accessing Rows

In [2]:
df.iloc[2]

Park Name    Badlands National Park
State                            SD
Acres                        242756
Latitude                      43.75
Longitude                    -102.5
Name: BADL, dtype: object

## Pass data from data from dataframe index to loc method

In [3]:
df.loc['BADL']

Park Name    Badlands National Park
State                            SD
Acres                        242756
Latitude                      43.75
Longitude                    -102.5
Name: BADL, dtype: object

In [4]:
df.loc[['BADL', 'ARCH', 'ACAD']]

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BADL,Badlands National Park,SD,242756,43.75,-102.5
ARCH,Arches National Park,UT,76519,38.68,-109.57
ACAD,Acadia National Park,ME,47390,44.35,-68.21


In [5]:
df.iloc[[2, 1, 0]]

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BADL,Badlands National Park,SD,242756,43.75,-102.5
ARCH,Arches National Park,UT,76519,38.68,-109.57
ACAD,Acadia National Park,ME,47390,44.35,-68.21


## Slicing the dataframe as if it were a list

In [6]:
df[:3]

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ACAD,Acadia National Park,ME,47390,44.35,-68.21
ARCH,Arches National Park,UT,76519,38.68,-109.57
BADL,Badlands National Park,SD,242756,43.75,-102.5


In [7]:
df[3:6]

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BIBE,Big Bend National Park,TX,801163,29.25,-103.25
BISC,Biscayne National Park,FL,172924,25.65,-80.08
BLCA,Black Canyon of the Gunnison National Park,CO,32950,38.57,-107.72


## Indexing Columns
We can access a subset of the columns in a dataframe by placing the list of columns in brackets like so:

In [8]:
df['State'].head(3)

Park Code
ACAD    ME
ARCH    UT
BADL    SD
Name: State, dtype: object

## Indexing Columns and Rows
If we need to subset by both columns and rows, you can stack the commands

In [10]:
df[['State', 'Acres']][:3]

Unnamed: 0_level_0,State,Acres
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1
ACAD,ME,47390
ARCH,UT,76519
BADL,SD,242756


## Indexing: Scalar Values

In [12]:
df.State.iloc[2]

'SD'

Note that you will get a different return type if you pass a single value in a list.

In [13]:
df.State.iloc[[2]]

Park Code
BADL    SD
Name: State, dtype: object

## Selecting a subset of the data
The main method for subsetting data in Pandas is called boolean indexing. First, let's take a look at what pandas does when we ask it to evaluate a boolean:

In [15]:
(df.State == 'UT').head(3)

Park Code
ACAD    False
ARCH     True
BADL    False
Name: State, dtype: bool

We get a series of the results of the boolean. Passing that series into a dataframe gives us the subset of the dataframe where the boolean evaluates to True.

In [16]:
df[df.State == 'UT']

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ARCH,Arches National Park,UT,76519,38.68,-109.57
BRCA,Bryce Canyon National Park,UT,35835,37.57,-112.18
CANY,Canyonlands National Park,UT,337598,38.2,-109.93
CARE,Capitol Reef National Park,UT,241904,38.2,-111.17
ZION,Zion National Park,UT,146598,37.3,-113.05


If you have multiple arguments they'll need to be wrapped in parentheses

In [18]:
df[(df.Latitude > 60) | (df.Acres > 10**6)].head(3)

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
DENA,Denali National Park and Preserve,AK,3372402,63.33,-150.5
DEVA,Death Valley National Park,"CA, NV",4740912,36.24,-116.82
EVER,Everglades National Park,FL,1508538,25.32,-80.93


You can also use more complicated expressions, including lambdas.

In [22]:
df[df['Park Name'].str.split().apply(lambda x: len(x) == 3)].head(3)

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ACAD,Acadia National Park,ME,47390,44.35,-68.21
ARCH,Arches National Park,UT,76519,38.68,-109.57
BADL,Badlands National Park,SD,242756,43.75,-102.5


## Key Companion Methods: isin and isnull

 Suppose we wanted to find all parks on the West coast. isin makes that simple:

In [23]:
df[df.State.isin(['WA', 'OR', 'CA'])].head()

Unnamed: 0_level_0,Park Name,State,Acres,Latitude,Longitude
Park Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CHIS,Channel Islands National Park,CA,249561,34.01,-119.42
CRLA,Crater Lake National Park,OR,183224,42.94,-122.1
JOTR,Joshua Tree National Park,CA,789745,33.79,-115.9
LAVO,Lassen Volcanic National Park,CA,106372,40.49,-121.51
MORA,Mount Rainier National Park,WA,235625,46.85,-121.75
