## Import NumPy and Pandas

Import `Pandas` with the alias `pd`.

In [1]:
import pandas as pd

# DataFrame Slicing

A Dataframe object is like a standard Python dictionary or a two-dimensional NumPy array and therefore provides many similar patterns for data indexing and slicing.

Let's start by creating a DataFrame.

In [2]:
s1_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 
           'Florida': 19552860}
s1 = pd.Series(s1_dict)

s2_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297, 
           'Florida': 170312}
s2 = pd.Series(s2_dict)

d1 = pd.DataFrame({'Population':s1,'Area':s2})
print(d1)

            Population    Area
California    38332521  423967
Texas         26448193  695662
New York      19651127  141297
Florida       19552860  170312


Access a column via the explicit index.

In [3]:
 d1['Population']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Name: Population, dtype: int64

Access multiple columns via the explicit index.

In [4]:
d1[['Population', 'Area']]

Unnamed: 0,Population,Area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312


Add a Column.

In [5]:
Density = d1['Population'] / d1['Area']
d1['Density'] = Density
print(d1)

            Population    Area     Density
California    38332521  423967   90.413926
Texas         26448193  695662   38.018740
New York      19651127  141297  139.076746
Florida       19552860  170312  114.806121


Apply boolean indexing.

In [6]:
d1['Density'] > 100

California    False
Texas         False
New York       True
Florida        True
Name: Density, dtype: bool

In [7]:
d1[d1['Density'] > 100]

Unnamed: 0,Population,Area,Density
New York,19651127,141297,139.076746
Florida,19552860,170312,114.806121


# DataFrame Indexing

Access a single value via the explicit index.

In [8]:
d1['Area']['Texas']

695662

Access multiple values via the explicit index.

In [9]:
d1['Area'][['Texas', 'Florida']]

Texas      695662
Florida    170312
Name: Area, dtype: int64

Access a single value via the implicit index.

In [10]:
d1['Area'][1]

695662

# Series Indexers

The patterns for data indexing and slicing just explained can be a source of confusion. To obviate this, Pandas provides special indexer attributes that explicitly expose certain indexing schemes.

The `loc` attribute allows to index and slice the array underlying a DataFrame as if it is a simple Numpy Array (using the explicit Python index), even though the index and column labels are mantained.

In [11]:
d1.loc['Texas', 'Density']

38.01874042279153

In [12]:
d1.loc['New York':'Florida', 'Area':'Density']

Unnamed: 0,Area,Density
New York,141297,139.076746
Florida,170312,114.806121


The `iloc` attribute allows to index and slice the array underlying a DataFrame as if it is a simple Numpy Array (using the implicit Python index), even though the index and column labels are mantained.

In [13]:
d1.iloc[1, 2]

38.01874042279153

In [14]:
d1.iloc[2:, 1:]

Unnamed: 0,Area,Density
New York,141297,139.076746
Florida,170312,114.806121
