# Chapter 02: Working with Pandas ``DataFrame`` Objects

## Setup

In this section, I will import all of the necessary packages I need.

In [1]:
# Necessary imports
import pandas as pd
import numpy as np

## Creating a ``DataFrame`` Object from a CSV File

### Reading a CSV File

Using ``pd.read_csv()``, I can use pandas to read a CSV file ([documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)). Pandas is usually very good at figuring out which options to use based on the input data, so I often won't need to add arguments to the call; however, there are many options available should I need them, some of which include the following:

| Parameter | Purpose |
| --- | --- |
| `sep` | Specifies the delimiter |
| `header` | Row number where the column names are located; the default option has `pandas` infer whether they are present |
| `names` | List of column names to use as the header |
| `index_col` | Column to use as the index |
| `usecols` | Specifies which columns to read in |
| `dtype` | Specifies data types for the columns | 
| `converters` | Specifies functions for converting data in certain columns |
| `skiprows` | Rows to skip |
| `nrows` | Number of rows to read at a time (combine with `skiprows` to read a file bit by bit) |
| `parse_dates` | Automatically parse columns containing dates into datetime objects |
| `chunksize` | For reading the file in chunks |
| `compression` | For reading in compressed files without extracting beforehand |
| `encoding` | Specifies the file encoding |

Notably, I could pull the data from:

* The csv file stored in the data directory (``data/earthquakes.csv``)
* The GitHub repository for this book (https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition/blob/master/ch_02/data/earthquakes.csv?raw=True)

I will use the pre-downloaded csv file to make this easy on myself.

In [2]:
csv_file_path = 'data/earthquakes.csv'
"""
str: The path to the earthquake data from September 18, 2018 - October 13, 2018

Obtained from the US Geological Survey (USGS) using the USGS API
"""

df_csv = pd.read_csv(csv_file_path)
"""
pandas.dataframe: The DataFrame we will examine in this notebook.

It will be the test DataFrame used in this notebook.
The DataFrame is built on the CSV file I loaded in before.
"""

df_csv

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,",ci,",automatic,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.020030,,79.0,",ci37389202,",1.29,ml,...,",ci,",automatic,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.021370,28.0,21.0,",ci37389194,",3.42,ml,...,",ci,",automatic,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...
3,,,37389186,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.026180,,39.0,",ci37389186,",0.44,ml,...,",ci,",automatic,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475196167,https://earthquake.usgs.gov/earthquakes/eventp...
4,,,73096941,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.077990,,192.0,",nc73096941,",2.16,md,...,",nc,",automatic,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539477547926,https://earthquake.usgs.gov/earthquakes/eventp...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9327,,,73086771,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.018060,,185.0,",nc73086771,",0.62,md,...,",nc,",reviewed,1537230228060,"M 0.6 - 9km ENE of Mammoth Lakes, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1537285598315,https://earthquake.usgs.gov/earthquakes/eventp...
9328,,,38063967,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.030410,,50.0,",ci38063967,",1.00,ml,...,",ci,",reviewed,1537230135130,"M 1.0 - 3km W of Julian, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1537276800970,https://earthquake.usgs.gov/earthquakes/eventp...
9329,,,2018261000,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.452600,,276.0,",pr2018261000,",2.40,md,...,",pr,",reviewed,1537229908180,"M 2.4 - 35km NNE of Hatillo, Puerto Rico",0,earthquake,",geoserve,origin,phase-data,",-240.0,1537243777410,https://earthquake.usgs.gov/earthquakes/eventp...
9330,,,38063959,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.018650,,61.0,",ci38063959,",1.10,ml,...,",ci,",reviewed,1537229545350,"M 1.1 - 9km NE of Aguanga, CA",0,earthquake,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1537230211640,https://earthquake.usgs.gov/earthquakes/eventp...


## Inspecting A ``DataFrame`` Object

Now I can check the properties of the ``df_csv`` object. Namely:

### Whether the ``DataFrame`` Object Is Empty or Not

The ``empty`` property returns a ``False`` value if the ``DataFrame`` object contains data. Otherwise, it returns ``True``.

In [3]:
# Checks to see if DataFrame object is empty or not
df_csv.empty

False

### The Dimensions of the ``DataFrame`` Object

The ``shape`` property returns the number of rows and columnns in the ``DataFrame`` object.

In [4]:
# Get the shape of the DataFrame object
df_csv.shape

(9332, 26)

### The Columns Inside the ``DataFrame`` Object

The ``columns`` property returns the 'key' for all columns in the ``DataFrame`` object.

In [5]:
df_csv.columns

# Test to see if this returns an iterable item
test = df_csv.columns
for col in test:
    print(col)

# See the original return of df_csv.columns
test

alert
cdi
code
detail
dmin
felt
gap
ids
mag
magType
mmi
net
nst
place
rms
sig
sources
status
time
title
tsunami
type
types
tz
updated
url


Index(['alert', 'cdi', 'code', 'detail', 'dmin', 'felt', 'gap', 'ids', 'mag',
       'magType', 'mmi', 'net', 'nst', 'place', 'rms', 'sig', 'sources',
       'status', 'time', 'title', 'tsunami', 'type', 'types', 'tz', 'updated',
       'url'],
      dtype='object')

### The Head and Tail of the ``DataFrame`` Object

The head of the ``DataFrame`` object can be retrieved by using the ``head()`` method. If no integer argument is passed, then it automatically gets the first 5 rows of data.

In [6]:
# Get the head of the DataFrame object
df_csv.head()

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,",ci,",automatic,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,",ci,",automatic,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02137,28.0,21.0,",ci37389194,",3.42,ml,...,",ci,",automatic,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...
3,,,37389186,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02618,,39.0,",ci37389186,",0.44,ml,...,",ci,",automatic,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475196167,https://earthquake.usgs.gov/earthquakes/eventp...
4,,,73096941,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.07799,,192.0,",nc73096941,",2.16,md,...,",nc,",automatic,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539477547926,https://earthquake.usgs.gov/earthquakes/eventp...


The tail of the ``DataFrame`` object can be retrieved using the ``tail()`` method (very similar to the ``head()`` method).

In [7]:
# Get the tail of the DataFrame object
df_csv.tail(2)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
9330,,,38063959,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.01865,,61.0,",ci38063959,",1.1,ml,...,",ci,",reviewed,1537229545350,"M 1.1 - 9km NE of Aguanga, CA",0,earthquake,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1537230211640,https://earthquake.usgs.gov/earthquakes/eventp...
9331,,,38063935,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.01698,,39.0,",ci38063935,",0.66,ml,...,",ci,",reviewed,1537228864470,"M 0.7 - 9km NE of Aguanga, CA",0,earthquake,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1537305830770,https://earthquake.usgs.gov/earthquakes/eventp...


### The Data Types Within the ``DataFrame`` Object

The ``dtypes`` property returns the type of data within each column of the ``DataFrame`` object.

In [8]:
# Get the data types in each column of the DataFrame object
df_csv.dtypes

alert       object
cdi        float64
code        object
detail      object
dmin       float64
felt       float64
gap        float64
ids         object
mag        float64
magType     object
mmi        float64
net         object
nst        float64
place       object
rms        float64
sig          int64
sources     object
status      object
time         int64
title       object
tsunami      int64
type        object
types       object
tz         float64
updated      int64
url         object
dtype: object

### Getting Extra Information and Location of ``null`` Values in the ``DataFrame`` Object

The ``info()`` method returns information about the ``DataFrame`` object. It provides much of the information discussed in the other methods and properties, but also counts the number of ``null`` items in each column of the ``DataFrame`` object. This is a very useful method.

In [9]:
# Get a lot of information about the DataFrame object
df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9332 entries, 0 to 9331
Data columns (total 26 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   alert    59 non-null     object 
 1   cdi      329 non-null    float64
 2   code     9332 non-null   object 
 3   detail   9332 non-null   object 
 4   dmin     6139 non-null   float64
 5   felt     329 non-null    float64
 6   gap      6164 non-null   float64
 7   ids      9332 non-null   object 
 8   mag      9331 non-null   float64
 9   magType  9331 non-null   object 
 10  mmi      93 non-null     float64
 11  net      9332 non-null   object 
 12  nst      5364 non-null   float64
 13  place    9332 non-null   object 
 14  rms      9332 non-null   float64
 15  sig      9332 non-null   int64  
 16  sources  9332 non-null   object 
 17  status   9332 non-null   object 
 18  time     9332 non-null   int64  
 19  title    9332 non-null   object 
 20  tsunami  9332 non-null   int64  
 21  type     9332 

## Describing & Summarizing ``DataFrame`` Objects

The ``describe()`` method will provide a quick summary of all integer and float elements in a ``DataFrame`` object. To be more specific, it provides the following statistics if called without any arguments:

* Count
* Mean
* Standard deviation
* Minimum
* 25<sup>th</sup> percentile
* 75<sup>th</sup> percentile
* Maximum

In [10]:
# Base describe() method
df_csv.describe()

Unnamed: 0,cdi,dmin,felt,gap,mag,mmi,nst,rms,sig,time,tsunami,tz,updated
count,329.0,6139.0,329.0,6164.0,9331.0,93.0,5364.0,9332.0,9332.0,9332.0,9332.0,9331.0,9332.0
mean,2.754711,0.544925,12.31003,121.506588,1.497345,3.651398,19.053878,0.362122,56.899914,1538284000000.0,0.006537,-451.99014,1538537000000.0
std,1.010637,2.214305,48.954944,72.962363,1.203347,1.790523,15.492315,0.317784,91.872163,608030600.0,0.080589,231.752571,656413500.0
min,0.0,0.000648,0.0,12.0,-1.26,0.0,0.0,0.0,0.0,1537229000000.0,0.0,-720.0,1537230000000.0
25%,2.0,0.020425,1.0,66.1425,0.72,2.68,8.0,0.119675,8.0,1537793000000.0,0.0,-540.0,1537996000000.0
50%,2.7,0.05905,2.0,105.0,1.3,3.72,15.0,0.21,26.0,1538245000000.0,0.0,-480.0,1538621000000.0
75%,3.3,0.17725,5.0,159.0,1.9,4.57,25.0,0.59,56.0,1538766000000.0,0.0,-480.0,1539110000000.0
max,8.4,53.737,580.0,355.91,7.5,9.12,172.0,1.91,2015.0,1539475000000.0,1.0,720.0,1539537000000.0


I can also specify what percentiles I want to use by using the percentile argument (``percentile = [a, b]``), where $0 \leq$ ``a`` $<$ ``b`` $\leq 1$.

In [11]:
floor = 0.05
"""
float: The lowest percentile to look at
"""
ceiling = 0.95
"""
float: The highest percentile to look at
"""
percentile_range = [floor, ceiling]
"""
list(float, float): The floor and ceiling to use for our example percentile range
"""

# Describe the DataFrame object within a given range
df_csv.describe(percentiles=percentile_range)

Unnamed: 0,cdi,dmin,felt,gap,mag,mmi,nst,rms,sig,time,tsunami,tz,updated
count,329.0,6139.0,329.0,6164.0,9331.0,93.0,5364.0,9332.0,9332.0,9332.0,9332.0,9331.0,9332.0
mean,2.754711,0.544925,12.31003,121.506588,1.497345,3.651398,19.053878,0.362122,56.899914,1538284000000.0,0.006537,-451.99014,1538537000000.0
std,1.010637,2.214305,48.954944,72.962363,1.203347,1.790523,15.492315,0.317784,91.872163,608030600.0,0.080589,231.752571,656413500.0
min,0.0,0.000648,0.0,12.0,-1.26,0.0,0.0,0.0,0.0,1537229000000.0,0.0,-720.0,1537230000000.0
5%,2.0,0.005491,1.0,35.0,-0.04,0.0,4.0,0.03,0.0,1537344000000.0,0.0,-600.0,1537387000000.0
50%,2.7,0.05905,2.0,105.0,1.3,3.72,15.0,0.21,26.0,1538245000000.0,0.0,-480.0,1538621000000.0
95%,4.3,2.6789,40.2,276.0,4.4,6.38,49.0,0.96,298.0,1539319000000.0,0.0,-60.0,1539400000000.0
max,8.4,53.737,580.0,355.91,7.5,9.12,172.0,1.91,2015.0,1539475000000.0,1.0,720.0,1539537000000.0


The ``include`` argument specifies what data type should be examined.

In [12]:
data_type = object
"""
object: The data type to test the include argument with
"""
df_csv.describe(include = data_type)

Unnamed: 0,alert,code,detail,ids,magType,net,place,sources,status,title,type,types,url
count,59,9332,9332,9332,9331,9332,9332,9332,9332,9332,9332,9332,9332
unique,2,9332,9332,9332,10,14,5433,52,2,7807,5,42,9332
top,green,73089491,https://earthquake.usgs.gov/fdsnws/event/1/que...,",nn00659707,",ml,ak,"10km NE of Aguanga, CA",",ak,",reviewed,"M 0.4 - 10km NE of Aguanga, CA",earthquake,",geoserve,origin,phase-data,",https://earthquake.usgs.gov/earthquakes/eventp...
freq,58,1,1,1,6803,3166,306,2981,7797,55,9081,5301,1


Or, if I set ``include = 'all'``, then I'll get a summary of every data type in the ``DataFrame`` object.

In [13]:
data_type = "all"
"""
str: The data type to test the include argument with
"""
df_csv.describe(include = data_type)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
count,59,329.0,9332.0,9332,6139.0,329.0,6164.0,9332,9331.0,9331,...,9332,9332,9332.0,9332,9332.0,9332,9332,9331.0,9332.0,9332
unique,2,,9332.0,9332,,,,9332,,10,...,52,2,,7807,,5,42,,,9332
top,green,,73089491.0,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",nn00659707,",,ml,...,",ak,",reviewed,,"M 0.4 - 10km NE of Aguanga, CA",,earthquake,",geoserve,origin,phase-data,",,,https://earthquake.usgs.gov/earthquakes/eventp...
freq,58,,1.0,1,,,,1,,6803,...,2981,7797,,55,,9081,5301,,,1
mean,,2.754711,,,0.544925,12.31003,121.506588,,1.497345,,...,,,1538284000000.0,,0.006537,,,-451.99014,1538537000000.0,
std,,1.010637,,,2.214305,48.954944,72.962363,,1.203347,,...,,,608030600.0,,0.080589,,,231.752571,656413500.0,
min,,0.0,,,0.000648,0.0,12.0,,-1.26,,...,,,1537229000000.0,,0.0,,,-720.0,1537230000000.0,
25%,,2.0,,,0.020425,1.0,66.1425,,0.72,,...,,,1537793000000.0,,0.0,,,-540.0,1537996000000.0,
50%,,2.7,,,0.05905,2.0,105.0,,1.3,,...,,,1538245000000.0,,0.0,,,-480.0,1538621000000.0,
75%,,3.3,,,0.17725,5.0,159.0,,1.9,,...,,,1538766000000.0,,0.0,,,-480.0,1539110000000.0,


The ``describe()`` method will even work on columns as well.

In [14]:
df_csv.felt.describe()

count    329.000000
mean      12.310030
std       48.954944
min        0.000000
25%        1.000000
50%        2.000000
75%        5.000000
max      580.000000
Name: felt, dtype: float64

There are methods for specific statistics as well. Here is a sampling of them:

| Method | Description | Data types |
| --- | --- | --- |
| `count()` | The number of non-null observations | Any |
| `nunique()` | The number of unique values | Any |
| `sum()` | The total of the values | Numerical or Boolean |
| `mean()` | The average of the values | Numerical or Boolean |
| `median()` | The median of the values | Numerical |
| `min()` | The minimum of the values | Numerical |
| `idxmin()` | The index where the minimum values occurs | Numerical |
| `max()` | The maximum of the values | Numerical |
| `idxmax()` | The index where the maximum value occurs | Numerical |
| `abs()` | The absolute values of the data | Numerical |
| `std()` | The standard deviation | Numerical |
| `var()` | The variance |  Numerical |
| `cov()` | The covariance between two `Series`, or a covariance matrix for all column combinations in a `DataFrame` | Numerical |
| `corr()` | The correlation between two `Series`, or a correlation matrix for all column combinations in a `DataFrame` | Numerical |
| `quantile()` | Calculates a specific quantile | Numerical |
| `cumsum()` | The cumulative sum | Numerical or Boolean |
| `cummin()` | The cumulative minimum | Numerical |
| `cummax()` | The cumulative maximum | Numerical |

Note that `Index` objects also have several methods to help describe and summarize our data:

| Method | Description |
| --- | --- |
| `argmax()`/`argmin()` | Find the location of the maximum/minimum value in the index |
| `equals()` | Compare the index to another `Index` object for equality |
| `isin()` | Check if the index values are in a list of values and return an array of Booleans |
| `max()`/`min()` | Find the maximum/minimum value in the index |
| `nunique()` | Get the number of unique values in the index |
| `to_series()` | Create a `Series` object from the index |
| `unique()` | Find the unique values of the index |
| `value_counts()`| Create a frequency table for the unique values in the index |

## Subsetting Data in a ``DataFrame`` Object

### Selecting Columns from the ``DataFrame`` Object Using Attributes

I can use attribute notation to get all the elements in a specific column.

In [15]:
# Retrieve a column using attributes
df_csv.time

0       1539475168010
1       1539475129610
2       1539475062610
3       1539474978070
4       1539474716050
            ...      
9327    1537230228060
9328    1537230135130
9329    1537229908180
9330    1537229545350
9331    1537228864470
Name: time, Length: 9332, dtype: int64

### Selecting Columns from the ``DataFrame`` Object Using Dictionary Syntax

I can also use dictionary syntax to grab all the information I can from a specific column in a ``DataFrame`` object.

In [16]:
# Retrieve a column using dictionaries
first_column_to_get = 'time'
"""
str: A unique column header from my DataFrame instance.

Used to test getting one or more columns from a pandas DataFrame.
"""
df_csv[first_column_to_get]

0       1539475168010
1       1539475129610
2       1539475062610
3       1539474978070
4       1539474716050
            ...      
9327    1537230228060
9328    1537230135130
9329    1537229908180
9330    1537229545350
9331    1537228864470
Name: time, Length: 9332, dtype: int64

In fact, I can use dictionary syntax to get more than one column at a time.

In [17]:
# Retrieving more than one column using dictionaries
second_column_to_get = 'title'
"""
str: A unique column header from my DataFrame instance.

Used to test getting one or more columns from a pandas DataFrame.
"""
columns_dictionary = [first_column_to_get, second_column_to_get]
"""
list(str, str): A dictionary containing two or more column headings from my DataFrame instance.
"""
df_csv[columns_dictionary]

Unnamed: 0,time,title
0,1539475168010,"M 1.4 - 9km NE of Aguanga, CA"
1,1539475129610,"M 1.3 - 9km NE of Aguanga, CA"
2,1539475062610,"M 3.4 - 8km NE of Aguanga, CA"
3,1539474978070,"M 0.4 - 9km NE of Aguanga, CA"
4,1539474716050,"M 2.2 - 10km NW of Avenal, CA"
...,...,...
9327,1537230228060,"M 0.6 - 9km ENE of Mammoth Lakes, CA"
9328,1537230135130,"M 1.0 - 3km W of Julian, CA"
9329,1537229908180,"M 2.4 - 35km NNE of Hatillo, Puerto Rico"
9330,1537229545350,"M 1.1 - 9km NE of Aguanga, CA"


Using ``dict`` to get columns can be even more powerful if I combine it with list comprehensions and string operations.

In [18]:
# Create a new columns dictionary using list comprehension and string operations
columns_dictionary = columns_dictionary + [column for column in df_csv.columns if column.startswith('mag')]
columns_dictionary

['time', 'title', 'mag', 'magType']

In [19]:
df_csv[columns_dictionary]

Unnamed: 0,time,title,mag,magType
0,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",1.35,ml
1,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",1.29,ml
2,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",3.42,ml
3,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0.44,ml
4,1539474716050,"M 2.2 - 10km NW of Avenal, CA",2.16,md
...,...,...,...,...
9327,1537230228060,"M 0.6 - 9km ENE of Mammoth Lakes, CA",0.62,md
9328,1537230135130,"M 1.0 - 3km W of Julian, CA",1.00,ml
9329,1537229908180,"M 2.4 - 35km NNE of Hatillo, Puerto Rico",2.40,md
9330,1537229545350,"M 1.1 - 9km NE of Aguanga, CA",1.10,ml


### Slicing ``DataFrame`` Objects

I can index my ``DataFrame`` instance like so:

```python
DataFrame[row_start:row_stop]
```

Be aware the ``row_start`` argument will be *inclusive* whereas the ``row_stop`` argument will be **exclusive**.

In [20]:
row_start = 101
"""
int: The row to start the slice at.

This is an inclusive value.
"""
row_stop = 104
"""
int: The row I need to stop.

This is an exclusive value.
"""
df_csv[row_start: row_stop]

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
101,,,73096756,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.01355,,185.0,",nc73096756,",0.59,md,...,",nc,",automatic,1539435391320,"M 0.6 - 8km ESE of Mammoth Lakes, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539439802162,https://earthquake.usgs.gov/earthquakes/eventp...
102,,,37388730,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02987,,39.0,",ci37388730,",1.33,ml,...,",ci,",automatic,1539435293090,"M 1.3 - 8km ENE of Aguanga, CA",0,earthquake,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1539435940470,https://earthquake.usgs.gov/earthquakes/eventp...
103,,,37388722,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.03667,,183.0,",ci37388722,",0.03,ml,...,",ci,",automatic,1539434854250,"M 0.0 - 5km WSW of Anza, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539435060684,https://earthquake.usgs.gov/earthquakes/eventp...


I can chain a column slice and a row slice together to get select data from my ``DataFrame`` object.

In [21]:
df_csv[columns_dictionary][row_start: row_stop]

Unnamed: 0,time,title,mag,magType
101,1539435391320,"M 0.6 - 8km ESE of Mammoth Lakes, CA",0.59,md
102,1539435293090,"M 1.3 - 8km ENE of Aguanga, CA",1.33,ml
103,1539434854250,"M 0.0 - 5km WSW of Anza, CA",0.03,ml


### Indexing a ``DataFrame`` Object

#### Indexing a ``DataFrame`` Object Using the ``loc`` Attribute

I can use the ``loc`` attribute to pull out specific ranges of my ``DataFrame``. All I need to do is set up code like this:

```python
DataFrame.loc[row_indexer, column_indexer]
```
Note that the ``row_indexer`` must be an ``int:int`` argument and the ``column_indexer`` must be a ``str`` or ``[str,str,...]`` argument. Also, ``loc`` is *inclusive* with its ``row_indexer``.

<div class='alert alert-info' role='alert'>
    By using a <code>:</code> for the <code>row_indexer</code>, I can retrieve all of the elements from a given column/columns.
</div>

In [22]:
# Indexing with loc
df_csv.loc[row_start:row_stop, columns_dictionary]

Unnamed: 0,time,title,mag,magType
101,1539435391320,"M 0.6 - 8km ESE of Mammoth Lakes, CA",0.59,md
102,1539435293090,"M 1.3 - 8km ENE of Aguanga, CA",1.33,ml
103,1539434854250,"M 0.0 - 5km WSW of Anza, CA",0.03,ml
104,1539434531500,"M 1.6 - 10km S of Progreso, B.C., MX",1.64,ml


#### Indexing a ``DataFrame`` Object Using the ``iloc`` Attribute

I can use the ``iloc`` attribute to pull out specific ranges of my ``DataFrame`` in a very similar fashion to the ``loc``. The differences are that both the ``row_indexer`` and ``column_indexer`` must be ``int`` arguments, and the ``row_indexer`` is **exclusive**.

In [23]:
# Indexing with iloc
first_column_to_get_index = 18
second_column_to_get_index = 19
columns_to_get_indices = [first_column_to_get_index, second_column_to_get_index]
df_csv.iloc[row_start: row_stop + 1, columns_to_get_indices]

Unnamed: 0,time,title
101,1539435391320,"M 0.6 - 8km ESE of Mammoth Lakes, CA"
102,1539435293090,"M 1.3 - 8km ENE of Aguanga, CA"
103,1539434854250,"M 0.0 - 5km WSW of Anza, CA"
104,1539434531500,"M 1.6 - 10km S of Progreso, B.C., MX"


### Looking Up Scalar Values in a ``DataFrame`` Object

#### Looking Up Scalar Values in a ``DataFrame`` Object Using ``at``

By using ``at``, I can get the element at a specific row and column.

In [24]:
# Scalars with at
test_row = 10
"""
int: A random row to test at and iat with.
"""
test_column_title = 'mag'
"""
str: A random column heading to test at with.
"""
df_csv.at[test_row, test_column_title]

0.5

#### Looking Up Scalar Values in a ``DataFrame`` Object Using ``iat``

``iat`` is very similar to ``at``, except that the column argument is an ``int`` index, not a string.

In [25]:
# Scalars with iat
test_column_index = 8
"""
int: A random column index to test iat with.
"""
df_csv.iat[test_row, test_column_index]

0.5

### Filtering a ``DataFrame`` Object

#### Using Boolean Masks to Filter a ``DataFrame`` Object

A **Boolean mask** filters elements based upon a specific argument and returns values that meet that argument.

In [26]:
# Boolean mask on a column
bool_mag_value = 7.0
"""
float: The magnitude of the earthquake I want to filter for.
"""
df_csv.mag > bool_mag_value

0       False
1       False
2       False
3       False
4       False
        ...  
9327    False
9328    False
9329    False
9330    False
9331    False
Name: mag, Length: 9332, dtype: bool

By placing a Boolean mask within bracket, I can return a selection from the ``DataFrame`` instance of all rows that meet that criteria.

In [27]:
# Boolean mask on a selection
df_csv[df_csv.mag >= bool_mag_value]

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
837,green,4.1,1000haa3,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.763,3.0,14.0,",us1000haa3,pt18283003,at00pgehsk,",7.0,mww,...,",us,pt,at,",reviewed,1539204500290,"M 7.0 - 117km E of Kimbe, Papua New Guinea",1,earthquake,",dyfi,finite-fault,general-text,geoserve,groun...",600.0,1539378744253,https://earthquake.usgs.gov/earthquakes/eventp...
5263,red,8.4,1000h3p4,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.589,18.0,27.0,",us1000h3p4,us1000h4p4,",7.5,mww,...,",us,us,",reviewed,1538128963480,"M 7.5 - 78km N of Palu, Indonesia",1,earthquake,",dyfi,finite-fault,general-text,geoserve,groun...",480.0,1539123134531,https://earthquake.usgs.gov/earthquakes/eventp...


Boolean masks also work with ``loc`` indices.

In [28]:
# Boolean mask + loc
df_csv.loc[
    df_csv.mag >= bool_mag_value,
    columns_dictionary
]

Unnamed: 0,time,title,mag,magType
837,1539204500290,"M 7.0 - 117km E of Kimbe, Papua New Guinea",7.0,mww
5263,1538128963480,"M 7.5 - 78km N of Palu, Indonesia",7.5,mww


Using the Boolean operators ``&`` (AND) and ``|`` (OR) around two or more Boolean masks surrounded by parentheses will allow me to filter through ``DataFrame`` objects using more than one criteria.

In [29]:
bool_alert_value = 'red'
"""
str: The boolean value for the alert I want to filter for.

This argument can either be ``red`` or ``green``.
"""
## Using & with Boolean masks.
df_csv.loc[
    (df_csv.tsunami >= 1) & (df_csv.alert == bool_alert_value),
    columns_dictionary
]

Unnamed: 0,time,title,mag,magType
5263,1538128963480,"M 7.5 - 78km N of Palu, Indonesia",7.5,mww


In [30]:
## Using | with Boolean masks.
df_csv.loc[
    (df_csv.tsunami >= 1) | (df_csv.alert == bool_alert_value),
    columns_dictionary
]

Unnamed: 0,time,title,mag,magType
36,1539459504090,"M 5.0 - 165km NNW of Flying Fish Cove, Christm...",5.0,mww
118,1539429023560,"M 6.7 - 262km NW of Ozernovskiy, Russia",6.7,mww
501,1539312723620,"M 5.6 - 128km SE of Kimbe, Papua New Guinea",5.6,mww
799,1539213362130,"M 6.5 - 148km S of Severo-Kuril'sk, Russia",6.5,mww
816,1539208835130,"M 6.2 - 94km SW of Kokopo, Papua New Guinea",6.2,mww
...,...,...,...,...
8561,1537427126700,"M 5.4 - 228km S of Taron, Papua New Guinea",5.4,mb
8624,1537411002190,"M 5.1 - 278km SE of Pondaguitan, Philippines",5.1,mb
9133,1537274456960,"M 5.1 - 64km SSW of Kaktovik, Alaska",5.1,ml
9175,1537262729590,"M 5.2 - 126km N of Dili, East Timor",5.2,mb


#### Using the ``notnull()`` Method to Filter a ``DataFrame`` Object

To get not-null elements from a column in a ``DataFrame`` instance, I can use the ``notnull()`` method to return.

In [31]:
location_to_check = 'Alaska'
"""
str: A location I want to see whether they've had earthquakes or not.
"""
# Filtering for all earthquakes occurring in Alaska.
# Use the notnull() method to return instances of earthquakes that triggered alerts.
columns_dictionary = [
    'alert',
    'mag',
    'magType',
    'title',
    'tsunami',
    'type'
]
df_csv.loc[(df_csv.place.str.contains(location_to_check)) & (df_csv.alert.notnull()), columns_dictionary]

Unnamed: 0,alert,mag,magType,title,tsunami,type
1015,green,5.0,ml,"M 5.0 - 61km SSW of Chignik Lake, Alaska",1,earthquake
1273,green,4.0,ml,"M 4.0 - 71km SW of Kaktovik, Alaska",1,earthquake
1795,green,4.0,ml,"M 4.0 - 60km WNW of Valdez, Alaska",1,earthquake
2752,green,4.0,ml,"M 4.0 - 67km SSW of Kaktovik, Alaska",1,earthquake
3260,green,3.9,ml,"M 3.9 - 44km N of North Nenana, Alaska",0,earthquake
4101,green,4.2,ml,"M 4.2 - 131km NNW of Arctic Village, Alaska",0,earthquake
6897,green,3.8,ml,"M 3.8 - 80km SSW of Kaktovik, Alaska",0,earthquake
8524,green,3.8,ml,"M 3.8 - 69km SSW of Kaktovik, Alaska",0,earthquake
9133,green,5.1,ml,"M 5.1 - 64km SSW of Kaktovik, Alaska",1,earthquake


#### Using Regular Expressions to Filter a ``DataFrame`` Object

I can even use regular expressions to filter a ``DataFrame`` instance.

In [32]:
mag_filter = 3.8
"""
float: The magnitude to use for this filter test.
"""
df_csv.loc[(df_csv.place.str.contains(r'CA|California$')) & (df_csv.mag > mag_filter), columns_dictionary]

Unnamed: 0,alert,mag,magType,title,tsunami,type
1465,green,3.83,mw,"M 3.8 - 109km WNW of Trinidad, CA",0,earthquake
2414,green,3.83,mw,"M 3.8 - 5km SW of Tres Pinos, CA",1,earthquake


#### Using the ``between()`` Method to Filter a ``DataFrame`` Object

I can also use the ``between(a, b)`` to filter all ``int`` or ``float`` values ``a`` $ \leq x \leq $ ``b``.

In [33]:
min_mag = 6.5
"""
float: The minimum magnitude to include in the filter.
"""
max_mag = 7.5
"""
float: The maximum magnitude to include in the filter.
"""
df_csv.loc[df_csv.mag.between(min_mag, max_mag), columns_dictionary]

Unnamed: 0,alert,mag,magType,title,tsunami,type
118,green,6.7,mww,"M 6.7 - 262km NW of Ozernovskiy, Russia",1,earthquake
799,green,6.5,mww,"M 6.5 - 148km S of Severo-Kuril'sk, Russia",1,earthquake
837,green,7.0,mww,"M 7.0 - 117km E of Kimbe, Papua New Guinea",1,earthquake
4363,green,6.7,mww,"M 6.7 - 263km NNE of Ndoi Island, Fiji",1,earthquake
5263,red,7.5,mww,"M 7.5 - 78km N of Palu, Indonesia",1,earthquake


#### Using the ``isin()`` Method to Filter a ``DataFrame`` Object

I can use the ``isin()`` method to check for membership in a ``DataFrame`` instance.

In [34]:
isin_dict = ['mw', 'mwb']
"""
list(str, str): The items to use as a filter for a column.

I'll use this dictionary to return rows with either a `mw` element or a `mwb` element in the `magType` column.
"""
df_csv.loc[df_csv.magType.isin(isin_dict), columns_dictionary]

Unnamed: 0,alert,mag,magType,title,tsunami,type
995,,3.35,mw,"M 3.4 - 9km WNW of Cobb, CA",0,earthquake
1465,green,3.83,mw,"M 3.8 - 109km WNW of Trinidad, CA",0,earthquake
2414,green,3.83,mw,"M 3.8 - 5km SW of Tres Pinos, CA",1,earthquake
4988,green,4.41,mw,"M 4.4 - 1km SE of Delta, B.C., MX",1,earthquake
6307,green,5.8,mwb,"M 5.8 - 297km NNE of Ndoi Island, Fiji",0,earthquake
8257,green,5.7,mwb,"M 5.7 - 175km SSE of Lambasa, Fiji",0,earthquake


#### Using the ``idxmin()`` and ``idxmax()`` Methods to Filter a ``DataFrame`` Object

The ``idxmin()`` and ``idxmax()`` can be used to return the minimum and maximum value from a column in a ``DataFrame`` instance.

In [35]:
mag_min_max_dict = [df_csv.mag.idxmin(), df_csv.mag.idxmax()]
"""
[int, int]: The index for the minimum and maximum values in the `mag` column.
"""
df_csv.loc[mag_min_max_dict, columns_dictionary]

Unnamed: 0,alert,mag,magType,title,tsunami,type
2409,,-1.26,ml,"M -1.3 - 41km ENE of Adak, Alaska",0,earthquake
5263,red,7.5,mww,"M 7.5 - 78km N of Palu, Indonesia",1,earthquake


#### Using the ``filter()`` Method to Filter a ``DataFrame`` Object

The ``filter()`` method does not filter ``DataFrame`` objects in the same way I've done so far. Here, I'll show multiple ways to use the ``filter()`` method. For example:

* I can grab columns of a ``DataFrame`` instance by passing a list to ``items``:

In [36]:
items_list = ['mag', 'magType']
"""
[str, str]: The columns to get from my DataFrame instance.
"""
df_csv.filter(items_list).head()

Unnamed: 0,mag,magType
0,1.35,ml
1,1.29,ml
2,3.42,ml
3,0.44,ml
4,2.16,md


* I can grab columns of a ``DataFrame`` instance by passing a ``str`` value to the ``like`` parameter.

In [37]:
df_csv.filter(like = 'mag').head()

Unnamed: 0,mag,magType
0,1.35,ml
1,1.29,ml
2,3.42,ml
3,0.44,ml
4,2.16,md


* I can use a regular expression to filter a ``DataFrame`` instance using the ``regex`` argument.

In [38]:
df_csv.filter(regex = r'^t').head()

Unnamed: 0,time,title,tsunami,type,types,tz
0,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0
1,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0
2,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0
3,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0
4,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0


* I can also use ``filter()`` along the rows by passsing in ``axis = 0``.

<div class='alert alert-info' role='alert'>
    I will learn more about the <code>set_index()</code> method in the next chapter.
</div>

In [39]:
df_csv.set_index('place').filter(like = 'Japan', axis = 0).filter(items = columns_dictionary).head()

Unnamed: 0_level_0,alert,mag,magType,title,tsunami,type
place,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"160km NNW of Nago, Japan",,4.6,mb,"M 4.6 - 160km NNW of Nago, Japan",0,earthquake
"7km ESE of Asahi, Japan",,5.2,mww,"M 5.2 - 7km ESE of Asahi, Japan",0,earthquake
"14km E of Tomakomai, Japan",,4.5,mwr,"M 4.5 - 14km E of Tomakomai, Japan",0,earthquake
"139km WSW of Naze, Japan",,4.7,mb,"M 4.7 - 139km WSW of Naze, Japan",0,earthquake
"53km ESE of Kamaishi, Japan",,4.6,mb,"M 4.6 - 53km ESE of Kamaishi, Japan",0,earthquake


## Creating Data in a ``DataFrame`` Object

### Adding New Columns to a ``DataFrame`` Object

When I add a new column to a ``DataFrame`` instance, it will be added to the right end of the instance. Additionally, if I only add one element to the new column, that new element will be **broadcasted** across the rows of the ``DataFrame`` instance.

In [40]:
new_column_title = 'source'
"""
str: The heading for the new column I want to add.
"""
new_column_element = 'USGS API'
"""
str: The element I want to broadcast down my DataFrame instance.
"""
df_csv[new_column_title] = new_column_element
# I just want to see if the new elements are added
df_csv.head()

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,status,time,title,tsunami,type,types,tz,updated,url,source
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,automatic,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,automatic,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02137,28.0,21.0,",ci37389194,",3.42,ml,...,automatic,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API
3,,,37389186,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02618,,39.0,",ci37389186,",0.44,ml,...,automatic,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475196167,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API
4,,,73096941,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.07799,,192.0,",nc73096941,",2.16,md,...,automatic,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539477547926,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API


I can even use a Boolean mask to help build a new column.

In [41]:
new_column_title = "mag_negative"
new_column_element = df_csv.mag < 0
df_csv[new_column_title] = new_column_element
df_csv.head()

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,time,title,tsunami,type,types,tz,updated,url,source,mag_negative
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02137,28.0,21.0,",ci37389194,",3.42,ml,...,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False
3,,,37389186,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02618,,39.0,",ci37389186,",0.44,ml,...,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475196167,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False
4,,,73096941,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.07799,,192.0,",nc73096941,",2.16,md,...,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539477547926,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False


#### Example: Adding a ``parsed_place`` Column to My ``DataFrame`` Object

Currently, the ``place`` column has several elements that are the same thing but are called something else (example: CA and California).

In [42]:
df_csv.place.str.extract(r', (.*$)')[0].sort_values().unique()

array(['Afghanistan', 'Alaska', 'Argentina', 'Arizona', 'Arkansas',
       'Australia', 'Azerbaijan', 'B.C., MX', 'Barbuda', 'Bolivia',
       'Bonaire, Saint Eustatius and Saba ', 'British Virgin Islands',
       'Burma', 'CA', 'California', 'Canada', 'Chile', 'China',
       'Christmas Island', 'Colombia', 'Colorado', 'Costa Rica',
       'Dominican Republic', 'East Timor', 'Ecuador', 'Ecuador region',
       'El Salvador', 'Fiji', 'Greece', 'Greenland', 'Guam', 'Guatemala',
       'Haiti', 'Hawaii', 'Honduras', 'Idaho', 'Illinois', 'India',
       'Indonesia', 'Iran', 'Iraq', 'Italy', 'Jamaica', 'Japan', 'Kansas',
       'Kentucky', 'Kyrgyzstan', 'Martinique', 'Mauritius', 'Mayotte',
       'Mexico', 'Missouri', 'Montana', 'NV', 'Nevada', 'New Caledonia',
       'New Hampshire', 'New Mexico', 'New Zealand', 'Nicaragua',
       'North Carolina', 'Northern Mariana Islands', 'Oklahoma', 'Oregon',
       'Pakistan', 'Papua New Guinea', 'Peru', 'Philippines',
       'Puerto Rico', 'Roman

The code below will fix that.

In [43]:
df_csv['parsed_place'] = df_csv.place.str.replace(
    r'.* of ', '', regex=True # remove anything saying <something> of <something>
).str.replace(
    'the ', '' # remove "the "
).str.replace(
    r'CA$', 'California', regex=True # fix California
).str.replace(
    r'NV$', 'Nevada', regex=True # fix Nevada
).str.replace(
    r'MX$', 'Mexico', regex=True # fix Mexico
).str.replace(
    r' region$', '', regex=True # chop off endings with " region"
).str.replace(
    'northern ', '' # remove "northern "
).str.replace(
    'Fiji Islands', 'Fiji' # line up the Fiji places
).str.replace(
    r'^.*, ', '', regex=True # remove anything else extraneous from the beginning
).str.strip() # remove any extra spaces
df_csv.parsed_place.sort_values().unique()

array(['Afghanistan', 'Alaska', 'Argentina', 'Arizona', 'Arkansas',
       'Ascension Island', 'Australia', 'Azerbaijan', 'Balleny Islands',
       'Barbuda', 'Bolivia', 'British Virgin Islands', 'Burma',
       'California', 'Canada', 'Carlsberg Ridge',
       'Central East Pacific Rise', 'Central Mid-Atlantic Ridge', 'Chile',
       'China', 'Christmas Island', 'Colombia', 'Colorado', 'Costa Rica',
       'Dominican Republic', 'East Timor', 'Ecuador', 'El Salvador',
       'Fiji', 'Greece', 'Greenland', 'Guam', 'Guatemala', 'Haiti',
       'Hawaii', 'Honduras', 'Idaho', 'Illinois', 'India',
       'Indian Ocean Triple Junction', 'Indonesia', 'Iran', 'Iraq',
       'Italy', 'Jamaica', 'Japan', 'Kansas', 'Kentucky',
       'Kermadec Islands', 'Kuril Islands', 'Kyrgyzstan', 'Martinique',
       'Mauritius', 'Mayotte', 'Mexico', 'Mid-Indian Ridge', 'Missouri',
       'Montana', 'Nevada', 'New Caledonia', 'New Hampshire',
       'New Mexico', 'New Zealand', 'Nicaragua', 'North Carolina',


#### Using the ``assign()`` Method to Create Columns in a ``DataFrame`` Object

The ``assign()`` method allows me to create more than 1 column at a time, or update existing columns.

In [44]:
in_california = df_csv.parsed_place.str.endswith('California')
"""
pandas.core.series.Series: A column for whether the earthquake occurred in California.
"""
in_alaska = df_csv.parsed_place.str.endswith('Alaska')
"""
pandas.core.series.Series: A column for whether the earthquake occurred in Alaska.
"""
number_of_columns_to_sample = 10
"""
int: The number of columns to sample from the original DataFrame instance.
"""
df_csv.assign(
    in_california = df_csv.parsed_place.str.endswith('California'),
    in_alaska = df_csv.parsed_place.str.endswith('Alaska')).sample(number_of_columns_to_sample, random_state = 0)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,type,types,tz,updated,url,source,mag_negative,parsed_place,in_california,in_alaska
7207,,,2000hj64,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.298,,80.0,",us2000hj64,",4.8,mwr,...,earthquake,",geoserve,moment-tensor,origin,phase-data,",-360.0,1539400902040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Nicaragua,False,False
4755,,,61424302,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02342,,89.0,",uw61424302,",1.09,ml,...,earthquake,",geoserve,origin,phase-data,",-480.0,1538446772590,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Washington,False,False
4595,,,20267171,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20267171,",1.8,ml,...,earthquake,",geoserve,origin,phase-data,",-540.0,1539067248080,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True
3566,,,20268839,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20268839,",1.5,ml,...,earthquake,",geoserve,origin,phase-data,",-540.0,1539390064104,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True
2182,,,38316584,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.1339,,128.0,",ci38316584,",0.9,ml,...,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539109448181,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,True,False
4561,,,20267197,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20267197,us1000h4hb,",2.9,ml,...,earthquake,",geoserve,origin,phase-data,",-540.0,1539491199040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True
2379,,,20271488,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20271488,",1.1,ml,...,earthquake,",geoserve,origin,phase-data,",-540.0,1539219549977,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True
2846,,,73093341,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.0305,,139.0,",nc73093341,",0.44,md,...,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1538677923501,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,True,False
8600,,,20258931,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20258931,",1.4,ml,...,earthquake,",geoserve,origin,",-540.0,1537418635961,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True
4256,,,37374994,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.09785,,35.0,",ci37374994,",2.15,ml,...,earthquake,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1538410222140,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,True,False


##### Using Lambda Functions in the ``assign()`` Method to Create Columns in a ``DataFrame`` Object

With the use of `lambda` functions, the `assign()` method becomes even more powerful.

**Lambda functions** are anonymous functions usually defined in one line and for single use.

The `assign()` method passes the entire dataframe into the `lambda` function as `x`; from there, I can select the columns `in_ca` and `in_alaska`, which are being created in that same call to `assign()`.

Here, I use a `lambda` function to create a new column, `neither`, which tells if the earthquake was neither in Alaska nor California:

In [45]:
df_csv.assign(
    in_california = df_csv.parsed_place == 'California',
    in_alaska = df_csv.parsed_place == 'Alaska',
    neither = lambda x: ~x.in_california & ~x.in_alaska
).sample(number_of_columns_to_sample, random_state = 0)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,types,tz,updated,url,source,mag_negative,parsed_place,in_california,in_alaska,neither
7207,,,2000hj64,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.298,,80.0,",us2000hj64,",4.8,mwr,...,",geoserve,moment-tensor,origin,phase-data,",-360.0,1539400902040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Nicaragua,False,False,True
4755,,,61424302,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02342,,89.0,",uw61424302,",1.09,ml,...,",geoserve,origin,phase-data,",-480.0,1538446772590,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Washington,False,False,True
4595,,,20267171,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20267171,",1.8,ml,...,",geoserve,origin,phase-data,",-540.0,1539067248080,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True,False
3566,,,20268839,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20268839,",1.5,ml,...,",geoserve,origin,phase-data,",-540.0,1539390064104,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True,False
2182,,,38316584,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.1339,,128.0,",ci38316584,",0.9,ml,...,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539109448181,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,True,False,False
4561,,,20267197,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20267197,us1000h4hb,",2.9,ml,...,",geoserve,origin,phase-data,",-540.0,1539491199040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True,False
2379,,,20271488,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20271488,",1.1,ml,...,",geoserve,origin,phase-data,",-540.0,1539219549977,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True,False
2846,,,73093341,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.0305,,139.0,",nc73093341,",0.44,md,...,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1538677923501,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,True,False,False
8600,,,20258931,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20258931,",1.4,ml,...,",geoserve,origin,",-540.0,1537418635961,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Alaska,False,True,False
4256,,,37374994,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.09785,,35.0,",ci37374994,",2.15,ml,...,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1538410222140,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,True,False,False


### Concatenating ``DataFrame`` Objects with the ``pd.concat()`` Method

I can use the ``pd.concat([DataFrame_1, DataFrame_2], axis = 0)`` to append ``DataFrame_2`` to the bottom of ``DataFrame_1``.

In [46]:
tsunami = df_csv[df_csv.tsunami == 1]
"""
pandas.core.frame.DataFrame: A DataFrame instance that includes all earthquakes that led to a tsunami.
"""
no_tsunami = df_csv[df_csv.tsunami == 0]
"""
pandas.core.frame.DataFrame: A DataFrame instance that includes all earthquakes that did not lead to a tsunami.
"""
pd.concat([tsunami, no_tsunami], axis = 0)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,title,tsunami,type,types,tz,updated,url,source,mag_negative,parsed_place
36,,,1000hbsa,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.54100,,51.0,",us1000hbsa,",5.00,mww,...,"M 5.0 - 165km NNW of Flying Fish Cove, Christm...",1,earthquake,",geoserve,origin,phase-data,",420.0,1539461285040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Christmas Island
118,green,,1000hbkz,https://earthquake.usgs.gov/fdsnws/event/1/que...,2.62300,,25.0,",pt18286001,at00pgjb1a,us1000hbkz,",6.70,mww,...,"M 6.7 - 262km NW of Ozernovskiy, Russia",1,earthquake,",geoserve,ground-failure,impact-link,losspager...",600.0,1539455437040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Russia
501,green,,1000hax9,https://earthquake.usgs.gov/fdsnws/event/1/que...,2.35600,,45.0,",us1000hax9,",5.60,mww,...,"M 5.6 - 128km SE of Kimbe, Papua New Guinea",1,earthquake,",geoserve,losspager,moment-tensor,origin,phase...",600.0,1539359431040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Papua New Guinea
799,green,,1000hacw,https://earthquake.usgs.gov/fdsnws/event/1/que...,3.87900,,18.0,",pt18283004,at00pgeomo,us1000hacw,",6.50,mww,...,"M 6.5 - 148km S of Severo-Kuril'sk, Russia",1,earthquake,",geoserve,ground-failure,impact-link,losspager...",600.0,1539224915040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Russia
816,green,,1000habl,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.90700,,12.0,",us1000habl,",6.20,mww,...,"M 6.2 - 94km SW of Kokopo, Papua New Guinea",1,earthquake,",geoserve,ground-failure,losspager,moment-tens...",600.0,1539219109963,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Papua New Guinea
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9327,,,73086771,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.01806,,185.0,",nc73086771,",0.62,md,...,"M 0.6 - 9km ENE of Mammoth Lakes, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1537285598315,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California
9328,,,38063967,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.03041,,50.0,",ci38063967,",1.00,ml,...,"M 1.0 - 3km W of Julian, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1537276800970,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California
9329,,,2018261000,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.45260,,276.0,",pr2018261000,",2.40,md,...,"M 2.4 - 35km NNE of Hatillo, Puerto Rico",0,earthquake,",geoserve,origin,phase-data,",-240.0,1537243777410,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Puerto Rico
9330,,,38063959,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.01865,,61.0,",ci38063959,",1.10,ml,...,"M 1.1 - 9km NE of Aguanga, CA",0,earthquake,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1537230211640,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California


I can also change the ``pd.concat([...], axis = 0)`` argument to ``axis = 1`` to add more columns to the right end of the ``DataFrame`` instance.

In [47]:
get_columns = ['tz', 'felt', 'ids']
"""
list(str): A set of columns I want to add to my DataFrame instance.
"""
n_rows_to_get = 2
"""
int: The number of rows to return from the updated DataFrame instance.
"""
additional_columns = pd.read_csv(csv_file_path, usecols = get_columns)
"""
pandas.core.frame.DataFrame: A DataFrame instance with some columns I want to retrieve.
"""
build_list = [df_csv.head(n_rows_to_get), additional_columns.head(n_rows_to_get)]
"""
list(pandas.core.frame.DataFrame, pandas.core.frame.DataFrame): An expanded DataFrame instance.
"""
pd.concat(build_list, axis = 1)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,types,tz,updated,url,source,mag_negative,parsed_place,felt.1,ids.1,tz.1
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,,",ci37389218,",-480.0
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California,,",ci37389202,",-480.0


#### Using the ``join`` Parameter to Concatenate ``DataFrame`` Objects

The ``join`` parameter specifies how to handle any overlap in column names (when appending to the bottom) or in row names (when concatenating to the left/right).

By default, this is ``outer``, everything. However, if I use ``inner``, I will only keep what is in common between ``DataFrame`` objects.

In [48]:
pd.concat(
    [tsunami.head(2), no_tsunami.head(2).assign(type = 'earthquake')], join = 'inner'
)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,title,tsunami,type,types,tz,updated,url,source,mag_negative,parsed_place
36,,,1000hbsa,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.541,,51.0,",us1000hbsa,",5.0,mww,...,"M 5.0 - 165km NNW of Flying Fish Cove, Christm...",1,earthquake,",geoserve,origin,phase-data,",420.0,1539461285040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Christmas Island
118,green,,1000hbkz,https://earthquake.usgs.gov/fdsnws/event/1/que...,2.623,,25.0,",pt18286001,at00pgjb1a,us1000hbkz,",6.7,mww,...,"M 6.7 - 262km NW of Ozernovskiy, Russia",1,earthquake,",geoserve,ground-failure,impact-link,losspager...",600.0,1539455437040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Russia
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California


##### Using the ``ignore_index`` with the ``join`` Parameter

If I don't care about the indices, I can use the ``ignore_index`` parameter with the ``join`` parameter. This will give me sequential values instead of what was demonstrated above.

In [49]:
pd.concat(
    [tsunami.head(2), no_tsunami.head(2).assign(type = 'earthquake')], join = 'inner', ignore_index = True
)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,title,tsunami,type,types,tz,updated,url,source,mag_negative,parsed_place
0,,,1000hbsa,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.541,,51.0,",us1000hbsa,",5.0,mww,...,"M 5.0 - 165km NNW of Flying Fish Cove, Christm...",1,earthquake,",geoserve,origin,phase-data,",420.0,1539461285040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Christmas Island
1,green,,1000hbkz,https://earthquake.usgs.gov/fdsnws/event/1/que...,2.623,,25.0,",pt18286001,at00pgjb1a,us1000hbkz,",6.7,mww,...,"M 6.7 - 262km NW of Ozernovskiy, Russia",1,earthquake,",geoserve,ground-failure,impact-link,losspager...",600.0,1539455437040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,Russia
2,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California
3,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,False,California


### Deleting Unwanted Data from a ``DataFrame`` Object

#### Deleting Unwanted Data froma a ``DataFrame`` Object Using the ``del`` Command

I can use dictionary syntax and the ``del`` command to remove data from a ``DataFrame`` instance.

Before the deletion.

In [50]:
column_to_delete = 'sources'
"""
str: The column heading for the column I want to delete from my DataFrame instance.
"""
df_csv.columns

Index(['alert', 'cdi', 'code', 'detail', 'dmin', 'felt', 'gap', 'ids', 'mag',
       'magType', 'mmi', 'net', 'nst', 'place', 'rms', 'sig', 'sources',
       'status', 'time', 'title', 'tsunami', 'type', 'types', 'tz', 'updated',
       'url', 'source', 'mag_negative', 'parsed_place'],
      dtype='object')

After the deletion.

In [51]:
del df_csv[column_to_delete]
df_csv.columns

Index(['alert', 'cdi', 'code', 'detail', 'dmin', 'felt', 'gap', 'ids', 'mag',
       'magType', 'mmi', 'net', 'nst', 'place', 'rms', 'sig', 'status', 'time',
       'title', 'tsunami', 'type', 'types', 'tz', 'updated', 'url', 'source',
       'mag_negative', 'parsed_place'],
      dtype='object')

It is a good idea to pair the ``del`` command with a ``try``/``except`` block.

In [52]:
try:
    del df_csv[column_to_delete]
except KeyError:
    print(f'The {column_to_delete} column is not in this DataFrame instance anymore.')

The sources column is not in this DataFrame instance anymore.


#### Removing Data from a ``DataFrame`` Object Using the ``pop()`` Method

I can also use the ``pop()`` method to remove a column from a ``DataFrame`` instance. The beauty of this method is that the popped column is not permanently deleted. It is, instead, returned as a ``Series`` instance.

In [53]:
column_to_pop = 'mag_negative'
"""
str: The column heading for the column I want to pop out of the DataFrame instance.
"""
popped_column = df_csv.pop(column_to_pop)
"""
pandas.core.series.Series: The column I popped out of my DataFrame instance.
"""
print(type(popped_column))

<class 'pandas.core.series.Series'>


In [54]:
df_csv

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,time,title,tsunami,type,types,tz,updated,url,source,parsed_place
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.020030,,79.0,",ci37389202,",1.29,ml,...,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.021370,28.0,21.0,",ci37389194,",3.42,ml,...,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
3,,,37389186,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.026180,,39.0,",ci37389186,",0.44,ml,...,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475196167,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
4,,,73096941,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.077990,,192.0,",nc73096941,",2.16,md,...,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539477547926,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9327,,,73086771,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.018060,,185.0,",nc73086771,",0.62,md,...,1537230228060,"M 0.6 - 9km ENE of Mammoth Lakes, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1537285598315,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
9328,,,38063967,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.030410,,50.0,",ci38063967,",1.00,ml,...,1537230135130,"M 1.0 - 3km W of Julian, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1537276800970,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
9329,,,2018261000,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.452600,,276.0,",pr2018261000,",2.40,md,...,1537229908180,"M 2.4 - 35km NNE of Hatillo, Puerto Rico",0,earthquake,",geoserve,origin,phase-data,",-240.0,1537243777410,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Puerto Rico
9330,,,38063959,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.018650,,61.0,",ci38063959,",1.10,ml,...,1537229545350,"M 1.1 - 9km NE of Aguanga, CA",0,earthquake,",focal-mechanism,geoserve,nearby-cities,origin...",-480.0,1537230211640,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California


In [55]:
# This column solely exist to include the variable name in a markdown cell.
# The import I need to do to display text as Markdown.
# from IPython.display import Markdown as md

Notice that the popped column has a mask in it.

In [56]:
popped_column

0       False
1       False
2       False
3       False
4       False
        ...  
9327    False
9328    False
9329    False
9330    False
9331    False
Name: mag_negative, Length: 9332, dtype: bool

I can use the popped column to filter my data.

In [57]:
df_csv[popped_column].head()

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,time,title,tsunami,type,types,tz,updated,url,source,parsed_place
39,,,660886,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.095,,155.25,",nn00660886,",-0.1,ml,...,1539458844506,"M -0.1 - 6km NW of Lemmon Valley, Nevada",0,earthquake,",geoserve,origin,phase-data,",-480.0,1539482703428,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Nevada
49,,,660884,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.097,,155.98,",nn00660884,",-0.1,ml,...,1539455017464,"M -0.1 - 6km NW of Lemmon Valley, Nevada",0,earthquake,",geoserve,origin,phase-data,",-480.0,1539482700579,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Nevada
135,,,660897,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.092,,71.82,",nn00660897,",-0.4,ml,...,1539422175717,"M -0.4 - 10km SSE of Beatty, Nevada",0,earthquake,",geoserve,origin,phase-data,",-480.0,1539482715521,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Nevada
161,,,80314084,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.148,,117.0,",mb80314084,",-0.02,md,...,1539412475360,"M -0.0 - 20km SSE of Ronan, Montana",0,earthquake,",geoserve,origin,phase-data,",-420.0,1539433905970,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Montana
198,,,660745,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.047,,65.16,",nn00660745,",-0.2,ml,...,1539398340822,"M -0.2 - 60km N of Pahrump, Nevada",0,earthquake,",geoserve,origin,phase-data,",-480.0,1539482652322,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Nevada


#### Using the ``drop()`` Method to Drop Rows from a ``DataFrame`` Object

I can use the ``drop()`` method to drop specific rows from a ``DataFrame`` instance. 

In [62]:
rows_to_drop = [0, 10]
"""
[int, int]: The indices to drop from my DataFrame instance.
"""
df_csv.drop([0, 1, 2, 3, 4, 5]).head(5)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,time,title,tsunami,type,types,tz,updated,url,source,parsed_place
6,,,20280432,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ak20280432,",1.7,ml,...,1539473176017,"M 1.7 - 105km W of Talkeetna, Alaska",0,earthquake,",geoserve,origin,",-540.0,1539473596465,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Alaska
7,,,73096936,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.01622,,83.0,",nc73096936,",1.13,md,...,1539473060280,"M 1.1 - 10km NW of Parkfield, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539476642808,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
8,,,73096931,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.009138,,52.0,",nc73096931,",0.92,md,...,1539473042310,"M 0.9 - 6km NW of The Geysers, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539475027632,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California
9,,,1000hbtn,https://earthquake.usgs.gov/fdsnws/event/1/que...,3.191,,37.0,",us1000hbtn,",4.7,mb,...,1539472814760,"M 4.7 - 219km SSE of Saparua, Indonesia",0,earthquake,",geoserve,origin,phase-data,",540.0,1539473712040,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,Indonesia
10,,,37389162,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.005116,,57.0,",ci37389162,",0.5,ml,...,1539471831030,"M 0.5 - 10km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539472054436,https://earthquake.usgs.gov/earthquakes/eventp...,USGS API,California


#### Using the ``drop()`` Method to Drop Columns from a ``DataFrame`` Object

Using the ``columns`` argument allows me to drop columns from a ``DataFrame`` instance. The same can be done by using ``axis = 1`` argument.

In [64]:
colums_to_drop_list = ['alert', 'mag', 'title', 'time', 'tsunami']
columns_to_drop = [
    col for col in df_csv.columns if col not in colums_to_drop_list
]
df_csv.drop(columns = columns_to_drop).head()

Unnamed: 0,alert,mag,time,title,tsunami
0,,1.35,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0
1,,1.29,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0
2,,3.42,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0
3,,0.44,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0
4,,2.16,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0


By default, ``drop()``, along with the majority of ``DataFrame`` methods, will return a new ``DataFrame`` object. If we I want to change the one we are working with, we can pass ``inplace=True``.

<div class='alert alert-warning' role='alert'>
    <span>Be very careful if I choose to do this.</span>
</div>

In [66]:
df_csv.drop(columns = columns_to_drop, inplace = True)
df_csv.head()

Unnamed: 0,alert,mag,time,title,tsunami
0,,1.35,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0
1,,1.29,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0
2,,3.42,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0
3,,0.44,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0
4,,2.16,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0
