---
<center><h1>Basic intro to pandas</h1></center>

<center><h2>Work with pandas DataFrames: filtering, indexing and missing data</h2></center>
---

## Table of Contents

- [Work with pandas DataFrames: filtering, indexing and missing data](#Work-with-pandas-DataFrames:-filtering,-indexing-and-missing-data)
    * [Get basic information](#Get-basic-information)
    * [Conditional indexing and selection](#Conditional-indexing-and-selection)
    * [Work with indexes and MultiIndex option](#Work-with-indexes-and-MultiIndex-option)
    * [Selection by label and position](#Selection-by-label-and-position)
    * [Work with missing data](#Work-with-missing-data)
    - [*Exercise 1*](#Exercise-1)

In [52]:
import pandas as pd
import numpy as np
import random

## Work with pandas DataFrames: filtering, indexing and missing data

[[back to top]](#Table-of-Contents)

In this part we will continue our acquaintance with DataFrames and will get to know 
1.	how to get basic information about DataFrame and its content;
2.	how to get a segment of a Dataframe and select rows from DataFrame, which satisfy some conditions;
3.	how to change indexes in DataFrame and make advanced indexing;
4.	how to select any rows by its indexes, labels and positions;
5.	how to work with missing data.

Thus, we will divide the whole text of this lesson into logic constructed code blocks with respect to mentioned above points. In the following posts we will continue our learning of pandas and will consider its other features.

In [53]:
url="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv"
eqPastMonth=pd.read_csv(url)
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2021-02-24T19:50:07.610Z,19.181,-155.470993,32.860001,2.15,md,42.0,153.0,,0.1,hv,hv72363462,2021-02-24T19:53:30.510Z,"2 km SSE of Pāhala, Hawaii",earthquake,0.68,0.98,0.43,24.0,automatic,hv,hv
1,2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,,,,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,,0.6,,,automatic,ak,ak
2,2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,,40.5,,,automatic,nn,nn
3,2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr
5,2021-02-24T19:19:50.122Z,-15.1086,-173.3268,10.0,5.0,mb,,133.0,1.913,0.97,us,us7000ddeg,2021-02-24T19:53:29.704Z,"106 km NNE of Hihifo, Tonga",earthquake,5.4,1.8,0.052,120.0,reviewed,us,us
6,2021-02-24T19:12:39.950Z,38.1733,-117.9185,5.6,1.1,ml,8.0,105.08,0.035,0.25,nn,nn00801158,2021-02-24T19:36:32.286Z,"29 km SE of Mina, Nevada",earthquake,,1.6,,,automatic,nn,nn
7,2021-02-24T19:11:05.610Z,19.464333,-155.591995,-1.02,1.83,ml,14.0,58.0,,0.31,hv,hv72363427,2021-02-24T19:16:35.330Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.5,0.31,3.66,6.0,automatic,hv,hv
8,2021-02-24T19:07:06.420Z,19.151167,-155.474503,36.98,2.05,md,33.0,168.0,,0.11,hv,hv72363422,2021-02-24T19:10:35.040Z,"5 km S of Pāhala, Hawaii",earthquake,0.79,1.01,0.94,7.0,automatic,hv,hv
9,2021-02-24T19:05:11.260Z,32.537333,-115.230667,20.92,2.89,ml,23.0,64.0,0.1332,0.26,ci,ci39800680,2021-02-24T19:15:41.440Z,"12km ESE of Puebla, B.C., MX",earthquake,0.69,1.27,0.24,27.0,automatic,ci,ci


### Get basic information

[[back to top]](#Table-of-Contents)

pandas has a set of functions for getting basic information about DataFrame:

Lets take a look on type of `eqPastMonth` columns

In [54]:
eqPastMonth.dtypes

time                object
latitude           float64
longitude          float64
depth              float64
mag                float64
magType             object
nst                float64
gap                float64
dmin               float64
rms                float64
net                 object
id                  object
updated             object
place               object
type                object
horizontalError    float64
depthError         float64
magError           float64
magNst             float64
status              object
locationSource      object
magSource           object
dtype: object

You may notice that the dtype forthe time column is by default of type "object" meaning a string.  You can change this by using the apply function which allows one to apply a function to every row in series or dataframe. A "lambda" is a shorthand way to write your own function.

In [58]:
eqPastMonth['datetime'] = eqPastMonth['time'].apply(lambda x: (datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')))
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
0,2021-02-24T19:50:07.610Z,19.181,-155.470993,32.860001,2.15,md,42.0,153.0,,0.1,hv,hv72363462,2021-02-24T19:53:30.510Z,"2 km SSE of Pāhala, Hawaii",earthquake,0.68,0.98,0.43,24.0,automatic,hv,hv,2.25,2021-02-24 19:50:07.610
1,2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,,,,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,,0.6,,,automatic,ak,ak,1.2,2021-02-24 19:46:29.747
2,2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,,40.5,,,automatic,nn,nn,1.7,2021-02-24 19:35:10.940
3,2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv,1.89,2021-02-24 19:34:35.290
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150


In [59]:
eqPastMonth['magPlus1'] = eqPastMonth['mag'].apply(lambda x: x + 0.1)
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
0,2021-02-24T19:50:07.610Z,19.181,-155.470993,32.860001,2.15,md,42.0,153.0,,0.1,hv,hv72363462,2021-02-24T19:53:30.510Z,"2 km SSE of Pāhala, Hawaii",earthquake,0.68,0.98,0.43,24.0,automatic,hv,hv,2.25,2021-02-24 19:50:07.610
1,2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,,,,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,,0.6,,,automatic,ak,ak,1.2,2021-02-24 19:46:29.747
2,2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,,40.5,,,automatic,nn,nn,1.7,2021-02-24 19:35:10.940
3,2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv,1.89,2021-02-24 19:34:35.290
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150


Notice the new "datetime" column.  It of time datetime.

In [60]:
eqPastMonth.dtypes

time                       object
latitude                  float64
longitude                 float64
depth                     float64
mag                       float64
magType                    object
nst                       float64
gap                       float64
dmin                      float64
rms                       float64
net                        object
id                         object
updated                    object
place                      object
type                       object
horizontalError           float64
depthError                float64
magError                  float64
magNst                    float64
status                     object
locationSource             object
magSource                  object
magPlus1                  float64
datetime           datetime64[ns]
dtype: object

You can also see basic statistics about the DataFrame’s numeric columns

In [195]:
eqPastMonth.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10620 entries, 0 to 10619
Data columns (total 23 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   time             10620 non-null  object        
 1   latitude         10620 non-null  float64       
 2   longitude        10620 non-null  float64       
 3   depth            10620 non-null  float64       
 4   mag              10620 non-null  float64       
 5   magType          10620 non-null  object        
 6   nst              6919 non-null   float64       
 7   gap              8113 non-null   float64       
 8   dmin             7172 non-null   float64       
 9   rms              10619 non-null  float64       
 10  net              10620 non-null  object        
 11  id               10620 non-null  object        
 12  updated          10620 non-null  object        
 13  place            10620 non-null  object        
 14  type             10620 non-null  objec

Method `info()` shows (top down)
+ that `eqPastMonth` is an instance of DataFrame’s class; this information we have obtained with help of function `type()`;
+ number of rows in DataFrame;
+ type of each column and number of non-null rows in this column; this information in a shorted view was given by `dtypes`;
+ memory size of the DataFrame etc.
method `describe()` allows to quickly get average, minimal and maximal values, standard deviation etc. in each DataFrame column with numeric items

In [196]:
eqPastMonth.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst
count,10620.0,10620.0,10620.0,10620.0,6919.0,8113.0,7172.0,10619.0,7074.0,10619.0,7449.0,7828.0
mean,38.057262,-111.55976,22.700524,1.701339,22.034543,116.841777,0.647677,0.29834,1.771323,2.689489,0.240811,15.951456
std,20.09134,66.637466,48.77827,1.21,16.079741,65.360299,2.196168,0.276879,3.13608,49.156653,0.389112,28.485505
min,-64.7563,-179.9899,-4.7,-1.27,2.0,11.0,0.0,0.0,0.09,0.0,0.0,0.0
25%,33.381708,-150.1127,4.34,0.91,11.0,70.0,0.023915,0.10985,0.27,0.4,0.106,5.0
50%,38.15835,-118.932,8.9,1.43,18.0,100.0,0.067395,0.19,0.46,0.7,0.162,9.0
75%,53.5783,-115.638542,18.3,2.1,28.0,148.0,0.189075,0.45,0.96,1.5,0.23,18.0
max,85.0818,179.6491,662.1,7.7,145.0,350.0,52.111,2.93,26.8,5041.1,5.19,620.0


### Conditional indexing and selection

[[back to top]](#Table-of-Contents)

As we said above DataFrame is a group of Series objects. This allows you to select specific column (a Series) from the DataFrame (in this case you get a Series) or a few columns (in this case you get another DataFrame)

In [61]:
eqPastMonth_mag = eqPastMonth['mag']
# Here we are showing only one column, i.e. a Series
print ('type:', type(eqPastMonth_mag))
eqPastMonth_mag.head(10)

type: <class 'pandas.core.series.Series'>


0    2.15
1    1.10
2    1.60
3    1.79
4    2.41
5    5.00
6    1.10
7    1.83
8    2.05
9    2.89
Name: mag, dtype: float64

In [62]:
eqPastMonth_record = eqPastMonth[['time','depth', 'mag', 'place']]
# Here we are showing four columns, i.e. a new DataFrame
print ('type:', type(eqPastMonth_record))
eqPastMonth_record.tail()

type: <class 'pandas.core.frame.DataFrame'>


Unnamed: 0,time,depth,mag,place
10571,2021-01-25T20:26:37.822Z,10.0,4.5,Reykjanes Ridge
10572,2021-01-25T20:20:08.400Z,10.3,1.7,"12 km SSE of Talkeetna, Alaska"
10573,2021-01-25T20:16:10.458Z,2.1,1.5,"47 km SSW of Denali National Park, Alaska"
10574,2021-01-25T20:13:31.390Z,6.99,1.02,"23km ESE of Little Lake, CA"
10575,2021-01-25T20:09:26.382Z,64.5,1.2,"1 km NW of Susitna, Alaska"


You can also refer to one column in such way

In [64]:
eqPastMonth_record.time

0        2021-02-24T19:50:07.610Z
1        2021-02-24T19:46:29.747Z
2        2021-02-24T19:35:10.940Z
3        2021-02-24T19:34:35.290Z
4        2021-02-24T19:25:06.150Z
                   ...           
10571    2021-01-25T20:26:37.822Z
10572    2021-01-25T20:20:08.400Z
10573    2021-01-25T20:16:10.458Z
10574    2021-01-25T20:13:31.390Z
10575    2021-01-25T20:09:26.382Z
Name: time, Length: 10576, dtype: object

Filtered DataFrames can be obtained by using of logic operators

In [200]:
# Let's display only large earthquakes
eqPastMonth_large = eqPastMonth[eqPastMonth['mag'] > 5]
eqPastMonth_large.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740
100,2021-02-16T19:09:41.237Z,-2.8881,139.201,48.19,5.4,mww,,47.0,6.943,0.74,us,usd000eesi,2021-02-16T21:12:58.341Z,"162 km WSW of Abepura, Indonesia",earthquake,7.6,5.1,0.083,14.0,reviewed,us,us,2021-02-16 19:09:41.237
131,2021-02-16T15:54:07.766Z,-18.8866,-173.6117,10.0,5.2,mww,,149.0,5.252,0.52,us,us6000di3t,2021-02-16T22:51:05.852Z,"47 km ESE of Neiafu, Tonga",earthquake,11.1,1.9,0.098,10.0,reviewed,us,us,2021-02-16 15:54:07.766
180,2021-02-16T09:55:32.318Z,52.0601,156.989,152.04,5.3,mb,,80.0,1.135,0.87,us,us6000di0n,2021-02-16T10:10:36.040Z,"71 km NNE of Ozernovskiy, Russia",earthquake,3.9,6.0,0.023,620.0,reviewed,us,us,2021-02-16 09:55:32.318
189,2021-02-16T08:42:11.806Z,-6.0804,112.1674,599.74,5.1,mww,,25.0,1.967,1.19,us,us6000di0i,2021-02-16T08:55:23.040Z,"91 km NNW of Paciran, Indonesia",earthquake,10.4,7.3,0.093,11.0,reviewed,us,us,2021-02-16 08:42:11.806
194,2021-02-16T08:07:23.479Z,-15.6949,167.8335,101.67,5.1,mb,,115.0,5.758,0.81,us,us6000di0g,2021-02-16T08:50:41.040Z,"62 km NE of Norsup, Vanuatu",earthquake,10.0,4.6,0.087,43.0,reviewed,us,us,2021-02-16 08:07:23.479
203,2021-02-16T06:42:01.809Z,-13.3581,166.8405,52.65,5.1,mb,,59.0,7.558,0.68,us,us6000dhzx,2021-02-16T07:45:14.040Z,"95 km NW of Sola, Vanuatu",earthquake,8.3,7.2,0.063,81.0,reviewed,us,us,2021-02-16 06:42:01.809
233,2021-02-16T03:37:11.369Z,-17.9892,167.5914,10.0,5.3,mww,,38.0,3.498,0.98,us,us6000dhyf,2021-02-16T06:25:18.040Z,"81 km WSW of Port-Vila, Vanuatu",earthquake,7.5,1.8,0.098,10.0,reviewed,us,us,2021-02-16 03:37:11.369
252,2021-02-16T01:31:59.756Z,-17.8082,167.6351,10.0,5.9,mww,,38.0,3.672,0.79,us,us6000dhxu,2021-02-17T01:35:13.282Z,"72 km W of Port-Vila, Vanuatu",earthquake,7.8,1.7,0.056,31.0,reviewed,us,us,2021-02-16 01:31:59.756
255,2021-02-16T01:22:19.619Z,-17.6551,167.7198,10.0,5.4,mww,,25.0,4.34,0.85,us,us6000dhxs,2021-02-17T01:25:18.841Z,"63 km W of Port-Vila, Vanuatu",earthquake,8.6,1.7,0.063,24.0,reviewed,us,us,2021-02-16 01:22:19.619


In [65]:
#Getting records that are large (>5mag) earthquakes and that occurred in the northern hemisphere
filtered_df_1 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['latitude'] > 0)]
filtered_df_1.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,magPlus1
count,44.0,44.0,44.0,44.0,0.0,43.0,43.0,44.0,43.0,44.0,43.0,43.0,44.0
mean,28.556809,65.848882,37.869545,5.352273,,71.023256,2.088581,0.8625,6.583721,3.786364,0.058977,112.325581,5.452273
std,17.622015,102.851326,35.121386,0.357958,,36.141579,1.963879,0.240959,1.343579,1.849776,0.022141,173.080317,0.357958
min,1.2795,-179.3142,5.34,5.1,,16.0,0.346,0.35,3.9,0.7,0.023,10.0,5.2
25%,14.21865,25.084825,10.0,5.1,,41.0,0.643,0.7375,5.5,1.8,0.045,19.5,5.2
50%,29.83095,121.9089,23.765,5.2,,71.0,1.57,0.815,6.8,3.75,0.055,34.0,5.3
75%,39.469025,140.829175,49.6325,5.425,,84.0,2.7955,1.0825,7.55,5.5,0.072,108.5,5.525
max,63.9602,169.2391,152.04,7.1,,187.0,8.439,1.38,9.4,6.8,0.098,657.0,7.2


In [66]:
#Getting records that are large (>5mag) earthquakes and that occurred in the southern hemisphere
filtered_df_1 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['latitude'] < 0)]
filtered_df_1.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,magPlus1
count,147.0,147.0,147.0,147.0,0.0,147.0,147.0,147.0,147.0,147.0,146.0,146.0,147.0
mean,-20.353922,122.277174,26.483401,5.39932,,68.14966,6.747966,0.874694,9.302041,2.491156,0.073664,65.061644,5.49932
std,9.570325,103.133707,60.993346,0.381866,,31.066732,4.364112,0.228444,2.488865,1.454559,0.024729,85.489037,0.381866
min,-61.7259,-179.4181,5.59,5.1,,10.0,0.415,0.44,2.7,0.5,0.025,8.0,5.2
25%,-23.22395,145.08195,10.0,5.2,,44.5,3.5645,0.71,7.55,1.8,0.056,16.0,5.3
50%,-22.7553,171.2118,10.0,5.2,,63.0,7.814,0.83,9.0,1.9,0.0715,36.5,5.3
75%,-17.65405,171.6712,10.0,5.6,,90.5,8.1595,1.0,11.15,1.9,0.09175,80.0,5.7
max,-0.134,177.9523,599.74,7.7,,149.0,33.522,1.7,15.1,7.7,0.203,515.0,7.8


In [67]:
#Getting records that are large (>5mag) earthquakes and that occurred in the western hemisphere, but not after 120 w longitude, also filter columns in output
filtered_df_2 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['longitude'] < 0) & (eqPastMonth['longitude'] > -120)][['depth', 'mag', 'place']]
filtered_df_2.head(10)

Unnamed: 0,depth,mag,place
118,10.0,5.6,"5 km ESE of Vogar, Iceland"
451,54.37,5.2,South Sandwich Islands region
635,19.79,5.2,"78 km WSW of Puerto Madero, Mexico"
1534,10.0,5.1,central Mid-Atlantic Ridge
1689,10.0,5.8,central East Pacific Rise
1801,98.55,5.2,"12 km SSW of Yantzaza, Ecuador"
3013,35.0,5.1,"60 km SSW of La Gomera, Guatemala"
3183,13.66,5.2,"62 km WSW of Iquique, Chile"
3375,69.69,5.3,"33 km W of Celica, Ecuador"
4241,48.07,5.2,"20 km SW of Illapel, Chile"


You can also use the method `isin(range)` for checking the presence of Series items in range, method `isnull()` for define `null` (`NaN`) values and boolean operators `&` (`AND`) and `|` (`OR`) in complicated conditions.

As you can see after filtering result tables (i.e. DataFrames) have non-ordered indexes. To fix this trouble you may write the following:

In [203]:
filtered_df_2.reset_index().head(10)

Unnamed: 0,index,depth,mag,place
0,670,35.0,5.1,"60 km SSW of La Gomera, Guatemala"
1,815,30.87,5.2,"54 km SW of Iquique, Chile"
2,989,71.44,5.3,"22 km W of Celica, Ecuador"
3,1772,48.67,5.2,"25 km WSW of Illapel, Chile"
4,1942,10.0,5.7,central East Pacific Rise
5,2421,10.63,5.1,northern Peru
6,2525,103.95,5.1,"4 km ENE of Santa Rita de Siguas, Peru"
7,4599,10.0,6.7,West Chile Rise
8,5576,7.22,5.6,"82 km SSE of Lethem, Guyana"
9,5816,168.65,5.1,"63 km W of San Antonio de los Cobres, Argentina"


to start indexing form 0 and regularize it.

Also remember that you can add new columns and rows to the DataFrame:

In [204]:
#set new custom_score column and fill it with empty strings
eqPastMonth['custom_mag'] = ''
eqPastMonth['custom_mag'] = np.where(eqPastMonth['mag'] < 5, 'Small', "Large")
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
0,2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc,2021-02-17 04:25:59.180,Small
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn,2021-02-17 03:36:18.420,Small
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740,Large
6,2021-02-17T03:27:43.060Z,38.835834,-122.786835,1.84,0.91,md,22.0,73.0,0.003758,0.04,nc,nc73524141,2021-02-17T03:43:04.287Z,"6km WNW of Cobb, CA",earthquake,0.22,0.39,0.13,3.0,automatic,nc,nc,2021-02-17 03:27:43.060,Small
7,2021-02-17T03:25:29.491Z,-25.0984,-71.0266,10.0,4.4,mb,,169.0,0.735,0.56,us,us6000diaa,2021-02-17T03:37:29.040Z,"64 km WNW of Taltal, Chile",earthquake,4.5,2.0,0.24,5.0,reviewed,us,us,2021-02-17 03:25:29.491,Small
8,2021-02-17T03:24:15.431Z,59.8278,-152.2928,78.8,2.1,ml,,,,0.44,ak,ak02127fq4w8,2021-02-17T03:31:19.831Z,"26 km WNW of Anchor Point, Alaska",earthquake,,0.8,,,automatic,ak,ak,2021-02-17 03:24:15.431,Small
9,2021-02-17T02:58:09.380Z,18.7791,-66.1428,29.0,2.96,md,19.0,264.0,0.4738,0.22,pr,pr2021048001,2021-02-17T03:31:00.983Z,"34 km N of San Juan, Puerto Rico",earthquake,0.89,3.56,0.13,17.0,reviewed,pr,pr,2021-02-17 02:58:09.380,Small


### Work with indexes and MultiIndex option

[[back to top]](#Table-of-Contents)

Pandas allows to set specific indexes to a DataFrame. It can be defined at creating of a DataFrame:

In [70]:
import random
indexes = [random.randrange(0,100) for i in range(5)]
data = [{i:random.randint(0,10) for i in 'ABCDE'} for i in range(5)]
df = pd.DataFrame(data, index=indexes)
df

Unnamed: 0,A,B,C,D,E
53,5,4,10,4,4
21,8,9,8,4,6
26,2,5,4,0,5
14,3,10,3,4,0
81,9,1,3,1,7


Or be change any time

In [71]:
df.index = ['a', 'b', 'c', 'd', 'e']
df

Unnamed: 0,A,B,C,D,E
a,5,4,10,4,4
b,8,9,8,4,6
c,2,5,4,0,5
d,3,10,3,4,0
e,9,1,3,1,7


There is the possibility to select any column (one or more) as index column

In [72]:
# if duplicates exist you can drop duplicates to get unique values
#eqPastMonth_nodups = eqPastMonth.drop_duplicates(subset='time', keep='last')
# we don't need to do that.
# set 'time' as index
eqPastMonth_indexChange = eqPastMonth.set_index('time')
eqPastMonth_indexChange.head(10)

Unnamed: 0_level_0,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
2021-02-24T19:50:07.610Z,19.181,-155.470993,32.860001,2.15,md,42.0,153.0,,0.1,hv,hv72363462,2021-02-24T19:53:30.510Z,"2 km SSE of Pāhala, Hawaii",earthquake,0.68,0.98,0.43,24.0,automatic,hv,hv,2.25,2021-02-24 19:50:07.610
2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,,,,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,,0.6,,,automatic,ak,ak,1.2,2021-02-24 19:46:29.747
2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,,40.5,,,automatic,nn,nn,1.7,2021-02-24 19:35:10.940
2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv,1.89,2021-02-24 19:34:35.290
2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150
2021-02-24T19:19:50.122Z,-15.1086,-173.3268,10.0,5.0,mb,,133.0,1.913,0.97,us,us7000ddeg,2021-02-24T19:53:29.704Z,"106 km NNE of Hihifo, Tonga",earthquake,5.4,1.8,0.052,120.0,reviewed,us,us,5.1,2021-02-24 19:19:50.122
2021-02-24T19:12:39.950Z,38.1733,-117.9185,5.6,1.1,ml,8.0,105.08,0.035,0.25,nn,nn00801158,2021-02-24T19:36:32.286Z,"29 km SE of Mina, Nevada",earthquake,,1.6,,,automatic,nn,nn,1.2,2021-02-24 19:12:39.950
2021-02-24T19:11:05.610Z,19.464333,-155.591995,-1.02,1.83,ml,14.0,58.0,,0.31,hv,hv72363427,2021-02-24T19:16:35.330Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.5,0.31,3.66,6.0,automatic,hv,hv,1.93,2021-02-24 19:11:05.610
2021-02-24T19:07:06.420Z,19.151167,-155.474503,36.98,2.05,md,33.0,168.0,,0.11,hv,hv72363422,2021-02-24T19:10:35.040Z,"5 km S of Pāhala, Hawaii",earthquake,0.79,1.01,0.94,7.0,automatic,hv,hv,2.15,2021-02-24 19:07:06.420
2021-02-24T19:05:11.260Z,32.537333,-115.230667,20.92,2.89,ml,23.0,64.0,0.1332,0.26,ci,ci39800680,2021-02-24T19:15:41.440Z,"12km ESE of Puebla, B.C., MX",earthquake,0.69,1.27,0.24,27.0,automatic,ci,ci,2.99,2021-02-24 19:05:11.260


By default, `set_index()` returns a new DataFrame, so you’ll have to specify if you’d like the changes to occur in place.

Let’s create a many levels index for `filtered_df_2` DataFrame

In [73]:
# set 'id' & 'type' as index
eqPastMonth_multi = eqPastMonth.set_index(['id','type'])[["latitude","longitude","depth","mag", "place"]]
eqPastMonth_multi.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,latitude,longitude,depth,mag,place
id,type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
hv72363462,earthquake,19.181,-155.470993,32.860001,2.15,"2 km SSE of Pāhala, Hawaii"
ak0212ja58vx,earthquake,64.7437,-149.2695,20.1,1.1,"17 km NNW of Four Mile Road, Alaska"
nn00801159,earthquake,37.3513,-115.6738,0.3,1.6,"33 km S of Rachel, Nevada"
hv72363447,earthquake,19.453667,-155.597504,-1.39,1.79,"28 km E of Honaunau-Napoopoo, Hawaii"
pr2021055003,earthquake,17.821,-66.8773,10.0,2.41,"16 km S of Guánica, Puerto Rico"
us7000ddeg,earthquake,-15.1086,-173.3268,10.0,5.0,"106 km NNE of Hihifo, Tonga"
nn00801158,earthquake,38.1733,-117.9185,5.6,1.1,"29 km SE of Mina, Nevada"
hv72363427,earthquake,19.464333,-155.591995,-1.02,1.83,"28 km E of Honaunau-Napoopoo, Hawaii"
hv72363422,earthquake,19.151167,-155.474503,36.98,2.05,"5 km S of Pāhala, Hawaii"
ci39800680,earthquake,32.537333,-115.230667,20.92,2.89,"12km ESE of Puebla, B.C., MX"


and see the type of `eqPastMonth_multi.index()`

In [74]:
print ('type: ', type(eqPastMonth_multi.index))

type:  <class 'pandas.core.indexes.multi.MultiIndex'>


Thus, we get a new pandas class MultiIndex, which contains information about indexing of DataFrame and allows manipulating with this data. It’s interesting what is the type of `filtered_df_2.index()`?

You can get levels, labels and names values simply address it as to an attribute

### Selection by label and position
[[back to top]](#Table-of-Contents)

After reading previous three subparagraphs probably you have the question: Ok, I know now filter a DataFrame, how make it multi-indexed, but I don’t know how select any specific row in the table.
Object selection in pandas is now supported by two types of multi-axis indexing.

* `.loc` works on labels in the index;
* `.iloc` works on the positions in the index (so it only takes integers);

    
The sequence of the following examples demonstrates how we can manipulate with DataFrame’s rows.
At first let’s get the first row of equakes in the past month.

In [75]:
#To return a single record(i.e. row), in this case the first one.
eqPastMonth.loc[0]

time                 2021-02-24T19:50:07.610Z
latitude                               19.181
longitude                         -155.470993
depth                               32.860001
mag                                      2.15
magType                                    md
nst                                      42.0
gap                                     153.0
dmin                                      NaN
rms                                       0.1
net                                        hv
id                                 hv72363462
updated              2021-02-24T19:53:30.510Z
place              2 km SSE of Pāhala, Hawaii
type                               earthquake
horizontalError                          0.68
depthError                               0.98
magError                                 0.43
magNst                                   24.0
status                              automatic
locationSource                             hv
magSource                         

and rows from 1 to 3 (pay attention on setting of ranges in `.loc`, the right boundary is included to this range which IS different than Python lists and string data structures)

In [211]:
eqPastMonth.loc[1:3]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large


As you can see the first argument of `.loc` corresponds to index name. If you want return value of specific column(s), you should to define the name of this(these) column(s)

In [76]:
eqPastMonth.loc[0, 'place']

'2 km SSE of Pāhala, Hawaii'

In [81]:
eqPastMonth.loc[3:10, ['place', 'mag']]

Unnamed: 0,place,mag
3,"28 km E of Honaunau-Napoopoo, Hawaii",1.79
4,"16 km S of Guánica, Puerto Rico",2.41
5,"106 km NNE of Hihifo, Tonga",5.0
6,"29 km SE of Mina, Nevada",1.1
7,"28 km E of Honaunau-Napoopoo, Hawaii",1.83
8,"5 km S of Pāhala, Hawaii",2.05
9,"12km ESE of Puebla, B.C., MX",2.89
10,"2km NW of The Geysers, CA",1.42


Let’s repeat that the first argument of `.loc` is not row number but name of the index for this row

But if it is necessary to obtain rows by it number you may use `.iloc`

In [214]:
eqPastMonth.iloc[0]

time                 2021-02-17T04:25:59.180Z
latitude                            35.763168
longitude                         -120.329834
depth                                    9.28
mag                                      1.91
magType                                    md
nst                                      17.0
gap                                      90.0
dmin                                  0.01278
rms                                      0.08
net                                        nc
id                                 nc73524166
updated              2021-02-17T04:27:35.667Z
place                  6km NNW of Cholame, CA
type                               earthquake
horizontalError                          0.38
depthError                               0.52
magError                                 0.16
magNst                                    4.0
status                              automatic
locationSource                             nc
magSource                         

In [84]:
eqPastMonth.iloc[1:5,3:5]

Unnamed: 0,depth,mag
1,20.1,1.1
2,0.3,1.6
3,-1.39,1.79
4,10.0,2.41


In the first case column’s number coincides with its name. The second example demonstrates the difference between `.loc` and `.iloc`

In [85]:
eqPastMonth.loc[1:5]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
1,2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,,,,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,,0.6,,,automatic,ak,ak,1.2,2021-02-24 19:46:29.747
2,2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,,40.5,,,automatic,nn,nn,1.7,2021-02-24 19:35:10.940
3,2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv,1.89,2021-02-24 19:34:35.290
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150
5,2021-02-24T19:19:50.122Z,-15.1086,-173.3268,10.0,5.0,mb,,133.0,1.913,0.97,us,us7000ddeg,2021-02-24T19:53:29.704Z,"106 km NNE of Hihifo, Tonga",earthquake,5.4,1.8,0.052,120.0,reviewed,us,us,5.1,2021-02-24 19:19:50.122


### Work with missing data

[[back to top]](#Table-of-Contents)

Pandas primarily uses the value `np.nan` to represent missing data (in table missed/empty value are marked by `NaN`). It is by default not included in computations. Missing data creates many issues at mathematical or computational tasks with DataFrames and Series and it’s important to know how fight with these values.

Previously we have learned how to check `null` and `non-null` values in the DataFrame and Series and how to miss `null` row in the table. But what to do if we need to use rows with `null` data, for example, find sum of all values in the dataset?

Let’s try do this


In [87]:
magError = eqPastMonth['magError']
sum(magError)

nan

The result is unexpected because there many `non-null` values in `eqPastMonth['magError']` Series. Sure, we could filter `magError['magError']`  and remain only `non-null` values. But what if we need sum all numerical values in `magError`? This way will be powerless or too complicated, because we will drop all row items even there is only one `null` value in this row. You can try to do this yourself.

To solve the assigned task you may use an elegant pandas method `fillna(value)`, which replace all `null` values by value.


In [88]:
magError = eqPastMonth['magError'].fillna(0)
sum(magError)

1909.2427897239515

In [89]:
eqPastMonth_fillna = eqPastMonth.fillna(0)
eqPastMonth_fillna.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
0,2021-02-24T19:50:07.610Z,19.181,-155.470993,32.860001,2.15,md,42.0,153.0,0.0,0.1,hv,hv72363462,2021-02-24T19:53:30.510Z,"2 km SSE of Pāhala, Hawaii",earthquake,0.68,0.98,0.43,24.0,automatic,hv,hv,2.25,2021-02-24 19:50:07.610
1,2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,0.0,0.0,0.0,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,0.0,0.6,0.0,0.0,automatic,ak,ak,1.2,2021-02-24 19:46:29.747
2,2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,0.0,40.5,0.0,0.0,automatic,nn,nn,1.7,2021-02-24 19:35:10.940
3,2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,0.0,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv,1.89,2021-02-24 19:34:35.290
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150
5,2021-02-24T19:19:50.122Z,-15.1086,-173.3268,10.0,5.0,mb,0.0,133.0,1.913,0.97,us,us7000ddeg,2021-02-24T19:53:29.704Z,"106 km NNE of Hihifo, Tonga",earthquake,5.4,1.8,0.052,120.0,reviewed,us,us,5.1,2021-02-24 19:19:50.122
6,2021-02-24T19:12:39.950Z,38.1733,-117.9185,5.6,1.1,ml,8.0,105.08,0.035,0.25,nn,nn00801158,2021-02-24T19:36:32.286Z,"29 km SE of Mina, Nevada",earthquake,0.0,1.6,0.0,0.0,automatic,nn,nn,1.2,2021-02-24 19:12:39.950
7,2021-02-24T19:11:05.610Z,19.464333,-155.591995,-1.02,1.83,ml,14.0,58.0,0.0,0.31,hv,hv72363427,2021-02-24T19:16:35.330Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.5,0.31,3.66,6.0,automatic,hv,hv,1.93,2021-02-24 19:11:05.610
8,2021-02-24T19:07:06.420Z,19.151167,-155.474503,36.98,2.05,md,33.0,168.0,0.0,0.11,hv,hv72363422,2021-02-24T19:10:35.040Z,"5 km S of Pāhala, Hawaii",earthquake,0.79,1.01,0.94,7.0,automatic,hv,hv,2.15,2021-02-24 19:07:06.420
9,2021-02-24T19:05:11.260Z,32.537333,-115.230667,20.92,2.89,ml,23.0,64.0,0.1332,0.26,ci,ci39800680,2021-02-24T19:15:41.440Z,"12km ESE of Puebla, B.C., MX",earthquake,0.69,1.27,0.24,27.0,automatic,ci,ci,2.99,2021-02-24 19:05:11.260


Thus, we replace all `NaN` items to `0`. If `inplace=True` in `fillna()` method, then a DataFrame renew.
   
To remain only rows with `non-null` values you can use method `dropna()`

In [92]:
eqPastMonth_fillna = eqPastMonth.dropna(0)
print(eqPastMonth_fillna.mean())
eqPastMonth_fillna.head(10)



latitude            35.553778
longitude         -112.862699
depth                7.835666
mag                  1.274546
nst                 21.227313
gap                109.896636
dmin                 0.107263
rms                  0.135629
horizontalError      0.529559
depthError           2.483006
magError             0.165700
magNst              12.850376
magPlus1             1.374546
dtype: float64


Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150
9,2021-02-24T19:05:11.260Z,32.537333,-115.230667,20.92,2.89,ml,23.0,64.0,0.1332,0.26,ci,ci39800680,2021-02-24T19:15:41.440Z,"12km ESE of Puebla, B.C., MX",earthquake,0.69,1.27,0.24,27.0,automatic,ci,ci,2.99,2021-02-24 19:05:11.260
10,2021-02-24T18:49:40.330Z,38.788502,-122.76767,1.09,1.42,md,29.0,79.0,0.01361,0.04,nc,nc73527321,2021-02-24T19:17:06.472Z,"2km NW of The Geysers, CA",earthquake,0.21,0.31,0.21,9.0,automatic,nc,nc,1.52,2021-02-24 18:49:40.330
11,2021-02-24T18:44:19.570Z,36.202331,-120.174004,4.84,2.33,md,12.0,114.0,0.192,0.13,nc,nc73527316,2021-02-24T19:06:05.527Z,"6km W of Huron, CA",earthquake,1.01,1.56,0.59,3.0,automatic,nc,nc,2.43,2021-02-24 18:44:19.570
12,2021-02-24T18:34:13.730Z,32.5355,-115.227833,17.89,2.57,ml,27.0,64.0,0.1328,0.27,ci,ci39800640,2021-02-24T19:53:22.185Z,"12km ESE of Puebla, B.C., MX",earthquake,0.44,1.01,0.217,17.0,reviewed,ci,ci,2.67,2021-02-24 18:34:13.730
17,2021-02-24T18:18:08.870Z,38.835167,-122.791168,1.83,1.34,md,29.0,35.0,0.003301,0.02,nc,nc73527291,2021-02-24T18:44:05.271Z,"6km WNW of Cobb, CA",earthquake,0.18,0.3,0.24,6.0,automatic,nc,nc,1.44,2021-02-24 18:18:08.870
19,2021-02-24T18:16:32.300Z,38.587166,-122.801666,9.73,1.27,md,11.0,138.0,0.1282,0.15,nc,nc73527286,2021-02-24T19:58:05.729Z,"5km NNE of Windsor, CA",earthquake,1.05,2.93,0.1,8.0,automatic,nc,nc,1.37,2021-02-24 18:16:32.300
25,2021-02-24T17:22:08.540Z,19.0258,-65.1088,12.0,3.46,md,9.0,330.0,1.1526,0.29,pr,pr2021055002,2021-02-24T17:55:51.940Z,"77 km NNW of Charlotte Amalie, U.S. Virgin Isl...",earthquake,13.32,19.2,0.18,7.0,reviewed,pr,pr,3.56,2021-02-24 17:22:08.540
26,2021-02-24T17:14:29.110Z,38.834499,-122.818001,1.08,0.79,md,21.0,50.0,0.0127,0.02,nc,nc73527261,2021-02-24T18:40:07.246Z,"8km NW of The Geysers, CA",earthquake,0.18,0.46,0.13,2.0,automatic,nc,nc,0.89,2021-02-24 17:14:29.110
27,2021-02-24T17:10:40.980Z,32.795333,-115.4645,5.21,1.42,ml,16.0,92.0,0.07059,0.3,ci,ci39800568,2021-02-24T17:14:27.810Z,"8km WSW of Holtville, CA",earthquake,0.75,1.85,0.215,15.0,automatic,ci,ci,1.52,2021-02-24 17:10:40.980


We can manipulate by `null` values and columns using parameters subset and how to set analyzing columns and type of analysis respectively

> ### Exercise 1

> - Get type of `“latitude”` column in `eqPastMonth`. 

> - In `eqPastMonth` find all rows where `magType` corresponds to the value `"md"` and where `mag` is less `5` and `not-null` `magError`. Call the obtained DataFrmae as `eqPastMonth_md_large`. 

In [221]:
# type your code here
