---
<center><h1>Basic intro to pandas</h1></center>

<center><h2>Work with pandas DataFrames: filtering, indexing and missing data</h2></center>
---

## Table of Contents

- [Work with pandas DataFrames: filtering, indexing and missing data](#Work-with-pandas-DataFrames:-filtering,-indexing-and-missing-data)
    * [Get basic information](#Get-basic-information)
    * [Conditional indexing and selection](#Conditional-indexing-and-selection)
    * [Work with indexes and MultiIndex option](#Work-with-indexes-and-MultiIndex-option)
    * [Selection by label and position](#Selection-by-label-and-position)
    * [Work with missing data](#Work-with-missing-data)
    - [*Exercise 1*](#Exercise-1)

In [57]:
import pandas as pd
import numpy as np
import random

## Work with pandas DataFrames: filtering, indexing and missing data

[[back to top]](#Table-of-Contents)

In this part we will continue our acquaintance with DataFrames and will get to know 
1.	how to get basic information about DataFrame and its content;
2.	how to get a segment of a Dataframe and select rows from DataFrame, which satisfy some conditions;
3.	how to change indexes in DataFrame and make advanced indexing;
4.	how to select any rows by its indexes, labels and positions;
5.	how to work with missing data.

Thus, we will divide the whole text of this lesson into logic constructed code blocks with respect to mentioned above points. In the following posts we will continue our learning of pandas and will consider its other features.

In [58]:
url="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv"
eqPastMonth=pd.read_csv(url)
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2022-02-16T20:07:51.700Z,33.165667,-116.615167,-0.08,0.9,ml,13.0,178.0,0.3278,0.36,ci,ci40187648,2022-02-16T20:11:28.370Z,"10km N of Julian, CA",earthquake,1.86,31.61,0.135,9.0,automatic,ci,ci
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc
2,2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci
4,2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc
5,2022-02-16T19:35:20.500Z,38.789166,-122.762497,3.43,0.84,md,15.0,96.0,0.01165,0.04,nc,nc73693781,2022-02-16T19:47:12.061Z,"1km NNW of The Geysers, CA",earthquake,0.35,0.69,0.01,2.0,automatic,nc,nc
6,2022-02-16T19:32:55.130Z,37.988834,-122.454666,1.62,1.53,md,10.0,114.0,0.02055,0.06,nc,nc73693776,2022-02-16T19:44:10.907Z,"6km E of Santa Venetia, CA",earthquake,0.54,0.47,0.2,6.0,automatic,nc,nc
7,2022-02-16T19:19:46.840Z,38.819832,-122.803169,2.35,0.84,md,11.0,89.0,0.006461,0.01,nc,nc73693766,2022-02-16T19:29:10.811Z,"6km NW of The Geysers, CA",earthquake,0.34,0.86,,1.0,automatic,nc,nc
8,2022-02-16T19:18:15.583Z,61.3641,-149.8488,31.4,1.7,ml,,,,0.55,ak,ak022261ofsa,2022-02-16T19:22:52.289Z,"6 km E of Point MacKenzie, Alaska",earthquake,,0.1,,,automatic,ak,ak
9,2022-02-16T19:17:25.040Z,33.43,-117.6585,13.95,1.16,ml,21.0,218.0,0.05564,0.13,ci,ci40187592,2022-02-16T19:31:45.245Z,"4km W of San Clemente, CA",earthquake,0.37,0.57,0.099,9.0,reviewed,ci,ci


### Get basic information

[[back to top]](#Table-of-Contents)

pandas has a set of functions for getting basic information about DataFrame:

Lets take a look on type of `eqPastMonth` columns

In [59]:
eqPastMonth.dtypes

time                object
latitude           float64
longitude          float64
depth              float64
mag                float64
magType             object
nst                float64
gap                float64
dmin               float64
rms                float64
net                 object
id                  object
updated             object
place               object
type                object
horizontalError    float64
depthError         float64
magError           float64
magNst             float64
status              object
locationSource      object
magSource           object
dtype: object

You may notice that the dtype forthe time column is by default of type "object" meaning a string.  You can change this by using the apply function which allows one to apply a function to every row in series or dataframe. A "lambda" is a shorthand way to write your own function.

In [69]:
eqPastMonth['magPlus5'] = eqPastMonth['mag'].apply(lambda x: x + 5)
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5
0,2022-02-16T20:07:51.700Z,33.165667,-116.615167,-0.08,0.9,ml,13.0,178.0,0.3278,0.36,ci,ci40187648,2022-02-16T20:11:28.370Z,"10km N of Julian, CA",earthquake,1.86,31.61,0.135,9.0,automatic,ci,ci,5.9,5.9
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13
2,2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv,7.03,7.03
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49
4,2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc,7.36,7.36


In [68]:
def addFive(x):
    return x + 5
eqPastMonth['magPlus5'] = eqPastMonth['mag'].apply(addFive)
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5
0,2022-02-16T20:07:51.700Z,33.165667,-116.615167,-0.08,0.9,ml,13.0,178.0,0.3278,0.36,ci,ci40187648,2022-02-16T20:11:28.370Z,"10km N of Julian, CA",earthquake,1.86,31.61,0.135,9.0,automatic,ci,ci,5.9,5.9
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13
2,2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv,7.03,7.03
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49
4,2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc,7.36,7.36


In [70]:
eqPastMonth['datetime'] = eqPastMonth['time'].apply(lambda x: (datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')))
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5,datetime
0,2022-02-16T20:07:51.700Z,33.165667,-116.615167,-0.08,0.9,ml,13.0,178.0,0.3278,0.36,ci,ci40187648,2022-02-16T20:11:28.370Z,"10km N of Julian, CA",earthquake,1.86,31.61,0.135,9.0,automatic,ci,ci,5.9,5.9,2022-02-16 20:07:51.700
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13,2022-02-16 20:06:36.970
2,2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv,7.03,7.03,2022-02-16 20:01:07.410
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49,2022-02-16 19:40:50.170
4,2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc,7.36,7.36,2022-02-16 19:35:33.190


In [59]:
eqPastMonth['magPlus1'] = eqPastMonth['mag'].apply(lambda x: x + 0.1)
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
0,2021-02-24T19:50:07.610Z,19.181,-155.470993,32.860001,2.15,md,42.0,153.0,,0.1,hv,hv72363462,2021-02-24T19:53:30.510Z,"2 km SSE of Pāhala, Hawaii",earthquake,0.68,0.98,0.43,24.0,automatic,hv,hv,2.25,2021-02-24 19:50:07.610
1,2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,,,,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,,0.6,,,automatic,ak,ak,1.2,2021-02-24 19:46:29.747
2,2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,,40.5,,,automatic,nn,nn,1.7,2021-02-24 19:35:10.940
3,2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv,1.89,2021-02-24 19:34:35.290
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150


Notice the new "datetime" column.  It of time datetime.

In [71]:
eqPastMonth.dtypes

time                       object
latitude                  float64
longitude                 float64
depth                     float64
mag                       float64
magType                    object
nst                       float64
gap                       float64
dmin                      float64
rms                       float64
net                        object
id                         object
updated                    object
place                      object
type                       object
horizontalError           float64
depthError                float64
magError                  float64
magNst                    float64
status                     object
locationSource             object
magSource                  object
longMagType               float64
magPlus5                  float64
datetime           datetime64[ns]
dtype: object

You can also see basic statistics about the DataFrame’s numeric columns

In [72]:
eqPastMonth.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10272 entries, 0 to 10271
Data columns (total 25 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   time             10272 non-null  object        
 1   latitude         10272 non-null  float64       
 2   longitude        10272 non-null  float64       
 3   depth            10272 non-null  float64       
 4   mag              10271 non-null  float64       
 5   magType          10271 non-null  object        
 6   nst              7546 non-null   float64       
 7   gap              8806 non-null   float64       
 8   dmin             7144 non-null   float64       
 9   rms              10272 non-null  float64       
 10  net              10272 non-null  object        
 11  id               10272 non-null  object        
 12  updated          10272 non-null  object        
 13  place            10272 non-null  object        
 14  type             10272 non-null  objec

Method `info()` shows (top down)
+ that `eqPastMonth` is an instance of DataFrame’s class; this information we have obtained with help of function `type()`;
+ number of rows in DataFrame;
+ type of each column and number of non-null rows in this column; this information in a shorted view was given by `dtypes`;
+ memory size of the DataFrame etc.
method `describe()` allows to quickly get average, minimal and maximal values, standard deviation etc. in each DataFrame column with numeric items

In [73]:
eqPastMonth[eqPastMonth["mag"] >= 0].describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,longMagType,magPlus5
count,9905.0,9905.0,9905.0,9905.0,7179.0,8439.0,6884.0,9905.0,7460.0,9905.0,7934.0,8430.0,9905.0,9905.0
mean,35.017782,-106.196538,21.570588,1.731784,20.446441,123.414913,0.693819,0.266042,1.830671,2.734028,0.279792,14.309015,6.731784,6.731784
std,20.007929,73.360067,52.519548,1.227295,16.007,68.726391,2.248273,0.280599,3.170461,25.377563,0.547204,26.998056,1.227295,1.227295
min,-63.5219,-179.9982,-3.5,0.0,0.0,13.0,0.0,0.0,0.08,0.0,0.0,0.0,5.0,5.0
25%,32.753167,-149.182,3.22,0.86,9.0,72.0,0.019482,0.09,0.28,0.46,0.101,4.0,5.86,5.86
50%,37.730999,-119.739,8.34,1.47,15.0,107.0,0.065,0.1415,0.48,0.8,0.157509,7.0,6.47,6.47
75%,44.373667,-115.126,15.96,2.13,27.0,157.0,0.1809,0.36,1.12,1.8,0.238564,16.0,7.13,7.13
max,85.3093,179.9498,644.77,6.5,184.0,351.81,49.154,2.98,23.7,1865.6,5.35,618.0,11.5,11.5


### Conditional indexing and selection

[[back to top]](#Table-of-Contents)

As we said above DataFrame is a group of Series objects. This allows you to select specific column (a Series) from the DataFrame (in this case you get a Series) or a few columns (in this case you get another DataFrame)

In [61]:
eqPastMonth_mag = eqPastMonth['mag']
# Here we are showing only one column, i.e. a Series
print ('type:', type(eqPastMonth_mag))
eqPastMonth_mag.head(10)

type: <class 'pandas.core.series.Series'>


0    2.15
1    1.10
2    1.60
3    1.79
4    2.41
5    5.00
6    1.10
7    1.83
8    2.05
9    2.89
Name: mag, dtype: float64

In [74]:
eqPastMonth_record = eqPastMonth[['time','depth', 'mag', 'place']]
# Here we are showing four columns, i.e. a new DataFrame
print ('type:', type(eqPastMonth_record))
eqPastMonth_record.tail()

type: <class 'pandas.core.frame.DataFrame'>


Unnamed: 0,time,depth,mag,place
10267,2022-01-17T20:30:15.657Z,7.2,0.0,"52 km SE of Beatty, Nevada"
10268,2022-01-17T20:30:00.990Z,109.01,4.3,"113 km NW of Nuku‘alofa, Tonga"
10269,2022-01-17T20:27:16.238Z,118.4,1.5,"42 km WSW of Skwentna, Alaska"
10270,2022-01-17T20:23:10.180Z,13.98,0.5,"18km ESE of Anza, CA"
10271,2022-01-17T20:19:43.583Z,83.2,1.4,"38 km WSW of Anchor Point, Alaska"


You can also refer to one column in such way

In [64]:
eqPastMonth_record.time

0        2021-02-24T19:50:07.610Z
1        2021-02-24T19:46:29.747Z
2        2021-02-24T19:35:10.940Z
3        2021-02-24T19:34:35.290Z
4        2021-02-24T19:25:06.150Z
                   ...           
10571    2021-01-25T20:26:37.822Z
10572    2021-01-25T20:20:08.400Z
10573    2021-01-25T20:16:10.458Z
10574    2021-01-25T20:13:31.390Z
10575    2021-01-25T20:09:26.382Z
Name: time, Length: 10576, dtype: object

Filtered DataFrames can be obtained by using of logic operators

In [200]:
# Let's display only large earthquakes
eqPastMonth_large = eqPastMonth[eqPastMonth['mag'] > 5]
eqPastMonth_large.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740
100,2021-02-16T19:09:41.237Z,-2.8881,139.201,48.19,5.4,mww,,47.0,6.943,0.74,us,usd000eesi,2021-02-16T21:12:58.341Z,"162 km WSW of Abepura, Indonesia",earthquake,7.6,5.1,0.083,14.0,reviewed,us,us,2021-02-16 19:09:41.237
131,2021-02-16T15:54:07.766Z,-18.8866,-173.6117,10.0,5.2,mww,,149.0,5.252,0.52,us,us6000di3t,2021-02-16T22:51:05.852Z,"47 km ESE of Neiafu, Tonga",earthquake,11.1,1.9,0.098,10.0,reviewed,us,us,2021-02-16 15:54:07.766
180,2021-02-16T09:55:32.318Z,52.0601,156.989,152.04,5.3,mb,,80.0,1.135,0.87,us,us6000di0n,2021-02-16T10:10:36.040Z,"71 km NNE of Ozernovskiy, Russia",earthquake,3.9,6.0,0.023,620.0,reviewed,us,us,2021-02-16 09:55:32.318
189,2021-02-16T08:42:11.806Z,-6.0804,112.1674,599.74,5.1,mww,,25.0,1.967,1.19,us,us6000di0i,2021-02-16T08:55:23.040Z,"91 km NNW of Paciran, Indonesia",earthquake,10.4,7.3,0.093,11.0,reviewed,us,us,2021-02-16 08:42:11.806
194,2021-02-16T08:07:23.479Z,-15.6949,167.8335,101.67,5.1,mb,,115.0,5.758,0.81,us,us6000di0g,2021-02-16T08:50:41.040Z,"62 km NE of Norsup, Vanuatu",earthquake,10.0,4.6,0.087,43.0,reviewed,us,us,2021-02-16 08:07:23.479
203,2021-02-16T06:42:01.809Z,-13.3581,166.8405,52.65,5.1,mb,,59.0,7.558,0.68,us,us6000dhzx,2021-02-16T07:45:14.040Z,"95 km NW of Sola, Vanuatu",earthquake,8.3,7.2,0.063,81.0,reviewed,us,us,2021-02-16 06:42:01.809
233,2021-02-16T03:37:11.369Z,-17.9892,167.5914,10.0,5.3,mww,,38.0,3.498,0.98,us,us6000dhyf,2021-02-16T06:25:18.040Z,"81 km WSW of Port-Vila, Vanuatu",earthquake,7.5,1.8,0.098,10.0,reviewed,us,us,2021-02-16 03:37:11.369
252,2021-02-16T01:31:59.756Z,-17.8082,167.6351,10.0,5.9,mww,,38.0,3.672,0.79,us,us6000dhxu,2021-02-17T01:35:13.282Z,"72 km W of Port-Vila, Vanuatu",earthquake,7.8,1.7,0.056,31.0,reviewed,us,us,2021-02-16 01:31:59.756
255,2021-02-16T01:22:19.619Z,-17.6551,167.7198,10.0,5.4,mww,,25.0,4.34,0.85,us,us6000dhxs,2021-02-17T01:25:18.841Z,"63 km W of Port-Vila, Vanuatu",earthquake,8.6,1.7,0.063,24.0,reviewed,us,us,2021-02-16 01:22:19.619


In [76]:
#Getting records that are large (>5mag) earthquakes and that occurred in the northern hemisphere
filtered_df_1 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['latitude'] > 0)]
filtered_df_1.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,longMagType,magPlus5
count,44.0,44.0,44.0,44.0,0.0,42.0,42.0,44.0,42.0,44.0,42.0,42.0,44.0,44.0
mean,25.620648,35.227695,31.047045,5.388636,,58.0,3.81119,0.906136,6.87381,2.684091,0.057143,70.166667,10.388636,10.388636
std,17.152213,104.106748,38.66113,0.332164,,32.300419,4.280885,0.261331,2.019304,1.597373,0.017765,113.83855,0.332164,0.332164
min,0.8869,-166.6805,8.0,5.1,,19.0,0.199,0.45,2.2,0.3,0.023,10.0,10.1,10.1
25%,12.6925,-64.047775,10.0,5.175,,34.5,1.1755,0.7825,5.825,1.8,0.043,21.25,10.175,10.175
50%,21.6117,63.753,10.0,5.25,,47.5,1.896,0.89,7.1,1.9,0.056,32.5,10.25,10.25
75%,35.38555,126.389875,36.885,5.525,,77.0,3.78075,1.045,7.975,3.825,0.06875,64.5,10.525,10.525
max,70.7302,148.0745,209.05,6.3,,142.0,18.432,1.77,10.6,6.9,0.098,618.0,11.3,11.3


In [77]:
#Getting records that are large (>5mag) earthquakes and that occurred in the southern hemisphere
filtered_df_1 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['latitude'] < 0)]
filtered_df_1.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,longMagType,magPlus5
count,69.0,69.0,69.0,69.0,0.0,69.0,69.0,69.0,69.0,69.0,69.0,69.0,69.0,69.0
mean,-30.066945,-37.793472,34.569565,5.407246,,59.014493,5.344812,0.844783,8.515942,2.497101,0.073043,53.985507,10.407246,10.407246
std,19.177761,131.977693,49.622174,0.349503,,22.21651,4.728077,0.227697,2.266419,1.324761,0.027654,78.771541,0.349503,0.349503
min,-63.5219,-179.5549,4.17,5.1,,14.0,0.554,0.48,2.7,1.1,0.026,7.0,10.1,10.1
25%,-48.0343,-176.5393,10.0,5.2,,45.0,1.473,0.67,6.8,1.8,0.056,13.0,10.2,10.2
50%,-29.5677,-27.8756,10.0,5.3,,58.0,4.109,0.82,8.6,1.9,0.069,28.0,10.3,10.3
75%,-15.8805,105.9767,35.0,5.5,,76.0,7.751,0.94,10.1,3.1,0.089,55.0,10.5,10.5
max,-0.4019,179.852,240.84,6.5,,114.0,20.516,1.48,13.6,7.0,0.234,487.0,11.5,11.5


In [78]:
#Getting records that are large (>5mag) earthquakes and that occurred in the western hemisphere, but not after 120 w longitude, also filter columns in output
filtered_df_2 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['longitude'] < 0) & (eqPastMonth['longitude'] > -120)][['depth', 'mag', 'place']]
filtered_df_2.head(10)

Unnamed: 0,depth,mag,place
122,83.61,6.2,"0 km SSE of Nueva Concepción, Guatemala"
140,35.33,5.5,South Sandwich Islands region
254,65.5,5.1,South Sandwich Islands region
512,10.0,5.7,"208 km W of Olonkinbyen, Svalbard and Jan Mayen"
979,99.34,5.3,South Sandwich Islands region
1105,10.0,5.5,central Mid-Atlantic Ridge
1624,10.0,5.3,Ascension Island region
1886,10.0,5.5,Ascension Island region
2501,10.0,6.2,central Mid-Atlantic Ridge
2578,10.0,5.1,east of the South Sandwich Islands


You can also use the method `isin(range)` for checking the presence of Series items in range, method `isnull()` for define `null` (`NaN`) values and boolean operators `&` (`AND`) and `|` (`OR`) in complicated conditions.

As you can see after filtering result tables (i.e. DataFrames) have non-ordered indexes. To fix this trouble you may write the following:

In [79]:
filtered_df_2.reset_index().head(10)

Unnamed: 0,index,depth,mag,place
0,122,83.61,6.2,"0 km SSE of Nueva Concepción, Guatemala"
1,140,35.33,5.5,South Sandwich Islands region
2,254,65.5,5.1,South Sandwich Islands region
3,512,10.0,5.7,"208 km W of Olonkinbyen, Svalbard and Jan Mayen"
4,979,99.34,5.3,South Sandwich Islands region
5,1105,10.0,5.5,central Mid-Atlantic Ridge
6,1624,10.0,5.3,Ascension Island region
7,1886,10.0,5.5,Ascension Island region
8,2501,10.0,6.2,central Mid-Atlantic Ridge
9,2578,10.0,5.1,east of the South Sandwich Islands


to start indexing form 0 and regularize it.

Also remember that you can add new columns and rows to the DataFrame:

In [80]:
#set new custom_score column and fill it with empty strings
eqPastMonth['custom_mag'] = ''
eqPastMonth['custom_mag'] = np.where(eqPastMonth['mag'] < 5, 'Small', "Large")
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5,datetime,custom_mag
0,2022-02-16T20:07:51.700Z,33.165667,-116.615167,-0.08,0.9,ml,13.0,178.0,0.3278,0.36,ci,ci40187648,2022-02-16T20:11:28.370Z,"10km N of Julian, CA",earthquake,1.86,31.61,0.135,9.0,automatic,ci,ci,5.9,5.9,2022-02-16 20:07:51.700,Small
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13,2022-02-16 20:06:36.970,Small
2,2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv,7.03,7.03,2022-02-16 20:01:07.410,Small
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49,2022-02-16 19:40:50.170,Small
4,2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc,7.36,7.36,2022-02-16 19:35:33.190,Small
5,2022-02-16T19:35:20.500Z,38.789166,-122.762497,3.43,0.84,md,15.0,96.0,0.01165,0.04,nc,nc73693781,2022-02-16T19:47:12.061Z,"1km NNW of The Geysers, CA",earthquake,0.35,0.69,0.01,2.0,automatic,nc,nc,5.84,5.84,2022-02-16 19:35:20.500,Small
6,2022-02-16T19:32:55.130Z,37.988834,-122.454666,1.62,1.53,md,10.0,114.0,0.02055,0.06,nc,nc73693776,2022-02-16T19:44:10.907Z,"6km E of Santa Venetia, CA",earthquake,0.54,0.47,0.2,6.0,automatic,nc,nc,6.53,6.53,2022-02-16 19:32:55.130,Small
7,2022-02-16T19:19:46.840Z,38.819832,-122.803169,2.35,0.84,md,11.0,89.0,0.006461,0.01,nc,nc73693766,2022-02-16T19:29:10.811Z,"6km NW of The Geysers, CA",earthquake,0.34,0.86,,1.0,automatic,nc,nc,5.84,5.84,2022-02-16 19:19:46.840,Small
8,2022-02-16T19:18:15.583Z,61.3641,-149.8488,31.4,1.7,ml,,,,0.55,ak,ak022261ofsa,2022-02-16T19:22:52.289Z,"6 km E of Point MacKenzie, Alaska",earthquake,,0.1,,,automatic,ak,ak,6.7,6.7,2022-02-16 19:18:15.583,Small
9,2022-02-16T19:17:25.040Z,33.43,-117.6585,13.95,1.16,ml,21.0,218.0,0.05564,0.13,ci,ci40187592,2022-02-16T19:31:45.245Z,"4km W of San Clemente, CA",earthquake,0.37,0.57,0.099,9.0,reviewed,ci,ci,6.16,6.16,2022-02-16 19:17:25.040,Small


### Work with indexes and MultiIndex option

[[back to top]](#Table-of-Contents)

Pandas allows to set specific indexes to a DataFrame. It can be defined at creating of a DataFrame:

In [81]:
import random
indexes = [random.randrange(0,100) for i in range(5)]
data = [{i:random.randint(0,10) for i in 'ABCDE'} for i in range(5)]
df = pd.DataFrame(data, index=indexes)
df

Unnamed: 0,A,B,C,D,E
42,9,5,3,4,9
57,6,9,8,8,1
14,1,1,0,7,3
23,1,0,2,10,0
25,5,3,2,3,5


Or be change any time

In [82]:
df.index = ['a', 'b', 'c', 'd', 'e']
df

Unnamed: 0,A,B,C,D,E
a,9,5,3,4,9
b,6,9,8,8,1
c,1,1,0,7,3
d,1,0,2,10,0
e,5,3,2,3,5


There is the possibility to select any column (one or more) as index column

In [83]:
# if duplicates exist you can drop duplicates to get unique values
#eqPastMonth_nodups = eqPastMonth.drop_duplicates(subset='time', keep='last')
# we don't need to do that.
# set 'time' as index
eqPastMonth_indexChange = eqPastMonth.set_index('time')
eqPastMonth_indexChange.head(10)

Unnamed: 0_level_0,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5,datetime,custom_mag
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
2022-02-16T20:07:51.700Z,33.165667,-116.615167,-0.08,0.9,ml,13.0,178.0,0.3278,0.36,ci,ci40187648,2022-02-16T20:11:28.370Z,"10km N of Julian, CA",earthquake,1.86,31.61,0.135,9.0,automatic,ci,ci,5.9,5.9,2022-02-16 20:07:51.700,Small
2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13,2022-02-16 20:06:36.970,Small
2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv,7.03,7.03,2022-02-16 20:01:07.410,Small
2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49,2022-02-16 19:40:50.170,Small
2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc,7.36,7.36,2022-02-16 19:35:33.190,Small
2022-02-16T19:35:20.500Z,38.789166,-122.762497,3.43,0.84,md,15.0,96.0,0.01165,0.04,nc,nc73693781,2022-02-16T19:47:12.061Z,"1km NNW of The Geysers, CA",earthquake,0.35,0.69,0.01,2.0,automatic,nc,nc,5.84,5.84,2022-02-16 19:35:20.500,Small
2022-02-16T19:32:55.130Z,37.988834,-122.454666,1.62,1.53,md,10.0,114.0,0.02055,0.06,nc,nc73693776,2022-02-16T19:44:10.907Z,"6km E of Santa Venetia, CA",earthquake,0.54,0.47,0.2,6.0,automatic,nc,nc,6.53,6.53,2022-02-16 19:32:55.130,Small
2022-02-16T19:19:46.840Z,38.819832,-122.803169,2.35,0.84,md,11.0,89.0,0.006461,0.01,nc,nc73693766,2022-02-16T19:29:10.811Z,"6km NW of The Geysers, CA",earthquake,0.34,0.86,,1.0,automatic,nc,nc,5.84,5.84,2022-02-16 19:19:46.840,Small
2022-02-16T19:18:15.583Z,61.3641,-149.8488,31.4,1.7,ml,,,,0.55,ak,ak022261ofsa,2022-02-16T19:22:52.289Z,"6 km E of Point MacKenzie, Alaska",earthquake,,0.1,,,automatic,ak,ak,6.7,6.7,2022-02-16 19:18:15.583,Small
2022-02-16T19:17:25.040Z,33.43,-117.6585,13.95,1.16,ml,21.0,218.0,0.05564,0.13,ci,ci40187592,2022-02-16T19:31:45.245Z,"4km W of San Clemente, CA",earthquake,0.37,0.57,0.099,9.0,reviewed,ci,ci,6.16,6.16,2022-02-16 19:17:25.040,Small


By default, `set_index()` returns a new DataFrame, so you’ll have to specify if you’d like the changes to occur in place.

Let’s create a many levels index for `filtered_df_2` DataFrame

In [73]:
# set 'id' & 'type' as index
eqPastMonth_multi = eqPastMonth.set_index(['id','type'])[["latitude","longitude","depth","mag", "place"]]
eqPastMonth_multi.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,latitude,longitude,depth,mag,place
id,type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
hv72363462,earthquake,19.181,-155.470993,32.860001,2.15,"2 km SSE of Pāhala, Hawaii"
ak0212ja58vx,earthquake,64.7437,-149.2695,20.1,1.1,"17 km NNW of Four Mile Road, Alaska"
nn00801159,earthquake,37.3513,-115.6738,0.3,1.6,"33 km S of Rachel, Nevada"
hv72363447,earthquake,19.453667,-155.597504,-1.39,1.79,"28 km E of Honaunau-Napoopoo, Hawaii"
pr2021055003,earthquake,17.821,-66.8773,10.0,2.41,"16 km S of Guánica, Puerto Rico"
us7000ddeg,earthquake,-15.1086,-173.3268,10.0,5.0,"106 km NNE of Hihifo, Tonga"
nn00801158,earthquake,38.1733,-117.9185,5.6,1.1,"29 km SE of Mina, Nevada"
hv72363427,earthquake,19.464333,-155.591995,-1.02,1.83,"28 km E of Honaunau-Napoopoo, Hawaii"
hv72363422,earthquake,19.151167,-155.474503,36.98,2.05,"5 km S of Pāhala, Hawaii"
ci39800680,earthquake,32.537333,-115.230667,20.92,2.89,"12km ESE of Puebla, B.C., MX"


and see the type of `eqPastMonth_multi.index()`

In [74]:
print ('type: ', type(eqPastMonth_multi.index))

type:  <class 'pandas.core.indexes.multi.MultiIndex'>


Thus, we get a new pandas class MultiIndex, which contains information about indexing of DataFrame and allows manipulating with this data. It’s interesting what is the type of `filtered_df_2.index()`?

You can get levels, labels and names values simply address it as to an attribute

### Selection by label and position
[[back to top]](#Table-of-Contents)

After reading previous three subparagraphs probably you have the question: Ok, I know now filter a DataFrame, how make it multi-indexed, but I don’t know how select any specific row in the table.
Object selection in pandas is now supported by two types of multi-axis indexing.

* `.loc` works on labels in the index;
* `.iloc` works on the positions in the index (so it only takes integers);

    
The sequence of the following examples demonstrates how we can manipulate with DataFrame’s rows.
At first let’s get the first row of equakes in the past month.

In [87]:
#To return a single record(i.e. row), in this case the first one.
eqPastMonth.loc[0]

time                 2022-02-16T20:07:51.700Z
latitude                            33.165667
longitude                         -116.615167
depth                                   -0.08
mag                                       0.9
magType                                    ml
nst                                      13.0
gap                                     178.0
dmin                                   0.3278
rms                                      0.36
net                                        ci
id                                 ci40187648
updated              2022-02-16T20:11:28.370Z
place                    10km N of Julian, CA
type                               earthquake
horizontalError                          1.86
depthError                              31.61
magError                                0.135
magNst                                    9.0
status                              automatic
locationSource                             ci
magSource                         

and rows from 1 to 3 (pay attention on setting of ranges in `.loc`, the right boundary is included to this range which IS different than Python lists and string data structures)

In [88]:
eqPastMonth.loc[1:3]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5,datetime,custom_mag
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13,2022-02-16 20:06:36.970,Small
2,2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv,7.03,7.03,2022-02-16 20:01:07.410,Small
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49,2022-02-16 19:40:50.170,Small


As you can see the first argument of `.loc` corresponds to index name. If you want return value of specific column(s), you should to define the name of this(these) column(s)

In [76]:
eqPastMonth.loc[0, 'place']

'2 km SSE of Pāhala, Hawaii'

In [81]:
eqPastMonth.loc[3:10, ['place', 'mag']]

Unnamed: 0,place,mag
3,"28 km E of Honaunau-Napoopoo, Hawaii",1.79
4,"16 km S of Guánica, Puerto Rico",2.41
5,"106 km NNE of Hihifo, Tonga",5.0
6,"29 km SE of Mina, Nevada",1.1
7,"28 km E of Honaunau-Napoopoo, Hawaii",1.83
8,"5 km S of Pāhala, Hawaii",2.05
9,"12km ESE of Puebla, B.C., MX",2.89
10,"2km NW of The Geysers, CA",1.42


Let’s repeat that the first argument of `.loc` is not row number but name of the index for this row

But if it is necessary to obtain rows by it number you may use `.iloc`

In [214]:
eqPastMonth.iloc[0]

time                 2021-02-17T04:25:59.180Z
latitude                            35.763168
longitude                         -120.329834
depth                                    9.28
mag                                      1.91
magType                                    md
nst                                      17.0
gap                                      90.0
dmin                                  0.01278
rms                                      0.08
net                                        nc
id                                 nc73524166
updated              2021-02-17T04:27:35.667Z
place                  6km NNW of Cholame, CA
type                               earthquake
horizontalError                          0.38
depthError                               0.52
magError                                 0.16
magNst                                    4.0
status                              automatic
locationSource                             nc
magSource                         

In [84]:
eqPastMonth.iloc[1:5,3:5]

Unnamed: 0,depth,mag
1,20.1,1.1
2,0.3,1.6
3,-1.39,1.79
4,10.0,2.41


In the first case column’s number coincides with its name. The second example demonstrates the difference between `.loc` and `.iloc`

In [91]:
eqPastMonth.loc[1:5]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5,datetime,custom_mag
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13,2022-02-16 20:06:36.970,Small
2,2022-02-16T20:01:07.410Z,19.186333,-155.403839,30.43,2.03,md,30.0,181.0,,0.14,hv,hv72919612,2022-02-16T20:04:17.960Z,"8 km ESE of Pāhala, Hawaii",earthquake,0.83,0.97,2.03,4.0,automatic,hv,hv,7.03,7.03,2022-02-16 20:01:07.410,Small
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49,2022-02-16 19:40:50.170,Small
4,2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc,7.36,7.36,2022-02-16 19:35:33.190,Small
5,2022-02-16T19:35:20.500Z,38.789166,-122.762497,3.43,0.84,md,15.0,96.0,0.01165,0.04,nc,nc73693781,2022-02-16T19:47:12.061Z,"1km NNW of The Geysers, CA",earthquake,0.35,0.69,0.01,2.0,automatic,nc,nc,5.84,5.84,2022-02-16 19:35:20.500,Small


### Work with missing data

[[back to top]](#Table-of-Contents)

Pandas primarily uses the value `np.nan` to represent missing data (in table missed/empty value are marked by `NaN`). It is by default not included in computations. Missing data creates many issues at mathematical or computational tasks with DataFrames and Series and it’s important to know how fight with these values.

Previously we have learned how to check `null` and `non-null` values in the DataFrame and Series and how to miss `null` row in the table. But what to do if we need to use rows with `null` data, for example, find sum of all values in the dataset?

Let’s try do this


In [93]:
magError = eqPastMonth['magError']
sum(magError)

nan

The result is unexpected because there many `non-null` values in `eqPastMonth['magError']` Series. Sure, we could filter `magError['magError']`  and remain only `non-null` values. But what if we need sum all numerical values in `magError`? This way will be powerless or too complicated, because we will drop all row items even there is only one `null` value in this row. You can try to do this yourself.

To solve the assigned task you may use an elegant pandas method `fillna(value)`, which replace all `null` values by value.


In [97]:
magError = eqPastMonth['magError'].fillna(0)
magError.median()

0.13

In [89]:
eqPastMonth_fillna = eqPastMonth.fillna(0)
eqPastMonth_fillna.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus1,datetime
0,2021-02-24T19:50:07.610Z,19.181,-155.470993,32.860001,2.15,md,42.0,153.0,0.0,0.1,hv,hv72363462,2021-02-24T19:53:30.510Z,"2 km SSE of Pāhala, Hawaii",earthquake,0.68,0.98,0.43,24.0,automatic,hv,hv,2.25,2021-02-24 19:50:07.610
1,2021-02-24T19:46:29.747Z,64.7437,-149.2695,20.1,1.1,ml,0.0,0.0,0.0,0.3,ak,ak0212ja58vx,2021-02-24T19:50:43.341Z,"17 km NNW of Four Mile Road, Alaska",earthquake,0.0,0.6,0.0,0.0,automatic,ak,ak,1.2,2021-02-24 19:46:29.747
2,2021-02-24T19:35:10.940Z,37.3513,-115.6738,0.3,1.6,ml,9.0,263.87,0.317,0.63,nn,nn00801159,2021-02-24T19:40:13.150Z,"33 km S of Rachel, Nevada",earthquake,0.0,40.5,0.0,0.0,automatic,nn,nn,1.7,2021-02-24 19:35:10.940
3,2021-02-24T19:34:35.290Z,19.453667,-155.597504,-1.39,1.79,ml,10.0,79.0,0.0,0.16,hv,hv72363447,2021-02-24T19:40:06.880Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.52,0.36,0.07,3.0,automatic,hv,hv,1.89,2021-02-24 19:34:35.290
4,2021-02-24T19:25:06.150Z,17.821,-66.8773,10.0,2.41,md,11.0,261.0,0.2234,0.16,pr,pr2021055003,2021-02-24T20:06:29.340Z,"16 km S of Guánica, Puerto Rico",earthquake,0.56,0.62,0.1,5.0,reviewed,pr,pr,2.51,2021-02-24 19:25:06.150
5,2021-02-24T19:19:50.122Z,-15.1086,-173.3268,10.0,5.0,mb,0.0,133.0,1.913,0.97,us,us7000ddeg,2021-02-24T19:53:29.704Z,"106 km NNE of Hihifo, Tonga",earthquake,5.4,1.8,0.052,120.0,reviewed,us,us,5.1,2021-02-24 19:19:50.122
6,2021-02-24T19:12:39.950Z,38.1733,-117.9185,5.6,1.1,ml,8.0,105.08,0.035,0.25,nn,nn00801158,2021-02-24T19:36:32.286Z,"29 km SE of Mina, Nevada",earthquake,0.0,1.6,0.0,0.0,automatic,nn,nn,1.2,2021-02-24 19:12:39.950
7,2021-02-24T19:11:05.610Z,19.464333,-155.591995,-1.02,1.83,ml,14.0,58.0,0.0,0.31,hv,hv72363427,2021-02-24T19:16:35.330Z,"28 km E of Honaunau-Napoopoo, Hawaii",earthquake,0.5,0.31,3.66,6.0,automatic,hv,hv,1.93,2021-02-24 19:11:05.610
8,2021-02-24T19:07:06.420Z,19.151167,-155.474503,36.98,2.05,md,33.0,168.0,0.0,0.11,hv,hv72363422,2021-02-24T19:10:35.040Z,"5 km S of Pāhala, Hawaii",earthquake,0.79,1.01,0.94,7.0,automatic,hv,hv,2.15,2021-02-24 19:07:06.420
9,2021-02-24T19:05:11.260Z,32.537333,-115.230667,20.92,2.89,ml,23.0,64.0,0.1332,0.26,ci,ci39800680,2021-02-24T19:15:41.440Z,"12km ESE of Puebla, B.C., MX",earthquake,0.69,1.27,0.24,27.0,automatic,ci,ci,2.99,2021-02-24 19:05:11.260


Thus, we replace all `NaN` items to `0`. If `inplace=True` in `fillna()` method, then a DataFrame renew.
   
To remain only rows with `non-null` values you can use method `dropna()`

In [98]:
eqPastMonth_fillna = eqPastMonth.dropna(0)
print(eqPastMonth_fillna.mean())
eqPastMonth_fillna.head(10)


DataFrame.mean and DataFrame.median with numeric_only=None will include datetime64 and datetime64tz columns in a future version.



latitude            37.080326
longitude         -115.843078
depth                6.737574
mag                  1.165225
nst                 20.367261
gap                108.646107
dmin                 0.086803
rms                  0.119242
horizontalError      0.520107
depthError           2.486723
magError             0.167475
magNst              11.791979
longMagType          6.165225
magPlus5             6.165225
dtype: float64


Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,longMagType,magPlus5,datetime,custom_mag
0,2022-02-16T20:07:51.700Z,33.165667,-116.615167,-0.08,0.9,ml,13.0,178.0,0.3278,0.36,ci,ci40187648,2022-02-16T20:11:28.370Z,"10km N of Julian, CA",earthquake,1.86,31.61,0.135,9.0,automatic,ci,ci,5.9,5.9,2022-02-16 20:07:51.700,Small
1,2022-02-16T20:06:36.970Z,35.9375,-120.486336,4.92,1.13,md,15.0,110.0,0.01501,0.06,nc,nc73693791,2022-02-16T20:08:10.701Z,"7km NW of Parkfield, CA",earthquake,0.37,0.55,0.17,5.0,automatic,nc,nc,6.13,6.13,2022-02-16 20:06:36.970,Small
3,2022-02-16T19:40:50.170Z,33.451333,-116.823667,5.38,0.49,ml,13.0,129.0,0.06392,0.22,ci,ci40187616,2022-02-16T19:44:27.151Z,"4km ENE of Aguanga, CA",earthquake,0.73,1.65,0.05,9.0,automatic,ci,ci,5.49,5.49,2022-02-16 19:40:50.170,Small
4,2022-02-16T19:35:33.190Z,36.716499,-121.359497,2.41,2.36,md,34.0,113.0,0.0138,0.12,nc,nc73693786,2022-02-16T19:47:11.919Z,"9km SSW of Tres Pinos, CA",earthquake,0.35,0.58,0.16,35.0,automatic,nc,nc,7.36,7.36,2022-02-16 19:35:33.190,Small
5,2022-02-16T19:35:20.500Z,38.789166,-122.762497,3.43,0.84,md,15.0,96.0,0.01165,0.04,nc,nc73693781,2022-02-16T19:47:12.061Z,"1km NNW of The Geysers, CA",earthquake,0.35,0.69,0.01,2.0,automatic,nc,nc,5.84,5.84,2022-02-16 19:35:20.500,Small
6,2022-02-16T19:32:55.130Z,37.988834,-122.454666,1.62,1.53,md,10.0,114.0,0.02055,0.06,nc,nc73693776,2022-02-16T19:44:10.907Z,"6km E of Santa Venetia, CA",earthquake,0.54,0.47,0.2,6.0,automatic,nc,nc,6.53,6.53,2022-02-16 19:32:55.130,Small
9,2022-02-16T19:17:25.040Z,33.43,-117.6585,13.95,1.16,ml,21.0,218.0,0.05564,0.13,ci,ci40187592,2022-02-16T19:31:45.245Z,"4km W of San Clemente, CA",earthquake,0.37,0.57,0.099,9.0,reviewed,ci,ci,6.16,6.16,2022-02-16 19:17:25.040,Small
13,2022-02-16T18:26:44.470Z,33.721667,-116.821333,16.89,0.55,ml,12.0,189.0,0.08994,0.07,ci,ci40187560,2022-02-16T18:42:09.514Z,"7km ESE of Valle Vista, CA",earthquake,0.42,0.36,0.274,3.0,reviewed,ci,ci,5.55,5.55,2022-02-16 18:26:44.470,Small
14,2022-02-16T18:19:01.510Z,33.715667,-116.821667,16.88,1.2,ml,13.0,162.0,0.08818,0.1,ci,ci37392956,2022-02-16T19:17:32.158Z,"8km ESE of Valle Vista, CA",earthquake,0.47,0.39,0.095,5.0,reviewed,ci,ci,6.2,6.2,2022-02-16 18:19:01.510,Small
15,2022-02-16T18:18:53.530Z,33.7195,-116.824,16.67,1.27,ml,42.0,70.0,0.08928,0.12,ci,ci40187552,2022-02-16T19:11:01.990Z,"7km ESE of Valle Vista, CA",earthquake,0.18,0.29,0.168,25.0,reviewed,ci,ci,6.27,6.27,2022-02-16 18:18:53.530,Small


We can manipulate by `null` values and columns using parameters subset and how to set analyzing columns and type of analysis respectively

> ### Exercise 1

> - Get type of `“latitude”` column in `eqPastMonth`. 

> - In `eqPastMonth` find all rows where `magType` corresponds to the value `"md"` and where `mag` is less `5` and `not-null` `magError`. Call the obtained DataFrmae as `eqPastMonth_md_large`. 

In [221]:
# type your code here
eqPastMonth_md_large = eqPastMonth