---
<center><h1>Basic intro to pandas</h1></center>

<center><h2>Work with pandas DataFrames: filtering, indexing and missing data</h2></center>
---

## Table of Contents

- [Work with pandas DataFrames: filtering, indexing and missing data](#Work-with-pandas-DataFrames:-filtering,-indexing-and-missing-data)
    * [Get basic information](#Get-basic-information)
    * [Conditional indexing and selection](#Conditional-indexing-and-selection)
    * [Work with indexes and MultiIndex option](#Work-with-indexes-and-MultiIndex-option)
    * [Selection by label and position](#Selection-by-label-and-position)
    * [Work with missing data](#Work-with-missing-data)
    - [*Exercise 1*](#Exercise-1)

In [44]:
import pandas as pd
import numpy as np
import random

## Work with pandas DataFrames: filtering, indexing and missing data

[[back to top]](#Table-of-Contents)

In this part we will continue our acquaintance with DataFrames and will get to know 
1.	how to get basic information about DataFrame and its content;
2.	how to get a segment of a Dataframe and select rows from DataFrame, which satisfy some conditions;
3.	how to change indexes in DataFrame and make advanced indexing;
4.	how to select any rows by its indexes, labels and positions;
5.	how to work with missing data.

Thus, we will divide the whole text of this lesson into logic constructed code blocks with respect to mentioned above points. In the following posts we will continue our learning of pandas and will consider its other features.

In [45]:
url="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv"
eqPastMonth=pd.read_csv(url)
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc
5,2025-04-16T14:42:52.624Z,64.5566,-149.2173,0.8,1.6,ml,,,,0.45,ak,ak0254vjjyy6,2025-04-16T14:44:46.021Z,"5 km W of Nenana, Alaska",earthquake,,0.3,,,automatic,ak,ak
6,2025-04-16T14:38:39.010Z,36.940498,-121.473,11.03,1.34,md,5.0,190.0,0.1241,0.03,nc,nc75166286,2025-04-16T14:57:17.213Z,"11 km SE of Gilroy, CA",earthquake,1.19,1.93,,1.0,automatic,nc,nc
7,2025-04-16T14:34:00.340Z,46.615833,-119.797333,7.74,0.93,ml,11.0,80.0,0.02353,0.08,uw,uw62091961,2025-04-16T15:02:51.910Z,"11 km SE of Desert Aire, Washington",earthquake,0.26,0.18,0.139248,11.0,reviewed,uw,uw
8,2025-04-16T14:30:22.000Z,33.499,-116.441833,7.48,0.94,ml,44.0,73.0,0.1341,0.22,ci,ci40930047,2025-04-16T14:46:05.920Z,"22 km SW of La Quinta, CA",earthquake,0.21,1.04,0.166,22.0,reviewed,ci,ci
9,2025-04-16T14:29:29.390Z,38.799168,-122.752167,2.1,0.74,md,9.0,97.0,0.006073,0.02,nc,nc75166276,2025-04-16T14:47:19.151Z,"2 km NNE of The Geysers, CA",earthquake,0.4,0.9,0.07,9.0,automatic,nc,nc


### Get basic information

[[back to top]](#Table-of-Contents)

pandas has a set of functions for getting basic information about DataFrame:

Lets take a look on type of `eqPastMonth` columns

In [46]:
eqPastMonth.dtypes

time                object
latitude           float64
longitude          float64
depth              float64
mag                float64
magType             object
nst                float64
gap                float64
dmin               float64
rms                float64
net                 object
id                  object
updated             object
place               object
type                object
horizontalError    float64
depthError         float64
magError           float64
magNst             float64
status              object
locationSource      object
magSource           object
dtype: object

You may notice that the dtype forthe time column is by default of type "object" meaning a string.  You can change this by using the apply function which allows one to apply a function to every row in series or dataframe. A "lambda" is a shorthand way to write your own function.

In [47]:
eqPastMonth['magPlus5'] = eqPastMonth['mag'].apply(lambda x: x + 5)
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28


In [48]:
def addFive(x):
    return x + 5
eqPastMonth['magPlus5'] = eqPastMonth['mag'].apply(addFive)
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28


In [49]:
eqPastMonth['datetime'] = eqPastMonth['time'].apply(lambda x: (datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')))
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03,2025-04-16 15:22:14.240
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28,2025-04-16 14:59:44.190


In [50]:
eqPastMonth['magPlus1'] = eqPastMonth['mag'].apply(lambda x: x + 0.1)
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03,2025-04-16 15:22:14.240,1.13
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860,2.26
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790,3.91
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220,1.86
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28,2025-04-16 14:59:44.190,0.38


Notice the new "datetime" column.  It of time datetime.

In [51]:
eqPastMonth.dtypes

time                       object
latitude                  float64
longitude                 float64
depth                     float64
mag                       float64
magType                    object
nst                       float64
gap                       float64
dmin                      float64
rms                       float64
net                        object
id                         object
updated                    object
place                      object
type                       object
horizontalError           float64
depthError                float64
magError                  float64
magNst                    float64
status                     object
locationSource             object
magSource                  object
magPlus5                  float64
datetime           datetime64[ns]
magPlus1                  float64
dtype: object

You can also see basic statistics about the DataFrame’s numeric columns

In [52]:
eqPastMonth.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10449 entries, 0 to 10448
Data columns (total 25 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   time             10449 non-null  object        
 1   latitude         10449 non-null  float64       
 2   longitude        10449 non-null  float64       
 3   depth            10449 non-null  float64       
 4   mag              10449 non-null  float64       
 5   magType          10449 non-null  object        
 6   nst              8793 non-null   float64       
 7   gap              8793 non-null   float64       
 8   dmin             8793 non-null   float64       
 9   rms              10448 non-null  float64       
 10  net              10449 non-null  object        
 11  id               10449 non-null  object        
 12  updated          10449 non-null  object        
 13  place            10449 non-null  object        
 14  type             10449 non-null  objec

Method `info()` shows (top down)
+ that `eqPastMonth` is an instance of DataFrame’s class; this information we have obtained with help of function `type()`;
+ number of rows in DataFrame;
+ type of each column and number of non-null rows in this column; this information in a shorted view was given by `dtypes`;
+ memory size of the DataFrame etc.
method `describe()` allows to quickly get average, minimal and maximal values, standard deviation etc. in each DataFrame column with numeric items

In [53]:
eqPastMonth[eqPastMonth["mag"] >= 0].describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,magPlus5,datetime,magPlus1
count,9674.0,9674.0,9674.0,9674.0,8019.0,8019.0,8019.0,9673.0,7436.0,9673.0,8002.0,8015.0,9674.0,9674,9674.0
mean,38.513899,-108.297087,21.590463,1.639205,25.248784,107.793214,0.505618,0.269452,1.701874,2.346176,0.175227,19.776669,6.639205,2025-04-01 05:23:36.860396544,1.739205
min,-65.0709,-179.9966,-10.0,0.0,0.0,11.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2025-03-17 15:26:17.490000,0.1
25%,33.462875,-146.3323,2.95,0.8125,11.0,61.0,0.01343,0.09,0.25,0.45,0.108,8.0,5.8125,2025-03-23 22:36:29.522999808,0.9125
50%,38.800999,-121.264167,7.64,1.33,18.0,88.0,0.0544,0.17,0.42,0.73,0.159,13.0,6.33,2025-04-01 07:26:52.044999936,1.43
75%,49.452375,-115.1185,14.5,2.0,31.0,138.0,0.1327,0.37,0.99,1.61,0.213,23.0,7.0,2025-04-09 01:41:44.199749888,2.1
max,83.5057,179.9283,645.749,7.7,324.0,359.0,52.015,3.83,41.39,1026.0,2.49,624.0,12.7,2025-04-16 15:22:14.240000,7.8
std,18.204728,69.220946,55.719778,1.198945,23.528425,64.937323,2.176155,0.276421,3.139211,11.75461,0.110006,30.327191,1.198945,,1.198945


### Conditional indexing and selection

[[back to top]](#Table-of-Contents)

As we said above DataFrame is a group of Series objects. This allows you to select specific column (a Series) from the DataFrame (in this case you get a Series) or a few columns (in this case you get another DataFrame)

In [54]:
eqPastMonth_mag = eqPastMonth['mag']
# Here we are showing only one column, i.e. a Series
print ('type:', type(eqPastMonth_mag))
eqPastMonth_mag.head(10)

type: <class 'pandas.core.series.Series'>


0    1.03
1    2.16
2    3.81
3    1.76
4    0.28
5    1.60
6    1.34
7    0.93
8    0.94
9    0.74
Name: mag, dtype: float64

In [55]:
eqPastMonth_record = eqPastMonth[['time','depth', 'mag', 'place']]
# Here we are showing four columns, i.e. a new DataFrame
print ('type:', type(eqPastMonth_record))
eqPastMonth_record.tail()

type: <class 'pandas.core.frame.DataFrame'>


Unnamed: 0,time,depth,mag,place
10444,2025-03-17T15:51:07.140Z,1.04,-0.18,"85 km NW of Karluk, Alaska"
10445,2025-03-17T15:43:30.730Z,11.27,0.5,"3 km ESE of Lake Henshaw, CA"
10446,2025-03-17T15:41:51.772Z,35.0,4.9,"42 km ENE of Santa Maria, Philippines"
10447,2025-03-17T15:41:11.314Z,58.1,1.6,"1 km ENE of Nikolaevsk, Alaska"
10448,2025-03-17T15:26:17.490Z,1.27,1.1,"0 km NW of The Geysers, CA"


You can also refer to one column in such way

In [56]:
eqPastMonth_record.time

0        2025-04-16T15:22:14.240Z
1        2025-04-16T15:19:27.860Z
2        2025-04-16T15:04:56.790Z
3        2025-04-16T15:00:49.220Z
4        2025-04-16T14:59:44.190Z
                   ...           
10444    2025-03-17T15:51:07.140Z
10445    2025-03-17T15:43:30.730Z
10446    2025-03-17T15:41:51.772Z
10447    2025-03-17T15:41:11.314Z
10448    2025-03-17T15:26:17.490Z
Name: time, Length: 10449, dtype: object

Filtered DataFrames can be obtained by using of logic operators

In [57]:
# Let's display only large earthquakes
eqPastMonth_large = eqPastMonth[eqPastMonth['mag'] > 5]
eqPastMonth_large.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1
27,2025-04-16T12:41:01.967Z,-32.4632,-14.187,10.0,5.6,mb,52.0,66.0,4.845,1.07,us,us6000q6fi,2025-04-16T15:05:07.589Z,southern Mid-Atlantic Ridge,earthquake,10.78,1.865,0.06,99.0,reviewed,us,us,10.6,2025-04-16 12:41:01.967,5.7
127,2025-04-16T03:49:49.620Z,51.1546,-178.4009,39.929,5.2,mwr,111.0,126.0,0.476,1.03,us,us6000q6d9,2025-04-16T09:59:55.041Z,"146 km SW of Adak, Alaska",earthquake,4.74,6.226,0.083,14.0,reviewed,us,us,10.2,2025-04-16 03:49:49.620,5.3
136,2025-04-16T02:25:57.691Z,-47.9686,99.7924,10.0,5.2,mb,36.0,71.0,19.207,0.95,us,us6000q6d3,2025-04-16T02:43:22.040Z,southeast Indian Ridge,earthquake,11.85,1.903,0.097,35.0,reviewed,us,us,10.2,2025-04-16 02:25:57.691,5.3
147,2025-04-16T01:42:59.982Z,-47.8431,99.7581,10.0,6.6,mww,61.0,49.0,19.334,0.65,us,us6000q6cs,2025-04-16T04:07:08.865Z,southeast Indian Ridge,earthquake,10.95,1.856,0.093,11.0,reviewed,us,us,11.6,2025-04-16 01:42:59.982,6.7
173,2025-04-15T23:13:59.032Z,35.9389,70.564,99.0,5.6,mww,114.0,21.0,1.84,0.64,us,us6000q6c3,2025-04-16T13:59:58.403Z,"66 km NNW of Pārūn, Afghanistan",earthquake,7.22,1.898,0.073,18.0,reviewed,us,us,10.6,2025-04-15 23:13:59.032,5.7
196,2025-04-15T21:42:42.424Z,5.7983,124.2278,10.0,5.6,mww,85.0,38.0,1.635,0.45,us,us6000q6br,2025-04-16T14:47:57.797Z,"38 km SSW of Maguling, Philippines",earthquake,5.8,1.812,0.068,21.0,reviewed,us,us,10.6,2025-04-15 21:42:42.424,5.7
261,2025-04-15T15:49:03.235Z,-6.5082,154.9681,79.751,5.2,mww,99.0,47.0,3.619,0.59,us,us6000q65u,2025-04-15T16:16:04.040Z,"60 km WSW of Panguna, Papua New Guinea",earthquake,9.09,5.116,0.08,15.0,reviewed,us,us,10.2,2025-04-15 15:49:03.235,5.3
317,2025-04-15T11:22:33.624Z,-16.7724,-23.4304,10.0,5.1,mb,74.0,62.0,12.472,0.71,us,us6000q64r,2025-04-15T11:42:00.040Z,South Atlantic Ocean,earthquake,10.26,1.863,0.049,135.0,reviewed,us,us,10.1,2025-04-15 11:22:33.624,5.2
762,2025-04-14T17:08:28.110Z,33.035833,-116.594833,14.29,5.21,mw,119.0,20.0,0.04508,0.21,ci,ci40925991,2025-04-16T15:06:17.433Z,"5 km S of Julian, CA",earthquake,0.12,0.3,,6.0,reviewed,ci,ci,10.21,2025-04-14 17:08:28.110,5.31
957,2025-04-14T01:33:14.326Z,0.4975,126.1491,38.218,5.3,mb,56.0,37.0,1.247,0.85,us,us6000q5vw,2025-04-14T01:43:49.040Z,"141 km WSW of Ternate, Indonesia",earthquake,5.32,10.403,0.081,52.0,reviewed,us,us,10.3,2025-04-14 01:33:14.326,5.4


In [58]:
#Getting records that are large (>5mag) earthquakes and that occurred in the northern hemisphere
filtered_df_1 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['latitude'] > 0)]
filtered_df_1.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,magPlus5,datetime,magPlus1
count,47.0,47.0,47.0,47.0,46.0,46.0,46.0,47.0,46.0,47.0,45.0,46.0,47.0,47,47.0
mean,28.638222,23.513046,33.390596,5.570426,132.847826,62.543478,3.923806,0.75234,7.001522,2.988574,0.057778,98.347826,10.570426,2025-04-01 01:33:28.168893696,5.670426
min,0.4975,-178.4009,7.673,5.1,56.0,20.0,0.04508,0.21,0.12,0.2,0.023,6.0,10.1,2025-03-17 22:23:37.279000,5.2
25%,7.53875,-33.5404,10.0,5.2,93.25,33.5,0.99075,0.63,5.34,1.832,0.041,18.0,10.2,2025-03-23 13:21:58.152000,5.3
50%,24.8678,70.7895,10.0,5.4,113.0,56.0,2.11,0.73,7.435,1.868,0.052,39.5,10.4,2025-04-01 16:54:52.984999936,5.5
75%,51.15375,118.36235,29.3475,5.8,157.25,82.0,4.9645,0.855,8.6575,4.147,0.073,97.75,10.8,2025-04-07 13:36:41.872000,5.9
max,83.5057,172.8099,382.253,7.7,322.0,176.0,13.967,1.54,10.66,10.403,0.117,624.0,12.7,2025-04-16 03:49:49.620000,7.8
std,22.133357,114.699,60.403762,0.554644,59.682853,36.34997,3.922397,0.221774,2.239911,2.065459,0.023241,141.983758,0.554644,,0.554644


In [59]:
#Getting records that are large (>5mag) earthquakes and that occurred in the southern hemisphere
filtered_df_1 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['latitude'] < 0)]
filtered_df_1.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst,magPlus5,datetime,magPlus1
count,72.0,72.0,72.0,72.0,72.0,72.0,72.0,72.0,72.0,72.0,72.0,72.0,72.0,72,72.0
mean,-23.133233,40.121068,41.872139,5.426389,79.694444,66.375,4.728958,0.836528,9.140556,3.133333,0.063708,79.416667,10.426389,2025-04-02 12:49:36.277972224,5.526389
min,-65.0709,-178.7644,10.0,5.1,24.0,11.0,0.083,0.47,2.73,1.682,0.025,8.0,10.1,2025-03-17 17:32:18.505000,5.2
25%,-31.81115,-69.667375,10.0,5.1,52.0,41.0,2.10475,0.67,7.4275,1.842,0.0505,24.0,10.1,2025-03-26 07:16:05.243249920,5.2
50%,-17.94455,121.6061,10.0,5.2,72.0,60.0,3.2585,0.815,9.4,1.892,0.0625,48.5,10.2,2025-04-04 20:57:08.292499968,5.3
75%,-6.2971,151.605175,57.36775,5.6,92.0,76.5,4.977,0.9525,10.585,4.656,0.075,102.75,10.6,2025-04-08 19:48:30.720750080,5.7
max,-2.831,178.627,347.281,7.0,324.0,169.0,25.295,1.52,13.47,10.821,0.11,525.0,12.0,2025-04-16 12:41:01.967000,7.1
std,19.322958,130.52117,61.149012,0.454397,47.515643,35.340372,4.926695,0.216724,2.196999,2.036073,0.018664,88.382133,0.454397,,0.454397


In [60]:
#Getting records that are large (>5mag) earthquakes and that occurred in the western hemisphere, but not after 120 w longitude, also filter columns in output
filtered_df_2 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['longitude'] < 0) & (eqPastMonth['longitude'] > -120)][['depth', 'mag', 'place']]
filtered_df_2.head(10)

Unnamed: 0,depth,mag,place
27,10.0,5.6,southern Mid-Atlantic Ridge
317,10.0,5.1,South Atlantic Ocean
762,14.29,5.21,"5 km S of Julian, CA"
1083,10.0,5.1,West Chile Rise
1262,10.0,5.2,South Sandwich Islands region
2219,10.0,5.1,Easter Island region
2516,10.0,5.3,central Mid-Atlantic Ridge
2772,10.0,5.2,southeast central Pacific Ocean
2854,10.0,5.3,central Mid-Atlantic Ridge
3674,27.745,5.6,South Sandwich Islands region


You can also use the method `isin(range)` for checking the presence of Series items in range, method `isnull()` for define `null` (`NaN`) values and boolean operators `&` (`AND`) and `|` (`OR`) in complicated conditions.

As you can see after filtering result tables (i.e. DataFrames) have non-ordered indexes. To fix this trouble you may write the following:

In [61]:
filtered_df_2.reset_index().head(10)

Unnamed: 0,index,depth,mag,place
0,27,10.0,5.6,southern Mid-Atlantic Ridge
1,317,10.0,5.1,South Atlantic Ocean
2,762,14.29,5.21,"5 km S of Julian, CA"
3,1083,10.0,5.1,West Chile Rise
4,1262,10.0,5.2,South Sandwich Islands region
5,2219,10.0,5.1,Easter Island region
6,2516,10.0,5.3,central Mid-Atlantic Ridge
7,2772,10.0,5.2,southeast central Pacific Ocean
8,2854,10.0,5.3,central Mid-Atlantic Ridge
9,3674,27.745,5.6,South Sandwich Islands region


to start indexing form 0 and regularize it.

Also remember that you can add new columns and rows to the DataFrame:

In [62]:
#set new custom_score column and fill it with empty strings
eqPastMonth['custom_mag'] = ''
eqPastMonth['custom_mag'] = np.where(eqPastMonth['mag'] < 5, 'Small', "Large")
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1,custom_mag
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03,2025-04-16 15:22:14.240,1.13,Small
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860,2.26,Small
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790,3.91,Small
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220,1.86,Small
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28,2025-04-16 14:59:44.190,0.38,Small
5,2025-04-16T14:42:52.624Z,64.5566,-149.2173,0.8,1.6,ml,,,,0.45,ak,ak0254vjjyy6,2025-04-16T14:44:46.021Z,"5 km W of Nenana, Alaska",earthquake,,0.3,,,automatic,ak,ak,6.6,2025-04-16 14:42:52.624,1.7,Small
6,2025-04-16T14:38:39.010Z,36.940498,-121.473,11.03,1.34,md,5.0,190.0,0.1241,0.03,nc,nc75166286,2025-04-16T14:57:17.213Z,"11 km SE of Gilroy, CA",earthquake,1.19,1.93,,1.0,automatic,nc,nc,6.34,2025-04-16 14:38:39.010,1.44,Small
7,2025-04-16T14:34:00.340Z,46.615833,-119.797333,7.74,0.93,ml,11.0,80.0,0.02353,0.08,uw,uw62091961,2025-04-16T15:02:51.910Z,"11 km SE of Desert Aire, Washington",earthquake,0.26,0.18,0.139248,11.0,reviewed,uw,uw,5.93,2025-04-16 14:34:00.340,1.03,Small
8,2025-04-16T14:30:22.000Z,33.499,-116.441833,7.48,0.94,ml,44.0,73.0,0.1341,0.22,ci,ci40930047,2025-04-16T14:46:05.920Z,"22 km SW of La Quinta, CA",earthquake,0.21,1.04,0.166,22.0,reviewed,ci,ci,5.94,2025-04-16 14:30:22.000,1.04,Small
9,2025-04-16T14:29:29.390Z,38.799168,-122.752167,2.1,0.74,md,9.0,97.0,0.006073,0.02,nc,nc75166276,2025-04-16T14:47:19.151Z,"2 km NNE of The Geysers, CA",earthquake,0.4,0.9,0.07,9.0,automatic,nc,nc,5.74,2025-04-16 14:29:29.390,0.84,Small


### Work with indexes and MultiIndex option

[[back to top]](#Table-of-Contents)

Pandas allows to set specific indexes to a DataFrame. It can be defined at creating of a DataFrame:

In [63]:
import random
indexes = [random.randrange(0,100) for i in range(5)]
data = [{i:random.randint(0,10) for i in 'ABCDE'} for i in range(5)]
df = pd.DataFrame(data, index=indexes)
df

Unnamed: 0,A,B,C,D,E
84,4,10,0,3,10
84,5,4,8,9,4
63,0,7,0,6,0
21,7,7,9,6,0
74,6,5,6,2,0


Or be change any time

In [64]:
df.index = ['a', 'b', 'c', 'd', 'e']
df

Unnamed: 0,A,B,C,D,E
a,4,10,0,3,10
b,5,4,8,9,4
c,0,7,0,6,0
d,7,7,9,6,0
e,6,5,6,2,0


There is the possibility to select any column (one or more) as index column

In [65]:
# if duplicates exist you can drop duplicates to get unique values
#eqPastMonth_nodups = eqPastMonth.drop_duplicates(subset='time', keep='last')
# we don't need to do that.
# set 'time' as index
eqPastMonth_indexChange = eqPastMonth.set_index('time')
eqPastMonth_indexChange.head(10)

Unnamed: 0_level_0,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1,custom_mag
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03,2025-04-16 15:22:14.240,1.13,Small
2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860,2.26,Small
2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790,3.91,Small
2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220,1.86,Small
2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28,2025-04-16 14:59:44.190,0.38,Small
2025-04-16T14:42:52.624Z,64.5566,-149.2173,0.8,1.6,ml,,,,0.45,ak,ak0254vjjyy6,2025-04-16T14:44:46.021Z,"5 km W of Nenana, Alaska",earthquake,,0.3,,,automatic,ak,ak,6.6,2025-04-16 14:42:52.624,1.7,Small
2025-04-16T14:38:39.010Z,36.940498,-121.473,11.03,1.34,md,5.0,190.0,0.1241,0.03,nc,nc75166286,2025-04-16T14:57:17.213Z,"11 km SE of Gilroy, CA",earthquake,1.19,1.93,,1.0,automatic,nc,nc,6.34,2025-04-16 14:38:39.010,1.44,Small
2025-04-16T14:34:00.340Z,46.615833,-119.797333,7.74,0.93,ml,11.0,80.0,0.02353,0.08,uw,uw62091961,2025-04-16T15:02:51.910Z,"11 km SE of Desert Aire, Washington",earthquake,0.26,0.18,0.139248,11.0,reviewed,uw,uw,5.93,2025-04-16 14:34:00.340,1.03,Small
2025-04-16T14:30:22.000Z,33.499,-116.441833,7.48,0.94,ml,44.0,73.0,0.1341,0.22,ci,ci40930047,2025-04-16T14:46:05.920Z,"22 km SW of La Quinta, CA",earthquake,0.21,1.04,0.166,22.0,reviewed,ci,ci,5.94,2025-04-16 14:30:22.000,1.04,Small
2025-04-16T14:29:29.390Z,38.799168,-122.752167,2.1,0.74,md,9.0,97.0,0.006073,0.02,nc,nc75166276,2025-04-16T14:47:19.151Z,"2 km NNE of The Geysers, CA",earthquake,0.4,0.9,0.07,9.0,automatic,nc,nc,5.74,2025-04-16 14:29:29.390,0.84,Small


By default, `set_index()` returns a new DataFrame, so you’ll have to specify if you’d like the changes to occur in place.

Let’s create a many levels index for `filtered_df_2` DataFrame

In [66]:
# set 'id' & 'type' as index
eqPastMonth_multi = eqPastMonth.set_index(['id','type'])[["latitude","longitude","depth","mag", "place"]]
eqPastMonth_multi.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,latitude,longitude,depth,mag,place
id,type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
nc75166321,earthquake,38.821499,-122.762337,1.78,1.03,"3 km W of Cobb, CA"
hv74654467,earthquake,19.203501,-155.373001,30.68,2.16,"11 km E of Pāhala, Hawaii"
nc75166296,earthquake,40.368999,-125.008835,2.99,3.81,"62 km WNW of Petrolia, CA"
mb90078908,earthquake,44.342667,-115.176667,10.84,1.76,"23 km NW of Stanley, Idaho"
nc75166291,earthquake,38.803665,-122.779831,3.24,0.28,"4 km NNW of The Geysers, CA"
ak0254vjjyy6,earthquake,64.5566,-149.2173,0.8,1.6,"5 km W of Nenana, Alaska"
nc75166286,earthquake,36.940498,-121.473,11.03,1.34,"11 km SE of Gilroy, CA"
uw62091961,earthquake,46.615833,-119.797333,7.74,0.93,"11 km SE of Desert Aire, Washington"
ci40930047,earthquake,33.499,-116.441833,7.48,0.94,"22 km SW of La Quinta, CA"
nc75166276,earthquake,38.799168,-122.752167,2.1,0.74,"2 km NNE of The Geysers, CA"


and see the type of `eqPastMonth_multi.index()`

In [67]:
print ('type: ', type(eqPastMonth_multi.index))

type:  <class 'pandas.core.indexes.multi.MultiIndex'>


Thus, we get a new pandas class MultiIndex, which contains information about indexing of DataFrame and allows manipulating with this data. It’s interesting what is the type of `filtered_df_2.index()`?

You can get levels, labels and names values simply address it as to an attribute

### Selection by label and position
[[back to top]](#Table-of-Contents)

After reading previous three subparagraphs probably you have the question: Ok, I know now filter a DataFrame, how make it multi-indexed, but I don’t know how select any specific row in the table.
Object selection in pandas is now supported by two types of multi-axis indexing.

* `.loc` works on labels in the index;
* `.iloc` works on the positions in the index (so it only takes integers);

    
The sequence of the following examples demonstrates how we can manipulate with DataFrame’s rows.
At first let’s get the first row of equakes in the past month.

In [68]:
#To return a single record(i.e. row), in this case the first one.
eqPastMonth.loc[0]

time                 2025-04-16T15:22:14.240Z
latitude                            38.821499
longitude                         -122.762337
depth                                    1.78
mag                                      1.03
magType                                    md
nst                                      14.0
gap                                     123.0
dmin                                  0.01074
rms                                      0.02
net                                        nc
id                                 nc75166321
updated              2025-04-16T15:23:50.969Z
place                      3 km W of Cobb, CA
type                               earthquake
horizontalError                          0.29
depthError                               0.53
magError                                 0.09
magNst                                   15.0
status                              automatic
locationSource                             nc
magSource                         

and rows from 1 to 3 (pay attention on setting of ranges in `.loc`, the right boundary is included to this range which IS different than Python lists and string data structures)

In [69]:
eqPastMonth.loc[1:3]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1,custom_mag
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860,2.26,Small
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790,3.91,Small
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220,1.86,Small


As you can see the first argument of `.loc` corresponds to index name. If you want return value of specific column(s), you should to define the name of this(these) column(s)

In [70]:
eqPastMonth.loc[0, 'place']

'3 km W of Cobb, CA'

In [71]:
eqPastMonth.loc[3:10, ['place', 'mag']]

Unnamed: 0,place,mag
3,"23 km NW of Stanley, Idaho",1.76
4,"4 km NNW of The Geysers, CA",0.28
5,"5 km W of Nenana, Alaska",1.6
6,"11 km SE of Gilroy, CA",1.34
7,"11 km SE of Desert Aire, Washington",0.93
8,"22 km SW of La Quinta, CA",0.94
9,"2 km NNE of The Geysers, CA",0.74
10,"9 km NW of Tonasket, Washington",3.03


Let’s repeat that the first argument of `.loc` is not row number but name of the index for this row

But if it is necessary to obtain rows by it number you may use `.iloc`

In [72]:
eqPastMonth.iloc[0]

time                 2025-04-16T15:22:14.240Z
latitude                            38.821499
longitude                         -122.762337
depth                                    1.78
mag                                      1.03
magType                                    md
nst                                      14.0
gap                                     123.0
dmin                                  0.01074
rms                                      0.02
net                                        nc
id                                 nc75166321
updated              2025-04-16T15:23:50.969Z
place                      3 km W of Cobb, CA
type                               earthquake
horizontalError                          0.29
depthError                               0.53
magError                                 0.09
magNst                                   15.0
status                              automatic
locationSource                             nc
magSource                         

In [73]:
eqPastMonth.iloc[1:5,3:5]

Unnamed: 0,depth,mag
1,30.68,2.16
2,2.99,3.81
3,10.84,1.76
4,3.24,0.28


In the first case column’s number coincides with its name. The second example demonstrates the difference between `.loc` and `.iloc`

In [74]:
eqPastMonth.loc[1:5]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1,custom_mag
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860,2.26,Small
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790,3.91,Small
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220,1.86,Small
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28,2025-04-16 14:59:44.190,0.38,Small
5,2025-04-16T14:42:52.624Z,64.5566,-149.2173,0.8,1.6,ml,,,,0.45,ak,ak0254vjjyy6,2025-04-16T14:44:46.021Z,"5 km W of Nenana, Alaska",earthquake,,0.3,,,automatic,ak,ak,6.6,2025-04-16 14:42:52.624,1.7,Small


### Work with missing data

[[back to top]](#Table-of-Contents)

Pandas primarily uses the value `np.nan` to represent missing data (in table missed/empty value are marked by `NaN`). It is by default not included in computations. Missing data creates many issues at mathematical or computational tasks with DataFrames and Series and it’s important to know how fight with these values.

Previously we have learned how to check `null` and `non-null` values in the DataFrame and Series and how to miss `null` row in the table. But what to do if we need to use rows with `null` data, for example, find sum of all values in the dataset?

Let’s try do this


In [75]:
magError = eqPastMonth['magError']
sum(magError)

nan

The result is unexpected because there many `non-null` values in `eqPastMonth['magError']` Series. Sure, we could filter `magError['magError']`  and remain only `non-null` values. But what if we need sum all numerical values in `magError`? This way will be powerless or too complicated, because we will drop all row items even there is only one `null` value in this row. You can try to do this yourself.

To solve the assigned task you may use an elegant pandas method `fillna(value)`, which replace all `null` values by value.


In [76]:
magError = eqPastMonth['magError'].fillna(0)
magError.median()

0.141

In [79]:
eqPastMonth_fillna = eqPastMonth.fillna(0)
eqPastMonth_fillna.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1,custom_mag
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03,2025-04-16 15:22:14.240,1.13,Small
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860,2.26,Small
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790,3.91,Small
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220,1.86,Small
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28,2025-04-16 14:59:44.190,0.38,Small
5,2025-04-16T14:42:52.624Z,64.5566,-149.2173,0.8,1.6,ml,0.0,0.0,0.0,0.45,ak,ak0254vjjyy6,2025-04-16T14:44:46.021Z,"5 km W of Nenana, Alaska",earthquake,0.0,0.3,0.0,0.0,automatic,ak,ak,6.6,2025-04-16 14:42:52.624,1.7,Small
6,2025-04-16T14:38:39.010Z,36.940498,-121.473,11.03,1.34,md,5.0,190.0,0.1241,0.03,nc,nc75166286,2025-04-16T14:57:17.213Z,"11 km SE of Gilroy, CA",earthquake,1.19,1.93,0.0,1.0,automatic,nc,nc,6.34,2025-04-16 14:38:39.010,1.44,Small
7,2025-04-16T14:34:00.340Z,46.615833,-119.797333,7.74,0.93,ml,11.0,80.0,0.02353,0.08,uw,uw62091961,2025-04-16T15:02:51.910Z,"11 km SE of Desert Aire, Washington",earthquake,0.26,0.18,0.139248,11.0,reviewed,uw,uw,5.93,2025-04-16 14:34:00.340,1.03,Small
8,2025-04-16T14:30:22.000Z,33.499,-116.441833,7.48,0.94,ml,44.0,73.0,0.1341,0.22,ci,ci40930047,2025-04-16T14:46:05.920Z,"22 km SW of La Quinta, CA",earthquake,0.21,1.04,0.166,22.0,reviewed,ci,ci,5.94,2025-04-16 14:30:22.000,1.04,Small
9,2025-04-16T14:29:29.390Z,38.799168,-122.752167,2.1,0.74,md,9.0,97.0,0.006073,0.02,nc,nc75166276,2025-04-16T14:47:19.151Z,"2 km NNE of The Geysers, CA",earthquake,0.4,0.9,0.07,9.0,automatic,nc,nc,5.74,2025-04-16 14:29:29.390,0.84,Small


Thus, we replace all `NaN` items to `0`. If `inplace=True` in `fillna()` method, then a DataFrame renew.
   
To remain only rows with `non-null` values you can use method `dropna()`

In [85]:
# Drop rows with any missing values
eqPastMonth_fillna = eqPastMonth.dropna(axis=0)
# Print mean of each numeric column
print(eqPastMonth_fillna.mean(numeric_only=True))
# Show the first 10 rows of the cleaned DataFrame
eqPastMonth_fillna.head(10)

latitude            35.520464
longitude         -101.924453
depth               17.320879
mag                  1.470220
nst                 23.799040
gap                112.544884
dmin                 0.492024
rms                  0.207807
horizontalError      1.613937
depthError           2.385313
magError             0.171903
magNst              19.354020
magPlus5             6.470220
magPlus1             1.570220
dtype: float64


Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,magPlus5,datetime,magPlus1,custom_mag
0,2025-04-16T15:22:14.240Z,38.821499,-122.762337,1.78,1.03,md,14.0,123.0,0.01074,0.02,nc,nc75166321,2025-04-16T15:23:50.969Z,"3 km W of Cobb, CA",earthquake,0.29,0.53,0.09,15.0,automatic,nc,nc,6.03,2025-04-16 15:22:14.240,1.13,Small
1,2025-04-16T15:19:27.860Z,19.203501,-155.373001,30.68,2.16,ml,61.0,162.0,0.04099,0.13,hv,hv74654467,2025-04-16T15:23:16.150Z,"11 km E of Pāhala, Hawaii",earthquake,0.42,0.45,0.72,17.0,automatic,hv,hv,7.16,2025-04-16 15:19:27.860,2.26,Small
2,2025-04-16T15:04:56.790Z,40.368999,-125.008835,2.99,3.81,ml,103.0,247.0,0.5156,0.19,nc,nc75166296,2025-04-16T15:18:08.320Z,"62 km WNW of Petrolia, CA",earthquake,1.27,2.0,0.145,11.0,reviewed,nc,nc,8.81,2025-04-16 15:04:56.790,3.91,Small
3,2025-04-16T15:00:49.220Z,44.342667,-115.176667,10.84,1.76,ml,16.0,123.0,0.3349,0.24,mb,mb90078908,2025-04-16T15:15:17.250Z,"23 km NW of Stanley, Idaho",earthquake,0.63,1.07,0.123255,14.0,reviewed,mb,mb,6.76,2025-04-16 15:00:49.220,1.86,Small
4,2025-04-16T14:59:44.190Z,38.803665,-122.779831,3.24,0.28,md,13.0,84.0,0.009603,0.02,nc,nc75166291,2025-04-16T15:17:21.353Z,"4 km NNW of The Geysers, CA",earthquake,0.33,0.92,0.12,12.0,automatic,nc,nc,5.28,2025-04-16 14:59:44.190,0.38,Small
7,2025-04-16T14:34:00.340Z,46.615833,-119.797333,7.74,0.93,ml,11.0,80.0,0.02353,0.08,uw,uw62091961,2025-04-16T15:02:51.910Z,"11 km SE of Desert Aire, Washington",earthquake,0.26,0.18,0.139248,11.0,reviewed,uw,uw,5.93,2025-04-16 14:34:00.340,1.03,Small
8,2025-04-16T14:30:22.000Z,33.499,-116.441833,7.48,0.94,ml,44.0,73.0,0.1341,0.22,ci,ci40930047,2025-04-16T14:46:05.920Z,"22 km SW of La Quinta, CA",earthquake,0.21,1.04,0.166,22.0,reviewed,ci,ci,5.94,2025-04-16 14:30:22.000,1.04,Small
9,2025-04-16T14:29:29.390Z,38.799168,-122.752167,2.1,0.74,md,9.0,97.0,0.006073,0.02,nc,nc75166276,2025-04-16T14:47:19.151Z,"2 km NNE of The Geysers, CA",earthquake,0.4,0.9,0.07,9.0,automatic,nc,nc,5.74,2025-04-16 14:29:29.390,0.84,Small
10,2025-04-16T14:23:43.970Z,48.76,-119.534833,13.55,3.03,ml,16.0,219.0,0.07199,0.13,uw,uw62091951,2025-04-16T15:20:05.329Z,"9 km NW of Tonasket, Washington",earthquake,0.61,0.43,0.137904,12.0,reviewed,uw,uw,8.03,2025-04-16 14:23:43.970,3.13,Small
12,2025-04-16T14:15:14.290Z,33.050167,-116.606167,15.75,0.68,ml,35.0,41.0,0.03157,0.18,ci,ci40930023,2025-04-16T14:45:30.115Z,"3 km S of Julian, CA",earthquake,0.23,0.5,0.184,12.0,reviewed,ci,ci,5.68,2025-04-16 14:15:14.290,0.78,Small


We can manipulate by `null` values and columns using parameters subset and how to set analyzing columns and type of analysis respectively

> ### Exercise 1

> - Get type of `“latitude”` column in `eqPastMonth`. 

> - In `eqPastMonth` find all rows where `magType` corresponds to the value `"md"` and where `mag` is less `5` and `not-null` `magError`. Call the obtained DataFrmae as `eqPastMonth_md_large`. 

In [86]:
# type your code here
eqPastMonth_md_large = eqPastMonth