---
<center><h1>Basic intro to pandas</h1></center>

<center><h2>Work with pandas DataFrames: filtering, indexing and missing data</h2></center>
---

## Table of Contents

- [Work with pandas DataFrames: filtering, indexing and missing data](#Work-with-pandas-DataFrames:-filtering,-indexing-and-missing-data)
    * [Get basic information](#Get-basic-information)
    * [Conditional indexing and selection](#Conditional-indexing-and-selection)
    * [Work with indexes and MultiIndex option](#Work-with-indexes-and-MultiIndex-option)
    * [Selection by label and position](#Selection-by-label-and-position)
    * [Work with missing data](#Work-with-missing-data)
    - [*Exercise 1*](#Exercise-1)

In [190]:
import pandas as pd
import numpy as np
import random

## Work with pandas DataFrames: filtering, indexing and missing data

[[back to top]](#Table-of-Contents)

In this part we will continue our acquaintance with DataFrames and will get to know 
1.	how to get basic information about DataFrame and its content;
2.	how to get a segment of a Dataframe and select rows from DataFrame, which satisfy some conditions;
3.	how to change indexes in DataFrame and make advanced indexing;
4.	how to select any rows by its indexes, labels and positions;
5.	how to work with missing data.

Thus, we will divide the whole text of this lesson into logic constructed code blocks with respect to mentioned above points. In the following posts we will continue our learning of pandas and will consider its other features.

In [191]:
url="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv"
eqPastMonth=pd.read_csv(url)
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us
6,2021-02-17T03:27:43.060Z,38.835834,-122.786835,1.84,0.91,md,22.0,73.0,0.003758,0.04,nc,nc73524141,2021-02-17T03:43:04.287Z,"6km WNW of Cobb, CA",earthquake,0.22,0.39,0.13,3.0,automatic,nc,nc
7,2021-02-17T03:25:29.491Z,-25.0984,-71.0266,10.0,4.4,mb,,169.0,0.735,0.56,us,us6000diaa,2021-02-17T03:37:29.040Z,"64 km WNW of Taltal, Chile",earthquake,4.5,2.0,0.24,5.0,reviewed,us,us
8,2021-02-17T03:24:15.431Z,59.8278,-152.2928,78.8,2.1,ml,,,,0.44,ak,ak02127fq4w8,2021-02-17T03:31:19.831Z,"26 km WNW of Anchor Point, Alaska",earthquake,,0.8,,,automatic,ak,ak
9,2021-02-17T02:58:09.380Z,18.7791,-66.1428,29.0,2.96,md,19.0,264.0,0.4738,0.22,pr,pr2021048001,2021-02-17T03:31:00.983Z,"34 km N of San Juan, Puerto Rico",earthquake,0.89,3.56,0.13,17.0,reviewed,pr,pr


### Get basic information

[[back to top]](#Table-of-Contents)

pandas has a set of functions for getting basic information about DataFrame:

Lets take a look on type of `eqPastMonth` columns

In [192]:
eqPastMonth.dtypes

time                object
latitude           float64
longitude          float64
depth              float64
mag                float64
magType             object
nst                float64
gap                float64
dmin               float64
rms                float64
net                 object
id                  object
updated             object
place               object
type                object
horizontalError    float64
depthError         float64
magError           float64
magNst             float64
status              object
locationSource      object
magSource           object
dtype: object

You may notice that the dtype forthe time column is by default of type "object" meaning a string.  You can change this by using the apply function which allows one to apply a function to every row in series or dataframe. A "lambda" is a shorthand way to write your own function.

In [193]:
eqPastMonth['datetime'] = eqPastMonth['time'].apply(lambda x: (datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')))
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime
0,2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc,2021-02-17 04:25:59.180
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn,2021-02-17 03:36:18.420


Notice the new "datetime" column.  It of time datetime.

In [194]:
eqPastMonth.dtypes

time                       object
latitude                  float64
longitude                 float64
depth                     float64
mag                       float64
magType                    object
nst                       float64
gap                       float64
dmin                      float64
rms                       float64
net                        object
id                         object
updated                    object
place                      object
type                       object
horizontalError           float64
depthError                float64
magError                  float64
magNst                    float64
status                     object
locationSource             object
magSource                  object
datetime           datetime64[ns]
dtype: object

You can also see basic statistics about the DataFrame’s numeric columns

In [195]:
eqPastMonth.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10620 entries, 0 to 10619
Data columns (total 23 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   time             10620 non-null  object        
 1   latitude         10620 non-null  float64       
 2   longitude        10620 non-null  float64       
 3   depth            10620 non-null  float64       
 4   mag              10620 non-null  float64       
 5   magType          10620 non-null  object        
 6   nst              6919 non-null   float64       
 7   gap              8113 non-null   float64       
 8   dmin             7172 non-null   float64       
 9   rms              10619 non-null  float64       
 10  net              10620 non-null  object        
 11  id               10620 non-null  object        
 12  updated          10620 non-null  object        
 13  place            10620 non-null  object        
 14  type             10620 non-null  objec

Method `info()` shows (top down)
+ that `eqPastMonth` is an instance of DataFrame’s class; this information we have obtained with help of function `type()`;
+ number of rows in DataFrame;
+ type of each column and number of non-null rows in this column; this information in a shorted view was given by `dtypes`;
+ memory size of the DataFrame etc.
method `describe()` allows to quickly get average, minimal and maximal values, standard deviation etc. in each DataFrame column with numeric items

In [196]:
eqPastMonth.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst
count,10620.0,10620.0,10620.0,10620.0,6919.0,8113.0,7172.0,10619.0,7074.0,10619.0,7449.0,7828.0
mean,38.057262,-111.55976,22.700524,1.701339,22.034543,116.841777,0.647677,0.29834,1.771323,2.689489,0.240811,15.951456
std,20.09134,66.637466,48.77827,1.21,16.079741,65.360299,2.196168,0.276879,3.13608,49.156653,0.389112,28.485505
min,-64.7563,-179.9899,-4.7,-1.27,2.0,11.0,0.0,0.0,0.09,0.0,0.0,0.0
25%,33.381708,-150.1127,4.34,0.91,11.0,70.0,0.023915,0.10985,0.27,0.4,0.106,5.0
50%,38.15835,-118.932,8.9,1.43,18.0,100.0,0.067395,0.19,0.46,0.7,0.162,9.0
75%,53.5783,-115.638542,18.3,2.1,28.0,148.0,0.189075,0.45,0.96,1.5,0.23,18.0
max,85.0818,179.6491,662.1,7.7,145.0,350.0,52.111,2.93,26.8,5041.1,5.19,620.0


### Conditional indexing and selection

[[back to top]](#Table-of-Contents)

As we said above DataFrame is a group of Series objects. This allows you to select specific column (a Series) from the DataFrame (in this case you get a Series) or a few columns (in this case you get another DataFrame)

In [197]:
eqPastMonth_mag = eqPastMonth['mag']
# Here we are showing only one column, i.e. a Series
print ('type:', type(eqPastMonth_mag))
eqPastMonth_mag.head(10)

type: <class 'pandas.core.series.Series'>


0    1.91
1    1.18
2    0.86
3    5.00
4    1.20
5    5.50
6    0.91
7    4.40
8    2.10
9    2.96
Name: mag, dtype: float64

In [198]:
eqPastMonth_record = eqPastMonth[['time','depth', 'mag', 'place']]
# Here we are showing four columns, i.e. a new DataFrame
print ('type:', type(eqPastMonth_record))
eqPastMonth_record.tail()

type: <class 'pandas.core.frame.DataFrame'>


Unnamed: 0,time,depth,mag,place
10615,2021-01-18T04:55:20.700Z,34.759998,2.15,"7 km ENE of Pāhala, Hawaii"
10616,2021-01-18T04:51:51.831Z,11.4,2.4,"87 km SSE of Sand Point, Alaska"
10617,2021-01-18T04:49:26.600Z,5.79,1.51,"6 km ENE of Government Camp, Oregon"
10618,2021-01-18T04:48:52.780Z,6.09,1.11,"6 km ENE of Government Camp, Oregon"
10619,2021-01-18T04:43:45.240Z,12.14,1.7,"5km W of Morongo Valley, CA"


You can also refer to one column in such way

In [199]:
eqPastMonth_record.time

0        2021-02-17T04:25:59.180Z
1        2021-02-17T04:23:30.820Z
2        2021-02-17T04:20:41.210Z
3        2021-02-17T03:58:33.367Z
4        2021-02-17T03:36:18.420Z
                   ...           
10615    2021-01-18T04:55:20.700Z
10616    2021-01-18T04:51:51.831Z
10617    2021-01-18T04:49:26.600Z
10618    2021-01-18T04:48:52.780Z
10619    2021-01-18T04:43:45.240Z
Name: time, Length: 10620, dtype: object

Filtered DataFrames can be obtained by using of logic operators

In [200]:
# Let's display only large earthquakes
eqPastMonth_large = eqPastMonth[eqPastMonth['mag'] > 5]
eqPastMonth_large.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740
100,2021-02-16T19:09:41.237Z,-2.8881,139.201,48.19,5.4,mww,,47.0,6.943,0.74,us,usd000eesi,2021-02-16T21:12:58.341Z,"162 km WSW of Abepura, Indonesia",earthquake,7.6,5.1,0.083,14.0,reviewed,us,us,2021-02-16 19:09:41.237
131,2021-02-16T15:54:07.766Z,-18.8866,-173.6117,10.0,5.2,mww,,149.0,5.252,0.52,us,us6000di3t,2021-02-16T22:51:05.852Z,"47 km ESE of Neiafu, Tonga",earthquake,11.1,1.9,0.098,10.0,reviewed,us,us,2021-02-16 15:54:07.766
180,2021-02-16T09:55:32.318Z,52.0601,156.989,152.04,5.3,mb,,80.0,1.135,0.87,us,us6000di0n,2021-02-16T10:10:36.040Z,"71 km NNE of Ozernovskiy, Russia",earthquake,3.9,6.0,0.023,620.0,reviewed,us,us,2021-02-16 09:55:32.318
189,2021-02-16T08:42:11.806Z,-6.0804,112.1674,599.74,5.1,mww,,25.0,1.967,1.19,us,us6000di0i,2021-02-16T08:55:23.040Z,"91 km NNW of Paciran, Indonesia",earthquake,10.4,7.3,0.093,11.0,reviewed,us,us,2021-02-16 08:42:11.806
194,2021-02-16T08:07:23.479Z,-15.6949,167.8335,101.67,5.1,mb,,115.0,5.758,0.81,us,us6000di0g,2021-02-16T08:50:41.040Z,"62 km NE of Norsup, Vanuatu",earthquake,10.0,4.6,0.087,43.0,reviewed,us,us,2021-02-16 08:07:23.479
203,2021-02-16T06:42:01.809Z,-13.3581,166.8405,52.65,5.1,mb,,59.0,7.558,0.68,us,us6000dhzx,2021-02-16T07:45:14.040Z,"95 km NW of Sola, Vanuatu",earthquake,8.3,7.2,0.063,81.0,reviewed,us,us,2021-02-16 06:42:01.809
233,2021-02-16T03:37:11.369Z,-17.9892,167.5914,10.0,5.3,mww,,38.0,3.498,0.98,us,us6000dhyf,2021-02-16T06:25:18.040Z,"81 km WSW of Port-Vila, Vanuatu",earthquake,7.5,1.8,0.098,10.0,reviewed,us,us,2021-02-16 03:37:11.369
252,2021-02-16T01:31:59.756Z,-17.8082,167.6351,10.0,5.9,mww,,38.0,3.672,0.79,us,us6000dhxu,2021-02-17T01:35:13.282Z,"72 km W of Port-Vila, Vanuatu",earthquake,7.8,1.7,0.056,31.0,reviewed,us,us,2021-02-16 01:31:59.756
255,2021-02-16T01:22:19.619Z,-17.6551,167.7198,10.0,5.4,mww,,25.0,4.34,0.85,us,us6000dhxs,2021-02-17T01:25:18.841Z,"63 km W of Port-Vila, Vanuatu",earthquake,8.6,1.7,0.063,24.0,reviewed,us,us,2021-02-16 01:22:19.619


In [201]:
#Getting records that are large (>5mag) earthquakes and that occurred in the northern hemisphere
filtered_df_1 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['latitude'] > 0)]
filtered_df_1.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst
count,40.0,40.0,40.0,40.0,1.0,39.0,39.0,40.0,39.0,40.0,39.0,39.0
mean,28.12435,72.432125,58.69025,5.39,28.0,65.615385,2.044187,0.84075,6.319231,3.9265,0.060667,96.74359
std,16.41843,99.664015,93.880675,0.445375,,33.940052,1.879899,0.224641,1.364668,1.875015,0.020691,155.272887
min,1.2795,-179.3142,7.16,5.1,28.0,16.0,0.346,0.46,3.9,0.7,0.023,10.0
25%,14.310225,40.542625,10.0,5.1,28.0,39.5,0.624,0.6775,5.15,1.9,0.048,20.0
50%,28.78135,122.5354,35.39,5.2,28.0,64.0,1.618,0.8,6.2,4.05,0.06,34.0
75%,38.59945,141.417,67.2675,5.425,28.0,80.5,2.8045,1.005,7.35,5.5,0.0715,88.5
max,57.1226,169.2715,587.69,7.1,28.0,185.0,8.439,1.38,9.1,7.8,0.098,620.0


In [202]:
#Getting records that are large (>5mag) earthquakes and that occurred in the western hemisphere, but not after 120 w longitude, also filter columns in output
filtered_df_2 = eqPastMonth[(eqPastMonth['mag'] > 5 ) & (eqPastMonth['longitude'] < 0) & (eqPastMonth['longitude'] > -120)][['depth', 'mag', 'place']]
filtered_df_2.head(10)

Unnamed: 0,depth,mag,place
670,35.0,5.1,"60 km SSW of La Gomera, Guatemala"
815,30.87,5.2,"54 km SW of Iquique, Chile"
989,71.44,5.3,"22 km W of Celica, Ecuador"
1772,48.67,5.2,"25 km WSW of Illapel, Chile"
1942,10.0,5.7,central East Pacific Rise
2421,10.63,5.1,northern Peru
2525,103.95,5.1,"4 km ENE of Santa Rita de Siguas, Peru"
4599,10.0,6.7,West Chile Rise
5576,7.22,5.6,"82 km SSE of Lethem, Guyana"
5816,168.65,5.1,"63 km W of San Antonio de los Cobres, Argentina"


You can also use the method `isin(range)` for checking the presence of Series items in range, method `isnull()` for define `null` (`NaN`) values and boolean operators `&` (`AND`) and `|` (`OR`) in complicated conditions.

As you can see after filtering result tables (i.e. DataFrames) have non-ordered indexes. To fix this trouble you may write the following:

In [203]:
filtered_df_2.reset_index().head(10)

Unnamed: 0,index,depth,mag,place
0,670,35.0,5.1,"60 km SSW of La Gomera, Guatemala"
1,815,30.87,5.2,"54 km SW of Iquique, Chile"
2,989,71.44,5.3,"22 km W of Celica, Ecuador"
3,1772,48.67,5.2,"25 km WSW of Illapel, Chile"
4,1942,10.0,5.7,central East Pacific Rise
5,2421,10.63,5.1,northern Peru
6,2525,103.95,5.1,"4 km ENE of Santa Rita de Siguas, Peru"
7,4599,10.0,6.7,West Chile Rise
8,5576,7.22,5.6,"82 km SSE of Lethem, Guyana"
9,5816,168.65,5.1,"63 km W of San Antonio de los Cobres, Argentina"


to start indexing form 0 and regularize it.

Also remember that you can add new columns and rows to the DataFrame:

In [204]:
#set new custom_score column and fill it with empty strings
eqPastMonth['custom_mag'] = ''
eqPastMonth['custom_mag'] = np.where(eqPastMonth['mag'] < 5, 'Small', "Large")
eqPastMonth.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
0,2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc,2021-02-17 04:25:59.180,Small
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn,2021-02-17 03:36:18.420,Small
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740,Large
6,2021-02-17T03:27:43.060Z,38.835834,-122.786835,1.84,0.91,md,22.0,73.0,0.003758,0.04,nc,nc73524141,2021-02-17T03:43:04.287Z,"6km WNW of Cobb, CA",earthquake,0.22,0.39,0.13,3.0,automatic,nc,nc,2021-02-17 03:27:43.060,Small
7,2021-02-17T03:25:29.491Z,-25.0984,-71.0266,10.0,4.4,mb,,169.0,0.735,0.56,us,us6000diaa,2021-02-17T03:37:29.040Z,"64 km WNW of Taltal, Chile",earthquake,4.5,2.0,0.24,5.0,reviewed,us,us,2021-02-17 03:25:29.491,Small
8,2021-02-17T03:24:15.431Z,59.8278,-152.2928,78.8,2.1,ml,,,,0.44,ak,ak02127fq4w8,2021-02-17T03:31:19.831Z,"26 km WNW of Anchor Point, Alaska",earthquake,,0.8,,,automatic,ak,ak,2021-02-17 03:24:15.431,Small
9,2021-02-17T02:58:09.380Z,18.7791,-66.1428,29.0,2.96,md,19.0,264.0,0.4738,0.22,pr,pr2021048001,2021-02-17T03:31:00.983Z,"34 km N of San Juan, Puerto Rico",earthquake,0.89,3.56,0.13,17.0,reviewed,pr,pr,2021-02-17 02:58:09.380,Small


### Work with indexes and MultiIndex option

[[back to top]](#Table-of-Contents)

Pandas allows to set specific indexes to a DataFrame. It can be defined at creating of a DataFrame:

In [205]:
import random
indexes = [random.randrange(0,100) for i in range(5)]
data = [{i:random.randint(0,10) for i in 'ABCDE'} for i in range(5)]
df = pd.DataFrame(data, index=indexes)
df

Unnamed: 0,A,B,C,D,E
40,1,8,1,6,8
97,4,7,9,2,2
45,9,8,3,0,10
66,4,9,0,9,7
87,5,2,7,1,10


Or be change any time

In [206]:
df.index = ['a', 'b', 'c', 'd', 'e']
df

Unnamed: 0,A,B,C,D,E
a,1,8,1,6,8
b,4,7,9,2,2
c,9,8,3,0,10
d,4,9,0,9,7
e,5,2,7,1,10


There is the possibility to select any column (one or more) as index column

In [207]:
# if duplicates exist you can drop duplicates to get unique values
#eqPastMonth_nodups = eqPastMonth.drop_duplicates(subset='time', keep='last')
# we don't need to do that.
# set 'time' as index
eqPastMonth_indexChange = eqPastMonth.set_index('time')
eqPastMonth_indexChange.head(10)

Unnamed: 0_level_0,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc,2021-02-17 04:25:59.180,Small
2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large
2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn,2021-02-17 03:36:18.420,Small
2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740,Large
2021-02-17T03:27:43.060Z,38.835834,-122.786835,1.84,0.91,md,22.0,73.0,0.003758,0.04,nc,nc73524141,2021-02-17T03:43:04.287Z,"6km WNW of Cobb, CA",earthquake,0.22,0.39,0.13,3.0,automatic,nc,nc,2021-02-17 03:27:43.060,Small
2021-02-17T03:25:29.491Z,-25.0984,-71.0266,10.0,4.4,mb,,169.0,0.735,0.56,us,us6000diaa,2021-02-17T03:37:29.040Z,"64 km WNW of Taltal, Chile",earthquake,4.5,2.0,0.24,5.0,reviewed,us,us,2021-02-17 03:25:29.491,Small
2021-02-17T03:24:15.431Z,59.8278,-152.2928,78.8,2.1,ml,,,,0.44,ak,ak02127fq4w8,2021-02-17T03:31:19.831Z,"26 km WNW of Anchor Point, Alaska",earthquake,,0.8,,,automatic,ak,ak,2021-02-17 03:24:15.431,Small
2021-02-17T02:58:09.380Z,18.7791,-66.1428,29.0,2.96,md,19.0,264.0,0.4738,0.22,pr,pr2021048001,2021-02-17T03:31:00.983Z,"34 km N of San Juan, Puerto Rico",earthquake,0.89,3.56,0.13,17.0,reviewed,pr,pr,2021-02-17 02:58:09.380,Small


By default, `set_index()` returns a new DataFrame, so you’ll have to specify if you’d like the changes to occur in place.

Let’s create a many levels index for `filtered_df_2` DataFrame

In [208]:
# set 'id' & 'type' as index
eqPastMonth_multi = eqPastMonth.set_index(['id','type'])[["latitude","longitude","depth","mag", "place"]]
eqPastMonth_multi.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,latitude,longitude,depth,mag,place
id,type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
nc73524166,earthquake,35.763168,-120.329834,9.28,1.91,"6km NNW of Cholame, CA"
nc73524156,earthquake,38.819332,-122.763664,1.74,1.18,"4km W of Cobb, CA"
ci39553703,earthquake,33.598667,-116.8075,5.88,0.86,"13km WNW of Anza, CA"
us6000diah,earthquake,-17.6378,167.511,10.0,5.0,Vanuatu
nn00800559,earthquake,37.2757,-114.907,16.6,1.2,"24 km ESE of Alamo, Nevada"
us6000diae,earthquake,38.4489,22.0086,10.0,5.5,"16 km N of Kamárai, Greece"
nc73524141,earthquake,38.835834,-122.786835,1.84,0.91,"6km WNW of Cobb, CA"
us6000diaa,earthquake,-25.0984,-71.0266,10.0,4.4,"64 km WNW of Taltal, Chile"
ak02127fq4w8,earthquake,59.8278,-152.2928,78.8,2.1,"26 km WNW of Anchor Point, Alaska"
pr2021048001,earthquake,18.7791,-66.1428,29.0,2.96,"34 km N of San Juan, Puerto Rico"


and see the type of `eqPastMonth_multi.index()`

In [209]:
print ('type: ', type(eqPastMonth_multi.index))

type:  <class 'pandas.core.indexes.multi.MultiIndex'>


Thus, we get a new pandas class MultiIndex, which contains information about indexing of DataFrame and allows manipulating with this data. It’s interesting what is the type of `filtered_df_2.index()`?

You can get levels, labels and names values simply address it as to an attribute

### Selection by label and position
[[back to top]](#Table-of-Contents)

After reading previous three subparagraphs probably you have the question: Ok, I know now filter a DataFrame, how make it multi-indexed, but I don’t know how select any specific row in the table.
Object selection in pandas is now supported by two types of multi-axis indexing.

* `.loc` works on labels in the index;
* `.iloc` works on the positions in the index (so it only takes integers);

    
The sequence of the following examples demonstrates how we can manipulate with DataFrame’s rows.
At first let’s get the first row of equakes in the past month.

In [210]:
#To return a single record(i.e. row), in this case the first one.
eqPastMonth.loc[0]

time                 2021-02-17T04:25:59.180Z
latitude                            35.763168
longitude                         -120.329834
depth                                    9.28
mag                                      1.91
magType                                    md
nst                                      17.0
gap                                      90.0
dmin                                  0.01278
rms                                      0.08
net                                        nc
id                                 nc73524166
updated              2021-02-17T04:27:35.667Z
place                  6km NNW of Cholame, CA
type                               earthquake
horizontalError                          0.38
depthError                               0.52
magError                                 0.16
magNst                                    4.0
status                              automatic
locationSource                             nc
magSource                         

and rows from 1 to 3 (pay attention on setting of ranges in `.loc`, the right boundary is included to this range which IS different than Python lists and string data structures)

In [211]:
eqPastMonth.loc[1:3]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large


As you can see the first argument of `.loc` corresponds to index name. If you want return value of specific column(s), you should to define the name of this(these) column(s)

In [212]:
eqPastMonth.loc[0, 'place']

'6km NNW of Cholame, CA'

In [213]:
eqPastMonth.loc[:, ['place', 'mag']].head()

Unnamed: 0,place,mag
0,"6km NNW of Cholame, CA",1.91
1,"4km W of Cobb, CA",1.18
2,"13km WNW of Anza, CA",0.86
3,Vanuatu,5.0
4,"24 km ESE of Alamo, Nevada",1.2


Let’s repeat that the first argument of `.loc` is not row number but name of the index for this row

But if it is necessary to obtain rows by it number you may use `.iloc`

In [214]:
eqPastMonth.iloc[0]

time                 2021-02-17T04:25:59.180Z
latitude                            35.763168
longitude                         -120.329834
depth                                    9.28
mag                                      1.91
magType                                    md
nst                                      17.0
gap                                      90.0
dmin                                  0.01278
rms                                      0.08
net                                        nc
id                                 nc73524166
updated              2021-02-17T04:27:35.667Z
place                  6km NNW of Cholame, CA
type                               earthquake
horizontalError                          0.38
depthError                               0.52
magError                                 0.16
magNst                                    4.0
status                              automatic
locationSource                             nc
magSource                         

In [215]:
eqPastMonth.iloc[1:5]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn,2021-02-17 03:36:18.420,Small


In the first case column’s number coincides with its name. The second example demonstrates the difference between `.loc` and `.iloc`

In [216]:
eqPastMonth.loc[1:5]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn,2021-02-17 03:36:18.420,Small
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740,Large


### Work with missing data

[[back to top]](#Table-of-Contents)

Pandas primarily uses the value `np.nan` to represent missing data (in table missed/empty value are marked by `NaN`). It is by default not included in computations. Missing data creates many issues at mathematical or computational tasks with DataFrames and Series and it’s important to know how fight with these values.

Previously we have learned how to check `null` and `non-null` values in the DataFrame and Series and how to miss `null` row in the table. But what to do if we need to use rows with `null` data, for example, find sum of all values in the dataset?

Let’s try do this


In [217]:
magError = eqPastMonth['magError']
sum(magError)

nan

The result is unexpected because there many `non-null` values in `eqPastMonth['magError']` Series. Sure, we could filter `magError['magError']`  and remain only `non-null` values. But what if we need sum all numerical values in `magError`? This way will be powerless or too complicated, because we will drop all row items even there is only one `null` value in this row. You can try to do this yourself.

To solve the assigned task you may use an elegant pandas method `fillna(value)`, which replace all `null` values by value.


In [218]:
magError = eqPastMonth['magError'] .fillna(0)
sum(magError)

1793.8011018308189

In [219]:
eqPastMonth_fillna = eqPastMonth.fillna(0)
eqPastMonth_fillna.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
0,2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc,2021-02-17 04:25:59.180,Small
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,0.0,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,0.0,1.7,0.0,0.0,automatic,nn,nn,2021-02-17 03:36:18.420,Small
5,2021-02-17T03:36:07.740Z,38.4489,22.0086,10.0,5.5,mww,0.0,39.0,0.626,0.84,us,us6000diae,2021-02-17T04:38:13.265Z,"16 km N of Kamárai, Greece",earthquake,4.8,1.9,0.047,44.0,reviewed,us,us,2021-02-17 03:36:07.740,Large
6,2021-02-17T03:27:43.060Z,38.835834,-122.786835,1.84,0.91,md,22.0,73.0,0.003758,0.04,nc,nc73524141,2021-02-17T03:43:04.287Z,"6km WNW of Cobb, CA",earthquake,0.22,0.39,0.13,3.0,automatic,nc,nc,2021-02-17 03:27:43.060,Small
7,2021-02-17T03:25:29.491Z,-25.0984,-71.0266,10.0,4.4,mb,0.0,169.0,0.735,0.56,us,us6000diaa,2021-02-17T03:37:29.040Z,"64 km WNW of Taltal, Chile",earthquake,4.5,2.0,0.24,5.0,reviewed,us,us,2021-02-17 03:25:29.491,Small
8,2021-02-17T03:24:15.431Z,59.8278,-152.2928,78.8,2.1,ml,0.0,0.0,0.0,0.44,ak,ak02127fq4w8,2021-02-17T03:31:19.831Z,"26 km WNW of Anchor Point, Alaska",earthquake,0.0,0.8,0.0,0.0,automatic,ak,ak,2021-02-17 03:24:15.431,Small
9,2021-02-17T02:58:09.380Z,18.7791,-66.1428,29.0,2.96,md,19.0,264.0,0.4738,0.22,pr,pr2021048001,2021-02-17T03:31:00.983Z,"34 km N of San Juan, Puerto Rico",earthquake,0.89,3.56,0.13,17.0,reviewed,pr,pr,2021-02-17 02:58:09.380,Small


Thus, we replace all `NaN` items to `0`. If `inplace=True` in `fillna()` method, then a DataFrame renew.
   
To remain only rows with `non-null` values you can use method `dropna()`

In [220]:
eqPastMonth_fillna = eqPastMonth.dropna(0)
eqPastMonth_fillna.head(10)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
0,2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc,2021-02-17 04:25:59.180,Small
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
6,2021-02-17T03:27:43.060Z,38.835834,-122.786835,1.84,0.91,md,22.0,73.0,0.003758,0.04,nc,nc73524141,2021-02-17T03:43:04.287Z,"6km WNW of Cobb, CA",earthquake,0.22,0.39,0.13,3.0,automatic,nc,nc,2021-02-17 03:27:43.060,Small
9,2021-02-17T02:58:09.380Z,18.7791,-66.1428,29.0,2.96,md,19.0,264.0,0.4738,0.22,pr,pr2021048001,2021-02-17T03:31:00.983Z,"34 km N of San Juan, Puerto Rico",earthquake,0.89,3.56,0.13,17.0,reviewed,pr,pr,2021-02-17 02:58:09.380,Small
10,2021-02-17T02:57:34.400Z,32.687667,-115.837,7.38,1.58,ml,23.0,129.0,0.08231,0.23,ci,ci39553679,2021-02-17T03:11:58.278Z,"16km ESE of Ocotillo, CA",earthquake,0.48,1.02,0.154,18.0,automatic,ci,ci,2021-02-17 02:57:34.400,Small
14,2021-02-17T02:28:56.390Z,17.9248,-66.889,12.0,2.41,md,10.0,221.0,0.1615,0.08,pr,pr2021048000,2021-02-17T02:47:43.624Z,"5 km SSE of Guánica, Puerto Rico",earthquake,0.51,0.49,0.05,6.0,reviewed,pr,pr,2021-02-17 02:28:56.390,Small
21,2021-02-17T02:02:34.040Z,38.837502,-122.80883,1.76,0.84,md,16.0,51.0,0.01412,0.03,nc,nc73524121,2021-02-17T02:04:10.070Z,"8km WNW of Cobb, CA",earthquake,0.23,0.54,0.06,2.0,automatic,nc,nc,2021-02-17 02:02:34.040,Small
23,2021-02-17T01:55:03.670Z,38.821667,-122.765167,1.91,1.25,md,26.0,118.0,0.009541,0.03,nc,nc73524106,2021-02-17T02:41:05.895Z,"4km W of Cobb, CA",earthquake,0.24,0.37,0.05,6.0,automatic,nc,nc,2021-02-17 01:55:03.670,Small
24,2021-02-17T01:54:56.420Z,38.824501,-122.764336,2.03,0.47,md,9.0,125.0,0.007383,0.01,nc,nc73524101,2021-02-17T02:29:05.824Z,"4km W of Cobb, CA",earthquake,0.55,1.14,0.32,2.0,automatic,nc,nc,2021-02-17 01:54:56.420,Small


We can manipulate by `null` values and columns using parameters subset and how to set analyzing columns and type of analysis respectively

> ### Exercise 1

> - Get type of `“latitude”` column in `eqPastMonth`. 

> - In `eqPastMonth` find all rows where `magType` corresponds to the value `"md"` and where `mag` is less `5` and `not-null` `magError`. Call the obtained DataFrmae as `eqPastMonth_md_large`. 

In [221]:
# type your code here


In [222]:
eqPastMonth['datetime'] = eqPastMonth['time'].apply(lambda x: (datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')))
eqPastMonth.head(5)

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,datetime,custom_mag
0,2021-02-17T04:25:59.180Z,35.763168,-120.329834,9.28,1.91,md,17.0,90.0,0.01278,0.08,nc,nc73524166,2021-02-17T04:27:35.667Z,"6km NNW of Cholame, CA",earthquake,0.38,0.52,0.16,4.0,automatic,nc,nc,2021-02-17 04:25:59.180,Small
1,2021-02-17T04:23:30.820Z,38.819332,-122.763664,1.74,1.18,md,28.0,102.0,0.01215,0.04,nc,nc73524156,2021-02-17T04:39:07.637Z,"4km W of Cobb, CA",earthquake,0.22,0.34,0.16,5.0,automatic,nc,nc,2021-02-17 04:23:30.820,Small
2,2021-02-17T04:20:41.210Z,33.598667,-116.8075,5.88,0.86,ml,34.0,68.0,0.03656,0.16,ci,ci39553703,2021-02-17T04:24:30.300Z,"13km WNW of Anza, CA",earthquake,0.21,0.5,0.143,26.0,automatic,ci,ci,2021-02-17 04:20:41.210,Small
3,2021-02-17T03:58:33.367Z,-17.6378,167.511,10.0,5.0,mb,,70.0,4.206,0.79,us,us6000diah,2021-02-17T04:16:00.040Z,Vanuatu,earthquake,7.5,1.9,0.127,20.0,reviewed,us,us,2021-02-17 03:58:33.367,Large
4,2021-02-17T03:36:18.420Z,37.2757,-114.907,16.6,1.2,ml,9.0,94.16,0.177,0.24,nn,nn00800559,2021-02-17T03:44:17.647Z,"24 km ESE of Alamo, Nevada",earthquake,,1.7,,,automatic,nn,nn,2021-02-17 03:36:18.420,Small


In [223]:
eqPastMonth.dtypes

time                       object
latitude                  float64
longitude                 float64
depth                     float64
mag                       float64
magType                    object
nst                       float64
gap                       float64
dmin                      float64
rms                       float64
net                        object
id                         object
updated                    object
place                      object
type                       object
horizontalError           float64
depthError                float64
magError                  float64
magNst                    float64
status                     object
locationSource             object
magSource                  object
datetime           datetime64[ns]
custom_mag                 object
dtype: object