In [1]:
import pandas as pd

## Importing the data file

In [2]:
world_earthquakes = pd.read_csv('../data/world_earthquakes.csv')
world_earthquakes.head()

Unnamed: 0,date,country,latitude,longitude,depth,magnitude,secondary_effects,pde_shaking_deaths,pde_total_deaths,utsu_total_deaths,em_dat_total_deaths,others_source_deaths
0,1900-05-11 17:23,Japan,38.7,141.1,5.0,7.0 MJMA,,,,,,
1,1900-07-12 06:25,Turkey,40.3,43.1,,5.9 Muk,,,,140.0,,
2,1900-10-29 09:11,Venezuela,11.0,-66.0,0.0,7.7 Mw,,,,,,
3,1901-02-15 00:00,China,26.0,100.1,0.0,6.5 Ms,,,,,,
4,1901-03-31 07:11,Bulgaria,43.4,28.7,,6.4 Muk,,,,4.0,,


## Cleanup the date column

It looks like the date column is pretty consistent in the format: `yyyy-mm-dd hh:mm`. 
- Let's convert it into `datetime64`
- Let's split this further into `year`, `month`, `day`, `time` columns for more analysis options.

In [3]:
# Converting the original date column from string to datetime
world_earthquakes["date"] = world_earthquakes["date"].astype("datetime64")
world_earthquakes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1340 entries, 0 to 1339
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   date                  1340 non-null   datetime64[ns]
 1   country               1340 non-null   object        
 2   latitude              1325 non-null   object        
 3   longitude             1325 non-null   object        
 4   depth                 1250 non-null   object        
 5   magnitude             1339 non-null   object        
 6   secondary_effects     373 non-null    object        
 7   pde_shaking_deaths    738 non-null    float64       
 8   pde_total_deaths      749 non-null    float64       
 9   utsu_total_deaths     1027 non-null   float64       
 10  em_dat_total_deaths   559 non-null    object        
 11  others_source_deaths  37 non-null     object        
dtypes: datetime64[ns](1), float64(3), object(8)
memory usage: 125.8+ KB


Now, let's extract each parts: `year`, `month`, `day`, `time`, and `timestamp`

In [4]:
world_earthquakes['year'] = world_earthquakes['date'].dt.year
world_earthquakes['month'] = world_earthquakes['date'].dt.month
world_earthquakes['day'] = world_earthquakes['date'].dt.day
world_earthquakes['time'] = world_earthquakes['date'].dt.time

world_earthquakes.head()

Unnamed: 0,date,country,latitude,longitude,depth,magnitude,secondary_effects,pde_shaking_deaths,pde_total_deaths,utsu_total_deaths,em_dat_total_deaths,others_source_deaths,year,month,day,time
0,1900-05-11 17:23:00,Japan,38.7,141.1,5.0,7.0 MJMA,,,,,,,1900,5,11,17:23:00
1,1900-07-12 06:25:00,Turkey,40.3,43.1,,5.9 Muk,,,,140.0,,,1900,7,12,06:25:00
2,1900-10-29 09:11:00,Venezuela,11.0,-66.0,0.0,7.7 Mw,,,,,,,1900,10,29,09:11:00
3,1901-02-15 00:00:00,China,26.0,100.1,0.0,6.5 Ms,,,,,,,1901,2,15,00:00:00
4,1901-03-31 07:11:00,Bulgaria,43.4,28.7,,6.4 Muk,,,,4.0,,,1901,3,31,07:11:00


Finally, let's re-order the columns to have the date parts together

In [5]:
# Re-ordering the columns in the right order
world_earthquakes = world_earthquakes.reindex(columns=['date','year','month','day','time','country','latitude','longitude','depth','magnitude','secondary_effects','pde_shaking_deaths','pde_total_deaths','utsu_total_deaths','em_dat_total_deaths','others_source_death'])
world_earthquakes.head()

Unnamed: 0,date,year,month,day,time,country,latitude,longitude,depth,magnitude,secondary_effects,pde_shaking_deaths,pde_total_deaths,utsu_total_deaths,em_dat_total_deaths,others_source_death
0,1900-05-11 17:23:00,1900,5,11,17:23:00,Japan,38.7,141.1,5.0,7.0 MJMA,,,,,,
1,1900-07-12 06:25:00,1900,7,12,06:25:00,Turkey,40.3,43.1,,5.9 Muk,,,,140.0,,
2,1900-10-29 09:11:00,1900,10,29,09:11:00,Venezuela,11.0,-66.0,0.0,7.7 Mw,,,,,,
3,1901-02-15 00:00:00,1901,2,15,00:00:00,China,26.0,100.1,0.0,6.5 Ms,,,,,,
4,1901-03-31 07:11:00,1901,3,31,07:11:00,Bulgaria,43.4,28.7,,6.4 Muk,,,,4.0,,
