###  Earthquake analysis part 2

### Data Analysis and Pre-processing

This Python code imports several important libraries and modules for data analysis and manipulation:

pandas is a popular library for data manipulation and analysis in Python.
json is a built-in module for working with JSON data, which is commonly used in web applications and APIs.
datetime is a module in the Python standard library that provides methods for working with dates and times.

In [1]:
import pandas as pd;
import json
from datetime import datetime

Retrieve Data

This code reads the contents of a JSON file named "data.json" using the "open()" function in read mode ('r') and saves the loaded data into the "data" variable using the "json.load()" method. The "with" statement is used to ensure that the file is closed properly after it is read.  Once the date is retrieved,closing the file should be executed to avoid exceptions.

In [2]:
with open('data.json', 'r') as file:
    data = json.load(file)

file.close()

Before analysing the data, it should be converted to a pandas dataframe as it is a convenient way to work with nested or hierarchical JSON data, where each where each element can have multiple levels of nesting. In the earthquake data, the whole data is inside the key 'features'. And so we are using that key and take the whole data which is saved to the variable 'df'. The converted data is printed in the next line.

In [3]:
df = pd.json_normalize(data['features'])
df

Unnamed: 0,type,id,properties.mag,properties.place,properties.time,properties.updated,properties.tz,properties.url,properties.detail,properties.felt,...,properties.types,properties.nst,properties.dmin,properties.rms,properties.gap,properties.magType,properties.type,properties.title,geometry.type,geometry.coordinates
0,Feature,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,1679774696230,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",focal-mechanism,nearby-cities,origin,phase-da...",59.0,0.01177,0.19,30.0,ml,earthquake,"M 1.8 - 5km WNW of Borrego Springs, CA",Point,"[-116.4311676, 33.267334, 4.56]"
1,Feature,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,1679774210394,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",nearby-cities,origin,scitech-link,",8.0,0.40720,0.18,189.0,ml,earthquake,"M 2.1 - 1km SSW of Santee, CA",Point,"[-116.9778333, 32.8308333, 5.19]"
2,Feature,ak0233v8qay0,2.00,Southern Alaska,1679773586327,1679773719021,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",origin,phase-data,",,,0.57,,ml,earthquake,M 2.0 - Southern Alaska,Point,"[-147.0013, 61.2602, 0]"
3,Feature,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,1679771562802,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",origin,phase-data,",,,0.54,,ml,earthquake,"M 1.2 - 33 km NNE of Four Mile Road, Alaska",Point,"[-148.8985, 64.89, 16.7]"
4,Feature,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,1679772243330,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",origin,phase-data,",6.0,0.16040,0.07,267.0,md,earthquake,"M 3.2 - 17 km NNW of Cruz Bay, U.S. Virgin Isl...",Point,"[-64.826, 18.4843333333333, 75.75]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,Feature,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,1677811394893,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",origin,phase-data,",,,0.35,,ml,earthquake,M 2.1 - Alaska Peninsula,Point,"[-155.3965, 58.1839, 1.6]"
12459,Feature,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,1677722441196,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",origin,phase-data,",,,0.17,,ml,earthquake,"M 1.6 - 39 km W of Salamatof, Alaska",Point,"[-152.0289, 60.6776, 82.4]"
12460,Feature,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,1677612340130,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",origin,phase-data,",42.0,,0.12,140.0,md,earthquake,"M 2.2 - 9 km ENE of Pāhala, Hawaii",Point,"[-155.395166666667, 19.2408333333333, 30.91]"
12461,Feature,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,1677190347330,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/earthquakes/feed/v...,,...,",origin,phase-data,",7.0,0.04425,0.19,199.0,ml,earthquake,"M 1.2 - 16 km S of Harrah, Washington",Point,"[-120.57866666666666, 46.25533333333333, 12.96]"


### Data Pre processing 

In the below cell, different functions are written to perform the below tasks:

*   Convert the co-ordinates to List and create a new column for latitude and longitude and Depth.
*   Remove Unused columns.
*   Convert the event occurred date from milliseconds to UTC date format.
*   Check the number of duplicate records.
*   Perform encoding for the tsunami column as categorical data will be better for visualization purpose.
*   Drop the existing tsunami column








In [4]:
def split_coordinates(data):
    # Convert the co-ordinates to List and create a new column for latitude and longitude and Depth
    df[['Longitude', 'Latitude','Depth']] = pd.DataFrame(df['geometry.coordinates'].tolist(),
                                                            index=df.index)
    return df

# Columns such as url, details of earthquake and felt are found to be not required for analysis as they do not provide any meaningful insights.
def drop_unused_columns(df):
    df.drop(['geometry.coordinates'], axis=1, inplace=True)
    df.drop(['properties.url'], axis=1, inplace=True)
    df.drop(['properties.detail'], axis=1, inplace=True)
    df.drop(['properties.felt'], axis=1, inplace=True)
    return df

# milliseconds to date  conversion
def dateConversion(df):
  df['date'] = pd.to_datetime(df['properties.time'], unit='ms')
  df['date'] = df['date'].dt.date
  return df

def check_for_duplicate_records(earthquake_df):
    # Display the Number of Duplicate records in the dataframe
    print(f"There are {earthquake_df.duplicated().sum()} duplicate records found in inputdataframe.")

def create_new_column_for_tsunami(predicted_dataframe):
    # Raw data has a column "flg tsunami" having value 'tsunami' if the earthquake or volcano triggered tsunami or not
    # For making it easy for visualization purpose,create a new column with values as 'Yes' or 'No'
    predicted_dataframe.loc[predicted_dataframe['properties.tsunami'] == 1, 'tsunami_occurred'] = 'Yes'
    predicted_dataframe.loc[predicted_dataframe['properties.tsunami'] != 1, 'tsunami_occurred'] = 'No'
    # After creating a new column,drop the existing 'tsunami' column from raw data
    predicted_dataframe.drop(['properties.tsunami'], axis=1, inplace=True)
    # Drop the existing ID column as it will throw exception while merging with other dataframes when they have same keys
    # predicted_dataframe.drop(['id'], axis=1, inplace=True)

    return predicted_dataframe

Function Call to split the co-ordinates and drop the unused columns

In [5]:
split_coordinates(df);
drop_unused_columns(df)

Unnamed: 0,type,id,properties.mag,properties.place,properties.time,properties.updated,properties.tz,properties.cdi,properties.mmi,properties.alert,...,properties.dmin,properties.rms,properties.gap,properties.magType,properties.type,properties.title,geometry.type,Longitude,Latitude,Depth
0,Feature,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,1679774696230,,,,,...,0.01177,0.19,30.0,ml,earthquake,"M 1.8 - 5km WNW of Borrego Springs, CA",Point,-116.431168,33.267334,4.56
1,Feature,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,1679774210394,,,,,...,0.40720,0.18,189.0,ml,earthquake,"M 2.1 - 1km SSW of Santee, CA",Point,-116.977833,32.830833,5.19
2,Feature,ak0233v8qay0,2.00,Southern Alaska,1679773586327,1679773719021,,,,,...,,0.57,,ml,earthquake,M 2.0 - Southern Alaska,Point,-147.001300,61.260200,0.00
3,Feature,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,1679771562802,,,,,...,,0.54,,ml,earthquake,"M 1.2 - 33 km NNE of Four Mile Road, Alaska",Point,-148.898500,64.890000,16.70
4,Feature,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,1679772243330,,,,,...,0.16040,0.07,267.0,md,earthquake,"M 3.2 - 17 km NNW of Cruz Bay, U.S. Virgin Isl...",Point,-64.826000,18.484333,75.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,Feature,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,1677811394893,,,,,...,,0.35,,ml,earthquake,M 2.1 - Alaska Peninsula,Point,-155.396500,58.183900,1.60
12459,Feature,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,1677722441196,,,,,...,,0.17,,ml,earthquake,"M 1.6 - 39 km W of Salamatof, Alaska",Point,-152.028900,60.677600,82.40
12460,Feature,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,1677612340130,,,,,...,,0.12,140.0,md,earthquake,"M 2.2 - 9 km ENE of Pāhala, Hawaii",Point,-155.395167,19.240833,30.91
12461,Feature,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,1677190347330,,,,,...,0.04425,0.19,199.0,ml,earthquake,"M 1.2 - 16 km S of Harrah, Washington",Point,-120.578667,46.255333,12.96


Inorder to gather more insights from data, it is important to know the datatype of the columns available

In [6]:
print(df.dtypes)

type                   object
id                     object
properties.mag        float64
properties.place       object
properties.time         int64
properties.updated      int64
properties.tz          object
properties.cdi        float64
properties.mmi        float64
properties.alert       object
properties.status      object
properties.tsunami      int64
properties.sig          int64
properties.net         object
properties.code        object
properties.ids         object
properties.sources     object
properties.types       object
properties.nst        float64
properties.dmin       float64
properties.rms        float64
properties.gap        float64
properties.magType     object
properties.type        object
properties.title       object
geometry.type          object
Longitude             float64
Latitude              float64
Depth                 float64
dtype: object


Function call to convert the date in milliseconds format to UTC format.

In [7]:
dateConversion(df)

Unnamed: 0,type,id,properties.mag,properties.place,properties.time,properties.updated,properties.tz,properties.cdi,properties.mmi,properties.alert,...,properties.rms,properties.gap,properties.magType,properties.type,properties.title,geometry.type,Longitude,Latitude,Depth,date
0,Feature,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,1679774696230,,,,,...,0.19,30.0,ml,earthquake,"M 1.8 - 5km WNW of Borrego Springs, CA",Point,-116.431168,33.267334,4.56,2023-03-25
1,Feature,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,1679774210394,,,,,...,0.18,189.0,ml,earthquake,"M 2.1 - 1km SSW of Santee, CA",Point,-116.977833,32.830833,5.19,2023-03-25
2,Feature,ak0233v8qay0,2.00,Southern Alaska,1679773586327,1679773719021,,,,,...,0.57,,ml,earthquake,M 2.0 - Southern Alaska,Point,-147.001300,61.260200,0.00,2023-03-25
3,Feature,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,1679771562802,,,,,...,0.54,,ml,earthquake,"M 1.2 - 33 km NNE of Four Mile Road, Alaska",Point,-148.898500,64.890000,16.70,2023-03-25
4,Feature,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,1679772243330,,,,,...,0.07,267.0,md,earthquake,"M 3.2 - 17 km NNW of Cruz Bay, U.S. Virgin Isl...",Point,-64.826000,18.484333,75.75,2023-03-25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,Feature,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,1677811394893,,,,,...,0.35,,ml,earthquake,M 2.1 - Alaska Peninsula,Point,-155.396500,58.183900,1.60,2023-02-23
12459,Feature,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,1677722441196,,,,,...,0.17,,ml,earthquake,"M 1.6 - 39 km W of Salamatof, Alaska",Point,-152.028900,60.677600,82.40,2023-02-23
12460,Feature,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,1677612340130,,,,,...,0.12,140.0,md,earthquake,"M 2.2 - 9 km ENE of Pāhala, Hawaii",Point,-155.395167,19.240833,30.91,2023-02-23
12461,Feature,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,1677190347330,,,,,...,0.19,199.0,ml,earthquake,"M 1.2 - 16 km S of Harrah, Washington",Point,-120.578667,46.255333,12.96,2023-02-23



### Extract City Names

The place where the event occurred is provided in the 'properties.place' column. But it has additional details such as the distance from the main city. In order to obtain the city details, lambda expression is implemented to split the city name from the 'properties.place' column

In [8]:
unique_names = df['properties.place'].value_counts()
df['place_name'] = df['properties.place'].apply(lambda x: x.split(',')[-1])
df

Unnamed: 0,type,id,properties.mag,properties.place,properties.time,properties.updated,properties.tz,properties.cdi,properties.mmi,properties.alert,...,properties.gap,properties.magType,properties.type,properties.title,geometry.type,Longitude,Latitude,Depth,date,place_name
0,Feature,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,1679774696230,,,,,...,30.0,ml,earthquake,"M 1.8 - 5km WNW of Borrego Springs, CA",Point,-116.431168,33.267334,4.56,2023-03-25,CA
1,Feature,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,1679774210394,,,,,...,189.0,ml,earthquake,"M 2.1 - 1km SSW of Santee, CA",Point,-116.977833,32.830833,5.19,2023-03-25,CA
2,Feature,ak0233v8qay0,2.00,Southern Alaska,1679773586327,1679773719021,,,,,...,,ml,earthquake,M 2.0 - Southern Alaska,Point,-147.001300,61.260200,0.00,2023-03-25,Southern Alaska
3,Feature,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,1679771562802,,,,,...,,ml,earthquake,"M 1.2 - 33 km NNE of Four Mile Road, Alaska",Point,-148.898500,64.890000,16.70,2023-03-25,Alaska
4,Feature,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,1679772243330,,,,,...,267.0,md,earthquake,"M 3.2 - 17 km NNW of Cruz Bay, U.S. Virgin Isl...",Point,-64.826000,18.484333,75.75,2023-03-25,U.S. Virgin Islands
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,Feature,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,1677811394893,,,,,...,,ml,earthquake,M 2.1 - Alaska Peninsula,Point,-155.396500,58.183900,1.60,2023-02-23,Alaska Peninsula
12459,Feature,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,1677722441196,,,,,...,,ml,earthquake,"M 1.6 - 39 km W of Salamatof, Alaska",Point,-152.028900,60.677600,82.40,2023-02-23,Alaska
12460,Feature,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,1677612340130,,,,,...,140.0,md,earthquake,"M 2.2 - 9 km ENE of Pāhala, Hawaii",Point,-155.395167,19.240833,30.91,2023-02-23,Hawaii
12461,Feature,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,1677190347330,,,,,...,199.0,ml,earthquake,"M 1.2 - 16 km S of Harrah, Washington",Point,-120.578667,46.255333,12.96,2023-02-23,Washington


In [9]:
check_for_duplicate_records(df)

There are 0 duplicate records found in inputdataframe.


In [10]:
create_new_column_for_tsunami(df)

Unnamed: 0,type,id,properties.mag,properties.place,properties.time,properties.updated,properties.tz,properties.cdi,properties.mmi,properties.alert,...,properties.magType,properties.type,properties.title,geometry.type,Longitude,Latitude,Depth,date,place_name,tsunami_occurred
0,Feature,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,1679774696230,,,,,...,ml,earthquake,"M 1.8 - 5km WNW of Borrego Springs, CA",Point,-116.431168,33.267334,4.56,2023-03-25,CA,No
1,Feature,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,1679774210394,,,,,...,ml,earthquake,"M 2.1 - 1km SSW of Santee, CA",Point,-116.977833,32.830833,5.19,2023-03-25,CA,No
2,Feature,ak0233v8qay0,2.00,Southern Alaska,1679773586327,1679773719021,,,,,...,ml,earthquake,M 2.0 - Southern Alaska,Point,-147.001300,61.260200,0.00,2023-03-25,Southern Alaska,No
3,Feature,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,1679771562802,,,,,...,ml,earthquake,"M 1.2 - 33 km NNE of Four Mile Road, Alaska",Point,-148.898500,64.890000,16.70,2023-03-25,Alaska,No
4,Feature,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,1679772243330,,,,,...,md,earthquake,"M 3.2 - 17 km NNW of Cruz Bay, U.S. Virgin Isl...",Point,-64.826000,18.484333,75.75,2023-03-25,U.S. Virgin Islands,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,Feature,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,1677811394893,,,,,...,ml,earthquake,M 2.1 - Alaska Peninsula,Point,-155.396500,58.183900,1.60,2023-02-23,Alaska Peninsula,No
12459,Feature,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,1677722441196,,,,,...,ml,earthquake,"M 1.6 - 39 km W of Salamatof, Alaska",Point,-152.028900,60.677600,82.40,2023-02-23,Alaska,No
12460,Feature,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,1677612340130,,,,,...,md,earthquake,"M 2.2 - 9 km ENE of Pāhala, Hawaii",Point,-155.395167,19.240833,30.91,2023-02-23,Hawaii,No
12461,Feature,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,1677190347330,,,,,...,ml,earthquake,"M 1.2 - 16 km S of Harrah, Washington",Point,-120.578667,46.255333,12.96,2023-02-23,Washington,No


This code defines a function called "earthquakeMagnitude" that takes a pandas DataFrame object named "predicted_dataframe" as an argument. This function maps the abbreviation of different types of earthquake magnitudes to their full names in the DataFrame using the "loc" method and conditional statements. The mapped magnitude names are stored in a new column named "magnitude" in the DataFrame. The function returns the updated DataFrame. Finally, the function is called with a pandas DataFrame named "df" as an argument.

In [11]:
# MD: Duration magnitude
# ML: Local magnitude
# MS: Surface wave magnitude
# MW: Moment magnitude
# ME: Energy magnitude
# MI: Nuttli magnitude
# MB: Body-wave magnitude
# Mlg: Gutenberg-Richter magnitude
# md ml ms mw me mi mb mlg abbreviation


def earthquakeMagnitude(predicted_dataframe):
    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'md', 'magnitude'] = 'Duration magnitude'
    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'ml', 'magnitude'] = 'Local magnitude'

    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'ms', 'magnitude'] = 'Surface wave magnitude'

    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'mw', 'magnitude'] = 'Moment magnitude'

    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'me', 'magnitude'] = 'Energy magnitude'

    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'mi', 'magnitude'] = 'Nuttli magnitude'

    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'mb', 'magnitude'] = 'Body-wave magnitude'

    predicted_dataframe.loc[predicted_dataframe['properties.magType'] == 'mlg', 'magnitude'] = 'Gutenberg-Richter magnitude'

    return predicted_dataframe
earthquakeMagnitude(df)

Unnamed: 0,type,id,properties.mag,properties.place,properties.time,properties.updated,properties.tz,properties.cdi,properties.mmi,properties.alert,...,properties.type,properties.title,geometry.type,Longitude,Latitude,Depth,date,place_name,tsunami_occurred,magnitude
0,Feature,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,1679774696230,,,,,...,earthquake,"M 1.8 - 5km WNW of Borrego Springs, CA",Point,-116.431168,33.267334,4.56,2023-03-25,CA,No,Local magnitude
1,Feature,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,1679774210394,,,,,...,earthquake,"M 2.1 - 1km SSW of Santee, CA",Point,-116.977833,32.830833,5.19,2023-03-25,CA,No,Local magnitude
2,Feature,ak0233v8qay0,2.00,Southern Alaska,1679773586327,1679773719021,,,,,...,earthquake,M 2.0 - Southern Alaska,Point,-147.001300,61.260200,0.00,2023-03-25,Southern Alaska,No,Local magnitude
3,Feature,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,1679771562802,,,,,...,earthquake,"M 1.2 - 33 km NNE of Four Mile Road, Alaska",Point,-148.898500,64.890000,16.70,2023-03-25,Alaska,No,Local magnitude
4,Feature,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,1679772243330,,,,,...,earthquake,"M 3.2 - 17 km NNW of Cruz Bay, U.S. Virgin Isl...",Point,-64.826000,18.484333,75.75,2023-03-25,U.S. Virgin Islands,No,Duration magnitude
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,Feature,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,1677811394893,,,,,...,earthquake,M 2.1 - Alaska Peninsula,Point,-155.396500,58.183900,1.60,2023-02-23,Alaska Peninsula,No,Local magnitude
12459,Feature,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,1677722441196,,,,,...,earthquake,"M 1.6 - 39 km W of Salamatof, Alaska",Point,-152.028900,60.677600,82.40,2023-02-23,Alaska,No,Local magnitude
12460,Feature,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,1677612340130,,,,,...,earthquake,"M 2.2 - 9 km ENE of Pāhala, Hawaii",Point,-155.395167,19.240833,30.91,2023-02-23,Hawaii,No,Duration magnitude
12461,Feature,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,1677190347330,,,,,...,earthquake,"M 1.2 - 16 km S of Harrah, Washington",Point,-120.578667,46.255333,12.96,2023-02-23,Washington,No,Local magnitude


This code drops multiple columns from a pandas DataFrame object named "df". The "drop()" method is used with the "inplace=True" parameter to remove the specified columns from the DataFrame permanently.

In [12]:
df.drop(['properties.tz'], axis=1, inplace=True)
df.drop(['properties.cdi'], axis=1, inplace=True)
df.drop(['properties.mmi'], axis=1, inplace=True)
df.drop(['properties.alert'], axis=1, inplace=True)
df.drop(['properties.updated'], axis=1, inplace=True)
df.drop(['properties.net'], axis=1, inplace=True)
df.drop(['properties.code'], axis=1, inplace=True)
df.drop(['properties.ids'], axis=1, inplace=True)
df.drop(['type'], axis=1, inplace=True)
df.drop(['properties.sources'], axis=1, inplace=True)
df.drop(['properties.magType'], axis=1, inplace=True)
df.drop(['properties.title'], axis=1, inplace=True)

Below it calculates the total number of rows in a pandas DataFrame object named "df" using the built-in "len()" function. The result is stored in the variable "num_rows". The print statement then displays the total number of rows

In [13]:
# get the total number of rows
num_rows = len(df)

# print the result
print("Total number of rows:", num_rows)

Total number of rows: 12463


counts the number of missing or null values in each column of a pandas. The resulting Series object containing the count of missing values for each column is stored in the variable "missing_values".

In [14]:
# count the number of missing values in each column
missing_values = df.isnull().sum()

# print the result
print(missing_values)

id                      0
properties.mag          0
properties.place        0
properties.time         0
properties.status       0
properties.sig          0
properties.types        0
properties.nst       3465
properties.dmin      6539
properties.rms          0
properties.gap       3466
properties.type         0
geometry.type           0
Longitude               0
Latitude                0
Depth                   0
date                    0
place_name              0
tsunami_occurred        0
magnitude             138
dtype: int64


We remove the specified columns from the DataFrame permanently

In [15]:
df = df.drop(columns=['properties.dmin'])
df = df.drop(columns=['properties.nst'])
df = df.drop(columns=['properties.gap'])
df

Unnamed: 0,id,properties.mag,properties.place,properties.time,properties.status,properties.sig,properties.types,properties.rms,properties.type,geometry.type,Longitude,Latitude,Depth,date,place_name,tsunami_occurred,magnitude
0,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,automatic,49,",focal-mechanism,nearby-cities,origin,phase-da...",0.19,earthquake,Point,-116.431168,33.267334,4.56,2023-03-25,CA,No,Local magnitude
1,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,automatic,70,",nearby-cities,origin,scitech-link,",0.18,earthquake,Point,-116.977833,32.830833,5.19,2023-03-25,CA,No,Local magnitude
2,ak0233v8qay0,2.00,Southern Alaska,1679773586327,automatic,62,",origin,phase-data,",0.57,earthquake,Point,-147.001300,61.260200,0.00,2023-03-25,Southern Alaska,No,Local magnitude
3,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,automatic,22,",origin,phase-data,",0.54,earthquake,Point,-148.898500,64.890000,16.70,2023-03-25,Alaska,No,Local magnitude
4,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,reviewed,159,",origin,phase-data,",0.07,earthquake,Point,-64.826000,18.484333,75.75,2023-03-25,U.S. Virgin Islands,No,Duration magnitude
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,reviewed,68,",origin,phase-data,",0.35,earthquake,Point,-155.396500,58.183900,1.60,2023-02-23,Alaska Peninsula,No,Local magnitude
12459,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,reviewed,39,",origin,phase-data,",0.17,earthquake,Point,-152.028900,60.677600,82.40,2023-02-23,Alaska,No,Local magnitude
12460,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,reviewed,77,",origin,phase-data,",0.12,earthquake,Point,-155.395167,19.240833,30.91,2023-02-23,Hawaii,No,Duration magnitude
12461,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,reviewed,21,",origin,phase-data,",0.19,earthquake,Point,-120.578667,46.255333,12.96,2023-02-23,Washington,No,Local magnitude


drops rows that have missing values in the 'magnitude' column. The resulting DataFrame with the missing values removed is then assigned back to the variable 'df'. Finally, the cleaned DataFrame 'df' is printed to the console.

In [16]:
# Drop missing rows in 'magnitude' column
df = df.dropna(subset=['magnitude'])

# Print the resulting dataframe
df

Unnamed: 0,id,properties.mag,properties.place,properties.time,properties.status,properties.sig,properties.types,properties.rms,properties.type,geometry.type,Longitude,Latitude,Depth,date,place_name,tsunami_occurred,magnitude
0,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,automatic,49,",focal-mechanism,nearby-cities,origin,phase-da...",0.19,earthquake,Point,-116.431168,33.267334,4.56,2023-03-25,CA,No,Local magnitude
1,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,automatic,70,",nearby-cities,origin,scitech-link,",0.18,earthquake,Point,-116.977833,32.830833,5.19,2023-03-25,CA,No,Local magnitude
2,ak0233v8qay0,2.00,Southern Alaska,1679773586327,automatic,62,",origin,phase-data,",0.57,earthquake,Point,-147.001300,61.260200,0.00,2023-03-25,Southern Alaska,No,Local magnitude
3,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,automatic,22,",origin,phase-data,",0.54,earthquake,Point,-148.898500,64.890000,16.70,2023-03-25,Alaska,No,Local magnitude
4,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,reviewed,159,",origin,phase-data,",0.07,earthquake,Point,-64.826000,18.484333,75.75,2023-03-25,U.S. Virgin Islands,No,Duration magnitude
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,reviewed,68,",origin,phase-data,",0.35,earthquake,Point,-155.396500,58.183900,1.60,2023-02-23,Alaska Peninsula,No,Local magnitude
12459,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,reviewed,39,",origin,phase-data,",0.17,earthquake,Point,-152.028900,60.677600,82.40,2023-02-23,Alaska,No,Local magnitude
12460,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,reviewed,77,",origin,phase-data,",0.12,earthquake,Point,-155.395167,19.240833,30.91,2023-02-23,Hawaii,No,Duration magnitude
12461,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,reviewed,21,",origin,phase-data,",0.19,earthquake,Point,-120.578667,46.255333,12.96,2023-02-23,Washington,No,Local magnitude


Counts the number of missing values in each column. Then counts the total number of True values for each column, representing the number of missing values in each column.

In [17]:
# count the number of missing values in each column
missing_values = df.isnull().sum()
# print the result
print(missing_values)

id                   0
properties.mag       0
properties.place     0
properties.time      0
properties.status    0
properties.sig       0
properties.types     0
properties.rms       0
properties.type      0
geometry.type        0
Longitude            0
Latitude             0
Depth                0
date                 0
place_name           0
tsunami_occurred     0
magnitude            0
dtype: int64


Creates a new column called "month" . The values of this new column are obtained by applying a lambda function that extracts the month value from each 'date' value . The 'map()' method is used to apply this lambda function to each 'date' value in the 'date' column. The resulting month values are then assigned to the new "month" column.

In [18]:
df["month"] = df['date'].map(lambda x: x.month)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["month"] = df['date'].map(lambda x: x.month)


Unnamed: 0,id,properties.mag,properties.place,properties.time,properties.status,properties.sig,properties.types,properties.rms,properties.type,geometry.type,Longitude,Latitude,Depth,date,place_name,tsunami_occurred,magnitude,month
0,ci40189903,1.79,"5km WNW of Borrego Springs, CA",1679774005670,automatic,49,",focal-mechanism,nearby-cities,origin,phase-da...",0.19,earthquake,Point,-116.431168,33.267334,4.56,2023-03-25,CA,No,Local magnitude,3
1,ci40189895,2.13,"1km SSW of Santee, CA",1679773995770,automatic,70,",nearby-cities,origin,scitech-link,",0.18,earthquake,Point,-116.977833,32.830833,5.19,2023-03-25,CA,No,Local magnitude,3
2,ak0233v8qay0,2.00,Southern Alaska,1679773586327,automatic,62,",origin,phase-data,",0.57,earthquake,Point,-147.001300,61.260200,0.00,2023-03-25,Southern Alaska,No,Local magnitude,3
3,ak0233v8ikzw,1.20,"33 km NNE of Four Mile Road, Alaska",1679771424123,automatic,22,",origin,phase-data,",0.54,earthquake,Point,-148.898500,64.890000,16.70,2023-03-25,Alaska,No,Local magnitude,3
4,pr71401523,3.21,"17 km NNW of Cruz Bay, U.S. Virgin Islands",1679771233290,reviewed,159,",origin,phase-data,",0.07,earthquake,Point,-64.826000,18.484333,75.75,2023-03-25,U.S. Virgin Islands,No,Duration magnitude,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,reviewed,68,",origin,phase-data,",0.35,earthquake,Point,-155.396500,58.183900,1.60,2023-02-23,Alaska Peninsula,No,Local magnitude,2
12459,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,reviewed,39,",origin,phase-data,",0.17,earthquake,Point,-152.028900,60.677600,82.40,2023-02-23,Alaska,No,Local magnitude,2
12460,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,reviewed,77,",origin,phase-data,",0.12,earthquake,Point,-155.395167,19.240833,30.91,2023-02-23,Hawaii,No,Duration magnitude,2
12461,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,reviewed,21,",origin,phase-data,",0.19,earthquake,Point,-120.578667,46.255333,12.96,2023-02-23,Washington,No,Local magnitude,2


Imports the NumPy library and finds the minimum and maximum values for various columns. It then sets the display options for the output table, including the desired width and maximum number of columns to show.

Next, the code creates a dictionary containing the minimum and maximum values for each column and their corresponding column names. It then uses this dictionary to create a new DataFrame 'data_frame' which is then printed to the console using the 'print()' function. The output shows the minimum and maximum values for each column in a tabular format, including the significance level, country, event month, magnitude, place, depth, etc.

Output Explanation: The output data provides information about seismic events that have occurred in different countries, with details such as the magnitude, depth, and location of each event. It is evident from the output that the maximum magnitude recorded so far in the year 2023 is 5.90 during the month of march near Tajiskistan. In the mean time, the
highest significance level is recorded in Tajikistan with a value of 694.

In [19]:
# Minimum and Maximum 
import numpy as np
sig_minimum = df.loc[df['properties.sig'].idxmin()];
sig_maximum = df.loc[df['properties.sig'].idxmax()]
mag_minimum = df.loc[df['properties.mag'].idxmin()]
mag_maximum = df.loc[df['properties.mag'].idxmax()] 
rms_minimum = df.loc[df['properties.rms'].idxmin()]
rms_maximum = df.loc[df['properties.rms'].idxmax()] 	
depth_minimum = df.loc[df['Depth'].idxmin()]
depth_maximum = df.loc[df['Depth'].idxmax()] 

desired_width = 320

pd.set_option('display.width', desired_width)

np.set_printoptions(linewidth=desired_width)
pd.set_option('display.max_columns', 12)
print(""
          ""
          ""
          "")
data = {'Type': ['Minimum', 'Maximum'],
            'Signifance Level': [sig_minimum['properties.sig'], sig_maximum['properties.sig']],
            'Country': [sig_minimum.place_name, sig_maximum.place_name],
            'Event Month': [sig_minimum.month, sig_maximum.month],
            'Magnitude': [mag_minimum['properties.mag'], mag_maximum['properties.mag']],
            'Place': [mag_minimum.place_name, mag_maximum.place_name],
            'Month': [mag_minimum.month, mag_maximum.month]}
print(""
          ""
          ""
          "")
    # Create DataFrame
data_frame = pd.DataFrame(data)
# Print the output.
print(data_frame) 



      Type  Signifance Level      Country  Event Month  Magnitude        Place  Month
0  Minimum                 0       Alaska            3      -1.33       Alaska      3
1  Maximum               694   Tajikistan            3       5.90   Tajikistan      3


The below code creates a bar chart to visualize the frequency of values in the 'place_name' column that occur more than 100 times in the original DataFrame. 
*   This is implemented by  grouping the DataFrame by the 'place_name' column and counts the number of occurrences of each value using the 'size()' method. The result is a new DataFrame that contains two columns: 'place_name' and 'Count'.
*   Next, the code filters the new DataFrame to only include values where the count is greater than 100, using the boolean indexing operator and the 'loc[]' method.
*   Finally, the code uses the filtered DataFrame to create a bar chart using the 'px.bar()' method from Plotly Express. The x-axis of the chart represents the 'place_name' column, and the y-axis represents the count of each value. The resulting chart is then displayed using the 'show()' method.

Output Explanation: It is clear from the chart that the city 'Alaska' has recorded highest number of seismic activities with a count of 6001. The second highest number 2935 events were recorded in the CA( California) city while Hawai takes the third place with 434 seismic events.

In [20]:
import plotly.express as px

# Count the occurrences of each value in the "category" column
value_counts = df.groupby('place_name').size().reset_index(name='Count')
# print(value_counts)
# Filter values with count greater than 100
value_counts_filtered = value_counts[value_counts['Count'] > 100]

# Create bar chart
fig = px.bar(value_counts_filtered, x='place_name', y='Count')

# Show chart
fig.show()

The below code is used to create a bar chart to visualize the frequency of values in the 'magnitude' column of the DataFrame. This can be useful for identifying the range and distribution of magnitudes in the DataFrame. 


*   It is implemented by grouping the DataFrame by the 'magnitude' column and counts the number of occurrences of each value using the 'size()' method. The result is a new DataFrame that contains two columns: 'magnitude' and 'Count'.
*   The code then prints this DataFrame using the 'print()' method to display the count of each magnitude value in the 'magnitude' column.
*   Finally, the code uses the original DataFrame to create a bar chart using the 'px.bar()' method from Plotly Express. The x-axis of the chart represents the 'magnitude' column, and the y-axis represents the count of each value. The resulting chart is then displayed using the 'show()' method.

Output Explanation: Magnitude refers to the method or algorithm used to calculate the preferred magnitude for the event. In most of the seismic cases, 'Local Magnitude' approach is used to calculate the magnitude registered. 'Moment Magnitude' approach is the least used one.

In [21]:
import plotly.express as px

# Count the occurrences of each value in the "category" column
value_counts = df.groupby('magnitude').size().reset_index(name='Count')
print(value_counts)

# Create bar chart
fig = px.bar(value_counts, x='magnitude', y='Count')

# Show chart
fig.show()

             magnitude  Count
0  Body-wave magnitude    840
1   Duration magnitude   2797
2      Local magnitude   8679
3     Moment magnitude      9


#### creates a world map visualization of earthquake records

The world_map() function can be used to visualize the geographical distribution of earthquake data points in a DataFrame and their association with a specified column like 'tsunami_occurred'.

*   The function takes two parameters - a DataFrame data_frame and a string color.
*   The function uses the px.scatter_geo() method from Plotly Express to create a scatter plot on a world map, where the latitude and longitude of each data point are represented by the 'Latitude' and 'Longitude' columns of the DataFrame. The color of each data point is determined by the column specified in the color parameter.
*   The function sets the projection of the map to "natural earth" using the 'projection' parameter of px.scatter_geo().
*   The function then updates the layout of the plot to include a title and sets the x-axis of the title to 0.5 using the 'title' and 'title_x' parameters of fig.update_layout().
*   Finally, the function displays the plot using the 'show()' method of fig.

Output Explanation : It is evident from the graph that most of the seismic events are registered in the continent 'North America' where the least occurred in the Africa continent. In the same time, in terms of ocean most of the events were recorded in the 'pacific ocean'.


In [22]:
def world_map(data_frame, color):
    fig = px.scatter_geo(data_frame, lat='Latitude', lon='Longitude', color=color, projection="natural earth")
    fig.update_layout(title='Earthquake Records all over the world', title_x=0.5)
    fig.show()
world_map(df,'tsunami_occurred')

The below code defines a function named stack_bar_with_count that creates a stacked bar chart with text values on each bar using Plotly Express library in Python. Here's a brief explanation of the steps in the code:

*   The function takes five parameters - a DataFrame data_frame, two strings values and names, a string color, a string text, and a string title.
*   The function uses the px.bar() method from Plotly Express to create a stacked bar chart, where the values of each bar are represented by the column specified in the values parameter, and the names of each bar are represented by the column specified in the names parameter. The color of each bar is determined by the column specified in the color parameter, and the height of the chart is set to 700 using the 'height' parameter of px.bar().
*   The function then updates the text values of each bar using the 'texttemplate' and 'textposition' parameters of fig.update_traces(). The 'text' parameter is used to specify the text values to be displayed on each bar, and the 'textposition' parameter is used to specify the position of the text values relative to the bars.
*   The function sets the title of the chart using the 'title' parameter of px.bar(). 
*   Finally, the function displays the chart using the 'show()' method of fig.

In [23]:
def stack_bar_with_count(data_frame: object, values: object, names: object, color: object,
                         text: object, title: object) -> object:
    fig = px.bar(data_frame, x=values, y=names, color=color, height=700, text=text, title=title)
    fig.update_traces(texttemplate='%{text:s}', textposition='outside')
    fig.show()

The below code performs two groupby operations on the DataFrame df, and prints the resulting DataFrames.

*   In the first groupby operation, the code groups the DataFrame by two columns, 'properties.type' and 'tsunami_occurred'. It then aggregates the 'properties.type' column using the 'count' method and assigns the resulting Series to the variable 'count'. Finally, it resets the index of the resulting DataFrame using the 'reset_index' method, and assigns the resulting DataFrame to the variable 'tsunami_result'. The resulting DataFrame contains three columns: 'properties.type', 'tsunami_occurred', and 'count', where the 'count' column contains the frequency of occurrences of each combination of values in the 'properties.type' and 'tsunami_occurred' columns.

*   In the second groupby operation, the code groups the DataFrame by two columns, 'properties.type' and 'magnitude'. It then aggregates the 'properties.type' column using the 'count' method and assigns the resulting Series to the variable 'count'. Finally, it resets the index of the resulting DataFrame using the 'reset_index' method, and assigns the resulting DataFrame to the variable 'result'. The resulting DataFrame contains three columns: 'properties.type', 'magnitude', and 'count', where the 'count' column contains the frequency of occurrences of each combination of values in the 'properties.type' and 'magnitude' columns.

Output Explanation: 'Local Magnitude' approach mainly is used to calculate the magnitude of the event in case of queryblast. Eventhough 'Local Magnitude' approach is mostly used for calculation in case of earthquake, still 'Duration Magnitude' approach is involved equally for the evaluation purpose.

In [24]:
tsunami_result=df.groupby(['properties.type','tsunami_occurred'])['properties.type'].aggregate('count').reset_index(name="count")
result=df.groupby(['properties.type','magnitude'])['properties.type'].aggregate('count').reset_index(name="count")

print(result)


  properties.type            magnitude  count
0      earthquake  Body-wave magnitude    840
1      earthquake   Duration magnitude   2791
2      earthquake      Local magnitude   8518
3      earthquake     Moment magnitude      9
4       explosion   Duration magnitude      3
5       explosion      Local magnitude     62
6       ice quake      Local magnitude     32
7     other event      Local magnitude      8
8    quarry blast   Duration magnitude      3
9    quarry blast      Local magnitude     59


The below code creates a bar chart using the px.bar() function from the Plotly Express library. The function takes the DataFrame result as the first argument, and specifies the 'properties.type' column as the x-axis and the 'count' column as the y-axis. The resulting chart shows the frequency or count of occurrences of each value in the 'properties.type' column.

Output Explanation: In the total number of events recorded, around 8000 events are due to earthquake and 62 events recorded are due to explosion.


In [25]:
import plotly.express as px

# Create bar chart
fig = px.bar(result, x='properties.type', y='count')

# Show chart
fig.show()

Output Explanation: The below chart dsiplays that only in 6 events tsunami is triggered because of the earthquake and an in other events like explosion,icequake,quarryblast and so on not even once,tsunami is triggered.

In [26]:
stack_bar_with_count(tsunami_result, 'properties.type', 'count', 'tsunami_occurred', 'count',
                         "Occurrence of tsunami due to earthquake")

Output Explanation: Different approaches were used only to calculate the earthquake activities. But in other cases, only local magnitude is used.

In [27]:
stack_bar_with_count(result, 'properties.type', 'count', 'magnitude', 'count',
                         "Magnitude levels of earthquake")

# Challenges Faced

Tried converting the city names to country and continent names. But it was really difficult as the number of records is greater than 10000. Tried filtering out only the february month data but still the conversion process took too long. However converted from city to country for only february data.

While converting from country to continent most of the country names had special characters and are not clear. so it was difficult to find the continent name. So those codes were commented out in the last

In [28]:
# march_df = df[df['month'].dt.month == 3]
feb_month = df[df['month'] ==2]
feb_month

Unnamed: 0,id,properties.mag,properties.place,properties.time,properties.status,properties.sig,...,Depth,date,place_name,tsunami_occurred,magnitude,month
10429,av91091243,1.64,"92 km W of Adak, Alaska",1677628734320,reviewed,41,...,3.630000,2023-02-28,Alaska,No,Local magnitude,2
10430,av91777531,1.57,"92 km W of Adak, Alaska",1677628724110,reviewed,38,...,5.900000,2023-02-28,Alaska,No,Local magnitude,2
10431,tx2023edur,1.90,"37 km WSW of Mentone, Texas",1677628719055,reviewed,56,...,6.440063,2023-02-28,Texas,No,Local magnitude,2
10432,av91091248,0.92,"93 km W of Adak, Alaska",1677628717920,reviewed,13,...,6.890000,2023-02-28,Alaska,No,Local magnitude,2
10433,tx2023edup,2.90,"37 km WSW of Mentone, Texas",1677628656490,reviewed,129,...,7.249866,2023-02-28,Texas,No,Local magnitude,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,reviewed,68,...,1.600000,2023-02-23,Alaska Peninsula,No,Local magnitude,2
12459,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,reviewed,39,...,82.400000,2023-02-23,Alaska,No,Local magnitude,2
12460,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,reviewed,77,...,30.910000,2023-02-23,Hawaii,No,Duration magnitude,2
12461,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,reviewed,21,...,12.960000,2023-02-23,Washington,No,Local magnitude,2


The given code defines a function called cityToCountry which takes a city name as an argument and returns the name of the country that the city is located in.

In [29]:
from geopy.geocoders import Nominatim
def cityToCountry(cityName):
  geolocator = Nominatim(user_agent="my_app")
  city_name = cityName.strip();

  if(city_name == 'Kuril Islands'):
    country_name= 'Russia'
  elif(city_name == 'Tajikistan'):
    country_name= 'Tajikistan'
  elif(city_name == 'MX'):
    country_name= 'Mexico'
  elif(city_name == 'off the coast of Costa Rica'):
    country_name= 'Costa Rica'
  elif(city_name == 'Central Turkey'):
    country_name= 'Turkey'
  elif(city_name == 'Norwegian Sea'):
    country_name= 'Norway'
  elif(city_name == 'Fiji' or city_name=='south of the Fiji Islands'):
    country_name= 'Fiji'
  elif(city_name == 'California-Nevada border region'):
    country_name= 'United States'
  elif(city_name == 'Papua New Guinea'):
    country_name= 'Papua New Guinea'
  elif(city_name == 'New Zealand' or city_name== 'Kermadec Islands'):
    country_name= 'New Zealand'
  elif(city_name == 'Puerto Rico region'):
    country_name= 'United States'
  elif(city_name == 'Iran'):
    country_name= 'Iran'
  elif(city_name == 'Peru'):
    country_name= 'Peru'
  elif(city_name == 'Eastern Xizang' or city_name =='China'):
    country_name= 'China'
  elif(city_name == 'off the coast of Orgeon' or city_name=='south of Alaska'):
    country_name= 'United States'
  elif(city_name == 'Iraq'):
    country_name= 'Iraq'
  elif(city_name == 'Japan'):
    print("Japan££££££££")
    country_name= 'Japan'
  elif(city_name == 'Peru'):
    country_name= 'Peru'
  elif(city_name == 'Turkey'):
    country_name= 'Turkey'
  else:
    location = geolocator.geocode(city_name, exactly_one=True, timeout=10000)
    if location:
      country_name = location.raw['display_name'].split(",")[-1].strip()
      # print(f"The country name for {city_name} is {country_name}")
    else:
      # print(f"No location found for {city_name}")
      country_name='Unknown'
  # df[country_name]=country_name;
  # print(city_name,"******",country_name);
  return country_name;



used to extract the country name from the place_name column and add it as a new column in the feb_month DataFrame.

In [30]:
feb_month['country_name'] = feb_month["place_name"].apply(lambda x: cityToCountry(x))

Alaska ****** United States
Alaska ****** United States
Texas ****** United States
Alaska ****** United States
Texas ****** United States
Alaska ****** United States
Oklahoma ****** United States
CA ****** Canada
Alaska ****** United States
Alaska ****** United States
Alaska Peninsula ****** United States
Alaska ****** United States
Alaska ****** United States
Alaska ****** United States
Alaska ****** United States
central Turkey ****** Türkiye
Texas ****** United States
Alaska ****** United States
Alaska ****** United States
Alaska ****** United States
Texas ****** United States
Texas ****** United States
Alaska ****** United States
CA ****** Canada
Alaska ****** United States
Alaska ****** United States
Alaska ****** United States
CA ****** Canada
Alaska ****** United States
Alaska ****** United States
Alaska ****** United States
CA ****** Canada
Alaska ****** United States
Alaska ****** United States
CA ****** Canada
Alaska ****** United States
Alaska ****** United States
Alaska ***



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [31]:
feb_month

Unnamed: 0,id,properties.mag,properties.place,properties.time,properties.status,properties.sig,...,date,place_name,tsunami_occurred,magnitude,month,country_name
10429,av91091243,1.64,"92 km W of Adak, Alaska",1677628734320,reviewed,41,...,2023-02-28,Alaska,No,Local magnitude,2,United States
10430,av91777531,1.57,"92 km W of Adak, Alaska",1677628724110,reviewed,38,...,2023-02-28,Alaska,No,Local magnitude,2,United States
10431,tx2023edur,1.90,"37 km WSW of Mentone, Texas",1677628719055,reviewed,56,...,2023-02-28,Texas,No,Local magnitude,2,United States
10432,av91091248,0.92,"93 km W of Adak, Alaska",1677628717920,reviewed,13,...,2023-02-28,Alaska,No,Local magnitude,2,United States
10433,tx2023edup,2.90,"37 km WSW of Mentone, Texas",1677628656490,reviewed,129,...,2023-02-28,Texas,No,Local magnitude,2,United States
...,...,...,...,...,...,...,...,...,...,...,...,...,...
12458,ak0232hn4xtc,2.10,Alaska Peninsula,1677184513425,reviewed,68,...,2023-02-23,Alaska Peninsula,No,Local magnitude,2,United States
12459,ak0232hn4dmo,1.60,"39 km W of Salamatof, Alaska",1677184371205,reviewed,39,...,2023-02-23,Alaska,No,Local magnitude,2,United States
12460,hv73329557,2.24,"9 km ENE of Pāhala, Hawaii",1677184365830,reviewed,77,...,2023-02-23,Hawaii,No,Duration magnitude,2,United States
12461,uw61899432,1.16,"16 km S of Harrah, Washington",1677184317290,reviewed,21,...,2023-02-23,Washington,No,Local magnitude,2,United States


In [32]:
# !pip install pycountry
# !pip install pycountry-convert
# import pycountry_convert as pc


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pycountry
  Downloading pycountry-22.3.5.tar.gz (10.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m44.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: pycountry
  Building wheel for pycountry (pyproject.toml) ... [?25l[?25hdone
  Created wheel for pycountry: filename=pycountry-22.3.5-py2.py3-none-any.whl size=10681847 sha256=be5636ac0b40ec28bf45ecc579b38ad0b9eb3560e72b9e5cc101afa8d6445710
  Stored in directory: /root/.cache/pip/wheels/47/15/92/e6dc85fcb0686c82e1edbcfdf80cfe4808c058813fed0baa8f
Successfully built pycountry
Installing collected packages: pycountry
Successfully installed pycountry-22.3.5
Looking in indexes: https://pypi.o

In [33]:
# import geopy
# import pycountry
# !pip install country_converter

# import country_converter as coco

# def get_continent(country):
#     # Get the country code for the given country
#     continent=''
#     country_code=''
#     # print(country)
#       country_code = pycountry.countries.get(name=country).alpha_2
#       continent = 'Unknown'
#     # Define a dictionary mapping country codes to continents
#       continent_dict = {
#         'AS': 'Asia',
#         'EU': 'Europe',
#         'AF': 'Africa',
#         'NA': 'North America',
#         'SA': 'South America',
#         'OC': 'Oceania',
#         'AN': 'Antarctica'
#       }
    
#     # Get the continent for the given country code
#       try:
#           continent_code = coco.convert(names=country_code, to='Continent_Code')
#           continent = continent_dict[continent_code[0]]
#       except KeyError:
#           continent = 'Unknown'
#     print(country,'country',continent)
    
#     return continent

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting country_converter
  Downloading country_converter-1.0.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.5/44.5 KB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: country_converter
Successfully installed country_converter-1.0.0


In [34]:
# feb_month['Continent'] = feb_month["country_name"].apply(lambda x: get_continent(x))

United States country 
United States country 
United States country 
United States country 
United States country 
United States country 
United States country 
Canada country 
United States country 
United States country 
United States country 
United States country 
United States country 
United States country 
United States country 
Türkiye country Unknown
United States country 
United States country 
United States country 
United States country 
United States country 
United States country 
United States country 
Canada country 
United States country 
United States country 
United States country 
Canada country 
United States country 
United States country 
United States country 
Canada country 
United States country 
United States country 
Canada country 
United States country 
United States country 
United States country 
United States country 
New Zealand country 
United States country 
United States country 
United States country 
United States country 
United States country 
U



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### Conclusion:

The analysis of earthquake data using the USGS API provides valuable insights into the frequency, magnitude, and distribution of earthquakes. The analysis shows that there have been a significant number of earthquakes recorded over the past few decades, with the majority of them being of low magnitude. However, there have also been several large earthquakes that resulted in significant damage and loss of life.

Tried converting the city names to country and continent names. But it was really difficult as the number of records is greater than 10000. Tried filtering out only the february month data but still the conversion process took too long. However converted from city to country for only february data. While converting from country to continent most of the country names had special characters and are not clear. so it was difficult to find the continent name.

The analysis also reveals that earthquakes tend to occur in specific regions, such as the Pacific Ring of Fire, which is a highly seismically active area. Additionally, there appears to be a correlation between the magnitude of earthquakes and the frequency at which they occur.

Data visualizations, such as maps and plots, are used to better illustrate the distribution and frequency of earthquakes. These visualizations are useful in conveying the information to a wider audience and can also aid in identifying patterns and trends that may not be immediately apparent from the raw data.

Overall, the analysis of earthquake data can be used to better understand the patterns and trends of earthquakes, and to develop strategies for predicting and mitigating their impact. Further analysis of this data could involve exploring the relationship between earthquakes and other natural phenomena, such as volcanic activity or changes in ocean currents, to gain a more comprehensive understanding of seismic events.