<a href="https://colab.research.google.com/github/muhdadbachmid/Descriptive-Analysis-of-Earthquake-Indonesia/blob/main/Descriptive_Analysis_of_Earthquake_Indonesia.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import os
import sys
from tempfile import NamedTemporaryFile
from urllib.request import urlopen
from urllib.parse import unquote, urlparse
from urllib.error import HTTPError
from zipfile import ZipFile
import tarfile
import shutil

CHUNK_SIZE = 40960
DATA_SOURCE_MAPPING = 'earthquakes-in-indonesia:https%3A%2F%2Fstorage.googleapis.com%2Fkaggle-data-sets%2F1934565%2F8068603%2Fbundle%2Farchive.zip%3FX-Goog-Algorithm%3DGOOG4-RSA-SHA256%26X-Goog-Credential%3Dgcp-kaggle-com%2540kaggle-161607.iam.gserviceaccount.com%252F20240510%252Fauto%252Fstorage%252Fgoog4_request%26X-Goog-Date%3D20240510T052433Z%26X-Goog-Expires%3D259200%26X-Goog-SignedHeaders%3Dhost%26X-Goog-Signature%3D949e5df7f67ba4ff0eb74d1968390097b6e5c0cfcc5ced65c074695bf5a3965fcaed358acb2c2c9481406d892e6097ed43c2697b95ac07a5d899f86e4178bab9ca068f73724763896a35b98e4c871ba936c8c52896baeacd0339ae5d25970f36b2cc4a7d2f3e5d958c0db085957cd7e7dc5c308cb8d4e08bd60649a8b89e10f1bdaa87dff76611ef322b444728734014777689b646e3685c0eb1093075fa2b56404b7de6075d00051788d087dd6ae5b1351fabfcbd5b486eafa2a8e4c561b4f16908ce040f59e30465997c5e63b80fd89caa5daadb959ffe3f4fc7a97dbe4616270ba0265e53998d13356646afaec79fdc10a27842ea2067361901dc932fe577'

KAGGLE_INPUT_PATH='/kaggle/input'
KAGGLE_WORKING_PATH='/kaggle/working'
KAGGLE_SYMLINK='kaggle'

!umount /kaggle/input/ 2> /dev/null
shutil.rmtree('/kaggle/input', ignore_errors=True)
os.makedirs(KAGGLE_INPUT_PATH, 0o777, exist_ok=True)
os.makedirs(KAGGLE_WORKING_PATH, 0o777, exist_ok=True)

try:
  os.symlink(KAGGLE_INPUT_PATH, os.path.join("..", 'input'), target_is_directory=True)
except FileExistsError:
  pass
try:
  os.symlink(KAGGLE_WORKING_PATH, os.path.join("..", 'working'), target_is_directory=True)
except FileExistsError:
  pass

for data_source_mapping in DATA_SOURCE_MAPPING.split(','):
    directory, download_url_encoded = data_source_mapping.split(':')
    download_url = unquote(download_url_encoded)
    filename = urlparse(download_url).path
    destination_path = os.path.join(KAGGLE_INPUT_PATH, directory)
    try:
        with urlopen(download_url) as fileres, NamedTemporaryFile() as tfile:
            total_length = fileres.headers['content-length']
            print(f'Downloading {directory}, {total_length} bytes compressed')
            dl = 0
            data = fileres.read(CHUNK_SIZE)
            while len(data) > 0:
                dl += len(data)
                tfile.write(data)
                done = int(50 * dl / int(total_length))
                sys.stdout.write(f"\r[{'=' * done}{' ' * (50-done)}] {dl} bytes downloaded")
                sys.stdout.flush()
                data = fileres.read(CHUNK_SIZE)
            if filename.endswith('.zip'):
              with ZipFile(tfile) as zfile:
                zfile.extractall(destination_path)
            else:
              with tarfile.open(tfile.name) as tarfile:
                tarfile.extractall(destination_path)
            print(f'\nDownloaded and uncompressed: {directory}')
    except HTTPError as e:
        print(f'Failed to load (likely expired) {download_url} to path {destination_path}')
        continue
    except OSError as e:
        print(f'Failed to load {download_url} to path {destination_path}')
        continue

print('Data source import complete.')


Downloading earthquakes-in-indonesia, 7509531 bytes compressed
Downloaded and uncompressed: earthquakes-in-indonesia
Data source import complete.


# Descriptive Analysis of Earthquake Data in Indonesia

*Introduction*

My objective is to analyze earthquake data sourced from Indonesia. The dataset comprises vital information regarding earthquakes, encompassing their depth, magnitude, latitude, and longitude. My primary aim is to conduct a comprehensive descriptive analysis to unearth insights into the distribution and attributes of earthquakes occurring in Indonesia.



Step 1: Importing Necessary Libraries:

To initiate the analysis, I commence by importing indispensable libraries, including pandas for data manipulation and numpy for mathematical operations.

In [3]:
import pandas as pd
import numpy as np

Step 2: Loading the Data:

I proceed to load the earthquake dataset from the provided CSV file utilizing the read_csv() function within the pandas library.

In [4]:
data = pd.read_csv('/kaggle/input/earthquakes-in-indonesia/katalog_gempa.csv')


Step 3: Data Cleansing:

Initially, I address any missing values within the dataset by employing the forward fill method to ensure data integrity. Following this, I eliminate any duplicate entries to maintain data accuracy.

In [5]:
data = data.ffill()
data = data.drop_duplicates()

Step 4: Outlier Detection and Removal:

Subsequently, I embark on the task of detecting and subsequently removing outliers present within features such as depth, magnitude, latitude, and longitude. This process is undertaken utilizing the Interquartile Range (IQR) method, ensuring the reliability of subsequent analysis.



In [6]:
for column in ['depth', 'mag', 'lat', 'lon']:
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    data = data[~((data[column] < (Q1 - 1.5 * IQR)) | (data[column] > (Q3 + 1.5 * IQR)))]

Step 5: Calculating Descriptive Statistics:

The subsequent phase involves the calculation of fundamental descriptive statistics, including mean, median, mode, and standard deviation, pertaining to depth, magnitude, latitude, and longitude. These statistics serve as invaluable metrics, offering insights into the central tendencies, variability, and distribution of earthquake occurrences.

In [7]:
#depth

mean_depth = data['depth'].mean()
median_depth = data['depth'].median()
mode_depth = data['depth'].mode()[0]
std_dev_depth = data['depth'].std()

#mag

mean_mag = data['mag'].mean()
median_mag = data['mag'].median()
mode_mag = data['mag'].mode()[0]
std_dev_mag = data['mag'].std()

#lat

mean_lat = data['lat'].mean()
median_lat = data['lat'].median()
mode_lat = data['lat'].mode()[0]
std_dev_lat = data['lat'].std()

#lon

mean_lon = data['lon'].mean()
median_lon = data['lon'].median()
mode_lon = data['lon'].mode()[0]
std_dev_lon = data['lon'].std()

Step 6: Printing Results:

Concluding the analysis, I print the calculated statistics to the screen, facilitating facile interpretation and further analysis

In [8]:
print("Depth - Mean: ", mean_depth, " Median: ", median_depth, " Mode: ", mode_depth, " Std Dev: ", std_dev_depth)
print("Magnitude - Mean: ", mean_mag, " Median: ", median_mag, " Mode: ", mode_mag, " Std Dev: ", std_dev_mag)
print("Latitude - Mean: ", mean_lat, " Median: ", median_lat, " Mode: ", mode_lat, " Std Dev: ", std_dev_lat)
print("Longitude - Mean: ", mean_lon, " Median: ", median_lon, " Mode: ", mode_lon, " Std Dev: ", std_dev_lon)

Depth - Mean:  25.298617770787782  Median:  12.0  Mode:  10  Std Dev:  25.077021746480174
Magnitude - Mean:  3.481641029575358  Median:  3.4  Mode:  3.3  Std Dev:  0.7697849433720615
Latitude - Mean:  -3.4370139501579136  Median:  -2.87  Mode:  -8.29  Std Dev:  4.328459166930589
Longitude - Mean:  118.74502666001766  Median:  120.46  Mode:  128.38  Std Dev:  10.992302426928497


*Result*

Depth: The average depth of the earthquakes is approximately 25.30 km, with a median of 12 km. The most frequently occurring depth (mode) is 10 km. The standard deviation of 25.08 indicates a moderate spread of earthquake depths around the mean.

Magnitude: The average magnitude of the earthquakes is approximately 3.48, with a median of 3.4. The most frequently occurring magnitude (mode) is 3.3. The standard deviation of 0.77 indicates a relatively small spread of earthquake magnitudes around the mean.

Latitude: The average latitude of the earthquake epicenters is approximately -3.44 degrees, with a median of -2.87 degrees. The most frequently occurring latitude (mode) is -8.29 degrees. The standard deviation of 4.33 indicates a moderate spread of earthquake epicenters around the mean latitude.

Longitude: The average longitude of the earthquake epicenters is approximately 118.75 degrees, with a median of 120.46 degrees. The most frequently occurring longitude (mode) is 128.38 degrees. The standard deviation of 10.99 indicates a moderate spread of earthquake epicenters around the mean longitude.