## 1. Read the cleaned data from `quakes-cleaned.csv` into a pandas dataframe.

In [1]:
# Import the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

# Set pandas optino to view complete dataframe
pd.set_option('display.max_columns', 23)

In [2]:
dataset_location = './Earthquakes'
file_name = 'quakes-cleaned.csv'

In [3]:
df = pd.read_csv('quakes-cleaned.csv')

In [4]:
df.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2023-11-15 09:05:07.304000+00:00,61.5808,-149.847,32.8,1.7,ml,31.315331,110.554586,1.0853,0.22,ak,2023-11-15 09:06:28.283000+00:00,"5 km SSW of Houston, Alaska",earthquake,2.794488,0.2,0.239882,21.952086,automatic,ak,ak
1,2023-11-15 08:53:06.688000+00:00,61.0794,-147.883,14.8,1.0,ml,31.315331,110.554586,1.0853,0.8,ak,2023-11-15 08:54:38.102000+00:00,"55 km NE of Whittier, Alaska",earthquake,2.794488,0.3,0.239882,21.952086,automatic,ak,ak
2,2023-11-15 08:41:52.480000+00:00,19.380667,-155.285339,0.32,1.73,md,15.0,153.0,1.0853,0.2,hv,2023-11-15 08:56:22.252000+00:00,"8 km SW of Volcano, Hawaii",earthquake,0.33,0.38,0.59,15.0,automatic,hv,hv
3,2023-11-15 07:44:53.035000+00:00,61.6382,-149.7828,32.9,1.9,ml,31.315331,110.554586,1.0853,0.31,ak,2023-11-15 07:46:10.981000+00:00,Southern Alaska,earthquake,2.794488,0.2,0.239882,21.952086,automatic,ak,ak
4,2023-11-15 07:19:44.540000+00:00,18.972166,-155.45166,34.759998,1.87,md,37.0,236.0,1.0853,0.12,hv,2023-11-15 07:22:58.830000+00:00,"17 km SE of Naalehu, Hawaii",earthquake,0.71,0.89,0.88,5.0,automatic,hv,hv


## 2. Use pandas to identify five interesting insights from the data.

### Insight 1: Most Earthquakes by `locationSource`

```python
# Insight 1: Mean magnitude of earthquakes by locationSource
count_by_location_source = df.groupby('locationSource')['mag'].count().reset_index()
print("Insight 1: Number of earthquakes by locationSource")
print(count_by_location_source)
```

**Description:**
This bar chart visualizes the count of earthquakes for each `locationSource`. Each bar represents a source, and the height of the bar corresponds to the number of quakes it has occured in location. From the data it is visible that the `ak` location had the highest number of quake compared to the rest of the locations.

In [None]:
# Insight 1: Mean magnitude of earthquakes by locationSource
count_by_location_source = df.groupby('locationSource')['mag'].count().reset_index()
print("Insight 1: Number of earthquakes by locationSource")
print(count_by_location_source)

### Insight 2: Top N locationSource with Highest Average Magnitude and Count of Earthquakes
```python
# Insight 2: Top N locationSource with the highest average earthquake magnitude and count of earthquakes
top_n = 10  # Change this to your desired top N value

# Calculate the mean magnitude and count of earthquakes by region
region_stats = df.groupby('locationSource')['mag'].agg(['mean', 'count']).sort_values(by='mean', ascending=False)

# Select the top N regions
top_regions = region_stats.head(top_n)

# Display the result
print(f"Insight 2: Top {top_n} locationSource with the highest average earthquake magnitude and count of earthquakes")
print(top_regions)
```

**Description**:
This insight identifies the top N locationSource with the highest average earthquake magnitude and displays their mean magnitudes along with the count of earthquakes. It provides information on the sources associated with the most intense seismic activities.

In [None]:
# Insight 2: Top N locationSource with the highest average earthquake magnitude and count of earthquakes
top_n = 10  # Change this to your desired top N value

# Calculate the mean magnitude and count of earthquakes by region
region_stats = df.groupby('locationSource')['mag'].agg(['mean', 'count']).sort_values(by='mean', ascending=False)

# Select the top N regions
top_regions = region_stats.head(top_n)

# Display the result
print(f"Insight 2: Top {top_n} locationSource with the highest average earthquake magnitude and count of earthquakes")
print(top_regions)

### Insight 3: Latitude and Longitude with the Highest Frequency of Earthquakes
```python
# Insight 3: Latitude and longitude with the highest frequency of earthquakes
top_coordinates = df.groupby(['latitude', 'longitude']).size().idxmax()
latitude, longitude = top_coordinates

# Find the number of earthquakes for the top coordinates
num_quakes_for_top_coordinates = df[(df['latitude'] == latitude) & (df['longitude'] == longitude)].shape[0]

print("Insight 3: Latitude and longitude with the highest frequency of earthquakes")
print(f"Latitude: {latitude}, Longitude: {longitude}")
print(f"Number of earthquakes for these coordinates: {num_quakes_for_top_coordinates}")
```

**Description**:
This insight identifies the latitude and longitude coordinates with the highest frequency of earthquakes. Additionally, it displays the number of earthquakes that occurred at this location, providing a hotspot of seismic activity.

In [None]:
# Insight 3: Latitude and longitude with the highest frequency of earthquakes
top_coordinates = df.groupby(['latitude', 'longitude']).size().idxmax()
latitude, longitude = top_coordinates

# Find the number of earthquakes for the top coordinates
num_quakes_for_top_coordinates = df[(df['latitude'] == latitude) & (df['longitude'] == longitude)].shape[0]

print("Insight 3: Latitude and longitude with the highest frequency of earthquakes")
print(f"Latitude: {latitude}, Longitude: {longitude}")
print(f"Number of earthquakes for these coordinates: {num_quakes_for_top_coordinates}")

### Insight 4: Distribution of Earthquake Depths
```python
# Insight 4: Distribution of earthquake depths
num_bins = 30
depth_bins = pd.cut(df['depth'], bins=num_bins)

# Create a DataFrame to store the count for each bin
depth_distribution_df = pd.DataFrame({'Depth Bin': depth_bins, 'Count': 1})
depth_distribution_df = depth_distribution_df.groupby('Depth Bin').count().reset_index()

# Display the depth distribution DataFrame
print(depth_distribution_df)
```
Description:
This insight provides a  distribution of earthquake depths. It includes count for each set of depths. This helps understand the depth characteristics of earthquakes in the dataset.

In [None]:
# Insight 4: Distribution of earthquake depths
num_bins = 30
depth_bins = pd.cut(df['depth'], bins=num_bins)

# Create a DataFrame to store the count for each bin
depth_distribution_df = pd.DataFrame({'Depth Bin': depth_bins, 'Count': 1})
depth_distribution_df = depth_distribution_df.groupby('Depth Bin').count().reset_index()

# Display the depth distribution DataFrame
print(depth_distribution_df)

### Insight 5: Distribution of quakes types
```python
# Insight 5: Distribution of Earthquake Types
earthquake_types_distribution = df['type'].value_counts().reset_index()
print(earthquake_types_distribution)
```
**Description:**
This Pie chart visualizes the distribution of types of quakes.

In [None]:
# Insight 5: Distribution of Earthquake Types
earthquake_types_distribution = df['type'].value_counts().reset_index()
print(earthquake_types_distribution)