##### All import go here

In [1]:
import calendar
import datetime
from datetime import datetime, date, time, timezone
from geopy.geocoders import Nominatim
import numpy as np
import plotly.express as px
import os
import pandas as pd
import urllib.request
import requests
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.cm as cm
from math import radians, cos, sin, asin, sqrt
import seaborn as sns
from scipy import stats # For in-built method to get PCC
from scipy.stats import pearsonr
from urllib.parse import quote
from matplotlib import rc
import warnings
warnings.filterwarnings('ignore')
from matplotlib import pyplot, dates
plt.rcParams.update(plt.rcParamsDefault)
plt.rcParams.update({'figure.figsize': (15, 5)})
%matplotlib inline

from bokeh.plotting import figure, show
from bokeh.transform import factor_cmap
from bokeh.models import (BasicTicker, ColorBar, ColumnDataSource,FactorRange,
                          LinearColorMapper, PrintfTickFormatter,HoverTool,CategoricalColorMapper)
from bokeh.transform import transform
from bokeh.models import Legend
from bokeh.palettes import Spectral4 
from bokeh.palettes import Category20
from bokeh.palettes import Magma, Inferno, Plasma, Viridis, Cividis
from bokeh.io import output_notebook, push_notebook, show, output_file
from bokeh.layouts import row,column,gridplot
from bokeh.models.widgets import Tabs,Panel

In [2]:
rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
# rc('font',**{'family':'serif','serif':['Times']})
rc('text', usetex=True)

SMALL_SIZE = 12
MEDIUM_SIZE = 14
BIGGER_SIZE = 16

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title

**An explainer Jupyter Notebook.** The explainer notebook contains all the behind the scenes data-analysis stuff, details on the dataset, why we've selected these particular visualizations, explanations methodology, etc. The purpose of the explainer notebook is to provide additional details for interested/scientific readers.
- The notebook contains our analysis and code. And it is structured it into the following sections: 

1. **Motivation**

- What is your dataset?
    - The significant Earthquake Dataset for the period of 1900–2023 is a thorough database of information on cruical earthquakes that have happened all over the world in the previous 123 years. The National Earthquake Information Center (NEIC)[[1]](https://www.usgs.gov/programs/earthquake-hazards/national-earthquake-information-center-neic) of the U.S. Geological Survey (USGS)[[2]](https://www.usgs.gov/) is responsible for compiling and maintaining this dataset. To give the most precise and current information on earthquake events, it is frequently updated.
   - Each entry in the dataset contains information about one of the more than 37,000 earthquakes that took place during this time period, including the earthquake's date, time, location, magnitude, and depth.
    - Seismologists, geologists, and other researchers who study earthquakes, as well as emergency management officials and other experts who work in disaster response and preparation, can all benefit from the dataset. The general public can also benefit from using it to learn more about the past of earthquakes and their effects on human societies.
    - The datset downloaded from kaggle [[3]](https://www.kaggle.com/datasets/jahaidulislam/significant-earthquake-dataset-1900-2023?resource=download)

- Why did you choose this/these particular dataset(s)?
    - Since the last ice age, there have been numerous large earthquakes, but their observation and effects have not always been felt globally.
    - On February 6, Turkey experienced two significant earthquakes close to the Syrian border. Sadly, both were stronger than a 7, and 59,259 people were confirmed dead, with 50,783 of them in Turkey and 8,476 in Syria. It is the fifth-deadliest earthquake in the 21st century and the deadliest earthquake on record since the 2010 Haiti earthquake [[4]](https://en.wikipedia.org/wiki/2010_Haiti_earthquake)and the fifth-deadliest of the 21st century [[5]](https://en.wikipedia.org/wiki/Lists_of_21st-century_earthquakes#List_of_deadliest_earthquakes), according to reports. The fourth-costliest earthquakes on record [[6]](https://en.wikipedia.org/wiki/List_of_costliest_earthquakes) [[7]](https://www.caabu.org/news/article/political-failure-has-killed-people-syria-and-after-earthquakes), the damages were estimated at US dollar 104 billion in Turkey and US dollar 14.8 billion in Syria [[8]](https://www.barrons.com/news/donor-conference-seeks-to-rally-quake-aid-for-turkey-syria-bce11409).
    - Since the beginning of time, natural disasters have had a big impact on human lives, frequently leaving a trail of death and destruction in their wake. Despite this, many areas of the world continue to lack the resources needed to implement necessary measures, particularly when it comes to earthquakes. It is for this reason that our project, which focuses on earthquake data analysis, is so intriguing. We intend to analyze the data to determine how changes in social, political, and economic facets of human life have been impacted. This information might aid in efforts made by the international authority to take the necessary steps for the affected communities. Our ultimate objective is to develop a deeper understanding of the social aspects of earthquake data in order to highlight vulnerable communities and communicate potential suggestions of workable solutions for these communities to the rest of the world.

- What was your goal for the end user's experience?
    - The effects of earthquakes on human life in various parts of the world are a particular focus of our project. and seeks to examine the information through the lenses of reporting inequalities and discrepancies, as well as social, political, geographic, financial, and GDP growth in various regions as well as geospatial data. Using an interactive map and graphs, we hope to investigate the effects and effects of earthquake in various parts of the world. To give a more complete picture of the data, these could include a variety of features like population density and comparison on a day-by-day, week-by-week, and month-by-month basis. 

2. **Basic stats**. Let's understand the dataset better

In [3]:
# Functin to download file from github
def file_exist(file_name):
    return file_name in os.listdir(os.getcwd())
def download_file(url,file_name):
    if file_exist(file_name):
        return
    response = requests.get(url)
    open(file_name, "wb").write(response.content)

In [4]:
# Load the dataset from git
url = "https://raw.githubusercontent.com/sifat-e-noor/Social_visualization/main/Projects/Significant%20Earthquake%20Dataset%201900-2023.csv"
download_file(url,"SignificantEarthquakeDataset1900-2023.csv")

# Create dataframe
df_globalEarthquake = pd.read_csv("SignificantEarthquakeDataset1900-2023.csv")
df_globalEarthquake.head(5)

Unnamed: 0,Time,Place,Latitude,Longitude,Depth,Mag,MagType,nst,gap,dmin,...,Updated,Unnamed: 14,Type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2023-02-17T09:37:34.868Z,"130 km SW of Tual, Indonesia",-6.5986,132.0763,38.615,6.1,mww,119.0,51.0,2.988,...,2023-02-17T17:58:24.040Z,,earthquake,6.41,5.595,0.065,23.0,reviewed,us,us
1,2023-02-16T05:37:05.138Z,"7 km SW of Port-Olry, Vanuatu",-15.0912,167.0294,36.029,5.6,mww,81.0,26.0,0.392,...,2023-02-17T05:41:32.448Z,,earthquake,5.99,6.08,0.073,18.0,reviewed,us,us
2,2023-02-15T18:10:10.060Z,"Masbate region, Philippines",12.3238,123.8662,20.088,6.1,mww,148.0,47.0,5.487,...,2023-02-16T20:12:32.595Z,,earthquake,8.61,4.399,0.037,71.0,reviewed,us,us
3,2023-02-15T06:38:09.034Z,"54 km WNW of Otaki, New Zealand",-40.5465,174.5709,74.32,5.7,mww,81.0,40.0,0.768,...,2023-02-16T06:42:09.738Z,,earthquake,3.68,4.922,0.065,23.0,reviewed,us,us
4,2023-02-14T13:16:51.072Z,"2 km NW of Lele?ti, Romania",45.1126,23.1781,10.0,5.6,mww,132.0,28.0,1.197,...,2023-02-17T09:15:18.586Z,,earthquake,4.85,1.794,0.032,95.0,reviewed,us,us


In [5]:
df_globalEarthquake.describe()

Unnamed: 0,Latitude,Longitude,Depth,Mag,nst,gap,dmin,rms,Unnamed: 14,horizontalError,depthError,magError,magNst
count,37331.0,37331.0,37197.0,37331.0,7473.0,10087.0,4395.0,20218.0,0.0,3970.0,20827.0,16551.0,5372.0
mean,5.457651,38.877695,58.583346,5.948616,265.481065,45.014891,4.315178,1.000779,,7.324982,10.679092,0.261882,46.97487
std,30.789822,123.090934,109.5634,0.45516,161.982149,34.311032,5.480411,0.356822,,5.400729,10.66051,0.169566,60.441745
min,-77.08,-179.997,-4.0,5.5,0.0,8.0,0.004505,0.005,,0.085,0.0,0.0,0.0
25%,-16.5198,-75.807,15.0,5.6,134.0,24.1,1.155,0.89,,5.7,3.6,0.2,17.0
50%,1.153,98.577,28.5,5.8,241.0,36.0,2.509,1.0,,7.1,6.1,0.2,31.0
75%,33.786,143.34785,41.0,6.14,372.0,54.8,5.1275,1.11,,8.5,16.2,0.33,55.0
max,87.199,180.0,700.0,9.5,934.0,360.0,39.73,42.41,,99.0,569.2,1.84,941.0


- Write about your choices in data cleaning and preprocessing

  - **Data cleaning and preprocessing:**
    - Checking for missing values in the dataset is the first step in data cleaning and preprocessing.

In [6]:
# Find dataset's overall missing values per columns
print('Follwing Columns have missing values:')
df_globalEarthquake.isnull().sum()

Follwing Columns have missing values:


Time                   0
Place                284
Latitude               0
Longitude              0
Depth                134
Mag                    0
MagType                0
nst                29858
gap                27244
dmin               32936
rms                17113
net                    0
ID                     0
Updated                0
Unnamed: 14        37331
Type                   0
horizontalError    33361
depthError         16504
magError           20780
magNst             31959
status                 0
locationSource         0
magSource              0
dtype: int64

In [7]:
# Clone dataframe for dataset cleaning and preproceesing
df_clean = df_globalEarthquake.copy()

# Check subtly for the missing values in cloumns 
df_clean.isnull().sum()/df_globalEarthquake.shape[0]*100

Time                 0.000000
Place                0.760762
Latitude             0.000000
Longitude            0.000000
Depth                0.358951
Mag                  0.000000
MagType              0.000000
nst                 79.981785
gap                 72.979561
dmin                88.226943
rms                 45.841258
net                  0.000000
ID                   0.000000
Updated              0.000000
Unnamed: 14        100.000000
Type                 0.000000
horizontalError     89.365407
depthError          44.209906
magError            55.664193
magNst              85.609815
status               0.000000
locationSource       0.000000
magSource            0.000000
dtype: float64

In [8]:
# Drop all the null values that are more than 40% in the columns 
Entire_df_clean = (df_clean.isnull().sum()/df_globalEarthquake.shape[0]*100)
RequirePerct_df_clean =(df_clean.isnull().sum()/df_globalEarthquake.shape[0]*100)>40
unwantedColumnsName = Entire_df_clean[RequirePerct_df_clean].index
df_clean = df_clean.drop(columns=unwantedColumnsName)

In [None]:
# def country_name(lat,lot):
#     geolocator = Nominatim(user_agent="foo_bar")
#     coordinates = (lat,lot)
#     location = geolocator.reverse(coordinates)
    
# #     country = location.split(',')[-1]
#     return location

# newdf_clean['Country'] = df_clean.apply(lambda x: country_name(x.Latitude, x.Longitude), axis=1)

In [None]:
# newdf_clean

In [None]:
# Convert Place coulmn's value into string
df_clean['Place'] = df_clean['Place'].apply(str)

# Derive Country name from Place column and create Country column
def place_name(place_name):
    return place_name.split()[-1].strip()    

df_clean['Country'] = df_clean['Place'].apply(place_name)

# Reordering columns
cols = list(df_clean.columns)
cols.insert(2, cols.pop(cols.index('Country')))
df_clean = df_clean[cols]

# Replace Nan value in Depth column with 0.0
df_clean["Depth"] = df_clean["Depth"].fillna(0.0)

In [None]:
# Rename Time column into Datetime
df_clean.rename(columns = {'Time':'Datetime'}, inplace = True)

# Find unique values of Magnitude column 
print(f'Magnitude column has: {len(df_clean.Mag.unique())} data')
# print("Unique values of "'mag(Magnitude)'" column are:")
# df_clean.Mag.unique()

In [None]:
# Convert Datetime column into datetime format and derive Year, Month, Date, Day, Time, and Hour  for further analysis
# Convert Mag columns data into round value for further analysis
df_clean['Datetime'] = pd.to_datetime(df_clean['Datetime'], infer_datetime_format=True)
df_clean['Date'] = df_clean['Datetime'].dt.date
df_clean['Year'] = df_clean['Datetime'].dt.year
df_clean['Month'] = df_clean['Datetime'].dt.month_name() 
df_clean['Day'] = df_clean['Datetime'].dt.day_name()
df_clean['Time'] = df_clean['Datetime'].dt.time
df_clean['Hour'] = df_clean['Datetime'].dt.hour
df_clean['Mag_round'] = df_clean['Mag'].round()

In [None]:
# Derive only time in hour:minute:seconds from 
def time(Time):
    Timezone_remove = Time.replace(tzinfo=None)
    return Timezone_remove.replace(microsecond=0)
    
df_clean['Time'] = df_clean['Time'].apply(time)

In [None]:
# Reordering columns
cols = list(df_clean.columns)
cols.insert(2, cols.pop(cols.index('Country')))
cols.insert(3, cols.pop(cols.index('Year')))
cols.insert(4, cols.pop(cols.index('Month')))
cols.insert(5, cols.pop(cols.index('Date')))
cols.insert(6, cols.pop(cols.index('Time')))
cols.insert(7, cols.pop(cols.index('Hour')))
df_clean = df_clean[cols]

In [None]:
# Save data set to .csv format
# df_clean.to_csv('SignificantEarthquake_cleandataset1900-2023.csv')

df_clean.head(3)

In [None]:
# View dataset's per columns overall basic info
print('Basic info of dataset:\n') 
df_clean.info()
print('\n')
# Find dataset's overall missing values per columns
print('Follwing Columns have missing values:')
df_clean.isnull().sum()

- Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis.
  - __A brief part of data analysis:__
    - Let's begin by examining the dataset to gain a general understanding of the patterns and trends in the earthquakes.

In [None]:
# Describe the dataset stats
df_clean.describe()

- The minimum magnitude recorded is 5.5, and the maximum is 9.5, as shown in the table above.
- The depth and magnitude of the earthquakes are on average 58.37 km and 5.94 respectively.

3. **Data Analysis**
 - Let's look at the annual number of earthquakes.

In [None]:
# Clone dataset to compare daterange of the earthquake in the given dataset
df_clean1 = df_clean.copy()
df_clean1['Year'] = pd.DatetimeIndex(df_clean['Datetime']).year
yearly_earthquakes = df_clean1.groupby('Year').count()['ID'].reset_index()

df_clean2 = df_clean.copy()
df_clean2['Year'] = pd.DatetimeIndex(df_clean['Updated']).year
yearly_earthquakesUpadted = df_clean2.groupby('Year').count()['ID'].reset_index()

In [None]:
# Now, let's plot the number of earthquakes per year
TOOLS = 'save,pan,box_zoom,reset,wheel_zoom,hover'
p = figure(title="Number of earthquakes per year", y_axis_type="linear", plot_height = 500,
           tools = TOOLS, plot_width = 950) # Year-wise total number of earthquakes
p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Total Earthquakes'
p.yaxis.axis_label_text_font_size = "13px"
p.xaxis.axis_label_text_font_size = "13px"
p.yaxis.axis_label_text_font_style = "normal"
p.xaxis.axis_label_text_font_style = "normal"
p.yaxis.axis_label_text_font = "helvetica"
p.xaxis.axis_label_text_font = "helvetica"
p.circle(1900, yearly_earthquakes.ID.min(), size = 10, color = 'green')
p.circle(2011, yearly_earthquakes.ID.max(), size = 10, color = 'red')

p.line(yearly_earthquakes.Year, yearly_earthquakes.ID, line_color="#9c9ede", line_width = 2)
p.select_one(HoverTool).tooltips = [
    ('Year', '@x'),
    ('Number of ocuurences', '@y'),
]

# output_notebook()
# output_file("line_chart.html", title="Line Chart")
# show(p)

In [None]:
p1 = figure(title="Number of earthquakes per year", y_axis_type="linear", plot_height = 500,
           tools = TOOLS, plot_width = 950) # Year-wise total number of earthquakes
p1.xaxis.axis_label = 'Year'
p1.yaxis.axis_label = 'Total Earthquakes'
p1.yaxis.axis_label_text_font_size = "13px"
p1.xaxis.axis_label_text_font_size = "13px"
p1.yaxis.axis_label_text_font_style = "normal"
p1.xaxis.axis_label_text_font_style = "normal"
p1.yaxis.axis_label_text_font = "helvetica"
p1.xaxis.axis_label_text_font = "helvetica"
p1.circle(2013, yearly_earthquakesUpadted.ID.min(), size = 10, color = 'green')
p1.circle(2022, yearly_earthquakesUpadted.ID.max(), size = 10, color = 'red')

p1.line(yearly_earthquakesUpadted.Year, yearly_earthquakesUpadted.ID, line_color="#9c9ede", line_width = 2)
p1.select_one(HoverTool).tooltips = [
    ('Year', '@x'),
    ('Number of ocuurences', '@y'),
]

In [None]:
#Tabbed layout
# Use p and p1 that are created at above
tab1 = Panel(child = p,title = "Earthquakes from: 1900-2023")
tab2 = Panel(child = p1,title = "Earthquakes from: 2013-2023")
tabs = Tabs(tabs=[tab1,tab2])
output_notebook()
show(tabs)

# output_file("tabs_line_chart.html", title="Tabs Line Chart")

In [None]:
# Now, let's plot the magnitudes of the earthquakes per year
sns.set_style("whitegrid")

# color = dict(boxes='DarkGreen', whiskers='DarkOrange',medians='DarkBlue', caps='Gray')
mag = sns.boxplot(x = 'Year', y = 'Mag', data = df_clean1, fliersize=1, whis=.8, palette='light:#5A9', saturation=0.5, linewidth=.5 , width=0.5)
mag.tick_params(axis='x', labelrotation=90)
mag.tick_params(axis='y',labelsize=10)
mag.tick_params(axis='x',labelsize=7)
mag.set_title("Year-wise Magnitude of earthquakes from 1900-2023",fontdict={'size': 13, 'weight': 'bold'})
plt.xlabel('Year', fontsize=12)
plt.ylabel('Magnitude', fontsize=12)
plt.show()

In [None]:
# # Now, let's plot the Depth of the earthquakes per year 
hover = HoverTool(tooltips = [("Year","@Year"),("Depth:","@Depth"),("Mag:","@Mag")], mode="hline")
plot = figure(title="Year-wise Depth of earthquakes from 1900-2023", tools=[hover,"crosshair"], plot_width = 950)
plot.circle(x= "Year",y = "Depth",source=df_clean1,color ="lavender",hover_color ="red")
plot.xaxis.axis_label = 'Year'
plot.yaxis.axis_label = 'Depth of Earthquakes'
plot.yaxis.axis_label_text_font_size = "13px"
plot.xaxis.axis_label_text_font_size = "13px"
plot.yaxis.axis_label_text_font_style = "normal"
plot.xaxis.axis_label_text_font_style = "normal"
plot.yaxis.axis_label_text_font = "helvetica"
plot.xaxis.axis_label_text_font = "helvetica"
show(plot)

In [None]:
# Use data for the period 2000-2023 for further analysis
focusedYear_df = df_clean1.loc[(df_clean1.Year >= 2010) & (df_clean1.Year <= 2023)]

In [None]:
mask1 = ['Year', 'Mag', "Latitude", "Longitude", 'Place', 'Depth', 'Country', "locationSource", "magSource", "ID",'Mag_round']
place_dfmag = focusedYear_df.sort_values('Mag', ascending=False)[mask1]
place_dfmag.head(50)

In [None]:
place_dfdep = focusedYear_df.sort_values('Depth', ascending=False)[mask1]
# place_dfdep.head(30)

In [None]:
# Mask place_dfMag
mask1 = ['Year', 'Mag', "Latitude", "Longitude", 'Place', 'Depth', 'Country', "locationSource", "magSource", "ID",'Mag_round']
place_df = focusedYear_df.sort_values('Year', ascending=True)[mask1]

In [None]:
import plotly.express as px
fig = px.scatter_geo(place_df, lat='Latitude',lon='Longitude', color="Mag",
                     hover_name="Country", hover_data=["Place", "Year", "Mag", "Depth", "locationSource", "magSource"], size="Depth",
                     animation_frame="Year",
                     projection="natural earth")
fig.update_layout(title = 'High Magnitudes and Deepest Earthquakes around the world: 2013-2023', title_x=0.5)
fig.show()

- Describe your data analysis and explain what you've learned about the dataset.
    - __The dataset has the following characteristicst:__
      - Date: The date of the earthquake in YYYY-MM-DD format.
      - Time: The time of the earthquake in HH:MM:SS format.
      - Latitude: The location of the earthquake's epicenter.
      - Longitude: The location of the earthquake's epicenter.
      - A seismic event's type, such as "earthquake," "nuclear explosion," "explosion," or "rockburst," is indicated by the word "type."
      - The earthquake's depth, measured in kilometers.
      - The Richter scale's measurement of the earthquake's magnitude.
      - Magnitude ML stands for local magnitude, Mw for moment magnitude, and Ms for surface wave magnitude, among other types of magnitude measurements.
      - ID: A number that distinguishes each earthquake event.
 
  - We looked at the Significant Earthquake Database from 1900 to 2023 in this notebook. To better understand the data, we loaded the dataset into the notebook, explored the information, and made visualizations. We discovered that the majority of earthquakes range in magnitude from 4 to 6, and that the annual number of earthquakes has grown over time. Additionally, we discovered that the Pacific Ring of Fire, a region with high seismic activity, is where the majority of earthquakes take place. Overall, the dataset contains a wealth of data about earthquakes, and with additional analysis, we can draw additional conclusions from the data.

- If relevant, talk about your machine-learning
  - We mainly focused on humanizing data visualization, therefore reasoning and analyzing behind the scenes of the events require more analysis and research compared to implementing a machine learning model, thus ML was non-relevant for this way of approach to data story telling.   

4. **Genre**. Which genre of data story did you use?
- We aim to do magazine style narrative storytelling of our data. Our aim it to have an author driven magazine style, as we want to take the user through our exploration of the data linearly to finally towards the end present our key findings. So that the storytelling is controlled, and we control the messaging. However, as we aim to have interactivity through maps. We ultimately, is gonna have a hybrid approach. As the interactivity would be a reader-driven element

- We believe this genre is the best way to communicate it, because we through this genre humanize the data and makes it more relatable, which is important for topics such as natural disasters that have significant impact on people’s lives. Then magazine-style also provides the importunity to include interactivity, which helps our story to be more appealing to the reader as the reader is in engaging with the story.

- Which tools did you use from each of the 3 categories of Visual Narrative (Figure 7 in Segal and Heer). Why?

Visual Narrative(we have to pick any from every section)

**Visual Structuring:Section1**
- Establishing Shot / Splash Screen
- Consistent Visual Platform
- Progress Bar / Timebar
- "Checklist" Progresss Tracker

**Highlighting:Section2**
- Close-Ups
- Feature Distinction
- Character Direction
- Motion
- Audio
- Zooming

**Transition Guidance:Section3**
- Familiar Objects (but still cuts)
- Viewing Angle
- Viewer (Camera) Motion
- Continuity Editing
- Object Continuity
- Animated Transitions

- Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?

Narrative Structure(we have to pick any one section)

**Ordering:Section1**
- Random Access
- User Directed Path
- Linear

**Interactivity:Section2**
- Hover Highlighting / Details
- Filtering / Selection / Search
- Navigation Buttons
- Very Limited Interactivity
- Explicit Instruction
- Tacit Tutorial
- Stimulating Default Views

**Messaging:Section3**
- Captions / Headlines
- Annotations
- Accompanying Article
- Multi-Messaging
- Comment Repitition
- Introductory Text
- Summary / Synthesis

In [None]:
focusedYearCountry_df = df_clean1.loc[(df_clean1.Year >= 2010) & (df_clean1.Year <= 2023) & (df_clean1.Country.isin(["Turkey","Indonesia","Japan","India","Haiti",'Chile', 'Colombia','Italy', "Syria"]))]

In [None]:
focusedYearCountry_df.head(2)

In [None]:
mask2 = ['Year', 'Country', "ID"]
vbar_df = focusedYearCountry_df.sort_values('Year', ascending=True)[mask2]

In [None]:
def idchange(id):
    return 1

vbar_df['abc']=vbar_df['ID'].apply(idchange)

In [None]:
vbartemp_df = vbar_df.groupby(['Country', 'Year'])['abc'].size().reset_index(name='Earthquakes')
vbartemp_df = vbartemp_df.pivot(index='Year', columns='Country', values='Earthquakes').reset_index()

vbartemp_df2 = vbartemp_df.fillna(1.0)
# vbartemp_df2

In [None]:
TOOLS = "save,pan,box_zoom,reset,wheel_zoom,tap"
cats = ["Turkey","Indonesia","Japan","India","Haiti",'Chile', 'Colombia','Italy',"Syria"]
source = ColumnDataSource(data=vbartemp_df2)
q = figure(plot_width=640, plot_height=600, title="Country wise count of earthquakes by year",toolbar_location='above', tools=TOOLS, tooltips="$name @Year: @$name")

colors = Category20[9]

q.vbar_stack(cats, x='Year', width=0.5, color=colors, source=source,
             legend_label=[x for x in cats])

q.y_range.start = 0
q.x_range.range_padding = 0.1
q.xgrid.grid_line_color = None
q.axis.minor_tick_line_color = None
q.outline_line_color = None
q.xaxis.axis_label = 'Year'
q.yaxis.axis_label = 'Earthquakes'
q.legend.location = "top_left"
q.legend.orientation = "horizontal"

output_notebook()
# output_file("stacked_bar.html", title="Stacked Bar Chart")

# set output to static HTML file
output_file("boekh_Viz_Country.html")
# save(q)

In [None]:
# Create bar-charts displaying the week-day, the months, the 24-hour cycle, and the 168 hours of the week development of each of these categories
focuscountries = set(["Turkey","Indonesia","Japan","India","Haiti",'Chile', 'Colombia','Italy'])
focuscountries = sorted(list(focuscountries))

In [None]:
x = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

# Create the bar subplots for week-day of focuscrimes
fig, axes = plt.subplots(nrows=4,ncols=2,sharex=True)
# sns.set(font_scale=1)
for i in range(len(focuscountries)):
    country = focuscountries[i]
    df_eachCrime = focusedYearCountry_df.loc[(focusedYearCountry_df['Country'] == country)].Day.value_counts().reindex(index=x)
    df_eachCrime.plot(ax= axes[i//2,i%2] ,kind='bar', figsize = (12, 7),grid=False, color ='lavender', edgecolor = "indigo")
    ylim_max= df_eachCrime.max()
    axes[i//2,i%2].set_title(country, loc='left', y=.8, x=.05)
    # Setting the number of ticks
    axes[i//2,i%2].locator_params(axis='y',tight=True, nbins=6)
    axes[i//2,i%2].set_ylabel("Earquake count", labelpad=7)
    axes[i//2,i%2].set_ylim(0,ylim_max*1.5)
    if i//2 == 6:
        axes[i//2,i%2].set_xlabel("Day of week", labelpad=7)

plt.suptitle("No. of earthquakes per week-day by country", y=1)
fig.align_ylabels(axes[:, ])
plt.tight_layout()
plt.show()

In [None]:
focusedYearCountry_df.Month.value_counts()

In [None]:
mask3 = ['Hour', 'Country', "ID"]
gbar_df = focusedYearCountry_df.sort_values('Hour', ascending=True)[mask3]

def idchange(id):
    return 1

gbar_df['abc']=gbar_df['ID'].apply(idchange)

gbartemp_df = gbar_df.groupby(['Country', 'Hour'])['abc'].size().reset_index(name='Earthquakes')
gbartemp_df = gbartemp_df.pivot(index='Hour', columns='Country', values='Earthquakes')

gbartemp_df2 = gbartemp_df.fillna(1.0)
gbartemp_df2

In [None]:
hours = ['1', '2', '3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24']

In [None]:
# # output_file("boekh_Viz_Line.html")

# source = ColumnDataSource(gbartemp_df2)
# output_notebook()
# colors = Category20[14]
# p = figure(x_range=FactorRange(factors=hours),width=640, height=260, title="Crimes per hour",
#            toolbar_location=None)
# #950,600
# #620,260
# bar = { } # to store vbars
# items = [] ### for the custom legend // you need to figure out where to add it
# ### here we will do a for loop:
# for indx,i in enumerate(focuscountries):   
#     bar[i] = p.line(x=gbartemp_df2.index.name, y=i, source= source,alpha=1, muted_alpha=0.2, line_color=colors[indx],line_width=2)
#     items.append((i, [bar[i]])) ### figure where to add it
#     hover = HoverTool(tooltips=[
#         ("%s" % i , "@{%s}" % i)
#     ], renderers=[bar[i]])
#     p.add_tools(hover)

# p.xaxis.axis_label = "Hour of the day"
# p.xaxis.axis_label_standoff = 10
# p.yaxis.axis_label = "Relative Frequency"
# p.yaxis.axis_label_standoff = 10    
# legend = Legend(items=items, location=(0, -30))
# p.add_layout(legend, 'left')
# p.legend.location ="top_center"
# p.legend.click_policy="mute"

# show(p)

5. **Visualizations.**

- Explain the visualizations you've chosen.

- Why are they right for the story you want to tell?

6. **Discussion.** Think critically about your creation

- What went well?

- What is still missing? What could be improved?, Why?

7. **Contributions**. Who did what?
You should write (just briefly) which group member was the main responsible for which elements of the assignment. (I want you guys to understand every part of the assignment, but usually there is someone who took lead role on certain portions of the work. That's what you should explain).
It is not OK simply to write "All group members contributed equally".

8. **References.** 
Make sure that you use references when they're needed and follow academic standards.

1. [National Earthquake Information Center (NEIC)](https://www.usgs.gov/programs/earthquake-hazards/national-earthquake-information-center-neic)
2. [USGS](https://www.usgs.gov/)
3. [significant-earthquake-dataset-1900-2023](https://www.kaggle.com/datasets/jahaidulislam/significant-earthquake-dataset-1900-2023?resource=download)
4. [2010_Haiti_earthquake](https://en.wikipedia.org/wiki/2010_Haiti_earthquake)
5. [Lists_of_21st-century_earthquakes#List_of_deadliest_earthquakes](https://en.wikipedia.org/wiki/Lists_of_21st-century_earthquakes#List_of_deadliest_earthquakes)
6. [List_of_costliest_earthquakes](https://en.wikipedia.org/wiki/List_of_costliest_earthquakes)
7. [Political failure has killed people in Syria before and after the earthquakes](https://www.caabu.org/news/article/political-failure-has-killed-people-syria-and-after-earthquakes)
8. [Donors Vow 7 Bn Euros For Turkey, Syria Quake Aid](https://www.barrons.com/news/donor-conference-seeks-to-rally-quake-aid-for-turkey-syria-bce11409)

In [None]:
# from bokeh.core.properties import value
# fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
# years = ["2015", "2016", "2017"]
# colors = ["#c9d9d3", "#718dbf", "#e84d60"]

# data = {'fruits' : fruits,
#         '2015'   : [2, 1, 4, 3, 2, 4],
#         '2016'   : [5, 3, 4, 2, 4, 6],
#         '2017'   : [3, 2, 4, 4, 5, 3]}

# source = ColumnDataSource(data=data)

# p = figure(x_range=fruits, plot_height=350, title="Fruit Counts by Year",
#            toolbar_location=None, tools="")

# renderers = p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source,
#                          legend=[value(x) for x in years], name=years)

# for r in renderers:
#     year = r.name
#     hover = HoverTool(tooltips=[
#         ("%s total" % year, "@%s" % year),
#         ("index", "$index")
#     ], renderers=[r])
#     p.add_tools(hover)

# p.y_range.start = 0
# p.x_range.range_padding = 0.1
# p.xgrid.grid_line_color = None
# p.axis.minor_tick_line_color = None
# p.outline_line_color = None
# p.legend.location = "top_left"
# p.legend.orientation = "horizontal"

# show(p)