This notebook is centered on selecting impact sentences based on a pre-defined threshold specified in 0_input.ipynb. It segregates sentences pertaining to structural and community impacts, preparing them for subsequent summary generation.

Please ensure to paste the input_path, which is the location of the output of the 01_input.ipynb file, at the start of this notebook. This step is the only requirement to load all necessary information for the execution of the code.

Recommended Google Colab Runtime Type: CPU (default).

In [None]:
# Specify the directory path where the output of the input file 01_input.ipynb was saved
input_path = "/content/drive/My Drive/ImpactDataMining/Turkiye_Earthquake/01_Input"

All the below sections automatically retrieve data from the 01_input.ipynb file, as well as results from previous notebooks in this series. The code is designed to run using this information, so no further edits are required beyond this point.

In [None]:
import os
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

from google.colab import drive
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.metrics import accuracy_score, precision_score, recall_score,  f1_score

In [None]:
import time

start_time = time.time()

In [None]:
def current_path():
  print("Current working directory")
  print(os.getcwd())
  print()

current_path()
drive.mount('/content/drive')
os.chdir(input_path)
current_path()

Current working directory
/content

Mounted at /content/drive
Current working directory
/content/drive/My Drive/ImpactDataMining/Turkiye_Earthquake/01_Input



In [None]:
with open('0_input.json', 'r') as file:
    data = json.load(file)
    output_path = data['output_path']
    labels_struct = data['keywords_struct']
    labels_comm = data['keywords_comm']
    labels_non_impact = data['keywords_non_impact']
    threshold = data['threshold']

In [None]:
current_path()
drive.mount('/content/drive')
os.chdir(output_path)
current_path()

Current working directory
/content/drive/My Drive/ImpactDataMining/Turkiye_Earthquake/01_Input

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Current working directory
/content/drive/My Drive/ImpactDataMining/Turkiye_Earthquake/03_Output



In [None]:
with open('1_output.json', 'r') as file:
    data = json.load(file)
    text_body = data['text_body']
    text_table = data['text_table']
    idx_body = data['idx_body']
    idx_table = data['idx_table']

with open('2a_output.json', 'r') as file:
    data = json.load(file)
    sent_all = data['sent_all']
    keywords = data['labels']
    label_pred = data['result_probs']
    result_labels = data['result_labels']

In [None]:
for i, n in enumerate(result_labels):
  if n in labels_non_impact:
    label_pred[i] = 0

In [None]:
y_pred = [1 if n >= threshold else 0 for n in label_pred]

In [None]:
idx_flat = [i for i, n in enumerate(y_pred) if n == 1]

idx_para_nested = idx_table + idx_body
idx_len = [len(n) for n in idx_para_nested]
idx_para_flat = [k for n in idx_para_nested for k in n]
idx_count = list(range(len(idx_para_flat)))

In [None]:
idx_count_nested = []; temp_list = [];
for i, (n, k) in enumerate(zip(idx_para_flat, idx_count)):
  if i > 0 and n <= idx_para_flat[i-1]:
    idx_count_nested.append(temp_list)
    temp_list = []
  temp_list.append(k)

if temp_list:
  idx_count_nested.append(temp_list)

idx_nested = [];
for n in idx_count_nested:
  temp_list = [k for k in n if k in idx_flat]
  if temp_list != []:
    idx_nested.append(temp_list)

In [None]:
sent_nested = []
for n in idx_nested:
  temp_list = [sent_all[i] for i in n]
  sent_nested.append(temp_list)

In [None]:
pos_labels = [result_labels[i] for i in idx_flat]

In [None]:
idx_struct_flat = [n for n, k in zip(idx_flat, pos_labels) if k in labels_struct]

idx_struct_nested = []
for n in idx_nested:
  temp_list = [k for k in n if k in idx_struct_flat]
  if temp_list != []:
    idx_struct_nested.append(temp_list)

sent_struct_nested = []
for n in idx_struct_nested:
  temp_list = [sent_all[i] for i in n]
  sent_struct_nested.append(temp_list)

idx_comm_flat = [n for n, k in zip(idx_flat, pos_labels) if k in labels_comm]

idx_comm_nested = []
for n in idx_nested:
  temp_list = [k for k in n if k in idx_comm_flat]
  if temp_list != []:
    idx_comm_nested.append(temp_list)

sent_comm_nested = []
for n in idx_comm_nested:
  temp_list = [sent_all[i] for i in n]
  sent_comm_nested.append(temp_list)

In [None]:
print('Structural Impact Sentences')
sent_struct_nested

Structural Impact Sentences


[['This earthquake was followed by many aftershocks, including several larger than magnitude 6 (one with Mw 6.6).'],
 ['As a result of this sequence of earthquakes and aftershocks, around 28,500 buildings partially or completely collapsed, while another 66,000 buildings were severely damaged in Türkiye.'],
 ['In Syria, more than 22,000 buildings were affected by the earthquakes, with 2,850 of them partially/completely collapsed or severely damaged.'],
 ['Around half of the buildings in the affected regions of Türkiye were constructed before 2000, i.e., before modern principles of earthquake design were implemented in the Turkish Seismic Code.'],
 ['Fragility functions developed for the building stock in the area showed that collapse under large shaking was possible for these relatively older buildings.'],
 ['However, several collapses of buildings constructed after 2000 were also observed.'],
 ['There are several reasons for the collapse of these relatively newer buildings, including: 

In [None]:
print('Community Impact Sentences')
sent_comm_nested

Community Impact Sentences


[['Due to the shallow depth of the earthquake and a bilateral rupture towards the southwest and the northeast with an area of approximately 100 km × 75 km, the earthquake impacted 10 provinces in Türkiye and several others in Syria, resulting in significant casualties due to the collapse of many buildings.'],
 ['As of March 8, the total official death toll due to these earthquakes was reported to be 45,968 confirmed deaths in Türkiye and 7,259 deaths in Syria.'],
 ['In Türkiye alone, more than 100,000 people were reported as injured.'],
 ['The objectives of this joint Preliminary Virtual Reconnaissance Report (PVRR) issued by the Structural Extreme Events Reconnaissance (StEER) network and Earthquake Engineering Research Institute (EERI) Learning From Earthquakes (LFE) Program are: 1) to provide details of the February 6 Mw 7.8 and Mw 7.5 earthquakes, 2) to describe local seismic codes and building construction practices, 3) to compare the recorded ground shaking with the parameters us

In [None]:
# Saving results to a JSON file
with open('2b_output.json', 'w') as file:
    json.dump(
        {'sent_nested': sent_nested, 'sent_struct_nested': sent_struct_nested,
         'sent_comm_nested': sent_comm_nested}, file
        )

In [None]:
end_time = time.time()
execution_time = end_time - start_time

print("Execution time:", execution_time, "seconds")

Execution time: 19.625630378723145 seconds
