This notebook is centered on selecting impactful sentences based on a pre-defined threshold specified in 0_input.ipynb. It segregates sentences pertaining to structural and community impacts, preparing them for subsequent summary generation.

Please ensure to paste the input_path, which is the location of 0_input.ipynb file, at the start of this notebook. This step is the only requirement to load all necessary information for the execution of the code.

Recommended Google Colab Runtime Type: CPU, as this notebook does not involve running machine learning models.

In [1]:
# Input file path (must navigate at the beginning of each file)
input_path = "/content/drive/My Drive/ImpactDataMining/Hurricane_Ian/Result"

All the below sections automatically retrieve data from the 0_input.ipynb file, as well as results from previous notebooks in this series. The code is designed to run using this information, so no further edits are required beyond this point.

In [2]:
import os
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

from google.colab import drive
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.metrics import accuracy_score, precision_score, recall_score,  f1_score

In [3]:
import time

start_time = time.time()

In [4]:
def current_path():
  print("Current working directory")
  print(os.getcwd())
  print()

current_path()
drive.mount('/content/drive')
os.chdir(input_path)
current_path()

Current working directory
/content

Mounted at /content/drive
Current working directory
/content/drive/My Drive/ImpactDataMining/Hurricane_Ian/Result



In [5]:
with open('0_input.json', 'r') as file:
    data = json.load(file)
    result_path = data['result_path']
    labels_struct = data['keywords_struct']
    labels_comm = data['keywords_comm']
    labels_non_impact = data['keywords_non_impact']
    threshold = data['threshold']

In [6]:
current_path()
drive.mount('/content/drive')
os.chdir(result_path)
current_path()

Current working directory
/content/drive/My Drive/ImpactDataMining/Hurricane_Ian/Result

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Current working directory
/content/drive/My Drive/ImpactDataMining/Hurricane_Ian/Result



In [7]:
with open('1_results.json', 'r') as file:
    data = json.load(file)
    text_body = data['text_body']
    text_table = data['text_table']
    idx_body = data['idx_body']
    idx_table = data['idx_table']

with open('2a_results.json', 'r') as file:
    data = json.load(file)
    sent_all = data['sent_all']
    keywords = data['labels']
    label_pred = data['result_probs']
    result_labels = data['result_labels']

In [8]:
for i, n in enumerate(result_labels):
  if n in labels_non_impact:
    label_pred[i] = 0

In [9]:
y_pred = [1 if n >= threshold else 0 for n in label_pred]

In [10]:
idx_flat = [i for i, n in enumerate(y_pred) if n == 1]

idx_para_nested = idx_table + idx_body
idx_len = [len(n) for n in idx_para_nested]
idx_para_flat = [k for n in idx_para_nested for k in n]
idx_count = list(range(len(idx_para_flat)))

In [11]:
idx_count_nested = []; temp_list = [];
for i, (n, k) in enumerate(zip(idx_para_flat, idx_count)):
  if i > 0 and n <= idx_para_flat[i-1]:
    idx_count_nested.append(temp_list)
    temp_list = []
  temp_list.append(k)

if temp_list:
  idx_count_nested.append(temp_list)

idx_nested = [];
for n in idx_count_nested:
  temp_list = [k for k in n if k in idx_flat]
  if temp_list != []:
    idx_nested.append(temp_list)

In [12]:
sent_nested = []
for n in idx_nested:
  temp_list = [sent_all[i] for i in n]
  sent_nested.append(temp_list)

In [13]:
pos_labels = [result_labels[i] for i in idx_flat]

In [14]:
idx_struct_flat = [n for n, k in zip(idx_flat, pos_labels) if k in labels_struct]

idx_struct_nested = []
for n in idx_nested:
  temp_list = [k for k in n if k in idx_struct_flat]
  if temp_list != []:
    idx_struct_nested.append(temp_list)

sent_struct_nested = []
for n in idx_struct_nested:
  temp_list = [sent_all[i] for i in n]
  sent_struct_nested.append(temp_list)

In [15]:
idx_comm_flat = [n for n, k in zip(idx_flat, pos_labels) if k in labels_comm]

idx_comm_nested = []
for n in idx_nested:
  temp_list = [k for k in n if k in idx_comm_flat]
  if temp_list != []:
    idx_comm_nested.append(temp_list)

sent_comm_nested = []
for n in idx_comm_nested:
  temp_list = [sent_all[i] for i in n]
  sent_comm_nested.append(temp_list)

In [16]:
print('Structural Impact Sentences')
sent_struct_nested

Structural Impact Sentences


[['Unlike Hurricane Charley (2004), water more so than wind was the impetus behind the disaster that unfolded.'],
 ['The impacts from Hurricane Ian were most severe in the barrier islands from the combination of storm surge and high winds, with many buildings completely washed away, and others left to deal with significant scour and eroded foundations.',
  'Several mobile/manufactured home parks on the barrier islands fared particularly poorly, offering little to no protection to anyone unfortunate enough to shelter in them.',
  'The damage was not restricted to buildings, as the causeways out to the barrier islands were washed away in multiple locations.'],
 ['It is notable that extensive losses were in part driven by decades-long construction boom of residential structures in Ft. Myers and Cape Coral since the 1950s and 1960s, expanding communities and neighborhoods encroaching upon vulnerable coastlines.'],
 ['The surge impacted regions with high population densities housed in both 

In [17]:
print('Community Impact Sentences')
sent_comm_nested

Community Impact Sentences


[['As such, Hurricane Ian will likely be one of the costliest landfalling hurricanes of all time in the US, claiming over 100 lives.'],
 ['Interested readers may also consult the accompanying Outage/Restoration Database for a chronology of disruption/outage/restoration data for power, telecommunications, and transportation networks.'],
 ['The results were catastrophic in terms of both damage to infrastructure and loss of human life on the densely-populated west coast of Florida, particularly in the barrier islands off Ft. Myers and Cape Coral.',
  'Tragically, preliminary numbers available at the time of this report confirm that Ian has caused over 100 fatalities in Florida, the highest direct loss of life in any hurricane landfalling in Florida since the 1935 Labor Day hurricane.'],
 ['The fatalities are primarily associated with the heavy storm-surge that struck the barrier islands of Sanibel, Ft. Myers Beach, and Bonita Beach.',
  'Wind damage was generally less severe, but widespre

In [18]:
# Saving results to a JSON file
with open('2b_results.json', 'w') as file:
    json.dump(
        {'sent_nested': sent_nested, 'sent_struct_nested': sent_struct_nested,
         'sent_comm_nested': sent_comm_nested}, file
        )

In [19]:
end_time = time.time()
execution_time = end_time - start_time

print("Execution time:", execution_time, "seconds")

Execution time: 19.91245198249817 seconds
