This notebook focuses on generating summaries by using BART large CNN model (https://huggingface.co/facebook/bart-large-cnn) from the impact sentences extracted earlier. It produces two distinct summaries: one highlighting structural impacts and the other detailing community impacts. The summarized results will be available in .json format.

Please ensure to paste the input_path, which is the location of 0_input.ipynb file, at the start of this notebook. This step is the only requirement to load all necessary information for the execution of the code.

Recommended Google Colab Runtime Type: A100 GPU (preferred) or V100 GPU, as this notebook involves running machine learning models.

In [None]:
# Input file path (must navigate at the beginning of each file)
input_path =  "/content/drive/My Drive/ImpactDataMining/Hurricane_Ian/Result"

All the below sections automatically retrieve data from the 0_input.ipynb file, as well as results from previous notebooks in this series. The code is designed to run using this information, so no further edits are required beyond this point.

In [None]:
!pip install transformers
import os
import json
import torch
import math

from google.colab import drive
from transformers import pipeline, BartTokenizer, BartForConditionalGeneration



In [None]:
import time

start_time = time.time()

In [None]:
def current_path():
  print("Current working directory")
  print(os.getcwd())
  print()

current_path()
drive.mount('/content/drive')
os.chdir(input_path)
current_path()

Current working directory
/content

Mounted at /content/drive
Current working directory
/content/drive/My Drive/ResilienceDataMining/Hurricane_Ian/Result



In [None]:
with open('0_input.json', 'r') as file:
    data = json.load(file)
    result_path = data['result_path']
    overlap_tokens = data['overlap_tokens']

In [None]:
os.makedirs(result_path, exist_ok=True)
os.chdir(result_path)
current_path()

Current working directory
/content/drive/My Drive/ResilienceDataMining/Hurricane_Ian/Result



In [None]:
with open('2b_results.json', 'r') as file:
    data = json.load(file)
    sent_nested = data['sent_nested']
    sent_struct_nested = data['sent_struct_nested']
    sent_comm_nested = data['sent_comm_nested']

In [None]:
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-mnli')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

In [None]:
def summary_chunk(text):
  result = []
  for n in text:
    if len(n) > 1:
      para = ' '.join([sent for sent in n])
      if len(para.split()) > 30:
        inputs = tokenizer(para, max_length=1024, return_tensors='pt', truncation=True)
        inputs.to(device)

        length_of_inputs = inputs['input_ids'].shape[1]
        summary_ids = model.generate(inputs['input_ids'], max_length=length_of_inputs, min_length=30,
                           do_sample=False, early_stopping=True, num_beams=4)
        result.append([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids][0])
      else:
        result.append(para)
    else:
      result.append(n[0])
  return result

In [None]:
def sliding_window(text, max_tokens=1024, overlap_tokens=200):
  para = text
  para_len = [len(n.split()) for n in para]
  chunks_len = []; chunks = []

  running_sum_len = 0; running_sum_text = '';
  overlap = []; overlap_1 = [];

  for count, text in zip(para_len, para):
    if running_sum_len + count > max_tokens:
      chunks_len.append(running_sum_len)
      chunks.append(running_sum_text)

      overlap_count = 0; overlap_text = '';
      for prev_elem, prev_text in zip(reversed(overlap), reversed(overlap_1)):
        if overlap_count + prev_elem < overlap_tokens:
          overlap_count = overlap_count + prev_elem
          overlap_text = ' '.join([overlap_text, prev_text])
        else:
          break

      running_sum_len = overlap_count + count
      running_sum_text = ' '.join([overlap_text, text])

      overlap = []; overlap_1 = []
    else:
      running_sum_len = running_sum_len + count
      running_sum_text = ' '.join([running_sum_text, text])

      overlap.append(count)
      overlap_1.append(text)

  if running_sum_len:
    chunks_len.append(running_sum_len)
    chunks.append(running_sum_text)
  return chunks

def summarize_with_bart(chunk):
  summary = []
  if len(chunk.split()) > 30:
    inputs = tokenizer(chunk, max_length=1024, return_tensors='pt', truncation=True)
    inputs.to(device)

    inputs_len = inputs['input_ids'].shape[1]
    summary_ids = model.generate(inputs['input_ids'], max_length=inputs_len, min_length=512,
                  do_sample=False, early_stopping=True, num_beams=4)
    summary.append([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids][0])
  else:
    summary.append(chunk)
  return summary

def hierarchical_summarize(text, max_tokens=1024, overlap_tokens=200):
    chunks = sliding_window(text, max_tokens=max_tokens, overlap_tokens=overlap_tokens)
    count = 0
    summaries = []
    for chunk in chunks:
        summary = summarize_with_bart(chunk)
        summaries.append(summary[0])
    aggregated_summary = " ".join([n for n in summaries])

    if len(aggregated_summary.split()) > max_tokens:
        return hierarchical_summarize(summaries, max_tokens=max_tokens)
    else:
        return aggregated_summary

In [None]:
result_struct_summ = summary_chunk(sent_struct_nested)
summ_struct = hierarchical_summarize(result_struct_summ, overlap_tokens=overlap_tokens)

print('Structural impact summary:')
summ_struct

Structural impact summary:


"Unlike Hurricane Charley (2004), water more so than wind was the impetus behind the disaster that unfolded. The impacts from Hurricane Ian were most severe in the barrier islands. Many buildings were completely washed away, and others left to deal with significant scour and eroded foundations. Seawall collapses were reported along the Atlantic coastline of Florida at Daytona Beach Shores. A levee in Hidden River in Sarasota County, FL, was also breached, causing severe flooding (Clowe, 2022). According to the National Levee Database, the Hidden River levee is a 1.98 mile embankment levee along the Myakka River. Mobile/manufactured housing and RV parks were the most susceptible to the damage. Wind damage was primarily limited to building-built homes in Port Charlotte, FL. Wind-induced flooding in the emergency room at HCA Florida Fawcett Hospital was rated EF3 by the National Weather Service. More severe damage appears to be focusing on roof damage, including roof damage on schools, wi

In [None]:
result_comm_summ = summary_chunk(sent_comm_nested)
summ_comm = hierarchical_summarize(result_comm_summ, overlap_tokens=overlap_tokens)

print('Community impact summary:')
summ_comm

Community impact summary:


'Storm-related death toll from Hurricane Ian was 125 as of November 10, 2022. Death toll included 119 storm-related fatalities in Florida, five in North Carolina, and one in Virginia. The majority of the deaths (57) were reported in Lee County, FL, and an estimated 60% were caused by drowning. Risk modelers estimated wind and coastal storm surge losses of $40-$74 billion. As such, Hurricane Ian will likely be one of the costliest landfalling hurricanes of all time in the US, claiming over 100 lives. The strong hurricane winds associated with Hurricane Ian caused widespread power outages in Florida. North Carolina had power outage of more than 358,000 (Dean, K. Cataudella, K., 2022) The Federal Communications Commission (FCC) reported that cell service outages dropped from 65.0% to around 5.0%. The total number of wireline/cable users affected in Florida dropped from around 320,000 users to around 110,000. As Hurricane Ian moved towards the northwest, causing intense precipitation with 

In [None]:
# Saving results to a JSON file
with open('3_results.json', 'w') as file:
    json.dump({
        'Structural impact summary': summ_struct,
        'Community impact summary': summ_comm
        }, file)

In [None]:
end_time = time.time()
execution_time = end_time - start_time

print("Execution time:", execution_time, "seconds")

Execution time: 109.10354900360107 seconds
