This notebook is optional and serves to assist users in customizing the project's scope. It displays the table of contents structure, enabling users to select sections they might want to exclude from the analysis. To exclude these sections, users should note the headings they want to omit and then specify these in the drop_headings variable within the 01_input.ipynb file, after reviewing the displayed table of contents. For those who wish to include all sections in the report can skip this notebook and proceed directly to the 01_input.ipynb file. The resulting table of contents will be printed at the end of this notebook.

Please ensure to paste the file path and name for the input document at the start of this notebook. This step is the only requirement to load all necessary information for the execution of the code.

Recommended Google Colab Runtime Type: CPU (default).

In [1]:
# Specify the file path and name of the input document
document_path = "/content/drive/My Drive/ImpactDataMining/Hurricane_Ian/01_Input"
document_name = "Hurricane Ian_PVRR.docx"

All subsequent sections automatically utilize the information in the document for processing. The code is configured to run with this data, so no additional edits are necessary beyond this point.

In [2]:
!pip install python-docx
!pip install anytree

import docx
import os
import json
import csv

from google.colab import drive
from anytree import Node, RenderTree, search

Collecting python-docx
  Downloading python_docx-1.1.2-py3-none-any.whl (244 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.3/244.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: python-docx
Successfully installed python-docx-1.1.2
Collecting anytree
  Downloading anytree-2.12.1-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.9/44.9 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: anytree
Successfully installed anytree-2.12.1


In [3]:
import time
start_time = time.time()

In [4]:
def current_path():
  print("Current working directory")
  print(os.getcwd())
  print()

current_path()
drive.mount('/content/drive')
os.chdir(document_path)
current_path()

Current working directory
/content

Mounted at /content/drive
Current working directory
/content/drive/My Drive/ImpactDataMining/Hurricane_Ian/01_Input



In [5]:
doc = docx.Document(document_name)

names = []
for para in doc.paragraphs:
    names.append(para.style.name)

text = []
for para in doc.paragraphs:
    text.append(para.text)

text_table = []
for table in doc.tables:
    table_data = []
    for row in table.rows:
        row_data = []
        for cell in row.cells:
            row_data.append(cell.text)
        table_data.append(row_data)
    text_table.append(table_data)

**Drop unnecessary sections**

In [6]:
headings_tree = [(names[i], text[i]) for i, n in enumerate(names) if n.startswith('Heading')]
headings = [text[i] for i, n in enumerate(names) if n.startswith('Heading')]
headings_idx = [i for i, n in enumerate(names) if n.startswith('Heading')]

In [7]:
# Root node
root = Node("Document")

# Keep track of the last node for each heading level
last_nodes = {0: root}

for level, title in headings_tree:
    level_num = int(level.split(' ')[-1])

    # Reset the last nodes for levels greater than the current level
    for i in range(level_num + 1, max(last_nodes.keys()) + 1):
        if i in last_nodes:
            del last_nodes[i]

    # Find the correct parent by looking at the last node at the previous level or above
    parent = None
    for i in range(level_num - 1, -1, -1):
        parent = last_nodes.get(i)
        if parent:
            break

    # Create the new node and store it as the last node for its level
    node = Node(title, parent=parent)
    last_nodes[level_num] = node

# Print the tree
print('TABLE OF CONTENTS')
for pre, _, node in RenderTree(root):
    print(f"{pre}{node.name}")

TABLE OF CONTENTS
Document
├── PREFACE
├── ATTRIBUTION GUIDANCE
│   ├── Reference to PVRR Analyses, Discussions or Recommendations
│   └── Citing Images from this PVRR
├── ACKNOWLEDGMENTS
├── TABLE OF CONTENTS
│   └── Common Terms & Acronyms
└── EXECUTIVE SUMMARY
    ├── Introduction
    │   ├── Loss of Life and Injuries
    │   ├── Official Response
    │   └── Report Scope
    ├── Hazard Characteristics
    │   ├── Meteorological Background
    │   ├── Wind Field
    │   ├── Storm Surge and Coastal Flooding
    │   ├── Rainfall and Inland Flooding
    │   ├── Tornadoes
    │   └── Comparative Case: Hurricane Charley (2004)
    ├── Local Codes and Construction Practices
    ├── Building Performance
    ├── Infrastructure Performance
    │   ├── Power Outages & Restoration
    │   │   └── Power Case Study: Solar Panels
    │   ├── Transportation Disruptions & Restoration
    │   │   └── Transportation Case Study: Sanibel Causeway
    │   └── Telecommunications
    ├── Coastal Protectiv