### Full solution, receiving text blocks as array, outputting JSON

In [131]:
import pdftotext
import os
import re

# Example of converting a PDF file into a JSON object with page text array
pdf_docs_path = os.path.join("PDF")
one_pdf_path = os.path.join(pdf_docs_path,"protect-your-home-from-wildfire.pdf")

with open(one_pdf_path, "rb") as f:
    pdf = pdftotext.PDF(f)
    
textArray = []
for page in pdf:
    docText = re.sub(r"[^a-zA-Z0-9:.,!?%$@]+", ' ', page).strip()
    textArray.append(docText)

Now textArray is the object to be posted to a single call that returns a json object with a list item for each text instance, including disaster type and actions.

### All NLP and other dependencies for Lambda Functions

In [132]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from collections import Counter
import string
import json
nlp = spacy.load("en_core_web_sm")

### Action extraction function

Three rules to determine an imperative sentence
1. The sentence begins with a verb 
<br>Ex: Consider a sanitary wastewater backflow preventer valve to reduce the risk of sewage backup into your basement.<br>
    A caveat is if the verb in in gerund tense, there should be another verb in the sentence, so we avoid title sentences like "ADAPTING TO CLIMATE CHANGE IN COASTAL COMMUNITIES OF THE ATLANTIC PROVINCES, CANADA"
<br><br>
2. The sentence has a structure of Always + v. / Never + v. / Please + v. / Don't + v. / not + v.
<br>Ex: Never pour kitchen grease, fats or oils into your house drains.
<br><br>
3. The sentence has multiple clauses and one of the clause begins with a verb
<br>Ex: If you live alone, develop a plan for yourself with links to neighbours and friends.<br>

Before these rules are applied, though, we need to rule out interrogative sentences
<br>Considering a new house?<br>


In [151]:
# Defining the imperative sentence detector for action extraction as well as geopolitical entities (GPE)
def impSentenceExtractor(someText):
    
    doc = nlp(someText)
    impSentList=[]
    gpeList=[]
    specialWords = ("always","never","please","don't","not")
    # Just a list of words not to be confused with places
    gpeStopWords = ("Wildfire", "Flood", "Flooding", "Hurricane", "Store", "Driveway", "Building", "Uncut", "Rbon", "Complementary")
    # Extract sentences from block of text
    for sentence in doc.sents:
        
        # Check if it is not an interrogative sentence:
        if sentence[len(sentence)-1].shape_ != "?":
        
            # Rule 1
            if sentence[0].tag_=='VB' and len(sentence) > 2 and not (sentence.text in impSentList):
                impSentList.append(sentence.text.capitalize())
                
            else:
                if sentence[0].tag_=='VBG' and len(sentence) > 2 and not (sentence.text in impSentList):
                    # Caveat to rule 1: search for another verb
                    addSentence = False
                    for token in sentence:
                        if token.tag_=='VB':
                            addSentence = True
                    if addSentence:
                        impSentList.append(sentence.text.capitalize()) 

                
                # Rules 2 & 3         
                else:
                    addSentence = False
                    for token in sentence:
                        if token.lower_ in specialWords:
                            n = token.i - sentence.start #find the position of the special word in the sentence
                            if sentence[n+1].pos_=='VERB' and sentence[n+1].tag_=="VB" and len(sentence) > 2: 
                                addSentence = True
                    
                    if addSentence and not (sentence.text in impSentList):
                        impSentList.append(sentence.text.capitalize())

    # Geopolitical entity extraction
    for ent in doc.ents:
        if ent.label_ == "GPE":
            # Append it if not already there:
            if not(ent.text in gpeList) and not (ent.text in gpeStopWords):
                gpeList.append(ent.text)
    return impSentList, gpeList

In [152]:
#Test
someText="Burn everything thoroughly in Canada in Rbon. Burn barrels and Re pits. adapting to climate change is good. close the doors. always close the doors. Always close the doors. Acknowledging how you feel can help you manage stress. In Hurricane."
impSentenceExtractor(someText)

(['Burn everything thoroughly in canada in rbon.',
  'Close the doors.',
  'Always close the doors.',
  'Acknowledging how you feel can help you manage stress.'],
 ['Canada'])

### Frequent words extractor function

In [135]:
# Defining a function for frequent word extraction, returning a simple string:
def frequentClimateWordsExtractor(text):

    # Dictionary of relevant words
    dictionary = ["snow","change","climate","heatwave","adaptation","tornado","water","icestorm","risk","impact","level","community","land","management","planning","development","http","plan","infrastructure","sea","event","action","vulnerability","flood","assessment","storm","temperature", "low","rise","resource","weather","strategy","damage","effect","precipitation","hazard","ice","protection","home","flooding","erosion","environment","emission","al","winter","heat","forest","wind","mitigation","emergency","coast","shoreline","greenhouse","elevation","carbon","wave","dike","wetland","disaster","conservation","reduction","fire","rain","drainage","ground","power","stormwater","roof","rainfall","extreme","wildfire","reference","vegetation","threat","drought","disease","coastline","sewer","nature","neutral","neutrality"]

    # Loads text with linguistic annotations from Spacy
    my_doc = nlp(text)

    filteredDoc = []
    filteredList = []
    
    # Returns a list with relevant words filtered by the dictionary
    for sentence in my_doc.sents:
        for word in sentence:
            if not(word.is_stop) and (word.pos_=='NOUN' or word.pos_=='PROPN'):
                filteredDoc.append(word.text.lower())

    nounsFreqDistribution = Counter(filteredDoc)
    
    listOfWords=""
    for word in nounsFreqDistribution.most_common(300):
        if word[0] in dictionary:
            listOfWords = listOfWords + word[0] + ", "
            filteredList.append(word[0])
    return listOfWords, filteredList

### Disaster Classifier (rules-based)

In [136]:
# Trying a simple rule-based approach to disaster classification
def disasterType(key_arr):
    
    # Undefined to start with
    # disaster_class = "Undefined"
    numDetected = 0
    disaster_class = []
    
    if "carbon" in key_arr and ("neutral" in key_arr or "neutrality" in key_arr):
        disaster_class.append("Carbon Neutrality")
        numDetected+=1
        
    if "adaptation" in key_arr and ("change" in key_arr or "plan" in key_arr):
        disaster_class.append("Climate Change Adaptation")
        numDetected+=1
        
    if "drought" in key_arr:
        disaster_class.append("Drought")
        numDetected+=1
        
    if "flood" in key_arr or "flooding" in key_arr or "rainfall" in key_arr or "stormwater" in key_arr or ("sea" in key_arr and "level" in key_arr and "rise" in key_arr):
        disaster_class.append("Flooding")
        numDetected+=1
        
    if ("heat" in key_arr and "extreme" in key_arr) or "heatwave" in key_arr:
        disaster_class.append("Heatwave")
        numDetected+=1
        
    if "mitigation" in key_arr:
        disaster_class.append("Mitigation")
        numDetected+=1
        
    if "wind" in key_arr or "tornado" in key_arr:
        disaster_class.append("Severe Wind")
        numDetected+=1
        
    if "snow" in key_arr or "snowstorm" in key_arr:
        disaster_class.append("Snowstorm")
        numDetected+=1

    if "temperature" in key_arr and "low" in key_arr:
        disaster_class.append("Low Temperatures")
        numDetected+=1


    if "fire" in key_arr or "wildfire" in key_arr:
        disaster_class.append("Wildfire")
        numDetected+=1
        
    if numDetected==0:
        disaster_class.append("Undefined")
        
    if "http" in key_arr or "al" in key_arr or "reference" in key_arr:
        disaster_class = ["References"]
    
    return disaster_class

### Core function execution, processing textArray

In [137]:
# Given textArray, loop through its text items
def mainProcessor(arrayOfText):
    returnBody=[]

    i=0
    for textBlock in arrayOfText:
        pageItem = {}
        i+=1
        impSents, places = impSentenceExtractor(textBlock)
        keywords, wordList = frequentClimateWordsExtractor(textBlock)
        disasterClass = disasterType(wordList)
        pageItem['block']=i
        pageItem['class']=disasterClass
        pageItem['places']=places
        pageItem['actions']=impSents
        if len(impSents)>0:
            returnBody.append(pageItem)
    
    return returnBody

In [138]:
mainProcessor(textArray)

[{'block': 4,
  'class': ['Wildfire'],
  'places': [],
  'actions': ['Make sure you have adequate insurance on your home and property.',
   'Let s look at three areas where you can apply firesmart principles to protect or reduce the damage to your property from a wildfire.',
   'Remove flammable trees and shrubs, such as pine, spruce and juniper.',
   'Keep your grass mowed and watered.']},
 {'block': 5,
  'class': ['Wildfire'],
  'places': [],
  'actions': ['Remove any trees and debris that would support the rapid spread of a wildfire.',
   'Make sure to thin or space trees so that the crowns tops of individual trees are at least 3 to 6 metres apart.',
   'Remove tree branches up to 2 metres from the ground.',
   'In this zone the objective is not to remove all combustible fuels from the forest, but to thin the area so fires will be low intensity and more easily low stand density where tree crowns do not touch extinguished.',
   'Thin or reduce the shrubs and trees that make up the un

In [144]:
textArray

['Part of the FireSmart Protect your home from Protection Plan Wildfire Designed for safer living is a program endorsed by Canada s insurers to promote disaster resilient homes.',
 'for Catastrophic Loss Reduction The Institute for Catastrophic Loss Reduction ICLR , established in 1997, is a world class centre for multidisciplinary disaster prevention research and communication. ICLR is an independent, not for profit research institute founded by the insurance industry and affiliated with the University of Western Ontario. The Institute s mission is to reduce the loss of life and property caused by severe weather and earthquakes through the identification and support of sustained actions that improve society s capacity to adapt to, anticipate, mitigate, withstand and recover from natural disasters. ICLR s mandate is to confront the alarming increase in disaster losses caused by natural disasters and to work to reduce disaster deaths, injuries and property damage. Disaster damage has be