# Cherry-picked examples by label of interest

- Goal: identify expectations for AI-assisted annotation of *similar* documents
- Question: what level of detail do we expect from labels (with caveat that any annotation method will be flexible enough to allow for "discovery" of new labels)?

In [1]:
# load packages
import pandas as pd
import numpy as np
import string
import re
import random

In [2]:
random.seed(11)

In [3]:
# path to RAS corpus
path_ras = '../data/processed/corpus_guidancev4.csv'

In [4]:
df = pd.read_csv(path_ras)
# drop rows with NaN snippet
df = df[df['snippet'].notna()]
df.head(5)

Unnamed: 0,unique_id,snippet,tag_0,tag_1,tag_2,source,group
0,0,Appendix B: Federal Laws and Executive Orders...,IMPLEMENTATION AND MONITORING,Plan Compliance and Integration,,A5 - NOAA Coastal Management 2010 adaptationguide,Climate Adaptation
1,1,Appendix C: Regional Climate Change Summaries,FACT BASE,Fact Base (general),Hazards,A5 - NOAA Coastal Management 2010 adaptationguide,Climate Adaptation
2,2,Adaptation: The adjustment of natural or human...,GUIDE PURPOSE AND OBJECTIVES,Term definition,,A8 - CA 2012 Adapting to Sea Level Rise,Climate Adaptation
3,3,Adaptive capacity: A communitys ability to re-...,GUIDE PURPOSE AND OBJECTIVES,Term definition,,A8 - CA 2012 Adapting to Sea Level Rise,Climate Adaptation
4,4,Resilience: The ability of an entity or system...,GUIDE PURPOSE AND OBJECTIVES,Term definition,,A8 - CA 2012 Adapting to Sea Level Rise,Climate Adaptation


In [5]:
# count tag occurrence in each tag level
print('Tag occurrence by level:\n')
print('tag_0:\n', df['tag_0'].value_counts(), '\n')
print('tag_1:\n', df['tag_1'].value_counts(), '\n')
print('tag_2:\n', df['tag_2'].value_counts(), '\n')

Tag occurrence by level:

tag_0:
 STRATEGY IDENTIFICATION                    711
COMMUNITY GOAL SETTING AND COORDINATION    624
FACT BASE                                  400
ANALYSIS METHODS                           315
IMPLEMENTATION AND MONITORING              269
GUIDE PURPOSE AND OBJECTIVES               160
Name: tag_0, dtype: int64 

tag_1:
 Strategy Action (general)            585
Fact Base (general)                  400
Stakeholder Involvement (general)    332
Analysis Methods                     315
Planning Team (general)              168
Implementation                       138
Strategy Selection                   126
Community Goals Identification       124
Evaluation and Monitoring             87
Guide Purpose Statement               69
Term definition                       49
Plan Compliance and Integration       44
Guide Outcomes                        39
Figures                                3
Name: tag_1, dtype: int64 

tag_2:
 Specific models or methods            

In [6]:
n_snippets = 1

## Exercise

- Focus on guidance document labels about *strategies*
- For each strategy label, pick a random chunk
- Consider what level of detail is appropriate: existing label, or something more refined?
- Some ideas for potentially more refined labels are proposed

## Strategy: Financing and Funding

- Financing / Grants
- Infrastructure
- Public-private partnerships

In [7]:
df.loc[df['tag_2'] == 'Financing and Funding']['snippet'].sample(n_snippets).apply(str).tolist()

['NATIONAL COASTAL RESILIENCE FUND DESCRIPTION PROJECTS The National Coastal Resilience Fund, a pub-lic-private  partnership between the National Fish and Wildlife  Foundation, NOAA, Shell, and TransRe, provides grants to  support natural infrastructure. Established in 2018, the  Fund invests in projects that protect coastal com-munities  from extreme storm and flood events while enhancing  natural habitat.  Community capacity-building and planning, engineering,  design, and construction projects such as living shoreline,  floodplain-habitat restoration design, marsh and wetland  habitat restoration, and natural channel design.']

## Strategy: Capacity, Management, and Planning

- Building codes and standards
- Development
- Land use

In [8]:
df.loc[df['tag_2'] == 'Capacity, Management, and Planning']['snippet'].sample(n_snippets).apply(str).tolist()

['that only mimic the NFIPs minimum requirements  for construction in the currently mapped flood  hazard areas may not be enough to protect people  and property from future or present day storms.9  To  provide a higher level of protection and better prepare  coastal communities for climate change, state and  local governments are encouraged to adopt codes that  consider sea level rise and include higher regulatory  standards. Examples include:  Adding freeboard (an additional height  requirement above the BFE)10,11  Applying V-Zone requirements to the Coastal  A-Zone (area landward of V-Zone that is still  subject to storm surge and damaging waves  (1.5-3 feet)) or the entire Special Flood Hazard  Area  Applying codes outside of the Special Flood  Hazard Area (e.g., to a point landward of the  limit of the Special Flood Hazard Area where  the ground elevation is equal to that of the  adjacent A-Zones BFE plus freeboard)  Applying codes to all structures undergoing  improvements and rep

## Strategy: Physical Infrastructure

- Infrastructure
- Transportation network
- Co-benefits (heat and runoff)

In [9]:
df.loc[df['tag_2'] == 'Physical Infrastructure']['snippet'].sample(n_snippets).apply(str).tolist()

['Constructing narrow streets  Results in less heat-holding  asphalt and concrete  Yields less runoff']

## Strategy: Reduce Environmental Impacts

In [10]:
df.loc[df['tag_2'] == 'Reduce Environmental Impacts']['snippet'].sample(n_snippets).apply(str).tolist()

['biologs, matting), and sand fill or a hybrid approach  combining vegetative planting with low rock sills or  footers, living breakwaters (e.g., oysters), or other  shore protection structures designed to keep sediment  in place or reduce wave energy. The techniques and  materials used will depend on site-specific needs and  characteristics. There are a number of benefits to  living shorelines. Specifically, they:  Maintain natural shoreline dynamics and sand  movement   Trap sand to rebuild eroded shorelines or  maintain the current shoreline   Provide important shoreline habitat   Reduce wave energy and coastal erosion  Absorb storm surge and flood waters   Filter nutrients and pollutants from the  water   Maintain beach and intertidal areas that  offer public access   Are aesthetically pleasing  Allow for landward migration as sea levels  rise  Absorb atmospheric carbon dioxide  Are less costly than shore protection  structures  In some states, the regulatory framework makes  it mo

## Strategy: Regulatory policy and legislation

In [11]:
df.loc[df['tag_2'] == 'Regulatory policy and legislation']['snippet'].sample(n_snippets).apply(str).tolist()

['Buffer ordinances. Buffer design  requirements (width, vegetation, maintenance). Stormwater  credit']

## Strategy: Education and behavior change

In [12]:
df.loc[df['tag_2'] == 'Education and behavior change']['snippet'].sample(n_snippets).apply(str).tolist()

['Education and  Awareness  Programs These are actions to inform and educate citizens,  elected officials, and property owners about  hazards and potential ways to mitigate them.  These actions may also include participation  in national programs, such as StormReady1 or Firewise2 Communities. Although this  type of mitigation reduces risk less directly  than structural projects or regulation, it is an  important foundation. A greater understanding  and awareness of hazards and risk among local  officials, stakeholders, and the public is more  likely to lead to direct actions.  Radio or television spots  Websites with maps and information  Real estate disclosure  Presentations to school groups or neighborhood organizations  Mailings to residents in hazard-prone areas.  StormReady   Firewise Communities']

## Strategy: Strategy-Other

In [13]:
df.loc[df['tag_2'] == 'Strategy-Other']['snippet'].sample(n_snippets).apply(str).tolist()

['E.2 Low Emitting Materials Objective: To reduce the number of indoor air contaminants that could be irritating, harmful, or  odorous to building occupants. Rationale: Low-emitting materials release fewer Volatile Organic Compounds (VOCs) and other  harmful chemicals into the air. These chemicals are found in higher concentrations  indoors and can cause a variety of different health problems, such as eye, nose, and throat  irritation; headaches; and kidney and liver damage.']

## Bonus: Goals

In [14]:
df.loc[df['tag_1'] == 'Community Goals Identification']['snippet'].sample(n_snippets).apply(str).tolist()

['  An understanding of why trust is so important, and how to build it   Tools for effective storytelling   A Stakeholder Map that includes your project team, advisory group, leadership and  decision makers, interest groups, and the broader community   An Engagement and Outreach Plan that identifies goals, target audiences, key  messages, tools for outreach, strategies for outreach, and an implementation plan']

## Bonus 2: Truly Cherry-picked Examples

These are chunks from three separate guidance documents, one from each of R-A-S, that show similar policies/actions being used. I cherry-picked these a while ago for illustrative purposes.

It's worth noting that these and most of the strategy chunks tend to present a set of possible actions available to a community. Thus, it might provide valuable information when coupled with the R-A-S label as we can then examine overlap across R-A-S with respect to the language in these chunks.

In [15]:
snippet = 'Consider incentives to encourage development in more desirable locations'
df.loc[df['snippet'].str.contains(snippet)][['group', 'tag_0', 'tag_1', 'tag_2', 'snippet']]

Unnamed: 0,group,tag_0,tag_1,tag_2,snippet
219,Climate Adaptation,STRATEGY IDENTIFICATION,Strategy Action (general),Regulatory policy and legislation,Incorporate projected climate impacts into po...
230,Climate Adaptation,STRATEGY IDENTIFICATION,Strategy Action (general),Regulatory policy and legislation,Consider incentives to encourage development i...


In [16]:
df.loc[df['snippet'].str.contains(snippet)]['snippet'].apply(str).tolist()[1]

'Consider incentives to encourage development in more desirable locationsplaces that are both less vulnerable to climate-related impacts and well-connected to existing development, infrastructure, and transportation options. Incentives can be financial, such as tax credits for protecting natural resources, or procedural, such as height or floor area bonuses for designs that are more resilient to hazards.  Incorporate measures into hazard mitigation and other plans to rebuild in stronger and more resilient ways should a disaster occur. Identifying and assessing hazards as they relate to infrastructure, capital improvement, and/or transportation improvement plans lets a community prioritize projects and pursue funding from state and federal sources. Communities that upgrade local plans are also in a better position to request postdisaster assistance when the next disaster occurs because they already have projects identified and know where and how to direct rebuilding. Communities can ali

In [17]:
snippet = 'Communities can update their land use'
df.loc[df['snippet'].str.contains(snippet)][['group', 'tag_0', 'tag_1', 'tag_2', 'snippet']]

Unnamed: 0,group,tag_0,tag_1,tag_2,snippet
1428,Resilience and Hazard Mitigation,STRATEGY IDENTIFICATION,Strategy Action (general),Regulatory policy and legislation,Development or Redevelopment Incentives Commun...


In [18]:
df.loc[df['snippet'].str.contains(snippet)]['snippet'].apply(str).tolist()

['Development or Redevelopment Incentives Communities can update their land use, zoning, or other  local regulations to provide incentives for using nature-based  solutions. Zoning incentives can allow a greater height,  density, or intensity of development if a developer uses  nature-based approaches. One common zoning incentive  is an increased floor-to-area ratio (FAR), which regulates  the density of development on a site. The City of Portland,  Oregon offers increased FAR as an incentive for installing  green roofs. Communities can also exempt green roofs  or pervious pavements from any regulations that apply to  impervious cover.  More incentives for adopting nature-based solutions  approaches can be used in the development application  and review period. These include discounted application  fees and discounted or waived maintenance bonding  requirements. The City of Chicago, Illinois waives permit fees  for developments that meet specific nature-based solutions  thresholds. For

In [19]:
snippet = 'Zoning can be used to regulate parcel use'
df.loc[df['snippet'].str.contains(snippet)][['group', 'tag_0', 'tag_1', 'tag_2', 'snippet']]

Unnamed: 0,group,tag_0,tag_1,tag_2,snippet
550,Climate Adaptation,STRATEGY IDENTIFICATION,Strategy Action (general),Regulatory policy and legislation,"Zoning Intended to create a healthy, safe, and..."


In [20]:
df.loc[df['snippet'].str.contains(snippet)]['snippet'].apply(str).tolist()

['Zoning Intended to create a healthy, safe, and orderly  community while balancing a diversity of interests,  ideally as envisioned by a comprehensive plan, zoning is  one of the most commonly used methods of regulating  land use. A number of the measures discussed in this  guide can be implemented through zoning.  Zoning can be used to regulate parcel use, density of  development, building dimensions, setbacks, impervious  surfaces, type of construction (e.g., easily movable),  shore protection structures, landscaping, etc. It can also  be used to regulate where development can and cannot  take place, making it an invaluable tool in efforts to  protect natural resources and environmentally sensitive  areas and guide development away from hazard-prone  areas. Permissible uses and standards vary by zoning  district. Types of districts include general use districts;  overlay districts, where provisions in addition to those  on the underlying districts apply; and special use  districts, 