# Part 1: Endangered Species

In [8]:
import pandas as pd
import openai
import json
import pymupdf4llm
import pymupdf

In [9]:
species = pd.read_csv("sources/species.csv")

In [10]:
species

Unnamed: 0,Scientific Name,Common Name,Species Group,Federal Listing Status,Where Listed
0,Acanthorutilus handlirschi,Cicek (minnow),Fishes,Endangered,Wherever found
1,Accipiter fasciatus natalis,Christmas Island goshawk,Birds,Endangered,Wherever found
2,Accipiter francesii pusillus,Anjouan Island sparrowhawk,Birds,Endangered,Wherever found
3,Accipiter gentilis laingi,Queen Charlotte goshawk,Birds,Threatened,"British Columbia, Canada"
4,Accipiter striatus venator,Puerto Rican sharp-shinned hawk,Birds,Endangered,Wherever found
...,...,...,...,...,...
1500,Zapus hudsonius preblei,Preble's meadow jumping mouse,Mammals,Threatened,wherever found
1501,Zosterops albogularis,Norfolk Island white-eye,Birds,Endangered,Wherever found
1502,Zosterops modesta,Seychelles white-eye,Birds,Endangered,Wherever found
1503,Zosterops rotensis,Rota bridled white-eye,Birds,Endangered,Wherever found


In [11]:
client = openai.Client()

* Whether it has a reference to a section 7 consultation
* Has a Biological Assessment

In [40]:
def get_species_comments(doc, species):

    ENDANGERED_SPECIES_ANALYSIS_PROMPT = f"""
You are an expert legal analyst tasked with analyzing an Environmental Impact Statement (EIS) for a proposed development project.

Here is the complete list of current endangered species in the United States:

```csv
{species.to_csv()}
```

Carefully read the document - the EIS may mention one or more of these endangered species. Identify ALL species from the list above mentioned in the EIS, and cite sections of the text that may pose increased risk or scrutiny for the project, specifically with reference to the following criteria:

* Whether it has a reference to a section 7 consultation
* Has a "Biological Assessment"
* Additionally, if any other related species could be impacted by the proposed development, flag it for review, write a suggestion that the developer analyze, and inciting the specific details of that species from the table above.

Respond with the following format:

```json
{{
    "comments" : [
        {{
            "quote": "Exact text quote from the document",
            "comment": "Explanation for how the quoted text could introduce regulatory burden/risk, and what the the developer might need to consider to mitigate this risk.",
        }},
        ...
    ]
}}
```

If no relevant text is found, return an empty list.""".strip()

    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": ENDANGERED_SPECIES_ANALYSIS_PROMPT},
            {"role": "user", "content": doc},
        ],
        response_format={"type": "json_object"}
    )

    resp_obj = json.loads(resp.choices[0].message.content)

    return resp_obj

In [13]:
# Step 1: Convert PDF to markdown
PAGES_PER_CHUNK = 50
page_count = pymupdf.open("sample.pdf").page_count
doc_chunks = [
    pymupdf4llm.to_markdown("sample.pdf", pages=range(i, i + PAGES_PER_CHUNK))
    for i in range(0, page_count, PAGES_PER_CHUNK)
]

In [46]:
species_comments = get_species_comments(doc_chunks[0], species)

In [47]:
species_comments

{'comments': [{'quote': 'The Project site does not support suitable habitat for Bi-State sage-grouse and it is not located within any mapped habitat identified in the 2016 Record of Decision and Land Use Plan Amendment for the Nevada and California Greater Sage Grouse Bi-State Distinct Population Segment in the Carson City and Tonopah Field Office... The USFWS raised concerns regarding yellow-billed cuckoo, a federally listed endangered species. This species has not been documented in or near the Project area during recent surveys, nor in the past. The species could, theoretically, migrate along the Walker River corridor. A single crossing of the gen-tie occurs over the Walker River. As such, this species is addressed in the Draft EIS to identify the means of avoidance.',
   'comment': 'The text indicates a Section 7 consultation may be triggered due to the potential impact on the Yellow-billed Cuckoo, a listed endangered species, as well as concerns about the Bi-State sage-grouse. The

In [45]:
find_comments_in_doc(doc_chunks[0], species_comments['comments'])

[{'quote': {'start': 28969,
   'end': 30440,
   'text': 'General wildlife; special status species; and threatened, endangered, and candidate species|Commenters raised questions about potential impacts to big game species, small mammals, and migratory birds, as well as loss of habitat and the loss of movement corridors through the solar site. Several commenters raised questions about potential impacts to Bi-State sage grouse, a special status species under consideration for listing as endangered or threatened under the Endangered Species Act (ESA). The Project site does not support suitable habitat for Bi-State sage-grouse and it is not located within any mapped habitat identified in the 2016 Record of Decision and Land Use Plan Amendment for the Nevada and California Greater Sage Grouse Bi-State Distinct Population Segment in the Carson City and Tonopah Field Office (2016 Bi-State Sage Grouse Plan Amendment) (BLM 2016). This species is addressed in the Draft EIS, given questions and co

In [14]:
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": ENDANGERED_SPECIES_ANALYSIS_PROMPT},
        {"role": "user", "content": doc_chunks[0]},
    ],
)

* Yellow-billed cuckoo (Coccyzus americanus)
* Bi-State Sage-Grouse (Centrocercus minimus)

In [15]:
print(resp.choices[0].message.content)

### Chapter 3 Affected Environment and Environmental Impacts

#### 3.8 General Wildlife; Special Status Wildlife Species; and Threatened, Endangered, Proposed, and Candidate Species

**3.8.1 Introduction**

This section addresses potential impacts to wildlife habitat and wildlife species subject to federal, state, and local regulations and policies, including species listed as threatened or endangered under the Endangered Species Act (ESA). The analysis includes consideration of potential loss of habitat, fragmentation, and displacement of species due to the proposed development of the Libra Solar Project.

**3.8.2 Special Status Wildlife Species**

The U.S. Fish and Wildlife Service (USFWS), Nevada Department of Wildlife (NDOW), and other agencies identified several special status wildlife species with the potential to be affected by the proposed project. Table 3.8-1 provides a list of the species discussed in this report.

**Table 3.8-1** **Special Status Wildlife Species Potentially

# National Historic Sites

In [18]:
sites = pd.read_excel("sources/sites.xlsx")

In [19]:
sites.columns

Index(['Ref#', 'Property Name', 'State', 'County', 'City ', 'Street & Number',
       'Status', 'Request Type', 'Status Date', 'Restricted Address',
       'Area of Significance', 'Category of Property', 'External Link',
       'Level of Significance - International',
       'Level of Significance - Local', 'Level of Significance - National',
       'Level of Significance - Not Indicated',
       'Level of Significance - State', 'Listed Date',
       'Name of Multiple Property Listing', 'NHL Designated Date',
       'Other Names', 'Park Name', 'Property ID'],
      dtype='object')

In [20]:
all_states = "\n".join(map(str, sites['State'].unique().tolist())) 
all_counties = "\n".join(map(str, sites['County'].unique().tolist()))
all_cities = "\n".join(map(str, sites['City '].unique().tolist()))

In [21]:
len(all_states), len(all_counties), len(all_cities)

(624, 15766, 157233)

* Higher risk if sites mentioned at all
* If sites are mentioned, are you doing the right thing?
* Mentions of section 106 of historic preservation act
* If the site is mentioned, did they also mention State historic preservation officers, tribal historic preservation officers.

In [22]:
def get_region(doc_chunk):

    LOCATION_SEARCH_PROMPT = f"""
The following is a list of cities, counties, and states in the United States:

States:
-------

{all_states}

Counties:
---------

{all_counties}

Cities:
-------

{all_cities}

You will be provided with an Environmental Impact Statement (EIS) for a proposed development project. Identify the states, counties, and cities relevant to this project. Return your response in the following JSON format:

```json
{{
    "states": [""],
    "counties": [""],
    "cities": [""]
}}
```

DO NOT output any other information or text. If the EIS does not mention a state, county, or city, output an empty list for that field.
""".strip()
    # First, find the state from the list of states
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": LOCATION_SEARCH_PROMPT,
            },
            {"role": "user", "content": doc_chunk},
        ],
        response_format={"type": "json_object"},
    )

    return json.loads(resp.choices[0].message.content)

In [27]:
def get_historic_sites_comments(doc_chunk):

    locations = get_region(doc_chunks[0])

    # Filter for states, counties, and cities mentioned in the document
    relevant_sites = sites[
        sites["State"].isin(locations["states"])
        & sites["County"].isin(locations["counties"])
        & sites["City "].isin(locations["cities"])
    ]

    HISTORIC_SITES_COMMENTS_PROMPT = f"""
You are an expert legal analyst tasked with analyzing an Environmental Impact Statement (EIS) for a proposed development project.

Here is a list of sites of national historic importance in the United States located in the region of the proposed development project:

{relevant_sites.to_markdown()}

Carefully read the document - the EIS may mention one or more of these sites. If it does, cite the specific text and comment on how it might contribute to a greater risk, specifically in relation to the following:

* If sites are mentioned, are they being handled correctly?
* Look for mentions of section 106 of historic preservation act.
* If sites are mentioned, are state historic preservation officers (SHPOs) or tribal historic preservation officers (THPOs) mentioned?

Respond with the following format:

```json
{{
    "comments" : [
        {{
            "quote": "Exact text quote from the document",
            "comment": "Explanation for how the quoted text could introduce regulatory burden/risk, and what the the developer might need to consider to mitigate this risk.",
        }},
        ...
    ]
}}
```

If no relevant text is found, return an empty list.""".strip()

    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": HISTORIC_SITES_COMMENTS_PROMPT},
            {"role": "user", "content": doc_chunk},
        ],
        response_format={"type": "json_object"},
    )

    resp_obj = json.loads(resp.choices[0].message.content)

    return resp_obj

In [28]:
site_comments = get_historic_sites_comments(doc_chunks[0])

In [29]:
print(site_comments)

{'comments': [{'quote': 'The BLM is conducting on-going government-to-government consultation with Bridgeport Indian Colony, Fallon Paiute-Shoshone Tribe, Pyramid Lake Paiute Tribe, Reno-Sparks Indian Colony, Walker River Paiute Tribe, Washoe Tribe of Nevada and California, Yerington Paiute Tribe, and Yomba Shoshone Tribe.', 'comment': "The continued consultation with local tribes represents a correct and thorough adherence to required protocols for involving Tribal Historic Preservation Officers (THPOs) under Section 106 of the NHPA. This reduces the risk of overlooking tribes' cultural resources but any oversight can lead to project delays or legal disputes."}, {'quote': 'The BLM has identified potential impacts to cultural resources in this Draft EIS and is continuing discussions with Tribes through formal and informal consultation to ensure that all concerns are considered in proposed mitigation.', 'comment': 'This approach helps in addressing concerns raised by tribes about the po

In [31]:
from comments import find_comments_in_doc

[{'quote': 'The BLM is conducting on-going government-to-government consultation with Bridgeport Indian Colony, Fallon Paiute-Shoshone Tribe, Pyramid Lake Paiute Tribe, Reno-Sparks Indian Colony, Walker River Paiute Tribe, Washoe Tribe of Nevada and California, Yerington Paiute Tribe, and Yomba Shoshone Tribe.',
  'comment': "The continued consultation with local tribes represents a correct and thorough adherence to required protocols for involving Tribal Historic Preservation Officers (THPOs) under Section 106 of the NHPA. This reduces the risk of overlooking tribes' cultural resources but any oversight can lead to project delays or legal disputes."},
 {'quote': 'The BLM has identified potential impacts to cultural resources in this Draft EIS and is continuing discussions with Tribes through formal and informal consultation to ensure that all concerns are considered in proposed mitigation.',
  'comment': 'This approach helps in addressing concerns raised by tribes about the potential 

In [36]:
find_comments_in_doc(doc_chunks[0], site_comments['comments'])

[{'quote': {'start': 23737,
   'end': 24036,
   'text': 'The BLM is conducting on-going government-to-government consultation with Bridgeport Indian Colony,\nFallon Paiute-Shoshone Tribe, Pyramid Lake Paiute Tribe, Reno-Sparks Indian Colony, Walker River Paiute\nTribe, Washoe Tribe of Nevada and California, Yerington Paiute Tribe, and Yomba Shoshone Tribe.'},
  'comment': "The continued consultation with local tribes represents a correct and thorough adherence to required protocols for involving Tribal Historic Preservation Officers (THPOs) under Section 106 of the NHPA. This reduces the risk of overlooking tribes' cultural resources but any oversight can lead to project delays or legal disputes."},
 {'quote': {'start': 25140,
   'end': 25370,
   'text': 'The BLM has identified potential impacts to cultural resources in this Draft EIS and is continuing discussions\nwith Tribes through formal and informal consultation to ensure that all concerns are considered in proposed\nmitigation.'}

# Legal Action Comments

In [70]:
def get_legal_comments(doc):

    LEGAL_ACTION_PROMPT = """
You are an expert legal analyst tasked with analyzing an Environmental Impact Statement (EIS) for a proposed development project.

Carefully read the document - the EIS may include a section on public comments where legal action is explicitly mentioned or threatened. If any such comments exist, check if they have been addressed elsewhere in the document.

Respond with the following format:

```json
{
    "comments" : [
        {
            "quote": "Exact text quote from the document",
            "comment": "Explanation for how the quoted text could introduce regulatory burden/risk, and what the the developer might need to consider to mitigate this risk.",
        },
        ...
    ]
}
```

DO NOT return any comments where EXPLICIT legal action is not threatened.""".strip()

    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": LEGAL_ACTION_PROMPT},
            {"role": "user", "content": doc},
        ],
        response_format={"type": "json_object"},
    )

    resp_obj = json.loads(resp.choices[0].message.content)

    return resp_obj

In [71]:
len(doc_chunks)

6

In [72]:
legal_comments = get_legal_comments(doc_chunks[5])

In [73]:
legal_comments

{'comments': [{'quote': 'Localized pressure on short-term rental housing would be greater in Yerington and the immediate vicinity, where some of the EJ communities of concern were identified. Communities identified as EJ low-income communities of concern such as Fallon, Silver Springs, and Stagecoach may also experience increased pressures on short-term rental housing supplies, which could put upward pressure on rental rates.',
   'comment': 'This situation could lead to legal challenges related to fair housing practices and violations of community rights. Developers need to carefully monitor and engage with local housing authorities to ensure adequate housing availability and affordability. They might need to contribute to or facilitate the development of additional housing or provide subsidies for affected populations to mitigate these risks.'},
  {'quote': 'EJ communities of concern could experience disproportionate adverse impacts to human quality of life from construction worker t

In [64]:
find_comments_in_doc(doc_chunks[4], legal_comments['comments'])

[]

# Public Comments 

In [None]:
def get_legal_comments(doc):

    LEGAL_ACTION_PROMPT = """
You are a public relations expert tasked with analyzing an Environmental Impact Statement (EIS) for a proposed development project.

Carefully read the document - the EIS may include a section with comments from the general public on their opinions of the project. 

Respond with the following format:

```json
{
    "comments" : [
        {
            "quote": "Exact text quote from the document",
            "comment": "Explanation for how the quoted text could introduce regulatory burden/risk, and what the the developer might need to consider to mitigate this risk.",
        },
        ...
    ]
}
```

DO NOT return any comments where EXPLICIT legal action is not threatened.""".strip()

    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": LEGAL_ACTION_PROMPT},
            {"role": "user", "content": doc},
        ],
        response_format={"type": "json_object"},
    )

    resp_obj = json.loads(resp.choices[0].message.content)

    return resp_obj