# Analyzing Environmental Justice: Integrating Census and Harmonized Landsat Sentinel-2 (HLS) NDVI Data with LLMs in Google Colab

## Overview
This exercise guides environmental and social scientists through the process of analyzing environmental justice issues by combining US Census data with NASA's Harmonized Landsat-Sentinel (HLS) satellite imagery using Large Language Models and rpompt engineering. You'll develop skills in integrating geospatial analysis and socioeconomic datasets using to identify hotspots of social vulnerability.

## Learning Objectives
- Learn the basics of Python and using Google Colab
- Integrate socioeconomic and remote sensing datasets
- Perform spatial analysis to identify areas of environmental concern
- Visualize and communicate environmental justice findings
- Become confident in using LLMs and prompt engineering to help with all your coding!!!

## Prerequisites
- Basic understanding of coding in some language (R, Matlab, etc)
- Basic understanding of data manipulation, data types, and LLMs
- A great attitude and the eye of the tiger!

## Part 1: Brief Introduction to Colab and Setting Up Your Environment and Working Directory

### Key Tasks
- Introduction to Google Colab and its features
- Install a suite of packages we will use in our workspace
- Set up our working directory
---



In [None]:
# To mount your Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Install required packages
!pip install pandas geopandas numpy matplotlib seaborn scikit-learn rasterio earthpy rasterstats

####<font color="#FF69B4">PROMPT: "I would like to set my working directory for my notebook and check that it worked"</font>


## Part 2: Working with Census Data to Create a Social Vulnerability Index (SVI)

In this section, you'll work with a subset of the 2022 Census data from the Baltimore, MD region and create a basic social vulnerability index.

### Key Tasks
- Read in a .geojson file (essentially a spatial dataframe like a shapefile)
- Learn to investigate variables
- Create and view a simple social vulnerability index (SVI)


---



####<font color="#FF69B4">PROMPT: "I want to import a .geojson file with census data in it named baltimore_region_acs_2022_epsg4269.geojson from my working directory and get some info about it. I would like the variable to be called gdf"</font>

####<font color="#FF69B4">PROMPT: "Can you calculate three more metrics that include the proportion of people in poverty, the proportion of renters, and the proportion of the population that is nonwhite and add to gdf"<font>

Specifically calulate them in the following way:

gdf['prop_poverty'] = (gdf['population_below_poverty'] / gdf['total_population'])

gdf['prop_renter'] = (gdf['renter_occupied_units'] / gdf['total_occupied_units'])

gdf['prop_nonwhite'] = 1 - (gdf['white_population'] / gdf['total_population'])
<font>

####<font color="skyblue">CHALLENGE: Can you get the LLM to give you code to show only the first 10 rows of gdf and only for the new columns you just created?</font>

####<font color="#FF69B4">PROMPT: "I would like to plot the three new columns side-by-side with viridis colormaps\"</font>

####<font color="skyblue">CHALLENGE: Is there anything you would change about the way the plots look (e.g., colorbar size, font size, colormap, etc)? Ask the LLM and customize away! NOTE: If you get errors you can hit the "Explain Error" button in the lower left after the error and feed it right back into Gemini who will then try to correct your code!</font>

####<font color="#FF69B4">PROMPT: "I would like to make the colormaps "turbo", reduce the size of the colorbars and make the font slightly larger"</font>

####<font color="#FF69B4">PROMPT: "I would like to create a Social Vulnerability Index (SVI) from my three new metrics and plot it"</font>

####<font color="#FF69B4">PROMPT: "I am curious what the MinMaxScaler did to the data? Can you compare before and after it was applied?"</font>

####<font color="#FF69B4">PROMPT: "I would like to see a histogram of the SVI data"</font>

## Part 3: Integrating Harmonized Landsat Sentinel-2 NDVI Data with Census Data to Illuminate Patterns of Social Vulnerability

In this section, you will work with a summer 2022 Normalized Difference Vegetation Index (NDVI) composite dataset derived from Harmonized Landsat Sentinel-2 [(HLS info)](https://hls.gsfc.nasa.gov/) and integrate it with the 2022 Census data for deeper exploration in the Baltimore, MD region.

### Key Tasks
- Import and view a geotiff raster file
- Extract NDVI statistics for each census tract and merge data into new colums
- Calculate a Green Space Inequity Index and view it
- Create a list of high priority environmental justice areas

### NDVI Primer
NDVI (Normalized Difference Vegetation Index) is a widely used remote sensing index that quantifies vegetation health and density using satellite or aerial imagery. It is calculated from the visible and near-infrared light reflected by vegetation.
<br>
<br>

📊 NDVI Formula:
$$
\text{NDVI} = \frac{(NIR - RED)}{(NIR + RED)}
$$

- NIR = Reflectance in the near-infrared spectrum (plants strongly reflect this light)

- RED = Reflectance in the red spectrum (plants absorb this light for photosynthesis)
<br>

🌱 NDVI Values:
NDVI values range from -1 to +1, and they indicate:

| NDVI Value    | Interpretation                           |
| ------------- | ---------------------------------------- |
| **< 0**       | Water, clouds, snow, or bare soil        |
| **0.1 – 0.2** | Sparse vegetation (e.g., shrubs)         |
| **0.2 – 0.5** | Moderate vegetation (e.g., grassland)    |
| **0.5 – 0.9** | Dense, healthy vegetation (e.g., forest) |
<br>
<br>

🛰️ Why is NDVI Useful?
- Environmental Monitoring: Assess drought, deforestation, and land degradation.
- Agriculture: Monitor crop health and stress.
- Urban Planning: Evaluate green space distribution.
- Climate Studies: Understand carbon sinks and land surface processes.

---

####<font color="#FF69B4">PROMPT: "I want to load my NDVI geotiff now and plot the data"</font>

####<font color="#FF69B4">PROMPT: "I want to extract statistics from the ndvi dataset for each tract in my census data and add the data as new columns into my census dataset. Also make sure that the nodata is set for all values equal to -9999 so they aren't being considered"</font>

####<font color="#FF69B4">PROMPT: "I would like to create a new green space inequity index that uses the census data and the ndvi data and plot the data on a scatter plot"</font>

####<font color="#FF69B4">PROMPT: "I would like to plot maps of SVI and the Green Space Inequity Index next to each other to compare"</font>:

####<font color="#FF69B4">PROMPT: "Can we run and view a correlation matrix for the variables that go into the green space inequity index?"</font>


####<font color="#FF69B4">PROMPT: "Can we make a plot showing census tracts with the 10 highest environmental justice priority scores based on the green space inequity index?"</font>:

## Part 4: Go Wild with LLM!! (If we have time)

### Key Tasks
- Have fun exploring the data further in anyway you see fit using LLMs as your guide!
- I have provided a clustering example prompt with the code available in the cheat sheet.
- Test your new Colab, Python, and LLM skills!
<br>
<br>

### Suggestions for exploring the data with LLMs
- Ask LLM to convert parts or all of this code into R, Matlab, etc
- Run principal component analysis on the data
- Try Google Colab with some of your own data
- Try to build a script to download data from an API like the Census Data API (you will likely need register to get an API key) [Census API User Guide](https://www.census.gov/data/developers/guidance/api-user-guide.html)



---

####<font color="#FF69B4">PROMPT: "I would like to run a cluster analysis on the variables that went into the SVI and ndvi_median"</font>

####<font color="#FF69B4">PROMPT: "Can you help me interpret these cluster profiles in a descriptive way?"

HINT: Try copying the table values printed out above into Gemini and asking to interpret the data</font>

####<font color="#FF69B4">PROMPT: "Can you plot cluster vs income?"</font>

---



####<font color="#FF69B4">PROMPT: "I would like to plot ndvi vs income with points colored by the Green Space Inequity Index"</font>


## Part 5: Show and Tell: Creating a Dashboard for Environmental Justice Data Exploration

In this final section, I'll show you a data dashboard that I created wholly by interacting with LLMs and how powerful Colab and LLMs can be for creating interactive visualizations to share with stakeholders.

#### YOU CAN ONLY RUN THIS IF YOU HAVE AN NGROK AUTHENITICATION TOKEN SO I WILL JUST SHOW YOU A LIVE DEMO


---

## Wrap Up Discussion

What are the main caveats you experienced today or in the past coding with LLMs?
Any other thoughts?

####<font color="#FF69B4">PROMPT: "What are the main caveats when using Gemini or any LLM to help you code?"</font>

## Some Resources and References

- [NASA Harmonized Landsat Sentinel (HLS) Project](https://hls.gsfc.nasa.gov/)
- [US Census Bureau API](https://www.census.gov/data/developers/data-sets.html)
- [EPA's Environmental Justice Screening Tool (EJScreen)](https://www.epa.gov/ejscreen)
- [NASA SEDAC - Socioeconomic Data and Applications Center](https://sedac.ciesin.columbia.edu/)
- [LP DAAC - Getting Started with Cloud-Native HLS Data in Python](https://lpdaac.usgs.gov/resources/e-learning/getting-started-cloud-native-hls-data-python/)
- [Census Data API User Guide](https://www.census.gov/data/developers/guidance/api-user-guide.html)