# Structured Output with LangChain + OpenAI

Introduction:

Performing an Urban Heat Island Analysis Project using NASA's EarthData API to access MODIS Land Surface Temperature (LST) data for a major metropolitan area to compare the average temperatures within the dense urban areas vs. the rural areas outside the  city. 

To do so, we consulted with ChatGPT to select an appropriate city and dates to use that would best show off the contrast between urban vs. rural LST, and extracted a bouding box for an urban area and a rural area located near the same metropolitan area. 

We then manually saved this information in a parameter dictionary with the following structure, using Python:

In [15]:
# Starting Parameters (manual)
data_params = {'city_region_name':'Houston, TX',
               'coordinates': {
                   'urban': {
                       'SW': [29.69193, -95.47998],
                        'NE': [29.90719, -95.2251]
                        },
                   'rural': {
                       'SW': [30.5, -96.5],
                       'NE': [31.0, -96.0]
                       }
                   },
               'time': {
                   'start': '2023-06-01T00:00:00Z',
                   'end': '2023-08-31T23:59:59Z'
                   }
               }
data_params

{'city_region_name': 'Houston, TX',
 'coordinates': {'urban': {'SW': [29.69193, -95.47998],
   'NE': [29.90719, -95.2251]},
  'rural': {'SW': [30.5, -96.5], 'NE': [31.0, -96.0]}},
 'time': {'start': '2023-06-01T00:00:00Z', 'end': '2023-08-31T23:59:59Z'}}

In [35]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import os, json, requests

In [36]:
## Plot the region suggested
# Function to generate sample points within a bounding box
def generate_sample_points(sw, ne, num_points=10):
    """
    Generate sample points within a given bounding box.

    Parameters:
    - sw (tuple): The southwest coordinates of the bounding box (latitude, longitude).
    - ne (tuple): The northeast coordinates of the bounding box (latitude, longitude).
    - num_points (int): The number of sample points to generate (default: 10).

    Returns:
    - list: A list of sample points, where each point is represented as a tuple (latitude, longitude).
    """
    latitudes = [sw[0] + i * (ne[0] - sw[0]) / (num_points - 1) for i in range(num_points)]
    longitudes = [sw[1] + i * (ne[1] - sw[1]) / (num_points - 1) for i in range(num_points)]
    return [(lat, lon) for lat in latitudes for lon in longitudes]


In [37]:


# Dataframe to store results
sampled_coordinates = []

# Check if any coordinates within the bounding boxes are over sea
for region, bounding_box in data_params['coordinates'].items():
    # Generate sample points within the bounding box
    sample_points = generate_sample_points(bounding_box['SW'], bounding_box['NE'], num_points=10)
    for lat, lon in sample_points:
        sampled_coordinates.append({'Region': region, 'Latitude': lat, 'Longitude': lon})

# Convert results to DataFrame
coords_df = pd.DataFrame(sampled_coordinates)
display(coords_df.head(2), coords_df.tail(2))

Unnamed: 0,Region,Latitude,Longitude
0,urban,29.69193,-95.47998
1,urban,29.69193,-95.45166


Unnamed: 0,Region,Latitude,Longitude
198,rural,31.0,-96.055556
199,rural,31.0,-96.0


We can visualize the selected regions using Plotly Express and generating a range of sample coordinates from the defined regions.

In [39]:
# ADMIN: DELETE FROM FINAL VERSION
img_dir = "images/blog/"
if not os.path.exists(img_dir):
    os.makedirs(img_dir)

In [62]:
## Plot the region suggested
fig = px.scatter_mapbox(coords_df, lat="Latitude", lon="Longitude", color='Region',
                        # color_continuous_scale="Viridis", 
                        mapbox_style="carto-positron",
                        title="Preview of Selected Bounding Boxes",
                        height=600, width=600)

# Remove left and right side margins
fig.update_layout(
    margin={"r":0, "l":0,'b':0, 't':100},
    legend={'orientation':"h", 'yanchor':"top", 'y':1.05, 'xanchor':"left", 'x':0},
    
)

fig.show()
## Save fig for README
fig.write_image(f"{img_dir}selected_regions.png")

If we are happy with the coordinates, we will proceed with using the Earth Data API `TO DO GET RIGHT NAME` to download the corresponding MODIS data.

# DECISION: Download Data next or LangChain Next?

## Download Data from NASA's Earth Data API  (`To Do: granules endpoint?`)

Now that we have the parameters for the locations, we need to ask the NASA Earthdata API for the data  

#### Sign up For NASA's Earth Data Api

To use the NASA Earthdata API with a token, you need to generate a token from the NASA Earthdata Login (EDL) and use it in your API requests.



#### Step-by-Step Guide

1. **Generate a Token from NASA Earthdata Login:**
   - Log in to your [NASA Earthdata Login](https://urs.earthdata.nasa.gov).
   - Navigate to the "My Profile" section.
   - Generate a new token under the "User Profile" section.
   - Save the crednetials in a .json file on your local machine. 
    > - Note: if you store this file within a repository, make sure to add the credentials file to your `.gitignore`!

In [None]:
import json
# where we stored the token locally on our PC
creds_json = "./earthdata_creds.json"
with open(creds_json) as f:
    creds = json.load(f)
print(creds.keys())

# Your NASA Earthdata token
token = creds['token']

dict_keys(['username', 'password', 'token'])



2. **Use the Token in Your API Requests:**
   - Use the generated token in the `Authorization: Bearer` header of your API requests.

In [None]:
def search_and_download(region_name, bounding_box, time_range, token, dest_folder='./data/MODIS-LST/',
                        force_download=False, verbose=True):
    """
    Searches for granules using the NASA Earthdata API and downloads the data files for a given region.

    Args:
        region_name (str): The name of the region.
        bounding_box (dict): The bounding box coordinates of the region in the format {'SW': [lat, lon], 'NE': [lat, lon]}.
        time_range (dict): The temporal range of the data in the format {'start': 'YYYY-MM-DD', 'end': 'YYYY-MM-DD'}.
        token (str): The access token for the NASA Earthdata API.
        dest_folder (str, optional): The destination folder to save the downloaded data files. Defaults to './data/MODIS-LST/'.

    Returns:
        list: A list of dictionaries containing the region name and the URL of each downloaded data file.
    """
    # Authorization header with the token
    headers = {
            'Authorization': f'Bearer {token}'
        }

    # Base URL for searching granules
    search_url = 'https://cmr.earthdata.nasa.gov/search/granules.json'
    
    # Pagination settings
    page_size = 10
    page_num = 1
    total_hits = None

    # List to store entries and links
    entries_links = []

    while True:
        # Set up the parameters for the search query
        params = {
            'short_name': 'MOD11A2',  # Dataset short name
            'version': '061',         # Dataset version
            'temporal': f"{time_range['start']},{time_range['end']}",  # Temporal range
            'bounding_box': f"{bounding_box['SW'][1]},{bounding_box['SW'][0]},{bounding_box['NE'][1]},{bounding_box['NE'][0]}",  # Bounding box coordinates
            'page_size': page_size,   # Number of results per page
            'page_num': page_num      # Current page number
        }
        
        # Send the request to the NASA Earthdata API
        response = requests.get(search_url, params=params, headers=headers)

        if response.status_code == 200:
            # Parse the JSON response
            data = response.json()
            
            # Determine the total number of hits on the first request
            if total_hits is None:
                total_hits = int(response.headers.get('CMR-Hits', 0))
                print(f"Total hits for {region_name}: {total_hits}")

            # Check if there are entries in the response
            if data['feed']['entry']:
                for entry in data['feed']['entry']:
                    # Extract relevant metadata from each entry
                    granule_id = entry.get('id', 'N/A')
                    dataset_id = entry.get('dataset_id', 'N/A')
                    start_time = entry.get('time_start', 'N/A')
                    end_time = entry.get('time_end', 'N/A')
                    spatial_extent = entry.get('boxes', ['N/A'])[0]
                    
                    
                    # Extract the data links for downloading
                    data_links = [link['href'] for link in entry['links'] if 'data#' in link['rel']]
                    
                    # Download each data link and store the entries and links
                    for url in data_links:
                        dir_for_dl = os.path.join(dest_folder, region_name)
                        # Define the filename based on the URL (to check if the file is a directory)
                        filename = os.path.join(dir_for_dl,#dest_folder, 
                                                url.split('/')[-1])
                        
                        # Check if directory
                        if os.path.isdir(filename):
                            if verbose:
                                print(f"- Skipping directory {filename}")
                            continue
                        
                        if "s3credentials" in filename:
                            if verbose:
                                print(f"- Skipping S3 credentials link {filename}")
                            continue
                        
                        if '?p' in filename:
                            if verbose:
                                print(f"- Skipping link with query parameters {filename}")
                            continue
                        # Remove question marks
                        filename = filename.replace("?", "-")
                        
                        # Use the helper function to download the file
                        filepath = download_file(url, dir_for_dl, token, force_download=force_download, 
                                                 verbose=verbose # Always be verbose for download
                                                 )
                        # Add the entry to the list
                        entries_links.append({'region': region_name, 'url': url,"fpath":filepath, 'granule_id': granule_id, 'dataset_id': dataset_id,
                                            'start_time': start_time, 'end_time': end_time, 'spatial_extent': spatial_extent})
            else:
                print(f"\n[!] No entries found for region: {region_name}")

            # Check if we have fetched all results
            if page_num * page_size >= total_hits:
                break
            else:
                page_num += 1
        else:
            print(f"\n[!] Error: {response.status_code} - {response.text}")
            break

    return entries_links

In [None]:
def download_file(url, dest_folder, token, force_download=False, verbose=True):
    """
    Downloads a file from the given URL and saves it to the specified destination folder.

    Args:
        url (str): The URL of the file to download.
        dest_folder (str): The destination folder where the file will be saved.
        token (str): The authorization token for accessing the file.
        force_download (bool, optional): If set to True, the file will be downloaded even if it already exists in the destination folder. Defaults to False.

    Returns:
        str: The path of the downloaded file.

    Raises:
        None

    """
    # Create the destination folder if it doesn't exist
    if not os.path.exists(dest_folder):
        os.makedirs(dest_folder)
    
    # Define the filename based on the URL
    filename = os.path.join(dest_folder, url.split('/')[-1])
    
    # Check if the file already exists
    if os.path.exists(filename) and not force_download:
        if verbose:
            print(f"- File {filename} already exists, skipping download.")
        return filename

    # Authorization header with the token
    headers = {
        'Authorization': f'Bearer {token}'
    }
    
    try:
        # Send the request to download the file
        response = requests.get(url, headers=headers)
        
    except Exception as e:
        print(f"- [!] An error occurred while downloading {url}: {e}")
        return
    
    # Save the file if the request is successful
    if response.status_code == 200:
        with open(filename, 'wb') as f:
            f.write(response.content)
        # if verbose:
        print(f"- Downloaded {filename}")
    else:
        print(f"- [!] Failed to download {url}: {response.status_code}")
    
    return filename

In [None]:
# Set DATA_DIR using region name from data params
region_name = data_params['city_region_name'].replace(',','_').replace(' ','')
DATA_DIR = f"./data/MODIS-LST/{region_name}/"
DATA_DIR

'./data/MODIS-LST/Atlanta_GA/'

In [None]:
# List to store all entries and links
all_entries_links = []

# Iterate through the regions and download data
for region_name, bounding_box in data_params['coordinates'].items():
    # group_links=  []
    entries_links = search_and_download(region_name, bounding_box, 
                                        data_params['time'], 
                                        token=creds['token'],
                                        dest_folder=DATA_DIR, 
                                        force_download=False, 
                                        verbose=False # new verbose flag
                                        )
    all_entries_links.extend(entries_links)
    print('\n\n')

The processing code for working with the downloaded files is beyond the scope of this blog post. However, suffice it to say that there may be data quality issues or other challenges that need to be addressed. 

For example, the downloaded files may contain a large amount of missing or corrupted data. These issues can be addressed using a combination of data preprocessing techniques, data cleaning methods, and data transformation tools. However, if we can adjust our starting parameters dynamically, we can find parameters with fewer issues, allowing a cleaner analysis.

To do so, we will leverage the OpenAI API with LangChain to query ChatGPT for new suggested parameters from within our notebook.

## Getting New Suggested Parameters from OpenAI 

### Setting Up OpenAI + LangChain

1. **Sign up for OpenAI's API:** 
   - Visit the [OpenAI website](https://www.openai.com) and sign up for an API key.

2. **Create a `.secret` folder:**
   ```bash
   cd ~
   mkdir .secret
   ```

3. **Save your API key as a text file in the `.secret` folder:**
   - Open a text editor and paste your API key.
   - Save the file as `open-ai.txt` in the `.secret` folder. For example, you can use the following command in the terminal to create the file and save the API key:
   ```bash
   echo "your_openai_api_key_here" > ~/.secret/open-ai.txt
   ```

4. **Export the key from the file to your `.bash_profile` or `.zshrc`:**
   - Open your `.bash_profile` for editing:
   ```bash
   code ~/.bash_profile
   ```
   If using zsh, use: 
   ```bash
   code ~/.zshrc
   ```
   - Add the following line to export the API key:
   ```bash
   export OPENAI_API_KEY=$(cat ~/.secret/open-ai.txt)
   ```
   - Save the file and exit the editor 

5. **Reload your `.bash_profile` to apply the changes:**
   ```bash
   source ~/.bash_profile
   ```
   If you're using zsh:
   ```bash
   source ~/.zshrc
   ```

After following these steps, your API key will be available in your environment variables as `OPENAI_API_KEY`.

You can confirm this with Python by importing the `os` module and checking the `os.environ` dictionary for 'OPENAI_API_KEY'.


In [1]:
import os
'OPENAI_API_KEY' in os.environ

True


> Note: Do NOT display the value of your OPENAI_API_KEY. If you accidentally expose your API credentials, OpenAI will automatically deactivate them, causing any program or app that uses it to break. 

### Using LangChain with ChatGPT

In [77]:
# !pip install langchain_openai langchain_core langchain_community


Prompt engineering/construction is vital for obtaining high-quality results from any Large Langauge Model (LLM). To get the best suggestions from the API, it is important to provide sufficient context in our prompt/query. 

In [81]:
from langchain_openai import ChatOpenAI 
chat = ChatOpenAI(api_key=os.environ['OPENAI_API_KEY'], model="gpt-4o", temperature=0.0)

prompt = """I am performing an urban heat island analysis project with MODIS data comparing urban areas vs. rural areas. 
I need to download MODIS data for 2 nearby non-overlapping regions (urban area and rural area outside of city) and time range.
Help me select the urban and rural regions and time.
"""
response = chat.invoke(prompt)
response

AIMessage(content='Performing an urban heat island (UHI) analysis using MODIS data is a great way to understand the temperature differences between urban and rural areas. Here’s a step-by-step guide to help you select the regions and time range for your analysis:\n\n### Step 1: Select Urban and Rural Regions\n\n1. **Identify the Urban Area:**\n   - Choose a city that is well-documented and has a significant urban footprint. For example, you could select a major city like New York City, Los Angeles, or Chicago in the United States.\n   - Use geographic coordinates to define the urban area. For instance, for New York City, you might use coordinates that encompass the entire metropolitan area.\n\n2. **Identify the Rural Area:**\n   - Select a rural area that is sufficiently far from the urban area to avoid overlap but close enough to have similar climatic conditions.\n   - Ensure the rural area is predominantly non-urban, with minimal human development. For example, you could choose a rur

In [80]:
# ChatGPT responds with Markdown-stytled text, so we can use IPython's `Markdown` class to render it
from IPython.display import Markdown, display
display(Markdown(response.content))

Certainly! Conducting an Urban Heat Island (UHI) analysis using MODIS (Moderate Resolution Imaging Spectroradiometer) data is a great approach. Here’s a step-by-step guide to help you select urban and rural regions and the time range for your analysis.

### Step 1: Selecting Urban and Rural Regions

**Urban Region:**
1. **Identify a City:** Choose a city known for having a significant urban heat island effect. For example, you could select cities like New York, Los Angeles, Beijing, or Delhi.
2. **Define the Urban Area:** Use a shapefile or a bounding box that covers the core urban area of the city. You may refer to urban extents or city boundaries available from sources such as the Global Urban Footprint (GUF) or the Global Human Settlement Layer (GHSL).

**Rural Region:**
1. **Select a Nearby Rural Area:** Choose an area outside the urban boundary with minimal human impact (e.g., farmlands, forests). Ensure this region is at least 10-20 km away from the urban boundary to avoid overlapping influences.
2. **Define the Rural Area:** Use a similar shapefile or bounding box approach to define the rural area.

### Example Selections:
- **Urban Region:**
  - City: New York City, USA
  - Bounding Box: (40.4774° N, -74.2591° W) to (40.9176° N, -73.7004° W)

- **Rural Region:**
  - Nearby rural area: Rural region outside New York City, towards the northwest
  - Bounding Box: (41.0° N, -74.5° W) to (41.5° N, -74.0° W)

### Step 2: Selecting the Time Range

1. **Seasonal Analysis:** UHI effects are often more pronounced during the summer. Choose a summer period to observe the maximum temperature differences.
2. **Multi-Year Analysis:** To observe trends, it’s beneficial to use data from multiple years. A common range is over a few years, such as 2015-2020.

### Example Time Range:
- **Start Date:** June 1, 2015
- **End Date:** August 31, 2020

### Step 3: Downloading MODIS Data

1. **MODIS Products:** For UHI analysis, commonly used MODIS products include:
   - **MOD11A2:** MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1km SIN Grid
   - **MYD11A2:** MODIS/Aqua Land Surface Temperature/Emissivity 8-Day L3 Global 1km SIN Grid

2. **Data Access:**
   - Visit NASA’s Earthdata website [https://earthdata.nasa.gov/](https://earthdata.nasa.gov/)
   - Use the Earthdata Search tool [https://search.earthdata.nasa.gov/](https://search.earthdata.nasa.gov/)
   - Log in with your Earthdata account (create one if you don't have it).
   - Search for MOD11A2 and MYD11A2 products.
   - Define your spatial and temporal extents using the bounding box and time range selected above.
   - Download data for further analysis.

### Step 4: Analysis

1. **Preprocessing:** Convert MODIS HDF files to a usable format (e.g., GeoTIFF). Use tools like GDAL or software like QGIS.
2. **Extracting Data:** Extract land surface temperature data for your urban and rural regions.
3. **Comparative Analysis:** Compare the temperature data between urban and rural regions to quantify the UHI effect.

This should help you get started with your UHI analysis project using MODIS data. If you have any more specific questions or need further assistance, feel free to ask!

### Reproducible and Customizable Queries/Prompts

We got a lot of great information from our prompt:
- We have new coordinates includes in our result,
- We have new time period

but there are 2 issues to resolve:
- Continuing to tweak and iterate on our prompt to get updated results 
- we received a large amount of unnecessary text, if all we want is the new values for our data_params dictionary. 


Therefore, we will construct a detailed system prompt that includes a `query` parameter, which is where we will add additional details to our requests. 


### Structured Output

In [63]:
## Create a chain for the language model

# The prompt template for suggesting data parameters
prompt = """
I am performing an urban heat island analysis project with MODIS data comparing urban areas vs. rural areas. 
I need to download MODIS data for 2 nearby non-overlapping regions (urban area and rural area outside of city) and time range.
Help me select the urban and rural regions and time following the instructions below.
{query}
Provide me the data parameters for the download (city_region_name, coordinates as SW [lat,long] NE [lat,long], time_start named 'start', time_end named 'end'):
"""


Instead of using f-strings to construct the final prompt for the language model, we can use `langchain` library to create an LLM chain that automatically formats the prompt with the query provided by the user. We do so by converting our string prompt to a PromptTemplate object using the .from_template method. 


In [66]:
from langchain_core.prompts import PromptTemplate


In [67]:

from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel, Field, create_model
from typing import List, Optional, Text, Dict

# Create a ChatPromptTemplate object
final_prompt_template = PromptTemplate.from_template(prompt)


In [None]:

# Get api key for OpenAI from the environment (or you can hardcode it)
api_key = os.getenv('OPENAI_API_KEY')    

# Instantiate the language model and setting the specific model (chat-gpto is newest and reasonable price)
# and  set the temperature (creativity level)
llm = ChatOpenAI(temperature=temperature, model=model_type, api_key=api_key)

## StrOutputParser will return the response as a string
parser = StrOutputParser(output_key="response")

# Making the final chain
llm_chain = final_prompt_template | llm | parser


In [None]:

    # Invoke the chain with the query to get the response
    response = llm_chain.invoke(input=dict(query=query))
    return response

### Structured Output

- Using a Pydantic data model to control structured output of API parameters from ChatGPT.

In [None]:
## Defining the structured output desired from chat gpt
## Tip was to use make_model
# https://stackoverflow.com/questions/63257839/best-way-to-specify-nested-dict-with-pydantic
class Coordinates(BaseModel):
    SW: List[float]
    NE: List[float]
    
class RegionCoordinates(BaseModel):
    rural: Optional[Coordinates]
    urban: Optional[Coordinates]
                            
class DataParams(BaseModel):
    """
    Represents the parameters for data analysis.

    Attributes:
        city_region_name (str): The name of the city or region.
        coordinates (Optional[RegionCoordinates]): The coordinates of the city or region.
        time (Dict[str, str]): A dictionary containing time-related information.
    """
    city_region_name: str
    coordinates: Optional[RegionCoordinates]
    time: Dict[str, str]

### Created a function that asks ChatGPT for suggested parameters

In [None]:
def suggest_data_params(query: str, temperature=0.0, model_type='gpt-4o',
                       return_llm = False, return_json=True) -> str:
    """
    Suggests data parameters for downloading MODIS data for a specific region and time range.
    
    Args:
        query (str): The query describing the requirements for the data download.
        temperature (float, optional): The temperature parameter for the language model. Defaults to 0.1.
        model_type (str, optional): The type of language model to use. Defaults to 'gpt-4o'.
        return_llm (bool, optional): Whether to return the language model chain. Defaults to False.
        return_json (bool, optional): Whether to return the response as JSON. Defaults to True.
    
    Returns:
        str: The response from the language model chain or the JSON response, depending on the value of return_json.
    """
    
    # The prompt template for suggesting data parameters
    prompt = """
    I am performing an urban heat island analysis project with MODIS data comparing urban areas vs. rural areas. 
    I need to download MODIS data for 2 nearby non-overlapping regions (urban area and rural area outside of city) and time range.
    Help me select the urban and rural regions and time following the instructions below.
    {query}
    
    Provide me the data parameters for the download (city_region_name, coordinates as SW [lat,long] NE [lat,long], time_start named 'start', time_end named 'end') in the following format:
    Format Instructions:
    Use the 2-letter abbreviations for the state.
    {format_instructions}
    """
    # Create a ChatPromptTemplate object
    final_prompt_template = PromptTemplate.from_template(prompt)

    # Get api key for OpenAI from the environment or session state (if on Streamlit)
    try:
        api_key = st.session_state.OPENAI_API_KEY
    except:
        api_key = os.getenv('OPENAI_API_KEY')
        
    # Instantiate the language model and setting the specific model (chat-gpto is newest and reasonable price)
    # and  set the temperature (creativity level)
    llm = ChatOpenAI(temperature=temperature, model=model_type, api_key=api_key)
    
    if return_json:
        # # JsonOutputParser will use the data model classes from above
        parser = JsonOutputParser(pydantic_object=DataParams,)    
        # Add formatting instructions for pydantic
        instructions =  parser.get_format_instructions()
            
    else:
        ## StrOutputParser will return the response as a string
        parser = StrOutputParser(output_key="response")
        # Manually defining the format instructions
        instructions = "Respond with text for each topic as a nested list with the topic number,  descriptive label,top words, and short insight."
        
        
    ## Adding the instructions to the prompt template
    final_prompt_template = final_prompt_template.partial(format_instructions=instructions)
    
    
    # Making the final chain
    llm_chain = final_prompt_template | llm | parser
    
    # Return the chain if specified
    if return_llm:
        return llm_chain
    else:
    
        # Invoke the chain with the query to get the response
        response = llm_chain.invoke(input=dict(query=query))
        return response

In [None]:
# GET_NEW_LOCATION = True # moved to top of notebook
# Where we are storing our parameters
fpath_params = "./config/data_params.json"

# If we want a new location
if GET_NEW_LOCATION:
    # prompt = """Select a region in the southern USA to avoid political bias/spin 
    #                                     and a time range to highlight the effects of climate change (like 06/01/2023-08/31/2023).
    #                                     Make sure to select a region that does not cover a body of water.
    #                                     Select small regions from the selected area to minimize the size of the dataset.
    #                                     Do not use Texas."""
    
    prompt = """Select a region that will be a perfect example of the effects of urban heat islands.
    Select a region in the southern USA to avoid political bias/spin 
    and a time range to highlight the effects of climate change (like 06/01/2023-08/31/2023).
    Make sure to select a region that does not cover a body of water.
    Select small identically-sized nearby non-overlapping regions from the selected area to minimize the size of the dataset.
    Do not use Texas."""

    # ask ChatGPT to suggest another set of parameters
    chatgpt_params = suggest_data_params(query=prompt, 
                                        return_json=True, temperature=0.0)


else:
    # otherwise, use the parameters we already have
    with open(fpath_params) as f:
        chatgpt_params = json.load(f)

chatgpt_params

{'city_region_name': 'Atlanta, GA',
 'coordinates': {'urban': {'SW': [33.640411, -84.442575],
   'NE': [33.790411, -84.292575]},
  'rural': {'SW': [33.290411, -84.842575], 'NE': [33.440411, -84.692575]}},
 'time': {'start': '2023-06-01', 'end': '2023-08-31'},
 'coordinates_lat_lon': {'urban': {'lat': [33.640411, 33.790411],
   'lon': [-84.442575, -84.292575]},
  'rural': {'lat': [33.290411, 33.440411], 'lon': [-84.842575, -84.692575]}}}

In [None]:

data_params = {'city_region_name': 'Atlanta, GA',
               'coordinates': {'urban': {'SW': [33.640411, -84.442575],
                                         'NE': [33.790411, -84.292575]},
                               'rural': {'SW': [33.290411, -84.842575],
                                         'NE': [33.440411, -84.692575]}},
               'time': {'start': '2023-06-01', 'end': '2023-08-31'},
               }
data_params