### Application

In [98]:
mock_data = {
  "data": {
    "collection": {
      "abstract": "MiCASA is an extensive revision of CASA-GFED3. CASA-GFED3 derives from Potter et al. (1993), diverging in development since Randerson et al. (1996). CASA is a light use efficiency model: NPP is expressed as the product of photosynthetically active solar radiation, a light use efficiency parameter, scalars that capture temperature and moisture limitations, and fractional absorption of photosynthetically active radiation (fPAR) by the vegetation canopy derived from satellite data. Fire parameterization was incorporated into the model by van der Werf et al. (2004) leading to CASA-GFED3 after several revisions (van der Werf et al., 2006, 2010). Development of the GFED module has continued, now at GFED5 (Chen et al., 2023) with less focus on the CASA module. MiCASA diverges from GFED development at version 3, although future reconciliation is possible. Input datasets include air temperature, precipitation, incident solar radiation, a soil classification map, and several satellite derived products. These products are primarily based on Moderate Resolution Imaging Spectroradiometer (MODIS) Terra and Aqua combined datasets including land cover classification (MCD12Q1), burned area (MCD64A1), Nadir BRDF-Adjusted Reflectance (NBAR; MCD43A4), from which fPAR is derived, and tree/herbaceous/bare vegetated fractions from Terra only (MOD44B). Emissions due to fire and burning of coarse woody debris (fuel wood) are estimated separately. ",
      "archiveAndDistributionInformation": {
        "fileArchiveInformation": [
          {
            "format": "netCDF",
            "averageFileSize": 10,
            "averageFileSizeUnit": "MB"
          }
        ]
      },
      "associatedDois": None,
      "boxes": [
        "-90 -180 90 179"
      ],
      "cloudHosted": True,
      "conceptId": "C3273639213-GES_DISC",
      "coordinateSystem": "CARTESIAN",
      "dataCenter": "GES_DISC",
      "dataCenters": [
        {
          "roles": [
            "ARCHIVER"
          ],
          "shortName": "NASA/GSFC/SED/ESD/TISL/GESDISC",
          "longName": "Goddard Earth Sciences Data and Information Services Center (formerly Goddard DAAC), Terrestrial Information Systems Laboratory, Earth Sciences Division, Science and Exploration Directorate, Goddard Space Flight Center, NASA",
          "contactInformation": {
            "relatedUrls": [
              {
                "urlContentType": "DataCenterURL",
                "type": "HOME PAGE",
                "url": "https://disc.gsfc.nasa.gov",
                "description": "NASA GES DISC Website"
              }
            ]
          },
          "contactGroups": [
            {
              "roles": [
                "Data Center Contact"
              ],
              "groupName": "GES DISC HELP DESK SUPPORT GROUP",
              "contactInformation": {
                "addresses": [
                  {
                    "streetAddresses": [
                      "Goddard Earth Sciences Data and Information Services Center",
                      "Code 610.2",
                      "NASA Goddard Space Flight Center"
                    ],
                    "city": "Greenbelt",
                    "stateProvince": "MD",
                    "postalCode": "20771",
                    "country": "USA"
                  }
                ],
                "contactMechanisms": [
                  {
                    "type": "Telephone",
                    "value": "301-614-5224"
                  },
                  {
                    "type": "Email",
                    "value": "gsfc-dl-help-disc@mail.nasa.gov"
                  }
                ]
              }
            }
          ],
          "contactPersons": [
            {
              "roles": [
                "Data Center Contact"
              ],
              "firstName": "Kristan",
              "lastName": "Morgan",
              "contactInformation": {
                "addresses": [
                  {
                    "city": "Greenbelt",
                    "stateProvince": "MD",
                    "postalCode": "20771",
                    "country": "USA"
                  }
                ],
                "contactMechanisms": [
                  {
                    "type": "Email",
                    "value": "kristan.l.morgan@nasa.gov"
                  }
                ]
              }
            }
          ]
        }
      ],
      "directDistributionInformation": {
        "region": "us-west-2",
        "s3CredentialsApiEndpoint": "https://data.gesdisc.earthdata.nasa.gov/s3credentials",
        "s3CredentialsApiDocumentationUrl": "https://data.gesdisc.earthdata.nasa.gov/s3credentialsREADME",
        "s3BucketAndObjectPrefixNames": [
          "s3://gesdisc-cumulus-prod-protected/CMS/MICASA_FLUX_D.1/"
        ]
      },
      "doi": {
        "doi": "10.5067/ZBXSA1LEN453"
      },
      "duplicateCollections": {
        "count": 0,
        "items": []
      },
      "hasGranules": True,
      "lines": None,
      "nativeDataFormats": [],
      "points": None,
      "polygons": None,
      "relatedUrls": [
        {
          "url": "https://docserver.gesdisc.eosdis.nasa.gov/public/project/CMS/micasa_v1_sample.jpg",
          "urlContentType": "VisualizationURL",
          "type": "GET RELATED VISUALIZATION"
        },
        {
          "url": "https://disc.gsfc.nasa.gov/datacollection/MICASA_FLUX_D_1.html",
          "description": "Access the dataset landing page from the GES DISC website.",
          "type": "DATA SET LANDING PAGE",
          "urlContentType": "CollectionURL"
        },
        {
          "url": "https://acdisc.gsfc.nasa.gov/data/CMS/MICASA_FLUX_D.1/",
          "description": "Access the data via HTTPS.",
          "subtype": "DATA TREE",
          "type": "GET DATA",
          "urlContentType": "DistributionURL"
        },
        {
          "url": "https://acdisc.gsfc.nasa.gov/opendap/CMS/MICASA_FLUX_D.1/",
          "description": "Access the data via the OPeNDAP protocol.",
          "subtype": "OPENDAP DATA",
          "type": "USE SERVICE API",
          "urlContentType": "DistributionURL"
        },
        {
          "url": "https://acdisc.gsfc.nasa.gov/data/CMS/MICASA_FLUX_D.1/doc/MiCASA_README.pdf",
          "description": "README Document",
          "subtype": "READ-ME",
          "type": "VIEW RELATED INFORMATION",
          "urlContentType": "PublicationURL"
        },
        {
          "url": "carbon.nasa.gov",
          "description": "The NASA Carbon Monitoring System (CMS) page.",
          "type": "PROJECT HOME PAGE",
          "urlContentType": "CollectionURL"
        },
        {
          "url": "https://search.earthdata.nasa.gov/search?q=MICASA_FLUX_D",
          "description": "Use the Earthdata Search to find and retrieve data sets across multiple data centers.",
          "subtype": "Earthdata Search",
          "type": "GET DATA",
          "urlContentType": "DistributionURL"
        }
      ],
      "relatedCollections": {
        "count": 0,
        "items": []
      },
      "scienceKeywords": [
        {
          "category": "EARTH SCIENCE",
          "topic": "CLIMATE INDICATORS",
          "term": "CARBON FLUX"
        }
      ],
      "shortName": "MICASA_FLUX_D",
      "spatialExtent": {
        "granuleSpatialRepresentation": "CARTESIAN",
        "horizontalSpatialDomain": {
          "geometry": {
            "coordinateSystem": "CARTESIAN",
            "boundingRectangles": [
              {
                "westBoundingCoordinate": -180,
                "northBoundingCoordinate": 90,
                "eastBoundingCoordinate": 179,
                "southBoundingCoordinate": -90
              }
            ]
          }
        }
      },
      "tags": {
        "edsc.extra.serverless.collection_capabilities": {
          "data": {
            "cloud_cover": False,
            "day_night_flag": False,
            "granule_online_access_flag": True,
            "orbit_calculated_spatial_domains": False,
            "updated_at": "2025-07-01T18:47:33.478Z"
          }
        }
      },
      "temporalExtents": [
        {
          "rangeDateTimes": [
            {
              "beginningDateTime": "2001-01-01T00:00:00.000Z",
              "endingDateTime": "2024-12-31T23:59:59.999Z"
            }
          ],
          "endsAtPresentFlag": False
        }
      ],
      "timeStart": "2001-01-01T00:00:00.000Z",
      "timeEnd": "2024-12-31T23:59:59.999Z",
      "tilingIdentificationSystems": None,
      "title": "MiCASA Daily NPP Rh ATMC NEE FIRE FUEL Fluxes 0.1 degree x 0.1 degree",
      "versionId": "1",
      "services": {
        "count": 0,
        "items": []
      },
      "granules": {
        "count": 8766,
        "items": [
          {
            "conceptId": "G3274577363-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274574213-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274574067-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577258-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274576831-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274576739-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274573924-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274576758-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577354-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577305-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274576674-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577352-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577345-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274573921-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577067-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274574064-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274576933-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577251-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274577311-GES_DISC",
            "onlineAccessFlag": True
          },
          {
            "conceptId": "G3274576621-GES_DISC",
            "onlineAccessFlag": True
          }
        ]
      },
      "subscriptions": {
        "count": 0,
        "items": []
      },
      "tools": {
        "count": 0,
        "items": []
      },
      "variables": {
        "count": 0,
        "cursor": None,
        "items": []
      }
    }
  }
}

In [474]:
from langgraph.graph import StateGraph, END, START
from langgraph.prebuilt import create_react_agent
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder



In [475]:
llm = init_chat_model("google_genai:gemini-2.0-flash")

In [610]:
# craete a data class
from typing import Optional, List, Dict, Any
from pydantic import BaseModel, Field

# this is the data that we will form.
# this will be used to create the config.json needed for airflow
# Note: everything is flattened, for simplicity.

class ConfigVars(BaseModel):
  """
  Always use this tool to structure your response to the user.
  """
  # Note that:
    # 1. Each field is an `optional` -- this allows the model to decline to extract it!
    # 2. Each field has a `description` -- this description is used by the LLM.
    # 3. Fields can have a `default` -- this default should be used if available.
    # Having a good description can help improve extraction results.

  # --- Top Level ---
  # type: Optional[str] = Field(description="The type of data", default="Collection")
  title: Optional[str] = Field(description="The title of the STAC Collection")
  collection: Optional[str] = Field(description="The Collection Id which is same as the concept id")
  description: Optional[str] = Field(description="The description of the STAC Collection") 
  
  # # --- Extent ---
  extent__spatial__bbox: Optional[List[List[float]]] = Field(description="The bounding box of the collection's spatial coverage in the form of four corners of the bbox.")
  extent__temporal__interval: Optional[List[List[str]]] = Field(description="The start and end datetimes of the collection's temporal coverage in a list of string as [start,end]")
 
  # --- Dashboard --- #
  dashboard_is_periodic: Optional[bool] = Field(description="Flag indicating if the collection's data is periodic over time over the granules of the collection. If not known, leave it as None")

  ## below commented for the MVP test. TODO: once the tools are created on the data filler to fill the following attributes, uncomment them.
  # dashboard_time_density:  Optional[str] = Field(description="The temporal density of the data (e.g., 'day', 'month'). Use concept_id to get the granules. And then find the density.")

  # # --- Cube Dimensions (Static: lon, lat, time) ---
  # # cube_dimensions__lon__axis:  Optional[str] = Field(description="The axis of the longitude dimension (e.g., 'x').",default  = 'x')
  # # cube_dimensions__lon__description:  Optional[str] = Field(description="Description of the longitude dimension ",default = "Latitude")
  # # cube_dimensions__lon__extent: Optional[List[float]]  = Field(description="The extent [min, max] of the longitude dimension in the reference system", default=[-180, 180])
  # # cube_dimensions__lon__reference_system: Optional[int] = Field(description="The coordinate reference system (e.g., EPSG:4326).", default="4326")
  # # cube_dimensions__lon__type:  Optional[str] = Field( description="The type of the dimension, typically 'spatial'.", default  = 'spatial')

  # # cube_dimensions__lat__axis:  Optional[str] = Field(description="The axis of the latitude dimension (e.g., 'y').",default = 'y')
  # # cube_dimensions__lat__description:  Optional[str] = Field(description="Description of the latitude dimension.",default = 'Longitude')
  # # cube_dimensions__lat__extent: Optional[List[float]]  = Field(description="The extent [min, max] of the latitude dimension in the refrence system", default=[-90, 90])
  # # cube_dimensions__lat__reference_system:  Optional[str] = Field( description="The coordinate reference system (e.g., EPSG:4326).", default="4326")
  # # cube_dimensions__lat__type:  Optional[str] = Field(description="The type of the dimension, typically 'spatial'.",  default  = 'spatial')

  # # cube_dimensions__time__description:  Optional[str] = Field( description="Description of the time dimension.", default = 'time')
  # cube_dimensions__time__extent: Optional[List[str]] = Field(description="The start and end datetimes of the collection's temporal coverage in a list of string as [start,end]")
  # cube_dimensions__time__step:  Optional[str] = Field(description="The temporal resolution step (e.g., 'P1D' for one day,'P1M' for one month). Use concept_id to get the granules. And then find the time step.")
  # # cube_dimensions__time__type:  Optional[str] = Field(description="The type of the dimension, typically 'temporal'.", default  = 'temporal')
  

  # cube_variables__NPP__description:  Optional[str] = Field(description="The description of the data bands variable. Is available on the header of the nc granule data.")
  # # cube_variables__NPP__dimensions:  Optional[List[str]] = Field(description="The available dimensions for the data bands variable.", default=["time","lat","lon"])
  # cube_variables__NPP__type:  Optional[str] = Field(description="The type represented by this bands variable")
  # cube_variables__NPP__units:  Optional[str] = Field(description="unit of the data bands variable. It is available on the header of the nc granule data.")

  # cube_variables__lat_bnds__description:  Optional[str] = Field(description="The description of the lat bands. Is available on the header of the nc granule data.")
  # # cube_variables__lat_bnds__dimensions:  Optional[List[str]] = Field(description="The available dimensions for the lat bands.", default=["lon", "nv"])
  # cube_variables__lat_bnds__type:  Optional[str] = Field(description="The type represented by this lat bands variable")
  # cube_variables__lat_bnds__units:  Optional[str] = Field(description="unit of the lat bands. It is available on the header of the nc granule data.")

  # cube_variables__lon_bnds__description:  Optional[str] = Field(description="The description of the lon bands. Is available on the header of the nc granule data.")
  # # cube_variables__lon_bnds__dimensions:  Optional[List[str]] = Field(description="The available dimensions for the lon bands.", default=["lat","nv"])
  # cube_variables__lon_bnds__type:  Optional[str] = Field(description="The type represented by this lon bands variable")
  # cube_variables__lon_bnds__units:  Optional[str] = Field(description="unit of the lon bands. It is available on the header of the nc granule data.")

  # cube_variables__time_bnds__description:  Optional[str] = Field(description="The description of the time band. Is available on the header of the nc granule data.")
  # # cube_variables__time_bnds__dimensions:  Optional[List[str]] = Field(description="The available dimensions for the time band.", default=["lon", "nv"])
  # cube_variables__time_bnds__type:  Optional[str] = Field(description="The type represented by this time band variable")
  # cube_variables__time_bnds__units:  Optional[str] = Field(description="unit of the time band. It is available on the header of the nc granule data.")


  # item_assets__data__description:  Optional[str] = Field(description="The description of the data bands variable. get it from the pdf attached to the granules.")
  # # item_assets__data__roles: Optional[List[str]]= Field(description="The role of the data bands variable", default=["data", "layers"])
  # item_assets__data__title:  Optional[str] = Field(description="The title of the data varialble. get it from the pdf attached to the granules.")
  # # item_assets__data__type:  Optional[str] = Field(description="The type of data. It is a constant for the Indexed CMR.", default="application/netcdf")
  
  # # renders__NPP__backend:  Optional[str] = Field(description="For CMR indexing into STAC, backend will be xarray", default="xarray") # default commented for now
  # renders__NPP__colormap_name:  Optional[str] = Field(description="This is a variable that represents the colormap for the given data bands variable. get it from the concept json.")
  # renders__NPP__resampling: Optional[str] = Field(description="This is the resampling value that represents the resampling value of the given data variale. get it from the concept json.")
  # renders__NPP__resacale: Optional[List[List[float]]] = Field(description="This is the rescale value for the given data bands variable. get it from the histogram for the provided variable.")
  # renders__NPP__title:  Optional[str] = Field(description="This is the title of the NPP data bands variable.")
  
  # ## defaults commented for now
  # # providers: Optional[List[Dict[str, str]]] = Field(description="The name of the data providers", default={"name": "NASA"})
  # # stac_version:  Optional[str] = Field(description="The STAC version", default="1.0.0")
  # # s_license:  Optional[str] = Field(description="The license of the STAC data", default="CC0-1.0")
  # # links: Optional[List[Any]] = Field(description="The links when the STAC is generated. Leave it as a Empty Array as it will be generated later on.", default=[])
  # # stac_extensions: Optional[List[str]] = Field(description="The list of STAC extensions used in the STAC", default=["https://stac-extensions.github.io/render/v1.0.0/schema.json", "https://stac-extensions.github.io/item-assets/v1.0.0/schema.json"])
  

In [611]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator

# here along with the messaege, the graph node will share the configs.
# whatever is missing will be filled in the process by nodes/agents in the graph.
class AgentState(TypedDict):
  messages: Annotated[list, operator.add]
  configs: ConfigVars

In [612]:
# node 1
def create_flat_config(state: AgentState) -> AgentState:
  """
  This node will be responsible for creation of the pydantic enforced ConfigState.
  The values that are not available in the input CMR json should be assigned as None.
  The values that are available in CMR and corresponds to the ConfigVars should be filled up.
  """
  # TODO: validate the input CMR json to match the CMR pydentic schema. If error, throw that instead.
  # The message content is expected to have the CMR json
  model = llm.with_structured_output(schema=ConfigVars)

  structuring_prompt_template = ChatPromptTemplate.from_messages(
    [
      ( "system",
        "You are an expert extraction algorithm."
        "Only extract relevant information from the structured data."
        "If you do not know the value of an attribute asked to extract,"
        "return null for the attribute's value"
      ),
      ( "human", "{data}" )
    ]
  )

  prompt = structuring_prompt_template.invoke({ "data": state["messages"][0]["content"] })

  response = model.invoke(prompt)

  # TODO: validate if the response is as per the pydantic defination of ConfigVars.

  return {
    "messages": state["messages"],
    "configs": response
  }


Tools Start


In [None]:
# tools for node 2, which is a react agent

from langchain_core.tools import tool
import requests
from datetime import datetime, timedelta

@tool
def find_dashboard_is_periodic(concept_id: str) -> bool:
  """
  Provided the concept_id from cmr or collection from configVars,
  this tool provides if the data is periodic or not.
  """
  def get_granules_information(concept_id):
    result  = get_full_collection_details(concept_id)
    collection = result.get('data',{}).get('collection',{})
    has_granules = collection.get('hasGranules',{})
    granules = collection.get('granules',{})
    granules_count = granules.get('count',{})
    return has_granules,granules_count
  
  def check_for_periodicity(concept_id: str,page_num:int,page_size:int=10) -> bool:
    """
    Provided the concept_id from cmr, use it to get the list of granules.
    based on the granules, figure out the periodicity.
    """
    # CMR API endpoint for searching granules
    CMR_URL = "https://cmr.earthdata.nasa.gov/search/granules.json"
    
    # Parameters for the API request
    # We sort by start date to easily calculate time differences
    params = {
        "collection_concept_id": concept_id,
        "sort_key": "start_date",
        "page_size": page_size ,
        "page_num":page_num
    }

    try:
      response = requests.get(CMR_URL, params=params)
      response.raise_for_status() # Raise an exception for bad status codes
      data = response.json()
      
      # Extract the start times from the response
      granules = data.get("feed", {}).get("entry", [])
      start_times = [g['time_start'] for g in granules]

      # We need at least 3 granules to reliably determine a period
      if len(start_times) < 3:
          return False

      # Convert time strings to datetime objects
      date_times = [datetime.fromisoformat(time.replace('Z', '+00:00')) for time in start_times]
      
      # Calculate the initial time difference
      initial_delta = date_times[1] - date_times[0]

      # If the initial delta is zero, it cannot be periodic
      if initial_delta.total_seconds() == 0:
          return False

      # Check if the remaining deltas are consistent with the initial one
      for i in range(1, len(date_times) - 1):
          current_delta = date_times[i+1] - date_times[i]
          # Allow a small tolerance (e.g., 1%) for minor variations
          if not (abs(current_delta.total_seconds() - initial_delta.total_seconds()) < 0.01 * initial_delta.total_seconds()):
              return False
              
      # If all time differences are consistent, it's periodic
      return True

    except requests.exceptions.RequestException as e:
      print(f"API request failed: {e}")
      return False
    except (ValueError, KeyError) as e:
      print(f"Failed to parse data: {e}")
      return False
  
  def check_sampled_periodicity(concept_id: str, total_count: int,page_size:int=10):
      """
      Checks for periodicity by sampling granules from the start, middle, and end.
      """
      if total_count < 15:
          print("Not enough granules to perform a sampled check.")
          return False

      # 1. Define pages to fetch
      start_page = 1
      middle_page = (total_count // page_size) // 2
      end_page = (total_count // page_size)
      
      # 2. Fetch granules from all three pages
      start_granules = check_for_periodicity(concept_id, start_page, page_size)
      middle_granules = check_for_periodicity(concept_id, middle_page, page_size)
      end_granules = check_for_periodicity(concept_id, end_page, page_size)
    
      # 3. Combine and process the timestamps
      all_granules = start_granules and middle_granules and end_granules
      if all_granules:
          print("\nPeriodicity is consistent across the entire dataset.")
          return True
      else:
          return False 
  
  has_granules,granules_count=get_granules_information(concept_id)
  page_size = 10
  if has_granules:
    is_periodic = check_sampled_periodicity(concept_id,granules_count,page_size)
    return is_periodic
  else :
    return False

  
# many more tools 

Tools End

In [622]:
# node 2
def call_filler_react_agent(state: AgentState):
  """
  for the missing value in the state.configs variables,
  try to fill in the value using available tools.
  Maybe create a template with system message to command it to do so
  """
  print("node 3 >>>>>>", state)
  
  tools = [find_dashboard_is_periodic]
  llm.bind_tools(tools)
  # model = llm.with_structured_output(schema=ConfigVars)
  filler_react_agent = create_react_agent(llm, tools)

  system_message = """
  You are a data expert designed to populate a configuration object based on a user's request.

  Your primary goal is to gather all the necessary information to completely fill out the `ConfigVars` schema.

  You have access to a set of tools to find information. You must use these tools whenever you do not have the information readily available. Do not make up or guess any values.

  Follow these steps carefully:
  1.  **Analyze the Request:** First, identify the core `concept_id` from the user's input. The `collection` field in your final answer will be this same `concept_id`.

  2.  **Plan Your Actions:** Look at the fields in the `ConfigVars` schema (`title`, `collection`, `description`, `is_periodic`). For each field that you don't have a value for, you must use a tool to find it.

  3.  **Execute and Observe:** Use the tools one by one to find the missing information.
      - To determine if the dataset is periodic, you **must** use the `find_periodicity` tool.
      - Use other available tools to find whatever is needed.

  4.  **Final Answer:** Once you have successfully found all the required pieces of None information (`title`, `collection`, `description`, `is_periodic`) return the structured output.
  """

  structuring_prompt_template = ChatPromptTemplate.from_messages(
    [
      ( "system", system_message),
      ( "human", "{data}" )
    ]
  )

  prompt = structuring_prompt_template.invoke({ "data": state["configs"] })

  response = filler_react_agent.invoke(prompt)

  print("!!!!------------")
  print(response.tool_calls())
 
   
  return {
    "messages": state["messages"],
    "configs": response       
  }

In [623]:
# node 3
def formulate_stac_config(state: AgentState) -> dict:
  """
  use the flattened configVars and then
  formulate the necessary config json
  """
  print("node 4", state)
  
  return {}

In [624]:
# node 4
def create_collection(stac_config: dict) -> None:
  """
  request the airflow create_collection DAG, with the complete STAC config.json
  """
  print("node 5", stac_config)
  
  return None

In [625]:
# finally create a graph
graph_builder = StateGraph(AgentState)

graph_builder.add_node("create_flat_config", create_flat_config)
graph_builder.add_node("call_filler_react_agent", call_filler_react_agent)
graph_builder.add_node("formulate_stac_config", formulate_stac_config)
graph_builder.add_node("create_collection", create_collection)

graph_builder.add_edge(START, "create_flat_config")
graph_builder.add_edge("create_flat_config", "call_filler_react_agent")
graph_builder.add_edge("call_filler_react_agent", "formulate_stac_config")
graph_builder.add_edge("formulate_stac_config", "create_collection")
graph_builder.add_edge("create_collection", END)

graph = graph_builder.compile()


In [626]:
# from IPython.display import Image, display

# try:
#   display(Image(graph.get_graph().draw_mermaid_png()))
# except exception:
#   pass

In [627]:
# from IPython.display import Image, display

# try:
#   display(Image(filler_react_agent.get_graph().draw_mermaid_png()))
# except exception:
#   pass

In [628]:
config = {
  "configurable": {
    "thread_id": "test123xyz123"
  }
}

messages = { "messages": [{
  "role": "user",
  "content": mock_data
  }]
}

print(messages)

{'messages': [{'role': 'user', 'content': {'data': {'collection': {'abstract': 'MiCASA is an extensive revision of CASA-GFED3. CASA-GFED3 derives from Potter et al. (1993), diverging in development since Randerson et al. (1996). CASA is a light use efficiency model: NPP is expressed as the product of photosynthetically active solar radiation, a light use efficiency parameter, scalars that capture temperature and moisture limitations, and fractional absorption of photosynthetically active radiation (fPAR) by the vegetation canopy derived from satellite data. Fire parameterization was incorporated into the model by van der Werf et al. (2004) leading to CASA-GFED3 after several revisions (van der Werf et al., 2006, 2010). Development of the GFED module has continued, now at GFED5 (Chen et al., 2023) with less focus on the CASA module. MiCASA diverges from GFED development at version 3, although future reconciliation is possible. Input datasets include air temperature, precipitation, incid

In [629]:
result = graph.invoke(messages, config)


node 3 >>>>>> {'messages': [{'role': 'user', 'content': {'data': {'collection': {'abstract': 'MiCASA is an extensive revision of CASA-GFED3. CASA-GFED3 derives from Potter et al. (1993), diverging in development since Randerson et al. (1996). CASA is a light use efficiency model: NPP is expressed as the product of photosynthetically active solar radiation, a light use efficiency parameter, scalars that capture temperature and moisture limitations, and fractional absorption of photosynthetically active radiation (fPAR) by the vegetation canopy derived from satellite data. Fire parameterization was incorporated into the model by van der Werf et al. (2004) leading to CASA-GFED3 after several revisions (van der Werf et al., 2006, 2010). Development of the GFED module has continued, now at GFED5 (Chen et al., 2023) with less focus on the CASA module. MiCASA diverges from GFED development at version 3, although future reconciliation is possible. Input datasets include air temperature, precip

AttributeError: 'dict' object has no attribute 'tool_calls'