# 3. Vancounver Open Data Example: Creating an Agent with multiple UDFs

In this notebook we'll build a multi-UDF Agent that can have access to a few UDFs that fetch information from [Vancouver's Open Data portal](https://opendata.vancouver.ca/pages/home/)

In [1]:
import fused
import json
import os
import time
from pathlib import Path

In [2]:
# We still need your local paths
PATH_TO_CLAUDE_CONFIG = (
    f"{str(Path.home())}/Library/Application Support/Claude/claude_desktop_config.json"
)


if not os.path.exists(PATH_TO_CLAUDE_CONFIG):
    # Creating the config file
    os.makedirs(os.path.dirname(PATH_TO_CLAUDE_CONFIG), exist_ok=True)
    with open(PATH_TO_CLAUDE_CONFIG, "w") as f:
        json.dump({}, f)

assert os.path.exists(PATH_TO_CLAUDE_CONFIG), (
    "Please update the PATH_TO_CLAUDE_CONFIG variable with the correct path to your Claude config file"
)

In [3]:
# Local path to the Claude app
CLAUDE_APP_PATH = "/Applications/Claude.app"
assert os.path.exists(CLAUDE_APP_PATH), (
    "Please update the CLAUDE_APP_PATH variable with the correct path to your Claude app"
)

In [4]:
# Change this path if you're not running this from the repo root
WORKING_DIR = os.getcwd()

In [5]:
# We'll load the commons folder once again to have our helper functions
commit = "5dda36c"
common = fused.load(
    f"https://github.com/fusedio/udfs/tree/{commit}/public/common"
).utils

In [6]:
# And see which agents we have available
json.load(open(os.path.join(WORKING_DIR, "agents.json")))

{'agents': [{'name': 'get_current_time', 'udfs': ['current_utc_time']},
  {'name': 'fused_docs', 'udfs': ['list_public_udfs', 'reading_fused_docs']},
  {'name': 'vancouver_open_data',
   'udfs': ['hundred_parks_in_vancouver',
    'electric_vehicle_chargers_in_vancouver',
    'building_permits_in_vancouver',
    'internet_speeds_for_lat_lon']},
  {'name': 'elevation_stats_for_lat_lon_area', 'udfs': ['elevation_stats']},
  {'name': 'apple_banana_orange', 'udfs': ['apple_banana_orange_udf']},
  {'name': 'dynamic_output_vector', 'udfs': ['dynamic_output_vector_udf']}]}

We'll make 5 UDFs:
- Returning location, name & size of 100 parks in Vancouver
- Returning the location of 100 EV chargers in the city
- Returning yearly crime statistics over a point of interest in Vancouver
- Returning the internet speed of any lat / lon (not just in Vancouver, but this also works for Vancouver area)
- Returning the location of community gardens in the city

In [7]:
AGENT_NAME = "vancouver_open_data_demo"

In [10]:
@fused.udf
def parks_vancouver():
    """
    UDF to get the polygon geometries of parks in Vancouver based on Open Data Portal
    """
    import requests
    import geopandas as gpd
    import pandas as pd
    from shapely.geometry import Polygon, MultiPolygon, shape
    import numpy as np
    import math
    from pandas import json_normalize

    @fused.cache
    def get_request(url):
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for HTTP errors
        return response

    limit = 100
    parks_url = f"https://opendata.vancouver.ca/api/explore/v2.1/catalog/datasets/parks-polygon-representation/records?limit={str(limit)}"
    response = get_request(url=parks_url)
    json_data = response.json()
    
    # First extract all non-geometry data
    df = json_normalize(json_data['results'])
    
    # Drop the geom column which will be replaced with proper geometry
    if 'geom' in df.columns:
        df = df.drop(columns=['geom'])

    # Filter for valid polygons or multipolygons
    valid_items = [
        item for item in json_data['results']
        if 'geom' in item 
        and item['geom'] is not None
        and isinstance(item['geom'], dict)
        and 'geometry' in item['geom'] 
        and item['geom']['geometry'] is not None
        and isinstance(item['geom']['geometry'], dict)
        and 'type' in item['geom']['geometry']
        and item['geom']['geometry']['type'] in ['Polygon', 'MultiPolygon']
    ]
    
    # Create geometries using shapely's shape function
    geometries = []
    for item in valid_items:
        try:
            geom = shape(item['geom']['geometry'])
            geometries.append(geom)
        except Exception as e:
            print(f"Error creating geometry: {e}")
            continue
    
    # Create filtered dataframe with matching indices
    filtered_df = pd.DataFrame(valid_items).drop(columns=['geom'])
    
    # Create GeoDataFrame with geometries
    gdf = gpd.GeoDataFrame(
        filtered_df, 
        geometry=geometries,
        crs="EPSG:4326"
    )

    # Adding estimate of average size of park so next UDF knows bu how much it needs to buffer lat / lon point to get similar expecation
    gdf['buffer_radius'] = gdf['area_ha'].apply(
        lambda x: math.sqrt((x * 10000) / math.pi)
    )

    # # Adding centroid calculation
    # centroids = gdf.geometry.centroid
    # gdf['centroid_lon'] = centroids.x
    # gdf['centroid_lat'] = centroids.y
    
    print(f"{gdf.sample()=}")
    print(f"{gdf.columns=}")
    print(gdf.shape)
    return gdf

In [11]:
fused.run(parks_vancouver)

Unnamed: 0,park_id,park_name,area_ha,park_url,geo_point_2d,geometry,buffer_radius
0,108.0,Connaught Park,5.993645,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.26205550109892, 'lon': -123.1601050...","POLYGON ((-123.1623 49.26292, -123.15784 49.26...",138.124454
1,81.0,Clark Park,4.295203,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.25710409057022, 'lon': -123.0723570...","POLYGON ((-123.07002 49.25766, -123.07004 49.2...",116.927561
2,37.0,Chaldecott Park,3.454305,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.249047303072075, 'lon': -123.192237...","POLYGON ((-123.19347 49.24991, -123.19099 49.2...",104.858925
3,174.0,Braemar Park,1.258929,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.24762496945391, 'lon': -123.1235807...","POLYGON ((-123.12462 49.24801, -123.12251 49.2...",63.303193
4,236.0,Ebisu Park,0.420569,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.20538526928567, 'lon': -123.1324784...","POLYGON ((-123.13189 49.20509, -123.13247 49.2...",36.588417
...,...,...,...,...,...,...,...
95,8.0,Trafalgar Park,4.860801,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.25097540413386, 'lon': -123.1624811...","POLYGON ((-123.16514 49.25106, -123.16512 49.2...",124.388145
96,188.0,Ross Park,1.511835,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.21726674556697, 'lon': -123.0823563...","POLYGON ((-123.08296 49.21803, -123.08173 49.2...",69.370880
97,140.0,Robson Park,1.563992,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.258160306611416, 'lon': -123.092024...","POLYGON ((-123.09127 49.25792, -123.09113 49.2...",70.557371
98,168.0,Riley Park,2.703745,http://covapp.vancouver.ca/parkfinder/parkdeta...,"{'lat': 49.24229347353439, 'lon': -123.1042653...","POLYGON ((-123.1051 49.24326, -123.10334 49.24...",92.770074


In [14]:
parks_mcp_metadata = {
    "description": """
Name: parks_vancouver
Purpose and Functionality:
The UDF 'parks_vancouver' is designed to extract and format the geographical and geometrical data related to parks in Vancouver. It fetches the data from the Open Data Portal of the city of Vancouver and further structures it in a geo-referenced format suitable for subsequent spatial analysis or visualization tasks. The function operates by facilitating HTTP requests, performing data extraction, geometry creation, data formatting, and providing results as a GeoDataFrame. 

Input Parameters:

The function doesn't require any explicit input parameters from the user. Internal parameters such as the 'limit' and 'parks_url' variables are already set within the function. The 'limit' parameter controls the maximum number of entries the function fetches from the Open Data Portal with a current preset value of 100. The 'parks_url' parameter is set to the API endpoint of the Vancouver parks dataset.
Output:
The output of this UDF is a GeoDataFrame, a spatial variant of a pandas DataFrame from the GeoPandas library. This output contains the data about parks and related geometric entities with their characteristics included as column entries. This includes the park's area (in hectares), centroid coordinates, buffer radius, and geometric representation as shapely Polygon or MultiPolygon objects.

Technical Details and Limitations:
The function makes HTTP requests to external APIs, therefore its functionality depends on the availability and continuity of those APIs. It also operates under the assumption that the data from the APIs follow a particular JSON structure. The script will fail to function properly if the web service changes its data structure or ceases to exist. To raise exceptions for HTTP errors and continue the function execution, it uses the 'raise_for_status()' method from the requests library.
Furthermore, the function also uses the Shapely library to generate geometric shapes from the data. Any inconsistencies or errors in the data may lead to exceptions during this process, but these are handled and printed via a try-except block. Finally, the output GeoDataFrame's 'buffer_radius' column is computed using the assumption that the park areas are circular, which may not always be accurate for irregularly shaped parks.""",
    "parameters": [
        {
            "name": "",
            "type": "",
        }
    ],
}

parks_mcp_metadata

{'description': "\nName: parks_vancouver\nPurpose and Functionality:\nThe UDF 'parks_vancouver' is designed to extract and format the geographical and geometrical data related to parks in Vancouver. It fetches the data from the Open Data Portal of the city of Vancouver and further structures it in a geo-referenced format suitable for subsequent spatial analysis or visualization tasks. The function operates by facilitating HTTP requests, performing data extraction, geometry creation, data formatting, and providing results as a GeoDataFrame. \n\nInput Parameters:\n\nThe function doesn't require any explicit input parameters from the user. Internal parameters such as the 'limit' and 'parks_url' variables are already set within the function. The 'limit' parameter controls the maximum number of entries the function fetches from the Open Data Portal with a current preset value of 100. The 'parks_url' parameter is set to the API endpoint of the Vancouver parks dataset.\nOutput:\nThe output of

In [15]:
# Adding our UDF + MCP_metadata to our Agent
common.save_to_agent(
    agent_json_path=os.path.join(WORKING_DIR, "agents.json"),
    udf=parks_vancouver,
    agent_name=AGENT_NAME,
    udf_name="hundred_parks_in_vancouver",
    mcp_metadata=parks_mcp_metadata,
)

In [28]:
@fused.udf
def ev_chargers():
    """
    UDF to get the location of electric chargers around Vancouver based on Open Data Portal
    """
    import requests
    import geopandas as gpd
    import pandas as pd
    from shapely.geometry import Point
    import numpy as np
    from pandas import json_normalize

    @fused.cache
    def get_request(url):
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for HTTP errors
        return response

    limit = 100
    building_permits_url = f"https://opendata.vancouver.ca/api/explore/v2.1/catalog/datasets/electric-vehicle-charging-stations/records?limit={str(limit)}"
    response = get_request(url=building_permits_url)
    json_data = response.json()
    
    # First extract all non-geometry data
    df = json_normalize(json_data['results'])
    
    # Drop the geom column which will be replaced with proper geometry
    if 'geom' in df.columns:
        df = df.drop(columns=['geom'])

    # Skipping any point that doesn't have valid geom
    valid_items = [
        item for item in json_data['results']
        if 'geom' in item 
        and item['geom'] is not None
        and isinstance(item['geom'], dict)
        and 'geometry' in item['geom'] 
        and item['geom']['geometry'] is not None
        and isinstance(item['geom']['geometry'], dict)
        and 'type' in item['geom']['geometry']
        and item['geom']['geometry']['type'] == 'Point'
    ]
    
    # Extract coordinates directly into arrays
    coords = np.array([
        item['geom']['geometry']['coordinates'] 
        for item in valid_items
    ])
    
    # Create Points in a vectorized way
    geometries = [Point(x, y) for x, y in coords]
    
    # Create filtered dataframe with matching indices
    filtered_df = pd.DataFrame(valid_items).drop(columns=['geom'])
    
    # Create GeoDataFrame with geometries
    gdf = gpd.GeoDataFrame(
        filtered_df, 
        geometry=geometries,
        crs="EPSG:4326"
    )
    print(gdf.shape)
    return gdf
    


In [29]:
# We can run this UDF locally with `fused.run(udf)`
fused.run(ev_chargers)

Unnamed: 0,address,lot_operator,geo_local_area,geo_point_2d,geometry
0,Beach Ave. @ Cardero St,Easy Park / Park Board,West End,"{'lat': 49.283155, 'lon': -123.142173}",POINT (-123.14217 49.28316)
1,1 Kingsway,Easypark,Mount Pleasant,"{'lat': 49.2641594, 'lon': -123.1002054}",POINT (-123.10021 49.26416)
2,4575 Clancy Loranger Way,Park Board,Riley Park,"{'lat': 49.243665, 'lon': -123.106578}",POINT (-123.10658 49.24366)
3,845 Avison Way,Vancouver Aquarium,,"{'lat': 49.299534, 'lon': -123.13022}",POINT (-123.13022 49.29953)
4,273 E 53rd Ave,City of Vancouver,Sunset,"{'lat': 49.222235022762, 'lon': -123.100095218...",POINT (-123.1001 49.22224)
5,3311 E. Hastings,City of Vancouver,Hastings-Sunrise,"{'lat': 49.281369, 'lon': -123.033786}",POINT (-123.03379 49.28137)
6,959-979 Mainland St,City of Vancouver,Downtown,"{'lat': 49.2769232201811, 'lon': -123.11926106...",POINT (-123.11926 49.27692)
7,451 Beach Crescent,City of Vancouver,Downtown,"{'lat': 49.272514, 'lon': -123.12833}",POINT (-123.12833 49.27251)
8,646 E. 44th Ave,City of Vancouver,Sunset,"{'lat': 49.229928611422, 'lon': -123.09135336453}",POINT (-123.09135 49.22993)
9,5175 Dumfries St,City of Vancouver,Kensington-Cedar Cottage,"{'lat': 49.2385063171386, 'lon': -123.07548522...",POINT (-123.07549 49.23851)


In [30]:
ev_charger_mcp_metadata = {
    "description": "This UDF returns the location of all the electric chargers in Vancouver as a GeoDataFrame with the name of the chargers and their lat lon",
    "parameters": [
        {
            "name": "",
            "type": "",
        }
    ],
}

ev_charger_mcp_metadata

{'description': 'This UDF returns the location of all the electric chargers in Vancouver as a GeoDataFrame with the name of the chargers and their lat lon',
 'parameters': [{'name': '', 'type': ''}]}

In [31]:
# We add this new UDF + mcp_metadata to the same agent
common.save_to_agent(
    agent_json_path=os.path.join(WORKING_DIR, "agents.json"),
    udf=ev_chargers,
    agent_name=AGENT_NAME,
    udf_name="electric_vehicle_chargers_in_vancouver",
    mcp_metadata=ev_charger_mcp_metadata,
)

In [50]:
@fused.udf
def yearly_crime_amount(
    up_to_year: int = 2021, 
    lat: float=49.2806, 
    lon: float=-123.1259,
    buffer_amount: float = 1000,
):
    """
    This UDF takes the lat / lon + buffer amount of any area within Vancouver and returns
    the number of different crimes per category that happened that year
    """
    import datetime
    import geopandas as gpd
    import pandas as pd
    import shapely
    from shapely.geometry import Point

    current_year = int(datetime.datetime.now().year)
    
    list_of_years_to_process = [year for year in range(up_to_year, current_year)]
    print(f"{list_of_years_to_process=}")

    yearly_type_summaries = {}
    yearly_crime_location = {}

    @fused.cache()
    def getting_yearly_data(year):
        path = f"s3://fused-users/fused/max/maxar_ted_demo/crimedata_csv_AllNeighbourhoods_{str(year)}.csv"
        
        df = pd.read_csv(path)
        df["geometry"] = gpd.points_from_xy(df["X"], df["Y"], crs="EPSG:32610")
    
        gdf = gpd.GeoDataFrame(df)
        print(gdf.shape)
        gdf.to_crs(4326, inplace=True)
    
        aoi_gdf = gpd.GeoDataFrame(geometry=[Point(lon, lat)], crs="EPSG:4326")
        
        # Project to a local UTM projection for accurate buffering in meters
        # Get UTM zone from longitude
        utm_zone = int(((lon + 180) / 6) % 60) + 1
        hemisphere = 'north' if lat >= 0 else 'south'
        utm_crs = f"+proj=utm +zone={utm_zone} +{hemisphere} +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
        
        gdf_utm = aoi_gdf.to_crs(utm_crs)
        gdf_utm['geometry'] = gdf_utm.buffer(buffer_amount)
        gdf_buffered = gdf_utm.to_crs("EPSG:4326")
    
        clipped_gdf = gpd.sjoin(gdf, gdf_buffered, predicate='intersects', how='inner')

        # Getting stats
        number_of_crimes = pd.DataFrame({f"number_crimes": [clipped_gdf.shape[0]]})
        grouped_by_type = clipped_gdf.groupby("TYPE").size().reset_index(name="count")

        # print(f"{type(grouped_by_type)=}")
        # print(f"{grouped_by_type=}")
        # print(f"{grouped_by_type.columns=}")
        return clipped_gdf, grouped_by_type
    
    for year in list_of_years_to_process:
        clipped_gdf, grouped_by_type = getting_yearly_data(year)
        yearly_crime_location[year] = clipped_gdf
        yearly_type_summaries[year] = grouped_by_type

    df_all_years = pd.concat(
        [df.assign(year=year) for year, df in yearly_type_summaries.items()],
        ignore_index=True
    )
    
    df = pd.concat([df.assign(year=year) for year, df in yearly_type_summaries.items()], ignore_index=True)
    print(f"{df_all_years=}")
    return df_all_years


In [51]:
fused.run(yearly_crime_amount, up_to_year=2020)

Unnamed: 0,TYPE,count,year
0,Break and Enter Commercial,693,2020
1,Break and Enter Residential/Other,98,2020
2,Mischief,1296,2020
3,Other Theft,2252,2020
4,Theft from Vehicle,2264,2020
5,Theft of Bicycle,334,2020
6,Theft of Vehicle,84,2020
7,Vehicle Collision or Pedestrian Struck (with F...,1,2020
8,Vehicle Collision or Pedestrian Struck (with I...,97,2020
9,Break and Enter Commercial,525,2021


In [52]:
yearly_crime_mcp_metadata = {
    "description": """
The User-Defined Function (UDF) 'yearly_crime_per_category' is designed to perform a geo-statistical analysis of crime data for a specific area within Vancouver based on the geographical coordinates provided. The function calculates the total number of crimes committed yearly since a specified year up to the current year within a specific radius of the given coordinates.

Input Parameters:
The function takes four parameters: 'up_to_year', 'lat', 'lon', and 'buffer_amount'. 
1. 'up_to_year': This integer value specifies the year since when the analysis should be performed up till the current year. The default value is set at 2021. 
2. 'lat' and 'lon': These float values define the geographical coordinates of the area of interest, specifically, latitude('lat') and longitude('lon'). These values are set to default to a location in Vancouver city, but must be redefined by the users according to the area for which data analysis is to be done. 
3. 'buffer_amount': This float value represents the radius (in meters) around the defined coordinate position within which the crime data analysis would be calculated. The buffer amount defaults to 1000 meters.

Functionality and Output: 
The function reads the crime data CSV files from an S3 bucket path for each year, starting from the specified 'up_to_year' to the current year. It then processes and clips the dataset according to the location and buffer amount parameter precisely. 
Afterward, it performs a grouped statistical analysis based on various crime categories, calculates the total number of crimes, and returns progressive data about the yearly crime rates and their respective categories within the specified location buffer.

Technical Details and Limitations:
1. This UDF operates by using several specialized libraries including 'pandas', 'geopandas', 'shapely', and datetime. It converts CSV data into geospatial data, and then adjusts the coordinate system for accurate buffer calculations. It uses spatial join operations to select the relevant data within the area of interest.
2. The function assumes the availability of the requisite crime data CSV files for all the years from the given 'up_to_year' up till the current year at the specified S3 path.
3. The accuracy of the function heavily depends on the quality, detail, and format of the input CSV files, as well as the appropriateness and correctness of the provided geographical parameters.
4. The function analyzes the crime data per year. Thus, if a wide range 'up_to_year' is given, the function can take a substantial amount of time due to the large amount of data processing involved.

By carefully considering the input parameters and limitations, the 'yearly_crime_per_category' UDF proves to be a powerful tool in performing progressive geo-analysis of crime data based on geographical coordinates.
""",
    "parameters": [
    {
        "name": "up_to_year",
        "type": "int"
    },
    {
        "name": "lat",
        "type": "float"
    },
    {
        "name": "lon",
        "type": "float"
    },
    {
        "name": "buffer_amount",
        "type": "float"
    }
],
}

yearly_crime_mcp_metadata

{'description': "\nThe User-Defined Function (UDF) 'yearly_crime_per_category' is designed to perform a geo-statistical analysis of crime data for a specific area within Vancouver based on the geographical coordinates provided. The function calculates the total number of crimes committed yearly since a specified year up to the current year within a specific radius of the given coordinates.\n\nInput Parameters:\nThe function takes four parameters: 'up_to_year', 'lat', 'lon', and 'buffer_amount'. \n1. 'up_to_year': This integer value specifies the year since when the analysis should be performed up till the current year. The default value is set at 2021. \n2. 'lat' and 'lon': These float values define the geographical coordinates of the area of interest, specifically, latitude('lat') and longitude('lon'). These values are set to default to a location in Vancouver city, but must be redefined by the users according to the area for which data analysis is to be done. \n3. 'buffer_amount': Th

In [53]:
# We add this new UDF + mcp_metadata to the same agent
common.save_to_agent(
    agent_json_path=os.path.join(WORKING_DIR, "agents.json"),
    udf=yearly_crime_amount,
    agent_name=AGENT_NAME,
    udf_name="yearly_crime_amount",
    mcp_metadata=yearly_crime_mcp_metadata,
)

In [33]:
@fused.udf
def ookla_internet_speed(bounds: fused.types.Bounds=None, lat: float=37.7749, lon: float=-122.4194):
    
    file_path='s3://ookla-open-data/parquet/performance/type=mobile/year=2024/quarter=3/2024-07-01_performance_mobile_tiles.parquet'
    
    # Load pinned versions of utility functions.
    utils = fused.load("https://github.com/fusedio/udfs/tree/ee9bec5/public/common/").utils
    
    # Sample usage: Set default lat/lon for San Francisco if none provided
    if lat is None and lon is None and bounds is None:
        print("Using sample coordinates for San Francisco")
        lat = 37.7749
        lon = -122.4194
    
    # Check if we're using point query or bounds
    if lat is not None and lon is not None:
        # Create a small bounding box around the input lat/lon
        buffer = 0.01  # ~1km at equator
        total_bounds = [lon - buffer, lat - buffer, lon + buffer, lat + buffer]
        using_point_query = True
    else:
        # Use the provided bounds
        total_bounds = bounds.total_bounds
        using_point_query = False
    
    @fused.cache
    def get_data(total_bounds, file_path, h3_size):
        con = utils.duckdb_connect()
        # DuckDB query to:
        # 1. Convert lat/long to H3 cells
        # 2. Calculate average download speed per cell
        # 3. Filter by geographic bounds
        qr=f'''select  h3_latlng_to_cell(tile_y, tile_x, {h3_size}) as hex, 
                    avg(avg_d_kbps) as metric
        from read_parquet("{file_path}") 
        where 1=1
        and tile_x between {total_bounds[0]} and {total_bounds[2]}
        and tile_y between {total_bounds[1]} and {total_bounds[3]}
        group by 1
        ''' 
        df = con.sql(qr).df()
        return df
    
    # Calculate H3 resolution based on zoom level or use a fixed high resolution for point queries    
    if using_point_query:
        res = 8  # Use high resolution for point queries
    else:
        res_offset = 0
        res = max(min(int(2+bounds.z[0]/1.5),8)-res_offset,2)
    
    df = get_data(total_bounds, file_path, h3_size=res)
    
    # For point queries, find the closest H3 cell and return its speed
    if using_point_query:
        con = utils.duckdb_connect()
        
        # Convert the input lat/lon to an H3 cell
        point_cell_query = f'''
        SELECT h3_latlng_to_cell({lat}, {lon}, {res}) as point_hex
        '''
        point_cell_df = con.sql(point_cell_query).df()
        
        if not point_cell_df.empty:
            point_cell = point_cell_df['point_hex'].iloc[0]
            
            # Find the cell in our results that matches the point's cell
            if 'hex' in df.columns and not df.empty:
                point_speed = df[df['hex'] == point_cell]
                
                if not point_speed.empty:
                    print(f"Speed at location ({lat}, {lon}): {point_speed['metric'].iloc[0]} kbps")
                    point_speed.rename(columns={'metric': 'internet_speed_kbs'}, inplace=True)
                    print(f"{point_speed=}")
                    return point_speed
                else:
                    print(f"No exact match found for location ({lat}, {lon}). Returning all cells in area.")
            else:
                print(f"No data found for location ({lat}, {lon})")
    
    print(df) 
    return df


In [34]:
fused.run(ookla_internet_speed)

Unnamed: 0,hex,internet_speed_kbs
6,613196570331971583,373333.333333


In [35]:
internet_speed_mcp_metadata = {
    "description": "This example demonstrates how Ookla's mobile performance data can be dynamically processed into an H3 hexagonal grid system. The network metrics are aggregated (averaging download speeds) for H3 hexes at a resolution that adapts based on the zoom level. The performance data comes from Ookla's global speed test infrastructure, capturing real-world mobile network performance across diverse network operators and technologies. The data is stored in Parquet format on S3, structured by year and quarter, allowing for efficient geographic querying and temporal analysis. The resulting hexagonal grid provides a standardized way to visualize and analyze mobile network performance patterns across different geographic scales and regions.",
    "parameters": [
        {
            "name": "lat",
            "type": "float"
        },
        {
            "name": "lon",
            "type": "float"
        }
    ],
}

internet_speed_mcp_metadata

{'description': "This example demonstrates how Ookla's mobile performance data can be dynamically processed into an H3 hexagonal grid system. The network metrics are aggregated (averaging download speeds) for H3 hexes at a resolution that adapts based on the zoom level. The performance data comes from Ookla's global speed test infrastructure, capturing real-world mobile network performance across diverse network operators and technologies. The data is stored in Parquet format on S3, structured by year and quarter, allowing for efficient geographic querying and temporal analysis. The resulting hexagonal grid provides a standardized way to visualize and analyze mobile network performance patterns across different geographic scales and regions.",
 'parameters': [{'name': 'lat', 'type': 'float'},
  {'name': 'lon', 'type': 'float'}]}

In [36]:
# We add this new UDF + mcp_metadata to the same agent
common.save_to_agent(
    agent_json_path=os.path.join(WORKING_DIR, "agents.json"),
    udf=ookla_internet_speed,
    agent_name=AGENT_NAME,
    udf_name="internet_speeds_for_lat_lon",
    mcp_metadata=internet_speed_mcp_metadata,
)

In [38]:
@fused.udf
def community_gardens_vancouver():
    import requests
    import geopandas as gpd
    from shapely.geometry import shape

    params = {
        "limit": -1, # setting to max of amount of request we can do
    }

    url = "https://opendata.vancouver.ca/api/explore/v2.1/catalog/datasets/community-gardens-and-food-trees/records"
    r = requests.get(url, params=params)

    gdf = gpd.GeoDataFrame(r.json()["results"])
    gdf["geometry"] = gdf["geom"].apply(lambda x: shape(x["geometry"]))
    gdf = gdf.set_geometry("geometry")
    del gdf["geom"]

    print(f"{gdf.shape=}")

    

    return gdf

In [39]:
fused.run(community_gardens_vancouver)

Unnamed: 0,mapid,year_created,name,street_number,street_direction,street_name,street_type,merged_address,number_of_plots,number_of_food_trees,notes,food_tree_varieties,other_food_assets,jurisdiction,steward_or_managing_organization,public_e_mail,website,geo_local_area,geo_point_2d,geometry
0,FA002,2014,15th Avenue Coop,1255,E,15th,Av,"1255 E 15th Av, Vancouver, BC",8.0,,,,,Private,,,,Mount Pleasant,"{'lat': 49.2571193, 'lon': -123.0788387}",POINT (-123.07884 49.25712)
1,FA003,2008,16 Oaks,1018,W,16th,Av,"1018 W 16th Av, Vancouver, BC",55.0,,,,,Private,,oak.16th.garden@gmail.com,,Shaughnessy,"{'lat': 49.2567482, 'lon': -123.1276645}",POINT (-123.12766 49.25675)
2,FA007,1942,East Boulevard Allotment Plots,7176,E,,Boulevard,"7176 E Boulevard, Vancouver, BC",71.0,,,,,City,City of Vancouver,communitygardens@vancouver.ca,,Kerrisdale,"{'lat': 49.2209834, 'lon': -123.1505481}",POINT (-123.15055 49.22098)
3,FA009,2010,Atira Community Garden,400,,Hawks,Av,"400 Hawks Av, Vancouver, BC",15.0,,,,,Private,Atira Community Resources,,http://www.atira.bc.ca/community-garden-kitchens,Strathcona,"{'lat': 49.2810913, 'lon': -123.0871212}",POINT (-123.08712 49.28109)
4,FA011,2015,BC Seniors Living Assoc. Garden,3355,E,5 th,Av,"3355 E 5 th Av, Vancouver, BC",11.0,,,,,Private,BC Seniors Living Assoc. Beulah Homes Society,,,Hastings-Sunrise,"{'lat': 49.266474, 'lon': -123.032578}",POINT (-123.03258 49.26647)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,FA166,2017,HFBC Housing Foundation,2085,W,5th,Av,"2085 W 5th Av, Vancouver, BC",12.0,,,,,Private,,,,Kitsilano,"{'lat': 49.267434, 'lon': -123.152475}",POINT (-123.15248 49.26743)
96,FA169,2018,Brightside Home Foundation - King's Daughters,1400,E,11th,Av,"1400 E 11th Av, Vancouver, BC",3.0,,,,,Private,Brightside Community Homes Foundation,,,Kensington-Cedar Cottage,"{'lat': 49.260319, 'lon': -123.075082}",POINT (-123.07508 49.26032)
97,FA174,2016,HFBC Housing Foundation,2924,,Venables,St,"2924 Venables St, Vancouver, BC",12.0,,,,,Private,HFBC Housing Foundation,,,Hastings-Sunrise,"{'lat': 49.276211, 'lon': -123.043281}",POINT (-123.04328 49.27621)
98,FA176,,John Hendry (Trout Lake) Park,3300,,Victoria,Drive,"3300 Victoria Drive, Vancouver, BC",4.0,,,,,Park Board,,,https://cedarcottagefoodnetwork.com/projects-e...,Kensington-Cedar Cottage,"{'lat': 49.255098, 'lon': -123.06367871}",POINT (-123.06368 49.2551)


In [40]:
community_garden_mcp_metadata = {
    "description": """
1) Purpose and Functionality:
The User Defined Function (UDF) 'community_gardens_vancouver' is designed to fetch and process geospatial data relating to community gardens and food trees in the city of Vancouver. It operates by sending a request to the Vancouver Open Data API and returns a GeoDataFrame containing the geographical coordinates and corresponding information relating to the specified city assets.

2) Input Parameters:
The function doesn't require any input parameters from the user, as the parameters necessary for the API request ('limit') and URL are hard-coded within the function.

3) Output:
The UDF returns a GeoDataFrame object containing spatial data obtained from the response of the API request. Specifically, the data includes geographical coordinates (as shapely geometry data) and other corresponding details about Vancouver's community gardens and food trees. A print statement within the function also provides the dimensions of the returned GeoDataFrame in the format (num_rows, num_columns).

4) Technical Details and Limitations:
- The function incorporates the use of the 'requests', 'geopandas' and 'shapely.geometry' Python libraries, which need to be installed and imported correctly for the UDF to operate.
- It connects to the Vancouver Open Data API, hence requires a stable internet connection for successful data retrieval.
- The 'limit' parameter for the API request is currently set to -1, the API's maximum data retrieval limit. Any changes to accommodate remote API updates may require adjustments to this parameter.
- 'geom', a temporary column for storing raw geometry data, is deleted after use to optimize memory.
- The function doesn't include any error handling capabilities, so it's not protected from potential issues such as unsuccessful API requests or changes in the data schema.

5) Technical Style:
The UDF adheres to Python's coding standards and practices, presenting a clean, unambiguous syntax beneficial for AI systems and human coders alike. Despite its complexity, its operations are orderly and segmented, boosting understanding and potential for extension or modification.""",
    "parameters": [
        {
            "name": "",
            "type": "",
        }
    ],
}

community_garden_mcp_metadata

{'description': "\n1) Purpose and Functionality:\nThe User Defined Function (UDF) 'community_gardens_vancouver' is designed to fetch and process geospatial data relating to community gardens and food trees in the city of Vancouver. It operates by sending a request to the Vancouver Open Data API and returns a GeoDataFrame containing the geographical coordinates and corresponding information relating to the specified city assets.\n\n2) Input Parameters:\nThe function doesn't require any input parameters from the user, as the parameters necessary for the API request ('limit') and URL are hard-coded within the function.\n\n3) Output:\nThe UDF returns a GeoDataFrame object containing spatial data obtained from the response of the API request. Specifically, the data includes geographical coordinates (as shapely geometry data) and other corresponding details about Vancouver's community gardens and food trees. A print statement within the function also provides the dimensions of the returned G

In [41]:
# We add this new UDF + mcp_metadata to the same agent
common.save_to_agent(
    agent_json_path=os.path.join(WORKING_DIR, "agents.json"),
    udf=community_gardens_vancouver,
    agent_name=AGENT_NAME,
    udf_name="community_gardens_vancouver",
    mcp_metadata=community_garden_mcp_metadata,
)

In [54]:
# Let's make sure we created our agent properly, with all our UDFs
agents = json.load(open(os.path.join(WORKING_DIR, "agents.json")))
print(json.dumps(agents, indent=4, sort_keys=True))

{
    "agents": [
        {
            "name": "get_current_time",
            "udfs": [
                "current_utc_time"
            ]
        },
        {
            "name": "fused_docs",
            "udfs": [
                "list_public_udfs",
                "reading_fused_docs"
            ]
        },
        {
            "name": "vancouver_open_data",
            "udfs": [
                "hundred_parks_in_vancouver",
                "internet_speeds_for_lat_lon"
            ]
        },
        {
            "name": "elevation_stats_for_lat_lon_area",
            "udfs": [
                "elevation_stats"
            ]
        },
        {
            "name": "apple_banana_orange",
            "udfs": [
                "apple_banana_orange_udf"
            ]
        },
        {
            "name": "dynamic_output_vector",
            "udfs": [
                "dynamic_output_vector_udf"
            ]
        },
        {
            "name": "vancouver_open_data_demo",
 

Now we can tell Claude we want to use this `vancouver_open_data` Agent, defined in `AGENT_NAME`

In [55]:
AGENT_NAME

'vancouver_open_data_demo'

In [56]:
# Finally, we can select which Agent we want to pass to Claude in our MCP server config
common.generate_local_mcp_config(
    config_path=PATH_TO_CLAUDE_CONFIG,
    agents_list=[AGENT_NAME],
    repo_path=WORKING_DIR,
)

Claude uses a specific config (that you passed under `PATH_TO_CLAUDE_CONFIG`) to know what to run under the hood. This is what we're editing for you each time you change the agent you want to run

In [57]:
# Let's read this Claude Desktop config to see what we're passing
claude_config = json.load(open(PATH_TO_CLAUDE_CONFIG))
print(json.dumps(claude_config, indent=4, sort_keys=True))

{
    "mcpServers": {
        "vancouver_open_data_demo": {
            "args": [
                "run",
                "--directory",
                "/Users/maximelenormand/Library/CloudStorage/Dropbox/Mac/Documents/repos/fused-mcp",
                "main.py",
                "--runtime=local",
                "--udf-names=hundred_parks_in_vancouver,electric_vehicle_chargers_in_vancouver,yearly_crime_amount,internet_speeds_for_lat_lon,community_gardens_vancouver",
                "--name=vancouver_open_data_demo"
            ],
            "command": "uv"
        }
    }
}


## Now let's restart Claude with this new agent!

In [58]:
def restart_claude(claude_path: str = CLAUDE_APP_PATH):
    app_name = claude_path.split("/")[-1]

    try:
        os.system(f"pkill -f '{app_name}'")
        print(f"Killed {app_name}")
        time.sleep(2)  # Wait for shutdown
    except Exception:
        print("Claude wasn't running, so no need to kill it")

    print(f"Restarting {app_name}")
    os.system(f"open -a '{claude_path}'")  # Restart Claude

In [59]:
restart_claude()

Killed Claude.app
Restarting Claude.app
