<a href="https://colab.research.google.com/github/esassoc/qanat-community/blob/develop/Qanat.CommunityAPI/Examples/hackathon_2025_Use_Case_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Download API documentation
import requests

try:
    url = "https://raw.githubusercontent.com/esassoc/qanat-community/develop/Qanat.CommunityAPI/Examples/groundwater-accounting-platform-api.json"
    response = requests.get(url)
    response.raise_for_status() # Raise an exception for bad status codes

    with open("groundwater-accounting-platform-api.json", "w") as f:
        f.write(response.text)

    # Load API documentation
    import json

    with open('groundwater-accounting-platform-api.json', 'r') as f:
        api_documentation = json.load(f)

    # API key setup
    from google.colab import userdata

    api_key = None  # Initialize api_key to None

    try:
      api_key = userdata.get('API_KEY')
      if not api_key:
          # If the secret exists but is empty, print message and continue to check api_key
          print("API key found in Colab Secrets but is empty. Please provide your API key.")
          api_key = None # Ensure api_key is None if empty

    except userdata.SecretNotFoundError:
      # If the secret does not exist, print instructions and continue to check api_key
      print("-----------------------------------------------------------------------")
      print("API key not found in Colab Secrets.")
      print("Please add your API key to Colab Secrets:")
      print("1. Click on the 'ðŸ”‘' icon in the left sidebar.")
      print("2. Click on 'New secret'.")
      print("3. For 'Name', enter 'API_KEY'.")
      print("4. For 'Value', paste your API key.")
      print("5. Click 'Save secret'.")
      print("Then, run this cell again.")
      print("-----------------------------------------------------------------------")
      api_key = None # Ensure api_key is None if not found

    # Only proceed with API call if API key is available
    if api_key:
        print("API key successfully loaded from Colab Secrets.")

        # Initial API call to list geographies
        base_url = api_documentation['servers'][0]['url'].rstrip('/')
        geographies_path = '/geographies'

        geographies_url = f"{base_url}{geographies_path}"

        headers = {
            "x-api-key": api_key
        }

        print(f"Attempting to call: {geographies_url}")
        response = requests.get(geographies_url, headers=headers)

        if response.status_code == 200:
          geographies_data = response.json()
          print("Available Geographies:")
          print(json.dumps(geographies_data, indent=2))
        else:
          print(f"Error: API call failed with status code {response.status_code}")
          print(response.text)
          if response.status_code == 401:
              print("Authentication Error: Please check your API key in Colab secrets ('API_KEY').")

except requests.exceptions.RequestException as e:
  print(f"Error: An error occurred during the API request: {e}")

# Task
Detect outlier water accounts in the Demo geography for 2024, and display a table with the water account name, owner, parcel acreage, groundwater usage, and percent deviation from the average groundwater usage for the year. Use the data model found at "https://raw.githubusercontent.com/esassoc/qanat-community/develop/Qanat.CommunityAPI/Examples/high_level_data_model.png" and ensure correct API calls and parameters (e.g., geographyID instead of geographyName).

## Understand the api

### Subtask:
Download and load the API documentation to understand available endpoints and parameters.


**Reasoning**:
The first step is to download the API documentation and save it to a file, then load it into a dictionary.



In [None]:
import requests
import json

url = "https://raw.githubusercontent.com/esassoc/qanat-community/develop/Qanat.CommunityAPI/Examples/groundwater-accounting-platform-api.json"
response = requests.get(url)
response.raise_for_status()

with open("groundwater-accounting-platform-api.json", "w") as f:
    f.write(response.text)

with open('groundwater-accounting-platform-api.json', 'r') as f:
    api_documentation = json.load(f)

print(json.dumps(api_documentation, indent=2))

{
  "openapi": "3.0.1",
  "info": {
    "title": "Groundwater Accounting Platform API",
    "description": "Before you start using the Groundwater Accounting Platform API, you will need to obtain an API key from the project team. The Groundwater Accounting Platform REST API provides resource-oriented urls to fetch data as JSON.",
    "termsOfService": "https://groundwateraccounting.org/terms-of-service",
    "contact": {
      "name": "Contact Us",
      "email": "info@groundwateraccounting.org"
    },
    "license": {
      "name": "License",
      "url": "https://groundwateraccounting.org/license"
    },
    "version": "1.0"
  },
  "servers": [
    {
      "url": "http://api-qa.groundwateraccounting.org/"
    }
  ],
  "paths": {
    "/geographies": {
      "get": {
        "tags": [
          "Geographies"
        ],
        "summary": "List",
        "description": "List all available geographies",
        "responses": {
          "200": {
            "description": "OK",
          

## Fetch water usage data

### Subtask:
Use the API to fetch water usage data for the specified geography and year.


**Reasoning**:
Construct the API URL, make the GET request, handle the response, and load the data into a DataFrame.



In [None]:
base_url = api_documentation['servers'][0]['url'].rstrip('/')
water_usage_path = api_documentation['paths']['/geographies/{geographyID}/years/{year}/usage-locations']['get']['parameters'][0]['x-api-path-helper']

water_usage_url = f"{base_url}{water_usage_path.replace('{geographyID}', str(demo_geography_id)).replace('{year}', str(year))}"

headers = {
    "x-api-key": api_key
}

print(f"Attempting to call: {water_usage_url}")
response = requests.get(water_usage_url, headers=headers)

if response.status_code == 200:
  water_usage_data = response.json()
  water_usage_df = pd.DataFrame(water_usage_data)
  display(water_usage_df.head())
  water_usage_df.info()
else:
  print(f"Error: API call failed with status code {response.status_code}")
  print(response.text)

KeyError: 'x-api-path-helper'

**Reasoning**:
The previous attempt to get the path from the API documentation failed because the key 'x-api-path-helper' does not exist in the parameters list. I will inspect the `api_documentation` to find the correct path for fetching water usage data.



In [None]:
import pandas as pd
print(json.dumps(api_documentation['paths']['/geographies/{geographyID}/years/{year}/usage-locations']['get'], indent=2))

{
  "tags": [
    "Usage Locations"
  ],
  "summary": "List by Geography and Reporting Period",
  "description": "List all usage locations for a specified geography and reporting period",
  "parameters": [
    {
      "name": "geographyID",
      "in": "path",
      "required": true,
      "schema": {
        "type": "integer",
        "format": "int32"
      }
    },
    {
      "name": "year",
      "in": "path",
      "required": true,
      "schema": {
        "type": "integer",
        "format": "int32"
      }
    }
  ],
  "responses": {
    "200": {
      "description": "OK",
      "content": {
        "application/json": {
          "schema": {
            "type": "array",
            "items": {
              "$ref": "#/components/schemas/UsageLocationConsumerDto"
            }
          }
        }
      }
    },
    "403": {
      "description": "Forbidden",
      "content": {
        "application/json": {
          "schema": {
            "$ref": "#/components/schemas/Proble

**Reasoning**:
Based on the API documentation output, the path for fetching usage locations by geography and year is `/geographies/{geographyID}/years/{year}/usage-locations`. I will use this path to construct the URL and fetch the data.



In [None]:
water_usage_path = '/geographies/{geographyID}/years/{year}/usage-locations'

water_usage_url = f"{base_url}{water_usage_path.replace('{geographyID}', str(demo_geography_id)).replace('{year}', str(year))}"

headers = {
    "x-api-key": api_key
}

print(f"Attempting to call: {water_usage_url}")
response = requests.get(water_usage_url, headers=headers)

if response.status_code == 200:
  water_usage_data = response.json()
  water_usage_df = pd.DataFrame(water_usage_data)
  display(water_usage_df.head())
  water_usage_df.info()
else:
  print(f"Error: API call failed with status code {response.status_code}")
  print(response.text)

Attempting to call: http://api-qa.groundwateraccounting.org/geographies/5/years/2024/usage-locations


Unnamed: 0,UsageLocationID,Name,Area,UsageLocationType,WaterAccountID,WaterAccountNumber,ParcelID,ParcelNumber,ParcelZones,ReportingPeriodID,ReportingPeriodName,GeographyID
0,593058,555-042-93,626.913154,Irrigated Agricultural Lands,13.0,10013.0,52473,555-042-93,"[{'ZoneID': 13, 'ZoneName': 'Zone 3', 'ZoneGro...",38,2024,5
1,593068,555-043-82,237.146219,Irrigated Agricultural Lands,6.0,10006.0,52474,555-043-82,"[{'ZoneID': 14, 'ZoneName': 'Zone 4', 'ZoneGro...",38,2024,5
2,593078,555-044-68,8.688234,Irrigated Agricultural Lands,30.0,10030.0,52475,555-044-68,"[{'ZoneID': 12, 'ZoneName': 'Zone 2', 'ZoneGro...",38,2024,5
3,593088,555-045-44,56.416935,Irrigated Agricultural Lands,50.0,10050.0,52476,555-045-44,"[{'ZoneID': 13, 'ZoneName': 'Zone 3', 'ZoneGro...",38,2024,5
4,593098,555-046-93,100.317954,Irrigated Agricultural Lands,49.0,10049.0,52477,555-046-93,"[{'ZoneID': 12, 'ZoneName': 'Zone 2', 'ZoneGro...",38,2024,5


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 126 entries, 0 to 125
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   UsageLocationID      126 non-null    int64  
 1   Name                 126 non-null    object 
 2   Area                 126 non-null    float64
 3   UsageLocationType    126 non-null    object 
 4   WaterAccountID       117 non-null    float64
 5   WaterAccountNumber   117 non-null    float64
 6   ParcelID             126 non-null    int64  
 7   ParcelNumber         126 non-null    object 
 8   ParcelZones          126 non-null    object 
 9   ReportingPeriodID    126 non-null    int64  
 10  ReportingPeriodName  126 non-null    object 
 11  GeographyID          126 non-null    int64  
dtypes: float64(3), int64(4), object(5)
memory usage: 11.9+ KB


## Fetch parcel data

### Subtask:
Use the API to fetch parcel data for the specified geography.


**Reasoning**:
Inspect the api_documentation to find the correct API endpoint and required parameters for fetching parcel data by geography ID.



In [None]:
print(json.dumps(api_documentation['paths']['/geographies/{geographyID}/parcels']['get'], indent=2))

{
  "tags": [
    "Parcels"
  ],
  "summary": "List by Geography",
  "description": "List all parcels for a specified geography",
  "parameters": [
    {
      "name": "geographyID",
      "in": "path",
      "required": true,
      "schema": {
        "type": "integer",
        "format": "int32"
      }
    }
  ],
  "responses": {
    "200": {
      "description": "OK",
      "content": {
        "application/json": {
          "schema": {
            "type": "array",
            "items": {
              "$ref": "#/components/schemas/ParcelConsumerDto"
            }
          }
        }
      }
    },
    "403": {
      "description": "Forbidden",
      "content": {
        "application/json": {
          "schema": {
            "$ref": "#/components/schemas/ProblemDetails"
          }
        }
      }
    },
    "404": {
      "description": "Not Found",
      "content": {
        "application/json": {
          "schema": {
            "$ref": "#/components/schemas/ProblemDetails"


**Reasoning**:
Construct the full URL for the API call using the base_url, the identified parcel data path, and the demo_geography_id, make the GET request, check the status code, parse the JSON response into a pandas DataFrame, and display the head and info of the DataFrame.



In [None]:
parcel_path = '/geographies/{geographyID}/parcels'

parcels_url = f"{base_url}{parcel_path.replace('{geographyID}', str(demo_geography_id))}"

print(f"Attempting to call: {parcels_url}")
response = requests.get(parcels_url, headers=headers)

if response.status_code == 200:
  parcels_data = response.json()
  parcels_df = pd.DataFrame(parcels_data)
  display(parcels_df.head())
  parcels_df.info()
else:
  print(f"Error: API call failed with status code {response.status_code}")
  print(response.text)

Attempting to call: http://api-qa.groundwateraccounting.org/geographies/5/parcels


Unnamed: 0,ParcelID,ParcelNumber,ParcelArea,OwnerName,OwnerAddress,WaterAccountID,Zones,GeographyID
0,52473,555-042-93,626.9132,Crop Circle Farms,"1234 Olive Drive, Bakersfield, CA 93308",13.0,"[{'ZoneID': 13, 'ZoneName': 'Zone 3', 'ZoneGro...",5
1,52474,555-043-82,237.1462,Berry Nutty Farms,"5555 Stockdale Highway, Bakersfield, CA 93309",6.0,"[{'ZoneID': 14, 'ZoneName': 'Zone 4', 'ZoneGro...",5
2,52475,555-044-68,8.6882,Melon Madness Farms,"7890 Ming Avenue, Bakersfield, CA 93309",30.0,"[{'ZoneID': 12, 'ZoneName': 'Zone 2', 'ZoneGro...",5
3,52476,555-045-44,56.4169,The Jolly Green Giant's Garden,"2345 Chester Avenue, Bakersfield, CA 93301",50.0,"[{'ZoneID': 13, 'ZoneName': 'Zone 3', 'ZoneGro...",5
4,52477,555-046-93,100.318,Sweet Pea's Farm,"6789 H Street, Bakersfield, CA 93304",49.0,"[{'ZoneID': 12, 'ZoneName': 'Zone 2', 'ZoneGro...",5


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 126 entries, 0 to 125
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ParcelID        126 non-null    int64  
 1   ParcelNumber    126 non-null    object 
 2   ParcelArea      126 non-null    float64
 3   OwnerName       126 non-null    object 
 4   OwnerAddress    126 non-null    object 
 5   WaterAccountID  117 non-null    float64
 6   Zones           126 non-null    object 
 7   GeographyID     126 non-null    int64  
dtypes: float64(2), int64(2), object(4)
memory usage: 8.0+ KB


## Fetch water accounts data

### Subtask:
Use the API to fetch water accounts data for the specified geography.


**Reasoning**:
Inspect the API documentation to find the water accounts endpoint and then construct the URL and make the API call to fetch the water accounts data.



In [None]:
print(json.dumps(api_documentation['paths']['/geographies/{geographyID}/water-accounts']['get'], indent=2))

water_accounts_path = '/geographies/{geographyID}/water-accounts'

water_accounts_url = f"{base_url}{water_accounts_path.replace('{geographyID}', str(demo_geography_id))}"

print(f"Attempting to call: {water_accounts_url}")
response = requests.get(water_accounts_url, headers=headers)

if response.status_code == 200:
  water_accounts_data = response.json()
  water_accounts_df = pd.DataFrame(water_accounts_data)
  display(water_accounts_df.head())
  water_accounts_df.info()
else:
  print(f"Error: API call failed with status code {response.status_code}")
  print(response.text)

{
  "tags": [
    "Water Accounts"
  ],
  "summary": "List by Geography",
  "description": "List all water accounts for a specified geography",
  "parameters": [
    {
      "name": "geographyID",
      "in": "path",
      "required": true,
      "schema": {
        "type": "integer",
        "format": "int32"
      }
    }
  ],
  "responses": {
    "200": {
      "description": "OK",
      "content": {
        "application/json": {
          "schema": {
            "type": "array",
            "items": {
              "$ref": "#/components/schemas/WaterAccountConsumerDto"
            }
          }
        }
      }
    },
    "403": {
      "description": "Forbidden",
      "content": {
        "application/json": {
          "schema": {
            "$ref": "#/components/schemas/ProblemDetails"
          }
        }
      }
    },
    "404": {
      "description": "Not Found",
      "content": {
        "application/json": {
          "schema": {
            "$ref": "#/components/sche

Unnamed: 0,WaterAccountID,WaterAccountNumber,WaterAccountName,Notes,WaterAccountPIN,WaterAccountPINLastUsed,WaterAccountContactName,ContactEmail,ContactPhoneNumber,FullAddress,GeographyID
0,1,10001,Oak Grove Farms,Notes go here.,GNN-308,,Oak Grove Farms,,,"1717 Emerald Court, Bakersfield, CA 93309",5
1,2,10002,Apple Bottom Farms,,YHA-551,2024-06-11T21:28:54.233,Apple Bottom Farms,,,"3232 Sunrise Road, Bakersfield, CA 93304",5
2,3,10003,Baa Baa Black Sheep Farms,,HXM-063,2024-03-06T22:10:52.94,Baa Baa Black Sheep Farms,,,"2525 Golden Hills Drive, Bakersfield, CA 93309",5
3,4,10004,Barnyard Bonanza,,FMG-461,2025-07-18T16:43:47.277,Barnyard Bonanza,,,"4567 Union Avenue, Bakersfield, CA 93305",5
4,5,10005,Berry Funny Farms,,BBS-351,2025-03-12T23:41:34.95,Berry Funny Farms,,,"2828 Park Avenue, Bakersfield, CA 93301",5


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 11 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   WaterAccountID           53 non-null     int64 
 1   WaterAccountNumber       53 non-null     int64 
 2   WaterAccountName         53 non-null     object
 3   Notes                    1 non-null      object
 4   WaterAccountPIN          53 non-null     object
 5   WaterAccountPINLastUsed  15 non-null     object
 6   WaterAccountContactName  52 non-null     object
 7   ContactEmail             1 non-null      object
 8   ContactPhoneNumber       2 non-null      object
 9   FullAddress              52 non-null     object
 10  GeographyID              53 non-null     int64 
dtypes: int64(3), object(8)
memory usage: 4.7+ KB


## Calculate total groundwater usage

### Subtask:
Calculate the total groundwater usage per water account from the fetched water usage data.


**Reasoning**:
Calculate the total groundwater usage per water account by grouping the water usage data by WaterAccountID and summing the ReportedValueInAcreFeet, then rename and reset the index.



In [None]:
water_usage_summary_df = water_usage_df.groupby('WaterAccountID')['ReportedValueInAcreFeet'].sum().reset_index()
water_usage_summary_df = water_usage_summary_df.rename(columns={'ReportedValueInAcreFeet': 'TotalGroundwaterUsage'})
display(water_usage_summary_df.head())

KeyError: 'Column not found: ReportedValueInAcreFeet'

**Reasoning**:
The previous attempt failed because the column 'ReportedValueInAcreFeet' was not found in `water_usage_df`. I need to inspect the `water_usage_df` to identify the correct column containing the usage values and then recalculate the total groundwater usage per water account.



In [None]:
print(water_usage_df.columns)

Index(['UsageLocationID', 'Name', 'Area', 'UsageLocationType',
       'WaterAccountID', 'WaterAccountNumber', 'ParcelID', 'ParcelNumber',
       'ParcelZones', 'ReportingPeriodID', 'ReportingPeriodName',
       'GeographyID'],
      dtype='object')


**Reasoning**:
The column names of `water_usage_df` do not include 'ReportedValueInAcreFeet'. Based on the available columns and the task, the relevant column for water usage is likely not directly in this DataFrame. I need to check the available dataframes to find the one containing the reported values in acre-feet and the UsageLocationID to link it back to the WaterAccountID. From the context, `water_measurements_df` contains `ReportedValueInAcreFeet` and `UsageLocationID`. I will merge `water_usage_df` and `water_measurements_df` on `UsageLocationID` to get the reported values along with the WaterAccountID, and then proceed with the aggregation.



In [None]:
combined_df = pd.merge(water_usage_df, water_measurements_df, on='UsageLocationID', how='left')
water_usage_summary_df = combined_df.groupby('WaterAccountID')['ReportedValueInAcreFeet'].sum().reset_index()
water_usage_summary_df = water_usage_summary_df.rename(columns={'ReportedValueInAcreFeet': 'TotalGroundwaterUsage'})
display(water_usage_summary_df.head())

Unnamed: 0,WaterAccountID,TotalGroundwaterUsage
0,1.0,276.9617
1,2.0,2897.1563
2,3.0,1599.7386
3,4.0,3492.9269
4,5.0,954.6723


## Calculate average groundwater usage

### Subtask:
Calculate the average groundwater usage across all water accounts.


**Reasoning**:
Calculate the mean of the 'TotalGroundwaterUsage' column and store it in `average_groundwater_usage`.



In [None]:
average_groundwater_usage = water_usage_summary_df['TotalGroundwaterUsage'].mean()
print(f"Average Groundwater Usage: {average_groundwater_usage:.2f} acre-feet")

Average Groundwater Usage: 1622.66 acre-feet


## Identify outliers

### Subtask:
Determine outliers based on a defined deviation threshold from the average groundwater usage.


**Reasoning**:
Calculate the percentage deviation from the average, add it as a new column, define the deviation threshold, filter for outliers, and display the resulting dataframe.



In [None]:
water_usage_summary_df['PercentDeviationFromAverage'] = ((water_usage_summary_df['TotalGroundwaterUsage'] - average_groundwater_usage) / average_groundwater_usage) * 100

deviation_threshold = 50

outliers_df = water_usage_summary_df[abs(water_usage_summary_df['PercentDeviationFromAverage']) > deviation_threshold].copy()

display(outliers_df)

Unnamed: 0,WaterAccountID,TotalGroundwaterUsage,PercentDeviationFromAverage
0,1.0,276.9617,-82.931591
1,2.0,2897.1563,78.543996
3,4.0,3492.9269,115.259745
5,6.0,2786.8419,71.745615
7,8.0,798.5894,-50.785071
10,11.0,21.5239,-98.67354
12,13.0,3881.739,139.221195
13,14.0,661.8033,-59.214832
17,18.0,6544.6675,303.330359
18,19.0,2519.7075,55.282836


## Merge data

### Subtask:
Combine the outlier water usage data with parcel acreage and water account details.


**Reasoning**:
Merge the outliers_df with parcel_area_summary_df and water_accounts_df to combine the relevant information for outlier water accounts.



In [None]:
merged_df = outliers_df.merge(parcel_area_summary_df, on='WaterAccountID', how='left')
merged_df = merged_df.merge(water_accounts_df, on='WaterAccountID', how='left')

display(merged_df.head())
merged_df.info()

Unnamed: 0,WaterAccountID,TotalGroundwaterUsage,PercentDeviationFromAverage,TotalParcelArea,WaterAccountNumber,WaterAccountName,Notes,WaterAccountPIN,WaterAccountPINLastUsed,WaterAccountContactName,ContactEmail,ContactPhoneNumber,FullAddress,GeographyID
0,1.0,276.9617,-82.931591,162.8301,10001,Oak Grove Farms,Notes go here.,GNN-308,,Oak Grove Farms,,,"1717 Emerald Court, Bakersfield, CA 93309",5
1,2.0,2897.1563,78.543996,770.4239,10002,Apple Bottom Farms,,YHA-551,2024-06-11T21:28:54.233,Apple Bottom Farms,,,"3232 Sunrise Road, Bakersfield, CA 93304",5
2,4.0,3492.9269,115.259745,1362.6959,10004,Barnyard Bonanza,,FMG-461,2025-07-18T16:43:47.277,Barnyard Bonanza,,,"4567 Union Avenue, Bakersfield, CA 93305",5
3,6.0,2786.8419,71.745615,997.6411,10006,Berry Nutty Farms,,BBR-932,2025-04-25T20:09:39.953,Berry Nutty Farms,,,"5555 Stockdale Highway, Bakersfield, CA 93309",5
4,8.0,798.5894,-50.785071,200.5109,10008,Chirpy Chicken Farms,,ZOD-709,,Test merced contact,,5554443333.0,"1920 Grogan St, merced, CA 95341",5


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   WaterAccountID               33 non-null     float64
 1   TotalGroundwaterUsage        33 non-null     float64
 2   PercentDeviationFromAverage  33 non-null     float64
 3   TotalParcelArea              33 non-null     float64
 4   WaterAccountNumber           33 non-null     int64  
 5   WaterAccountName             33 non-null     object 
 6   Notes                        1 non-null      object 
 7   WaterAccountPIN              33 non-null     object 
 8   WaterAccountPINLastUsed      9 non-null      object 
 9   WaterAccountContactName      33 non-null     object 
 10  ContactEmail                 1 non-null      object 
 11  ContactPhoneNumber           2 non-null      object 
 12  FullAddress                  33 non-null     object 
 13  GeographyID           

## Create outlier summary table

### Subtask:
Format the merged data into a table with the requested columns: Water Account Name, Owner, Parcel Acreage, Groundwater Usage, and Percent Deviation From Average.


**Reasoning**:
Select and rename columns to create the outlier summary table.



In [None]:
outlier_summary_table = merged_df[['WaterAccountName', 'WaterAccountContactName', 'TotalParcelArea', 'TotalGroundwaterUsage', 'PercentDeviationFromAverage']].copy()
outlier_summary_table = outlier_summary_table.rename(columns={
    'WaterAccountName': 'Water Account Name',
    'WaterAccountContactName': 'Owner',
    'TotalParcelArea': 'Parcel Acreage',
    'TotalGroundwaterUsage': 'Groundwater Usage',
    'PercentDeviationFromAverage': 'Percent Deviation From Average'
})
display(outlier_summary_table)

Unnamed: 0,Water Account Name,Owner,Parcel Acreage,Groundwater Usage,Percent Deviation From Average
0,Oak Grove Farms,Oak Grove Farms,162.8301,276.9617,-82.931591
1,Apple Bottom Farms,Apple Bottom Farms,770.4239,2897.1563,78.543996
2,Barnyard Bonanza,Barnyard Bonanza,1362.6959,3492.9269,115.259745
3,Berry Nutty Farms,Berry Nutty Farms,997.6411,2786.8419,71.745615
4,Chirpy Chicken Farms,Test merced contact,200.5109,798.5894,-50.785071
5,Country Acres Farms,Country Acres Farms,19.6834,21.5239,-98.67354
6,Crop Circle Farms,Crop Circle Farms,2030.5502,3881.739,139.221195
7,Farmyard Frenzy,Farmyard Frenzy,237.9525,661.8033,-59.214832
8,Green Meadows Farm,Green Meadows Farm,3281.8958,6544.6675,303.330359
9,Green Pastures Produce,Green Pastures Produce,1262.1779,2519.7075,55.282836


## Display outlier summary table

### Subtask:
Display the final table containing the outlier water accounts and their relevant details.


**Reasoning**:
Display the final outlier summary table.



In [None]:
display(outlier_summary_table)

Unnamed: 0,Water Account Name,Owner,Parcel Acreage,Groundwater Usage,Percent Deviation From Average
0,Oak Grove Farms,Oak Grove Farms,162.8301,276.9617,-82.931591
1,Apple Bottom Farms,Apple Bottom Farms,770.4239,2897.1563,78.543996
2,Barnyard Bonanza,Barnyard Bonanza,1362.6959,3492.9269,115.259745
3,Berry Nutty Farms,Berry Nutty Farms,997.6411,2786.8419,71.745615
4,Chirpy Chicken Farms,Test merced contact,200.5109,798.5894,-50.785071
5,Country Acres Farms,Country Acres Farms,19.6834,21.5239,-98.67354
6,Crop Circle Farms,Crop Circle Farms,2030.5502,3881.739,139.221195
7,Farmyard Frenzy,Farmyard Frenzy,237.9525,661.8033,-59.214832
8,Green Meadows Farm,Green Meadows Farm,3281.8958,6544.6675,303.330359
9,Green Pastures Produce,Green Pastures Produce,1262.1779,2519.7075,55.282836


## Summary:

### Data Analysis Key Findings

*   The average groundwater usage across all water accounts in the "Demo" geography for 2024 was 1622.66 acre-feet.
*   33 water accounts were identified as outliers, having a total groundwater usage deviating by more than 50% from the average usage.
*   The outlier water accounts' data was successfully combined with parcel acreage and owner information.
*   A summary table containing the Water Account Name, Owner, Parcel Acreage, Groundwater Usage, and Percent Deviation From Average for the identified outliers was created and displayed.

### Insights or Next Steps

*   Investigate the outlier accounts with the highest deviations to understand the reasons for significantly higher or lower groundwater usage compared to the average.
*   Visualize the distribution of groundwater usage and the identified outliers to gain a better understanding of the data spread and the impact of the deviation threshold.
