# WHISP pure Cloud Function

WHISP (WHat IS in that Plot?) is a tool developed by FAO to aid in due diligence reporting for supply chain sustainability.  This notebook demonstrates how to use [WHISP](https://openforis.org/solutions/whisp/) to get sustainable sourcing information from the Google geospatial stack to your workflow.

**WARNING**: These demos consume billable resources and may result in charges to your account!

# Setup

## Import Python packages for Setup

In [2]:
import os

## Determine the GCP project name and compute region

In [3]:
def is_running_in_colab():
    """
    Check if the code is being executed within Google Colab.
    Returns True if running in Colab, False otherwise.
    """
    try:
        from google.colab import userdata
        return True
    except ModuleNotFoundError:
        return False

def load_environment_variable(variable_name):
    """
    Load an environment variable from Colab secrets or from the local environment.
    Raises an exception if the variable is not found.
    """
    
    if is_running_in_colab():
        value = userdata.get(variable_name)
        if value is None:
            raise ValueError(f"'{variable_name}' not found in Colab secrets")
        return value
    else:
        value = os.getenv(variable_name)
        if value is None:
            raise ValueError(f"'{variable_name}' not found in the local environment variables")
        return value

In [5]:
project = load_environment_variable('EE_PROJECT_ID')
region = os.getenv('GCP_REGION')

In [None]:
!gcloud auth login --project {PROJECT} --billing-project {PROJECT} --update-adc

# Create the Cloud function and deploy it

The workflow here is to create a working directory, download key parts of WHISP from the [WHISP GitHub repo](https://github.com/forestdatapartnership/whisp) and embed them in a custom analysis.  First, make a working directory to hold all the code.

In [None]:
!mkdir whisper

Get `datsets.py` from the WHISP repo.  Note that you can inspect the code using the file browser to the left side of the notebook.

In [None]:
!curl https://raw.githubusercontent.com/forestdatapartnership/whisp/main/src/openforis_whisp/datasets.py --output whisper/datasets.py

## EEasify WHISP

Load `list_functions()` from the `datasets.py` file to get a list of images to use.

In [None]:
%%writefile whisper/easy_whisp.py

import google.auth
import ee
from typing import List

# First, initialize.
credentials, _ = google.auth.default(
    scopes=['https://www.googleapis.com/auth/earthengine']
)
ee.Initialize(credentials, project=PROJECT, opt_url='https://earthengine-highvolume.googleapis.com')

from datasets import list_functions

def easy_whisp() -> List[ee.Image]:
    """Returns the stack as a list of images."""
    images_list = []
    for func in list_functions():
      try:
        image = func()
        images_list.append(image)
      except ee.EEException as e:
        logging.error(str(e))
    return images_list

For each image in the list of relevant datasets provided by WHISP, do `reduceRegions` in parallel.

In [None]:
%%writefile whisper/main.py

import json
import ee
from flask import jsonify
import functions_framework
import logging
import requests
import google.auth
import google.cloud.logging
from google.api_core import retry
import concurrent.futures

from easy_whisp import easy_whisp

client = google.cloud.logging.Client()
client.setup_logging()


_WHISP_IMAGES = easy_whisp()


@retry.Retry()
def get_stats(region, image):
  """"""
  return image.reduceRegion(
      reducer=ee.Reducer.mean(), geometry=region, scale=10).getInfo()


@retry.Retry()
def get_whisp_stats(geojson):
  """"""
  region = ee.Geometry(geojson)
  whisp_stats = {}
  # Use a ThreadPoolExecutor with 20 workers for parallel execution.
  with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
    future_to_image = {executor.submit(
        get_stats, region=region, image=img): img for img in _WHISP_IMAGES}
    for future in concurrent.futures.as_completed(future_to_image):
      img = future_to_image[future]
      try:
        image_stats = future.result()
        whisp_stats.update(image_stats)
      except ee.EEException as e:
        logging.error(f'{img} generated an exception: {e}')
  return whisp_stats


@functions_framework.http
def main(request):
  """"""
  credentials, _ = google.auth.default(
      scopes=['https://www.googleapis.com/auth/earthengine']
  )
  ee.Initialize(credentials, project=PROJECT)
  try:
    replies = []
    request_json = request.get_json(silent=True)
    calls = request_json['calls']
    for call in calls:
      geo_json = json.loads(call[0])
      try:
        logging.info([geo_json])
        response = get_whisp_stats(geo_json)
        logging.info(response)
        replies.append(json.dumps(response))
      except Exception as e:
        logging.error(str(e))
        replies.append(json.dumps( { "errorMessage": str(e) } ))
    return jsonify(replies=replies, status=200, mimetype='application/json')
  except Exception as e:
    error_string = str(e)
    logging.error(error_string)
    return jsonify(error=error_string, status=400, mimetype='application/json')

In [None]:
%%writefile whisper/requirements.txt
earthengine-api
flask
functions-framework
google-api-core
google-cloud-logging
requests

## Deploy the Cloud Function

In [None]:
!gcloud functions deploy 'whisper' \
  --gen2 \
  --region={REGION} \
  --project={PROJECT} \
  --runtime=python312 \
  --source='whisper' \
  --entry-point=main \
  --trigger-http \
  --no-allow-unauthenticated \
  --timeout=300s

## Load WHISP example data

Get WHISP example data from GitHub and use it to test the Cloud Function.

In [None]:
import json

fc_list = !curl https://raw.githubusercontent.com/forestdatapartnership/whisp/main/tests/fixtures/geojson_example.geojson
fc_obj = json.loads("\n".join(fc_list))
features = fc_obj['features']
# See https://code.earthengine.google.com/e7d74cb4694589fc8a2e9923404730b4
feature = features[4]
feature

In [None]:
# Get the geometries.
geoms = [f['geometry'] for f in features]

In [None]:
geoms[0]

In [None]:
json.dumps(geoms[0], separators=(',', ':'))

In [None]:
import ee
ee.Initialize(project=PROJECT)

In [None]:
print(ee.Geometry(geoms[0]).getInfo())

## Test the deployed Cloud Function

In [None]:
!gcloud auth print-identity-token

Make a test request out of the WHISP sample data.

In [None]:
import json

test_calls = [[json.dumps(g), 'foo_string', 'bar_string'] for g in geoms]
test_request = json.dumps({'calls': test_calls}, separators=(',', ':')).join("''")

In [None]:
test_request

Make the request (might take a while).

In [None]:
responses = !curl -X POST https://{REGION}-{PROJECT}.cloudfunctions.net/whisper \
  -H "Authorization: bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" \
  -d {test_request}

### Inspect the output of the function

The keys are useful for making the SQL to use in BigQuery.

In [None]:
print(len(responses))
response = responses[0]
response_json = json.loads(response)
replies = response_json['replies']
print(len(replies))
reply_0 = replies[0]
reply_0_json = json.loads(reply_0)
reply_0_json.keys()

# Create a remote connection in BQ

Follow [this BigQuery guide](https://cloud.google.com/bigquery/docs/remote-functions#create_a_remote_function) to set up a connection to the Cloud Function deployed previously.  Once the connection is set up, create a function to use in queries.  Run this SQL in BQ, replacing with your project ID.

```
CREATE OR REPLACE FUNCTION `forest-data-partnership.WHISP_DEMO.whisp`(geom STRING) RETURNS STRING
REMOTE WITH CONNECTION `forest-data-partnership.us-central1.whisp`
OPTIONS (
  endpoint = 'https://us-central1-forest-data-partnership.cloudfunctions.net/whisper',
  max_batching_rows = 1
)
```

Once that's done, you can use your `easy_whisp` function in queries!  The keys extracted from the test response are useful for building the `SQL` that represents this query.  Note that the input table must have a geometry column and that the geometries are passed to the function as GeoJSON strings:

In [None]:
SQL_TEMPLATE = [f"JSON_EXTRACT_SCALAR(json_data, '$.{key}') AS {key}," for key in reply_0_json.keys()]
SQL_TEMPLATE = ['SELECT', 'geometry,'] + SQL_TEMPLATE
SQL_TEMPLATE = SQL_TEMPLATE + [
    'FROM',
    '`forest-data-partnership.WHISP_DEMO.input_examples`,',
    'UNNEST([SAFE.PARSE_JSON(`forest-data-partnership.WHISP_DEMO`.whisp(ST_ASGEOJSON(geometry)))]) AS json_data']

print('\n'.join(SQL_TEMPLATE))

Take that `SQL` blob over to BigQuery and run it!

# Next Steps

- Take that `SQL` blob over to BigQuery and run it!
- Try the [Sustainable Sourcing Cloud Function demo notebook](https://colab.research.google.com/drive/1cQyqNaiK3nP65I-LunRkQyLLXTakaYs9?resourcekey=0-YAkoE8VC9drgqa1PE6uAGA&usp=sharing).