<a href="https://colab.research.google.com/github/nicolevasos/GeoCitizens/blob/main/notebooks/Colab_whisp_geojson_to_csv.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/forestdatapartnership/whisp/blob/main/notebooks/Colab_whisp_geojson_to_csv.ipynb)

# Whisp a geojson

Python Notebook pathway for [Whisp](https://openforis.org/solutions/whisp/) running in the cloud via [Google Colab](https://colab.google/).

**To open:**
click badge at top.

**To run:** click play buttons (or press shift + enter)

**Requirements:** Google Earth Engine (GEE) account and registered cloud project.



- **Aim:** support compliance with zero deforestation regulations
- **Input**: geojson file of plot boundaries or points
- **Output**: CSV table and geojson containing statistics and risk indicators

### Setup Google Earth Engine

In [1]:
import ee

# Google Earth Engine project name
gee_project_name = "ee-dnsalazar10" # change to your project name. If unsure see here: https://developers.google.com/earth-engine/cloud/assets)

# NB opens browser to allow access
ee.Authenticate()

# initialize with chosen project
ee.Initialize(project=gee_project_name)

*** Earth Engine *** Share your feedback by taking our Annual Developer Satisfaction Survey: https://google.qualtrics.com/jfe/form/SV_7TDKVSyKvBdmMqW?ref=4i2o6


### Install and import packages

In [2]:
# Install openforis-whisp (if not already installed)
!pip install --pre openforis-whisp

Collecting openforis-whisp
  Downloading openforis_whisp-2.0.0a6-py3-none-any.whl.metadata (16 kB)
Collecting country_converter<2.0.0,>=0.7 (from openforis-whisp)
  Downloading country_converter-1.3.1-py3-none-any.whl.metadata (25 kB)
Collecting geojson<3.0.0,>=2.5.0 (from openforis-whisp)
  Downloading geojson-2.5.0-py2.py3-none-any.whl.metadata (15 kB)
Collecting pandera<1.0.0,>=0.22.1 (from pandera[io]<1.0.0,>=0.22.1->openforis-whisp)
  Downloading pandera-0.26.1-py3-none-any.whl.metadata (10 kB)
Collecting typing_inspect>=0.6.0 (from pandera<1.0.0,>=0.22.1->pandera[io]<1.0.0,>=0.22.1->openforis-whisp)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting black (from pandera[io]<1.0.0,>=0.22.1->openforis-whisp)
  Downloading black-25.9.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.metadata (83 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.5/83.5 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?2

In [3]:
import openforis_whisp as whisp

top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```



### Get a geojson

- Files are stored tempoarily and can be viewed in a panel on the left (click on Folder icon to view).
- Press refresh if updates are not showing
- Alternatively you can work with files in your Google Drive: drive.mount('/content/drive')

In [4]:
#function to upload a geojson file. Download example here: https://github.com/andyarnell/whisp/tree/package-test-new-structure/tests/fixtures)
def import_geojson():
    from google.colab import files
    fn, content = next(iter(files.upload().items()))
    with open(f'/content/{fn}', 'wb') as f: f.write(content)
    return f'/content/{fn}'

In [5]:
GEOJSON_EXAMPLE_FILEPATH = import_geojson()
print(f"GEOJSON_EXAMPLE_FILEPATH: {GEOJSON_EXAMPLE_FILEPATH}")

Saving test1_poly.geojson to test1_poly.geojson
GEOJSON_EXAMPLE_FILEPATH: /content/test1_poly.geojson


### Whisp it

In [6]:
# Choose countries to process (currently three countries: 'co', 'ci', 'br')
iso2_codes_list = ['co', 'ci', 'br']  # Example ISO2 codes for including country specific data

In [7]:
import ee
import pandas as pd
import geopandas as gpd
import json

# Choose countries to process (currently three countries: 'co', 'ci', 'br')
iso2_codes_list = ['co', 'ci', 'br']  # Example ISO2 codes for including country specific data

# Read the geojson file directly into a GeoDataFrame
gdf = gpd.read_file(GEOJSON_EXAMPLE_FILEPATH)

# Convert any datetime columns to strings in the pandas DataFrame
for col in gdf.columns:
    if pd.api.types.is_datetime64_any_dtype(gdf[col]):
        gdf[col] = gdf[col].astype(str)

# Convert the GeoDataFrame to an Earth Engine FeatureCollection
ee_feature_collection = ee.FeatureCollection(gdf.__geo_interface__)

# Process the Earth Engine FeatureCollection with whisp
df_stats = whisp.whisp_formatted_stats_ee_to_df(
    ee_feature_collection,
    # external_id_column="user_id",# optional - specify which input column/property to map to the external ID.
    national_codes=iso2_codes_list,
    # unit_type='percent', # optional - to change unit type. Default is 'ha'.
    )

Whisp multiband image compiled
Creating schema for national_codes: ['co', 'ci', 'br']
external_id


### Display results

In [8]:
df_stats

Unnamed: 0,plotId,external_id,Area,Geometry_type,Country,ProducerCountry,Admin_Level_1,Centroid_lon,Centroid_lat,Unit,...,nBR_MapBiomas_col9_palmoil_2020,nBR_MapBiomas_col9_pc_2020,nBR_INPE_TCamz_cer_annual_2020,nBR_MapBiomas_col9_soy_2020,nBR_MapBiomas_col9_annual_crops_2020,nBR_INPE_TCamz_pasture_2020,nBR_INPE_TCcer_pasture_2020,nBR_MapBiomas_col9_pasture_2020,nCI_Cocoa_bnetd,geo
0,1,,154.604996,Polygon,COL,CO,Quindío,-75.783744,4.419095,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7912..."
1,2,,27.325001,MultiPolygon,COL,CO,Quindío,-75.780787,4.421473,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7816..."
2,3,,7.255,Polygon,COL,CO,Quindío,-75.779172,4.419302,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7809..."
3,4,,4.249,Polygon,COL,CO,Quindío,-75.784319,4.41691,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7853..."
4,5,,6.043,Polygon,COL,CO,Quindío,-75.787801,4.420453,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7892..."
5,6,,0.431,Polygon,COL,CO,Quindío,-75.782231,4.418663,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7826..."
6,7,,0.289,Polygon,COL,CO,Quindío,-75.78736,4.41889,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7877..."
7,8,,10.654,Polygon,COL,CO,Quindío,-75.788942,4.421013,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7904..."
8,9,,0.43,Polygon,COL,CO,Quindío,-75.78375,4.422371,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7841..."
9,10,,19.841,Polygon,COL,CO,Quindío,-75.786466,4.418606,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[-75.7893..."


### Add risk category columns

In [9]:
# adds risk columns to end of dataframe
df_w_risk = whisp.whisp_risk(df=df_stats,national_codes=iso2_codes_list)

Using unit type: ha


### Display updated table
- Scroll to far right to see additions

In [10]:
df_w_risk

Unnamed: 0,plotId,external_id,Area,Geometry_type,Country,ProducerCountry,Admin_Level_1,Centroid_lon,Centroid_lat,Unit,...,Ind_05_primary_2020,Ind_06_nat_reg_forest_2020,Ind_07_planted_plantations_2020,Ind_08_planted_plantations_after_2020,Ind_09_treecover_after_2020,Ind_10_agri_after_2020,Ind_11_logging_concession_before_2020,risk_pcrop,risk_acrop,risk_timber
0,1,,154.604996,Polygon,COL,CO,Quindío,-75.783744,4.419095,ha,...,no,yes,no,no,yes,yes,no,low,low,low
1,2,,27.325001,MultiPolygon,COL,CO,Quindío,-75.780787,4.421473,ha,...,no,yes,no,no,yes,yes,no,low,low,low
2,3,,7.255,Polygon,COL,CO,Quindío,-75.779172,4.419302,ha,...,no,yes,no,no,yes,yes,no,low,low,low
3,4,,4.249,Polygon,COL,CO,Quindío,-75.784319,4.41691,ha,...,no,yes,no,no,yes,yes,no,low,low,low
4,5,,6.043,Polygon,COL,CO,Quindío,-75.787801,4.420453,ha,...,no,yes,no,no,yes,yes,no,low,low,low
5,6,,0.431,Polygon,COL,CO,Quindío,-75.782231,4.418663,ha,...,no,yes,no,no,yes,yes,no,more_info_needed,more_info_needed,high
6,7,,0.289,Polygon,COL,CO,Quindío,-75.78736,4.41889,ha,...,no,yes,no,no,no,yes,no,low,low,low
7,8,,10.654,Polygon,COL,CO,Quindío,-75.788942,4.421013,ha,...,no,yes,no,no,yes,yes,no,low,low,low
8,9,,0.43,Polygon,COL,CO,Quindío,-75.78375,4.422371,ha,...,no,yes,no,no,yes,yes,no,more_info_needed,more_info_needed,high
9,10,,19.841,Polygon,COL,CO,Quindío,-75.786466,4.418606,ha,...,no,yes,no,no,yes,yes,no,low,low,low



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



In [69]:
columns_to_plot = list(df_w_risk.columns[:10]) + [df_w_risk.columns[-3]]
df_subset = df_w_risk[columns_to_plot]
df_subset[:5]

Unnamed: 0,plotId,external_id,Area,Geometry_type,Country,ProducerCountry,Admin_Level_1,Centroid_lon,Centroid_lat,Unit,risk_pcrop
0,1,,154.604996,Polygon,COL,CO,Quindío,-75.783744,4.419095,ha,low
1,2,,27.325001,MultiPolygon,COL,CO,Quindío,-75.780787,4.421473,ha,low
2,3,,7.255,Polygon,COL,CO,Quindío,-75.779172,4.419302,ha,low
3,4,,4.249,Polygon,COL,CO,Quindío,-75.784319,4.41691,ha,low
4,5,,6.043,Polygon,COL,CO,Quindío,-75.787801,4.420453,ha,low


### Export table with risk columns to CSV (temporary storage)

In [52]:
df_w_risk.to_csv("whisp_output_table_w_risk.csv",index=False)

### Export table with risk columns to geojson (temporary storage)

In [31]:
whisp.convert_df_to_geojson(df_w_risk,"whisp_output_table_w_risk.geojson") # builds a geojson file containing Whisp columns. Uses the geometry column "geo" to create the spatial features.

GeoJSON saved to whisp_output_table_w_risk.geojson


### Visualize data on a map

In [64]:
import geopandas as gpd
import folium

# Load your polygons (must be GeoJSON or shapefile converted to GeoDataFrame)
gdf = gpd.read_file("whisp_output_table_w_risk.geojson")  # assign the same name you gave to the file in the previous snippet

# Compute centroids
gdf["centroid"] = gdf.geometry.centroid

# Get the average center of your data for initial map zoom
map_center = [gdf.centroid.y.mean(), gdf.centroid.x.mean()]

# Create base map (default dark)
m = folium.Map(location=map_center, zoom_start=14, tiles=None)

# Add dark basemap
folium.TileLayer('CartoDB dark_matter', name='Dark').add_to(m)

# Add light basemap
folium.TileLayer('CartoDB positron', name='Light').add_to(m)

# Create FeatureGroups
polygon_layer = folium.FeatureGroup(name="Polygons")
centroid_layer = folium.FeatureGroup(name="Centroids")

# Add polygons (risk areas)
folium.GeoJson(
    gdf.drop(columns=['centroid']),  # exclude centroid from GeoJson
    name="Polygons",
    style_function=lambda x: {
        "fillColor": "lightgray",
        "color": "darkgray",
        "weight": 1,
        "fillOpacity": 0.3,
    }
).add_to(polygon_layer)

# Define color mapping for risk categories
color_map = {
    "low": "green",
    "more_info_needed": "orange",
    "high": "red"
}

# Add centroids colored by pcrop_risk
for _, row in gdf.iterrows():
    risk_cat = row.get("risk_pcrop", "more_info_needed")
    color = color_map.get(str(risk_cat).lower(), "gray")  # default gray if missing

    folium.CircleMarker(
        location=[row["centroid"].y, row["centroid"].x],
        radius=6,
        color=color,
        fill=True,
        fill_opacity=0.9,
        popup=f"risk_pcrop: {risk_cat}"
    ).add_to(centroid_layer)

# Add layers to the map
polygon_layer.add_to(m)
centroid_layer.add_to(m)

# Add legend
legend_html = """
<div style="position: fixed;
     bottom: 30px; left: 30px; width: 140px; height: 130px;
     border:2px solid grey; z-index:9999; font-size:14px;
     background-color:white; padding: 5px;">
<b>pcrop_risk</b><br>
<i style="background:green; width:10px; height:10px; float:left; margin-right:5px;"></i> Low<br>
<i style="background:orange; width:10px; height:10px; float:left; margin-right:5px;"></i> More info needed<br>
<i style="background:red; width:10px; height:10px; float:left; margin-right:5px;"></i> High<br>
<i style="background:gray; width:10px; height:10px; float:left; margin-right:5px;"></i> Unknown
</div>
"""
m.get_root().html.add_child(folium.Element(legend_html))

# Add layer control
folium.LayerControl().add_to(m)

# Show interactive map
m



  gdf["centroid"] = gdf.geometry.centroid

  map_center = [gdf.centroid.y.mean(), gdf.centroid.x.mean()]


### Download outputs to local storage
- Saves files in "Downloads" folder on your machine
- If you see a "Downloads blocked" button at top of browser click to allow file downloads.
- Alternatively right click on file in the folder (in the panel on your left) and choose 'Download'.

In [39]:
import geopandas as gpd
import folium

# Load your polygons (must be GeoJSON or shapefile converted to GeoDataFrame)
gdf = gpd.read_file("whisp_output_table_w_risk.geojson")  # <-- replace with your file

# Compute centroids
gdf["centroid"] = gdf.geometry.centroid

# Get the average center of your data for initial map zoom
map_center = [gdf.centroid.y.mean(), gdf.centroid.x.mean()]

# Create folium map
m = folium.Map(location=map_center, zoom_start=5, tiles="cartodbpositron")

# Add polygons (risk areas)
# Remove the centroid column before passing to folium.GeoJson
folium.GeoJson(
    gdf.drop(columns=['centroid']),
    name="Polygons",
    style_function=lambda x: {
        "fillColor": "lightblue",
        "color": "black",
        "weight": 1,
        "fillOpacity": 0.5,
    }
).add_to(m)

# Add centroids as red markers
for _, row in gdf.iterrows():
    folium.CircleMarker(
        location=[row["centroid"].y, row["centroid"].x],
        radius=4,
        color="red",
        fill=True,
        fill_opacity=1,
        popup=str(row.get("risk", "No risk attribute"))  # optional popup
    ).add_to(m)

m  # renders interactive map in notebook


  gdf["centroid"] = gdf.geometry.centroid

  map_center = [gdf.centroid.y.mean(), gdf.centroid.x.mean()]


TypeError: Object of type Point is not JSON serializable

<folium.folium.Map at 0x7e6bdb00f530>

In [None]:
from google.colab import files
files.download('whisp_output_table_w_risk.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
files.download('whisp_output_table_w_risk.geojson') # spatial output