# Tropical Tree Cover — Pipeline de Amostra (10 m, threshold 40%)
Este notebook prepara um **fluxo mínimo** para:
1. Ler o GeoJSON de tiles;
2. Listar/filtrar as URLs `F10m_download`;
3. **(Opcional)** Baixar **1** raster de exemplo (10 m) para teste;
4. Calcular estatísticas e **aplicar threshold de 40%** (0.4) gerando **máscara binária**;
5. Exportar a máscara como GeoTIFF/PNG.

> **Observações importantes**
- Execute localmente com internet (para o *download*). Aqui no ambiente de demonstração o acesso externo está desabilitado.
- Ajuste os caminhos/URLs conforme seu contexto.


## 0) Requisitos
Instale as dependências se necessário:

```bash
pip install rasterio requests numpy matplotlib geopandas shapely
```

Se estiver no Jupyter/Colab, pode usar `!pip install ...`.


In [1]:
# %% [markdown]
# ## 1) Configurações iniciais
# Ajuste os caminhos conforme necessário.
import os

GEOJSON_PATH = r"Tropical_Tree_Cover.geojson"  # GeoJSON de tiles já fornecido
OUTPUT_DIR = "./outputs"
os.makedirs(OUTPUT_DIR, exist_ok=True)

THRESHOLD = 0.4  # 40%

print("GeoJSON:", GEOJSON_PATH)
print("Saídas em:", OUTPUT_DIR)
print("Threshold:", THRESHOLD)


GeoJSON: Tropical_Tree_Cover.geojson
Saídas em: ./outputs
Threshold: 0.4


In [2]:
# %%
import os, json, math, statistics, tempfile
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# rasterio e geopandas podem exigir GDAL; instale conforme README
try:
    import rasterio
    from rasterio.plot import show
    from rasterio.enums import Resampling
    from rasterio.shutil import copy as rio_copy
except Exception as e:
    print("Aviso: rasterio não está disponível neste ambiente ainda:", e)

try:
    import geopandas as gpd
    from shapely.geometry import shape, box, mapping
except Exception as e:
    print("Aviso: geopandas/shapely não disponíveis neste ambiente ainda:", e)

try:
    import requests
except Exception as e:
    print("Aviso: requests não disponível:", e)


In [4]:
# %%
# 2) Ler GeoJSON e explorar atributos
with open(GEOJSON_PATH, "r", encoding="utf-8") as f:
    gj = json.load(f)

features = gj.get("features", [])
print("Features:", len(features))

# Tabela de propriedades
rows = [ft.get("properties", {}) for ft in features]
props_df = pd.DataFrame(rows)
display(props_df.head(10))

# Extrair URLs F10m_download válidas
def looks_like_url(x: str) -> bool:
    if not isinstance(x, str):
        return False
    return x.startswith("http://") or x.startswith("https://")

props_df["has_F10m"] = props_df["F10m_download"].apply(looks_like_url)
urls_df = props_df.loc[props_df["has_F10m"], ["tile_id", "F10m_download"]].drop_duplicates()
print("Tiles com URL F10m_download:", len(urls_df))
display(urls_df.head(10))

# Salvar uma lista de URLs para referência
urls_csv = os.path.join(OUTPUT_DIR, "F10m_urls.csv")
urls_df.to_csv(urls_csv, index=False, encoding="utf-8")
print("Salvo:", urls_csv)


Features: 106


Unnamed: 0,tile_id,half_hectare_download,F10m_download,ObjectId,Shape__Area,Shape__Length
0,00N_000E,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,1,1245543000000.0,4464170.0
1,00N_010E,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,2,1245543000000.0,4464170.0
2,00N_020E,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,3,1245543000000.0,4464170.0
3,00N_030E,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,4,1245543000000.0,4464170.0
4,00N_040E,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,5,1245543000000.0,4464170.0
5,00N_040W,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,6,1245543000000.0,4464170.0
6,00N_050W,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,7,1245543000000.0,4464170.0
7,00N_060W,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,8,1245543000000.0,4464170.0
8,00N_070W,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,9,1245543000000.0,4464170.0
9,00N_080W,https://data-api.globalforestwatch.org/dataset...,https://data-api.globalforestwatch.org/dataset...,10,1245543000000.0,4464170.0


Tiles com URL F10m_download: 106


Unnamed: 0,tile_id,F10m_download
0,00N_000E,https://data-api.globalforestwatch.org/dataset...
1,00N_010E,https://data-api.globalforestwatch.org/dataset...
2,00N_020E,https://data-api.globalforestwatch.org/dataset...
3,00N_030E,https://data-api.globalforestwatch.org/dataset...
4,00N_040E,https://data-api.globalforestwatch.org/dataset...
5,00N_040W,https://data-api.globalforestwatch.org/dataset...
6,00N_050W,https://data-api.globalforestwatch.org/dataset...
7,00N_060W,https://data-api.globalforestwatch.org/dataset...
8,00N_070W,https://data-api.globalforestwatch.org/dataset...
9,00N_080W,https://data-api.globalforestwatch.org/dataset...


Salvo: ./outputs\F10m_urls.csv


In [9]:
# %%
# 3) (Opcional) Baixar 1 raster de exemplo
# Escolha o índice do tile a baixar na tabela acima (0 = primeira linha).
tile_idx = 0

if len(urls_df) == 0:
    print("Não há URLs F10m_download válidas no GeoJSON.")
else:
    row = urls_df.iloc[tile_idx]
    url = row["F10m_download"]
    tile = row["tile_id"]
    print("Tile escolhido:", tile)
    print("URL:", url)

    # Caminho de saída
    local_tif = os.path.join(OUTPUT_DIR, f"tile_{tile}_F10m.tif")

    # Faça o download se requests estiver disponível
    if 'requests' in globals():
        try:
            with requests.get(url, stream=True, timeout=120) as r:
                r.raise_for_status()
                with open(local_tif, "wb") as f:
                    for chunk in r.iter_content(chunk_size=8192):
                        if chunk:
                            f.write(chunk)
            print("Download concluído:", local_tif)
        except Exception as e:
            print("Falha no download (ajuste autenticação/URL):", e)
            print("Se for necessário token/cabeçalhos, adicione-os ao requests.get().")
    else:
        print("Requests indisponível. Baixe manualmente e salve em:", local_tif)

    print("Caso já tenha o arquivo local, prossiga para o próximo passo ajustando o caminho.")


Tile escolhido: 00N_000E
URL: https://data-api.globalforestwatch.org/dataset/wri_tropical_tree_cover_extent/v20220922/download/geotiff?grid=10/100000&tile_id=00N_000E&pixel_meaning=decile&x-api-key=2d60cd88-8348-4c0f-a6d5-bd9adb585a8c
Download concluído: ./outputs\tile_00N_000E_F10m.tif
Caso já tenha o arquivo local, prossiga para o próximo passo ajustando o caminho.


In [10]:
# %%
# 4) Ler raster local, calcular estatísticas e aplicar threshold de 40%
# Ajuste o caminho abaixo para o GeoTIFF baixado/manual:
search_candidates = sorted(Path(OUTPUT_DIR).glob("*.tif"))

if not search_candidates:
    print("Nenhum .tif encontrado em", OUTPUT_DIR, "- coloque um GeoTIFF aqui para continuar.")
else:
    tif_path = str(search_candidates[0])
    print("Usando:", tif_path)

    with rasterio.open(tif_path) as src:
        arr = src.read(1, masked=True)  # primeira banda
        profile = src.profile.copy()

    # Estatísticas básicas
    valid = arr.compressed() if np.ma.isMaskedArray(arr) else arr.flatten()
    print("Pixels válidos:", valid.size)
    if valid.size > 0:
        print("Min:", float(np.nanmin(valid)))
        print("P25:", float(np.nanpercentile(valid, 25)))
        print("Median:", float(np.nanmedian(valid)))
        print("P75:", float(np.nanpercentile(valid, 75)))
        print("Max:", float(np.nanmax(valid)))

        # Histograma
        plt.figure(figsize=(6,4))
        plt.hist(valid, bins=50)
        plt.title("Histograma dos valores de probabilidade (F10m)")
        plt.xlabel("Probabilidade")
        plt.ylabel("Contagem")
        plt.show()

        # Gerar máscara binária por threshold
        mask_bin = (arr >= THRESHOLD).astype(np.uint8)
        out_tif = os.path.join(OUTPUT_DIR, f"mask_threshold_{int(THRESHOLD*100)}.tif")

        # Atualiza metadados para um raster de 1 banda uint8
        profile.update(dtype=rasterio.uint8, count=1, nodata=0)
        with rasterio.open(out_tif, "w", **profile) as dst:
            dst.write(mask_bin, 1)
        print("Máscara salva em:", out_tif)

        # PNG visual
        try:
            import matplotlib.pyplot as plt
            plt.figure(figsize=(6,6))
            plt.imshow(mask_bin, interpolation="nearest")
            plt.title(f"Máscara (>= {THRESHOLD*100:.0f}%)")
            plt.axis("off")
            out_png = os.path.join(OUTPUT_DIR, f"mask_threshold_{int(THRESHOLD*100)}.png")
            plt.savefig(out_png, dpi=150, bbox_inches="tight")
            print("PNG salvo em:", out_png)
            plt.show()
        except Exception as e:
            print("Falha ao criar PNG:", e)


Usando: outputs\tile_00N_000E_F10m.tif


MemoryError: Unable to allocate 18.6 GiB for an array with shape (1, 100000, 100000) and data type uint16

In [8]:
# %%
# 5) (Opcional) Recorte por AOI (bbox) — informe o bbox abaixo
# Funciona se geopandas/shapely estiverem disponíveis e rasterio também.

AOI_BBOX = None  # exemplo: (-48.5, -18.0, -48.0, -17.5)  # (minx, miny, maxx, maxy)

if AOI_BBOX and 'gpd' in globals() and 'rasterio' in globals():
    minx, miny, maxx, maxy = AOI_BBOX
    aoi_geom = box(minx, miny, maxx, maxy)

    # Reabre o último arquivo salvo
    search_candidates = sorted(Path(OUTPUT_DIR).glob("mask_threshold_*.tif"))
    if search_candidates:
        tif_path = str(search_candidates[-1])
        with rasterio.open(tif_path) as src:
            out_image, out_transform = rasterio.mask.mask(src, [mapping(aoi_geom)], crop=True, filled=True)
            out_meta = src.meta.copy()
            out_meta.update({
                "height": out_image.shape[1],
                "width": out_image.shape[2],
                "transform": out_transform
            })
        out_clip = os.path.join(OUTPUT_DIR, "mask_clipped.tif")
        with rasterio.open(out_clip, "w", **out_meta) as dst:
            dst.write(out_image)
        print("Recorte salvo em:", out_clip)
    else:
        print("Nenhuma máscara encontrada para recortar.")
else:
    print("AOI_BBOX não definida ou dependências indisponíveis (geopandas/shapely/rasterio).")


AOI_BBOX não definida ou dependências indisponíveis (geopandas/shapely/rasterio).
