<a href="https://colab.research.google.com/github/pedroheyerdahl/omdena_ocn/blob/main/get_gee_data_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Installing packages

#Downloading Google Earth Engine data to Google Drive

This notebook is intended as a quick start for those wanting to download data from GEE. It uses the code kindly provided by @Matteo Jucker in https://gist.github.com/ciskoh/fc0913bc9fed068ce42d26e07a87ec60

First we need to install geetools, since Matteo's code uses this package, and it doesn't come preinstalled on Colab

In [1]:
!pip install geetools

Collecting geetools
[?25l  Downloading https://files.pythonhosted.org/packages/a9/02/91fe9e7f1ec5378ff38fc985564fbc476f41624841bd216ec1002d73b6dd/geetools-0.6.7.tar.gz (69kB)
[K     |████▊                           | 10kB 13.7MB/s eta 0:00:01[K     |█████████▌                      | 20kB 16.5MB/s eta 0:00:01[K     |██████████████▏                 | 30kB 10.4MB/s eta 0:00:01[K     |███████████████████             | 40kB 8.8MB/s eta 0:00:01[K     |███████████████████████▋        | 51kB 5.5MB/s eta 0:00:01[K     |████████████████████████████▍   | 61kB 6.1MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 3.5MB/s 
Collecting pyshp
[?25l  Downloading https://files.pythonhosted.org/packages/38/85/fbf87e7aa55103e0d06af756bdbc15cf821fa580414c23142d60a35d4f85/pyshp-2.1.3.tar.gz (219kB)
[K     |████████████████████████████████| 225kB 9.2MB/s 
Building wheels for collected packages: geetools, pyshp
  Building wheel for geetools (setup.py) ... [?25l[?25hdone
  Create

In [2]:
from pathlib import Path
import os
import json
import ee
from geetools import cloud_mask
import requests
from datetime import datetime as dt, timedelta
import re
import numpy as np
from google.colab import drive

This next cells connects the notebook to your earth engine account. 

If you don't have an account yet, first create one in https://signup.earthengine.google.com/

In [3]:
# Trigger the authentication flow.
ee.Authenticate()

# Initialize the library.
ee.Initialize()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://accounts.google.com/o/oauth2/auth?client_id=517222506229-vsmmajv00ul0bs7p89v5m89qs8eb9359.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fearthengine+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&code_challenge=FPbbpOhuGSDeK3f4vnIs2egVKVMqc6Y0D1gMHTiqpsg&code_challenge_method=S256

The authorization workflow will generate a code, which you should paste in the box below. 
Enter verification code: 4/1AY0e-g6o3Q_XJ0hPlgn9KKoFxEyGNErJqB-M0v8y4e6P38sjiqhtaCcDKQk

Successfully saved authorization token.


Now that we are linked to GEE, we also need to link out drive to the notebook

In [4]:
drive.mount("/content/drive")

Mounted at /content/drive


This is the code provided by Matteo. What it does is: given a geojson, it allows you to download sentinel images, ndvi and global land cover map for small areas (if the polygon is too big it will fail).

In [7]:
""""Functions to retrieve data from Google Earth Engine API
The main function to use is get_gee_data() that has 3 modes: 
1_ sentinel_raw download cloud free sentinel image between given dates
and for given bands. 
2_ global_land_cover download the Copernicus global land cover map at 100m resolution. 
3_ ndvi downloads the monthly ndvi average between given period (still from sentinel, so only after 2017
You need to be registered with google earth engine to use it(Free and quick) 
Do not choose big area, more than 10 sq km the function will fail. I will update it in the future for bigger areas..."
"""

def mask_clouds(ee_img):
    mask_all = cloud_mask.sentinel2()
    masked_ee_img = mask_all(ee_img)
    return masked_ee_img

def aggregate_ndvi(date_list, ndvi_coll):
    def simple_aggregate(end_date):
        end_date = ee.Date(end_date)
        begin_date = ee.Date(end_date).advance(-1, "month")
        filt_coll = ndvi_coll.filterDate(begin_date, end_date)
        return filt_coll.mean().set({"system:time_start": begin_date.millis(), "system:time_end": end_date.millis()})

    merged_ndvi_list = date_list.map(simple_aggregate)
    return merged_ndvi_list

def get_gee_data(aoi, date_range=["2020-05-01", "2020-07-01"], mode="sentinel_raw",
                 band_names=["B2", "B3", "B4", "B8"]):
    """ download images from google earth engine as zip file
    Parameters
    ----------
    aoi : area of interest as list of [xcoord,ycoord] points
    date_range : list of [start date, end_date] in 'YYYY-MM-DD' format
    mode : 'sentinel_raw' for satellite images, 'global_land_cover' for copernicus glc maps, 'ndvi' for vegetation time series
    band_names : only for mode == sentinel_raw. List of band to keep from the original image defaults to ["B2", "B3", "B4", "B8"]
    Returns None. saves zip file with image in data/raw folder
    """
    # Initialize the Earth Engine module.
    try:
        ee.Initialize()
    except ee.ee_exception.EEException:
        print("MISSING credentials!!!! \n you have to register to Google earth engine beforehand")
        ee.Authenticate()

    # Area of interest as gee object
    aoi_obj = ee.Geometry.Polygon([aoi])

    print(f"Downloading {mode} image for coordinates {aoi}")
    # date_range as gee object
    start_date = ee.Date(date_range[0])
    end_date = ee.Date(date_range[1])
    if mode == "sentinel_raw":
        # get sentinel collection
        sent2 = ee.ImageCollection(ee.ImageCollection("COPERNICUS/S2_SR"))
        sent_coll = sent2.filterBounds(aoi_obj).filterDate(start_date, end_date)
        # apply cloud removal
        # map function over collection
        cloud_free_coll = sent_coll.map(mask_clouds)
        # merge image using mean
        fin_img = cloud_free_coll.mean().select(band_names)

    # download global land cover
    if mode == "global_land_cover":
        glc = ee.ImageCollection("COPERNICUS/Landcover/100m/Proba-V-C3/Global")
        fin_img = ee.Image(glc.toList(10).reverse().get(0)).clip(aoi_obj)

    # download ndvi time series
    if mode == "ndvi":
        end_date = dt.now()
        start_date = end_date - timedelta(days=365)
        start_date_str = start_date.strftime("%Y-%m-%d")
        end_date_str = end_date.strftime("%Y-%m-%d")
        print(start_date_str, end_date_str)

        # get ndvi collection
        sent2 = ee.ImageCollection(ee.ImageCollection("COPERNICUS/S2_SR"))
        sent_coll = sent2.filterBounds(aoi_obj).filterDate(start_date, end_date)
        sent_coll = sent_coll.filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 30)
        cloud_free_coll = sent_coll.map(mask_clouds)
        ndvi_coll = cloud_free_coll.map(
                            lambda img: img.normalizedDifference(["B8", "B4"])\
                                   .clip(aoi_obj)\
                                   .set("system:time_start", img.get("system:time_start"))
                                   )
        # get list of dates 12 month
        start_date = ee.Date(ee.List(ndvi_coll.get("date_range")).get(0))
        end_date = ee.Date(ee.List(ndvi_coll.get("date_range")).get(1))
        diff = end_date.difference(start_date, "month").round()
        date_seq = ee.List.sequence(1, diff, 1).map(lambda delay: start_date.advance(delay, "month") )
        print(date_seq.getInfo())
        # aggregate monthly ndvi

        monthly_ndvi_list = aggregate_ndvi(date_seq, ndvi_coll)

        fin_img = ee.ImageCollection.fromImages(monthly_ndvi_list).toBands()
        print(fin_img.getInfo())

        # download image
    link = fin_img.getDownloadURL({
        'scale': 10,
        'crs': 'EPSG:4326',
        'fileFormat': 'GeoTIFF',
        'region': aoi_obj})
    return link

def download_data_from_link(link, area_name, mode, data_parent_path=None):
    if not data_parent_path:
        data_parent_path = Path("..", "..", "data", "raw")

    response = requests.get(link)
    if not response.status_code == 200:
        print(f"problem retrieving file from the link:\n {link})!")
        return None
    # set output filename

    out_file = data_parent_path / str(area_name + "_" + mode + ".zip")
    with open(out_file, "wb") as f:
        f.write(response.content)
    del response
    if os.path.exists(out_file):
        print(f"COMPLETED! image downloaded as zip file in \n {out_file}\n")
    return None


def download_dataset(aoi_path, data_parent_path=None, get_sent2=True, get_glc=True, get_ndvi=True):
    """ Retrieves input and target data from gee
    to train ml model"""
    with open(aoi_path) as f:
        coords = json.load(f)
    coords_list = coords['features']
    print(f" found {len(coords_list)} area of interest")
    # cycle through areas to download all of them
    timestamp = re.sub("[^0-9]", "", dt.now().isoformat())
    for n, c_dict in enumerate(coords_list):
        c = c_dict["geometry"]["coordinates"][0]
        print(f" \n downloading data for area {n}")

        # if needed for area identification add this "_".join([str(abs(round(a[0]*10e2))) + str(abs(round(a[1]*10e2))) for a in c])
        area_name = timestamp + "_" + str(n)
        if get_sent2:
            link = get_gee_data(aoi=c, mode="sentinel_raw")
            download_data_from_link(link, area_name, mode="sentinel_raw", data_parent_path=data_parent_path)
        if get_glc:
            link2 = get_gee_data(aoi=c, mode="global_land_cover")
            download_data_from_link(link2, area_name,mode="global_land_cover", data_parent_path=data_parent_path)
        if get_ndvi:
            link3 = get_gee_data(aoi=c, mode="ndvi")
            download_data_from_link(link3, area_name, mode="ndvi", data_parent_path=data_parent_path)
    return timestamp

Now, all we need to do is define an area of interest, as a list of coordinates. You can draw one using QGis, and save as geojson. If you're not used to GIS, you can also draw your AOI on http://geojson.io/, and then copy and paste the coordinates from the geojson on the cell bellow. Be careful not to draw an area too big!

In [8]:
aoi = [ [ -9.044288326125624, 51.877416845948957 ], [ -9.064332525991922, 51.877302919413232 ], [ -9.064211073895905, 51.855934674768655 ], [ -9.044166874029592, 51.85604860130438 ], [ -9.044288326125624, 51.877416845948957 ] ]

Passing the aoi to the get_gee_data returns a link

In [9]:
link = get_gee_data(aoi)

Downloading sentinel_raw image for coordinates [[-9.044288326125624, 51.87741684594896], [-9.064332525991922, 51.87730291941323], [-9.064211073895905, 51.855934674768655], [-9.044166874029592, 51.85604860130438], [-9.044288326125624, 51.87741684594896]]


That will be downloaded on the folder we defined below.

In [10]:
parent_path = Path('/content/drive/MyDrive/omdena')

Once we pass it to the downloader function

In [12]:
 download_data_from_link(link,'ireland_test', 'sentinel_raw', parent_path)

COMPLETED! image downloaded as zip file in 
 /content/drive/MyDrive/omdena/ireland_test_sentinel_raw.zip



And now you have downloaded your first batch of satellite images from google earth engine, congratulations!