# Reducing an image collection with a feature collection

In this notebook the goal is to export time series of image data (including image properties) for an arbitrary collection of features.

## Sample fields

The [sample_fields_collection.geojson](../data/sample_fields_collection.geojson) file contains a sample of about 200 center-pivot fields:

In [None]:
import geopandas as gpd

fields = gpd.read_file('../data/sample_fields_collection.geojson')
fields.explore()

## The `reducers` module

The `eepredefined.reducers` module includes a robust ee.ImageCollection reducer that you can use to retrieve:

- Only the specified bands 
- Selected image properties
- Selected feature properties
- Sets a useful `date` property (based on the image's `system:time_start` property) with a given `date_format` (default `YYYY-MM-dd'T'HH:mm:ss`, e.g. 2015-04-24T10:11:00)


## Simple Landsat example

Let's start with a simple example using the `landsat` module and simply retrieving e.g., mean `albedo`, mean`NDVI`, and cloud cover for one year of data for each feature.

### Step 1: landsat.collection

In [None]:
import ee
import geeet
from geeet.eepredefined import landsat, reducers, parsers
ee.Initialize()

region = parsers.feature_collection(fields) #  👀geeet also provides a robust parser -> ee.FeatureCollection
coll = landsat.collection(
    date_start="2015-01-01",
    date_end="2016-01-01",
    max_cc = 5,
    region=region 
)

### Step 2: reducers.image_collection

If you are interested in the average band value over each feature, you just need to include the band name in the `mean_bands` list parameter. If you are interested in the total band value (i.e. sum of all pixels), then you would use the `sum_bands` list parameter. 

We will also include some image properties in the data, e.g. `SPACECRAFT_ID` (to know which Landsat satellite was used for each data row), `WRS_PATH`, and `WRS_ROW`. We specify this using the `img_properties` parameter.

Similarly, if your original feature collection includes properties, you can select them with `feature_properties`. In this example, we will keep the `uid` property, which stands for unique ID. 



Finally, if you need to be specific about the [scale of analysis](https://developers.google.com/earth-engine/guides/scale#scale-of-analysis) (which in general you *should*), you can specify it using the `reducer_kwargs` parameter. Here we will specify the scale of analysis with the [EPSG:32638](https://epsg.io/32638) [coordinate reference system (crs)](https://developers.google.com/earth-engine/guides/scale#scale-of-analysis) at a scale of 30m. 

>  ⚠️ When using Landsat data, it is always better to use crs and crsTransform instead of crs and scale. Read why [here](https://developers.google.com/earth-engine/guides/exporting_images#setting_scale). However, to keep this example simple we will stick to crs + scale. 

In [None]:
region = parsers.feature_collection(fields)
data = reducers.image_collection(
    feature_collection=region,
    img_collection=coll,
    mean_bands = ['albedo', 'NDVI', 'cloud_cover'],
    img_properties=['SPACECRAFT_ID', 'WRS_PATH', 'WRS_ROW'],
    feature_properties=['uid'],
    reducer_kwargs=dict(crs="EPSG:32638", scale=30)
)

## Step 3: exporting the data

You can choose to export the data either as a shapefile (GeoJSON or shp), or as a CSV. 

When you export the data you can also choose which data columns to export, otherwise it includes all the data columns, which does include the geometry. Exporting the geometry is redundant (you already have the input feature collection), so it will be better to skip it by choosing CSV and not selecting the geometry. However, you do need a unique ID for each feature so you can relate your input feature collection with the data (which in this example we have). 

To export the data, we use the any of the [ee.batch.Export.table.to* functions](https://developers.google.com/earth-engine/apidocs/export-table-todrive#colab-python), which returns a [task](https://developers.google.com/earth-engine/guides/processing_environments#batch_environment) object. Here let's use `toDrive`.

In [None]:
task = ee.batch.Export.table.toDrive(
    collection=data,
    description='ee landsat fc reducer demo',
    fileNamePrefix='ee_landsat_fc_reducer_demo',
    fileFormat='CSV',
    selectors = ['uid', 'date','SPACECRAFT_ID', 'WRS_PATH', 'WRS_ROW', 'albedo', 'NDVI', 'cloud_cover']
    #                    👀 include date! which is by default included as a feature property
    # but DON'T include it in the `feature_properties`  paramater 
)
task

You need to use `task.start()` to submit it to the [batch processing environment](https://developers.google.com/earth-engine/guides/processing_environments#batch_environment).

In [None]:
task.start()

You can monitor the progress of the task in the Task tab on the [right panel in the code editor](https://developers.google.com/earth-engine/guides/playground):

![image](https://developers.google.com/static/earth-engine/images/Code_editor_diagram.png)

or directly in (https://code.earthengine.google.com/tasks)[https://code.earthengine.google.com/tasks].

> Make sure you select the same Project where you submitted the Task!

If you use vscode, give the [Earth Engine Task Manager](https://marketplace.visualstudio.com/items/?itemName=gee-community.eetasks) a try. You can view the progress of tasks directly in vscode (see more information in the [github repository](https://github.com/gee-community/eetasks)). 

For reference, the above task took 34 seconds of runtime 5 [EECU-seconds (click here for more information)](https://developers.google.com/earth-engine/guides/computation_overview#eecus) of processing power. The total number of Landsat images required for this process were 65.

## Result

Now we will read the exported result, which is [included here](../data/ee%20landsat%20fc%20reducer%20demo.csv) for comparison.


In [None]:
import pandas as pd

result = pd.read_csv("../data/ee_landsat_fc_reducer_demo.csv", 
                     index_col='uid',
                     parse_dates=['date'])
result

Let's plot the data for a specific field, e.g. the one with `uid=22162`

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(10,3))
# NDVI (all)
result.loc[22162].set_index('date').NDVI.plot(ax=ax, label='NDVI')
# NDVI (only Landsat 7)
(result.loc[22162].set_index('date')
 .query("SPACECRAFT_ID=='LANDSAT_7'").NDVI
 .plot(ax=ax, label='NDVI (L7)', marker='X', linestyle=''))
# NDVI (only Landsat 8)
(result.loc[22162].set_index('date')
 .query("SPACECRAFT_ID=='LANDSAT_8'").NDVI
 .plot(ax=ax, label='NDVI (L8)', marker='X', linestyle=''))
ax.set_xlabel("")
ax.legend()