In [1]:
from rich import print
import logging

logging.basicConfig(level=logging.INFO)

# MODIS / AρρEEARS

Retrieve MODIS data via [AρρEEARS](https://appeears.earthdatacloud.nasa.gov/).

You can also use the web portal to explore & download data, it is worth
exploring. You can monitor ongoing springtime requests in the web portal.
You cannot (currently) download data through the web interface (without
springtime) and then load that with springtime.

The retrieve data you need a NASA Earthdata account. You can create one
[here](https://urs.earthdata.nasa.gov/users/new) and save your credentials in
`~/.config/springtime/appeears.json` as `{"username": "<your username>",
"password": "<your password>"}`.

Here we will download data from the MODIS Land Cover Dynamics dataset and LAI/FPAR products. For more information, see https://lpdaac.usgs.gov/products/mcd12q2v061/ and https://lpdaac.usgs.gov/products/mod15a2hv061/.


## Explore AppEEARS data

Before downloading anything, use the `products` and `layers` utility functions to figure out what is available.


In [2]:
from springtime.datasets.appeears import products

products()

[ProductInfo(Product='GPW_DataQualityInd', Platform='GPW', Description='Quality of Input Data for Population Count and Density Grids', Resolution='1000m', Version='411', ProductAndVersion='GPW_DataQualityInd.411', DOI='10.7927/H42Z13KG', Available=True, RasterType='Tile', TemporalGranularity='Quinquennial', DocLink='https://doi.org/10.7927/H42Z13KG', Source='SEDAC', TemporalExtentStart='2000-01-01', TemporalExtentEnd='2020-12-31', Deleted=False),
 ProductInfo(Product='GPW_UN_Adj_PopCount', Platform='GPW', Description='UN-adjusted Population Count', Resolution='1000m', Version='411', ProductAndVersion='GPW_UN_Adj_PopCount.411', DOI='10.7927/H4PN93PB', Available=True, RasterType='Tile', TemporalGranularity='Quinquennial', DocLink='https://doi.org/10.7927/H4PN93PB', Source='SEDAC', TemporalExtentStart='2000-01-01', TemporalExtentEnd='2020-12-31', Deleted=False),
 ProductInfo(Product='GPW_UN_Adj_PopDensity', Platform='GPW', Description='UN-adjusted Population Density', Resolution='1000m', 

We are interested in product MCD12Q2 on "Land Cover Dynamics". The products has
several layers that we can retrieve.


In [3]:
from springtime.datasets.appeears import layers

layers("MCD12Q2.061")

{'Dormancy': LayerInfo(AddOffset='', Available=True, DataType='int16', Description='Date when EVI2 last crossed 15% of the segment EVI2 amplitude', Dimensions=['time', 'YDim', 'XDim', 'Num_Modes'], FillValue=32767, IsQA=False, Layer='Dormancy', OrigDataType='int16', OrigValidMax=32766, OrigValidMin=11138, QualityLayers="['QA_Detailed_0','QA_Detailed_1','QA_Overall_0','QA_Overall_1']", QualityProductAndVersion='MCD12Q2.061', ScaleFactor=1.0, Units='Day', ValidMax=32766, ValidMin=11138, XSize=2400, YSize=2400),
 'EVI_Amplitude': LayerInfo(AddOffset='', Available=True, DataType='int16', Description='Segment maximum - minimum EVI2', Dimensions=['time', 'YDim', 'XDim', 'Num_Modes'], FillValue=32767, IsQA=False, Layer='EVI_Amplitude', OrigDataType='int16', OrigValidMax=10000, OrigValidMin=0, QualityLayers="['QA_Detailed_0','QA_Detailed_1','QA_Overall_0','QA_Overall_1']", QualityProductAndVersion='MCD12Q2.061', ScaleFactor=0.0001, Units='EVI2', ValidMax=10000, ValidMin=0, XSize=2400, YSize=24

From the [metadata](https://lpdaac.usgs.gov/products/mcd12q2v061/) we learn that these variables are reported for up to 2 growing seasons per year, depending on vegetation type.

## Retrieving point data

There are two main ways to download AρρEEARS data: as points or as area. The springtime behaviour depends on whether the settings for points and area:

- Points given, area not given: use the point download of AρρEEARS
- Points not given, area given: use the area download of AρρEEARS
- Points and area given: download an area but extract points during load
- Points nor area given: invalid.


In [4]:
from springtime.datasets import Appeears

dataset = Appeears(
    years=[2009, 2011],
    product="MCD12Q2",
    version="061",
    layers=["Greenup", "Dormancy"],
    points=[(10.691330, 48.085350), (8.892998, 47.097801)],
    infer_date_offset=True,
)

print(dataset)

In [5]:
dataset.raw_load()

INFO:springtime.datasets.appeears:Found /home/peter/.cache/springtime/appeears/MCD12Q2-2009-2011-Dormancy-Greenup-50f0093a3994764340e8bb6f70797f854dd3a4eb-MCD12Q2-061-results.csv


Unnamed: 0,Latitude,Longitude,Date,MODIS_Tile,MCD12Q2_061_Line_Y_500m,MCD12Q2_061_Sample_X_500m,MCD12Q2_061_Dormancy_0,MCD12Q2_061_Dormancy_1,MCD12Q2_061_Greenup_0,MCD12Q2_061_Greenup_1,...,MCD12Q2_061_QA_Detailed_1_Dormancy,MCD12Q2_061_QA_Detailed_1_Dormancy_Description,MCD12Q2_061_QA_Detailed_1_Unused,MCD12Q2_061_QA_Detailed_1_Unused_Description,MCD12Q2_061_QA_Overall_0_bitmask,MCD12Q2_061_QA_Overall_0_Name,MCD12Q2_061_QA_Overall_0_Name_Description,MCD12Q2_061_QA_Overall_1_bitmask,MCD12Q2_061_QA_Overall_1_Name,MCD12Q2_061_QA_Overall_1_Name_Description
0,47.097801,8.892998,2009-01-01,h18v04,696.0,1452.0,14572.0,32767.0,14353.0,32767.0,...,0b11,Poor,0b01,,0b0000000000000000,0b0000000000000000,Best,0b0111111111111111,0b0111111111111111,
1,47.097801,8.892998,2010-01-01,h18v04,696.0,1452.0,14899.0,32767.0,14724.0,32767.0,...,0b11,Poor,0b01,,0b0000000000000001,0b0000000000000001,Good,0b0111111111111111,0b0111111111111111,
2,47.097801,8.892998,2011-01-01,h18v04,696.0,1452.0,15286.0,32767.0,15075.0,32767.0,...,0b11,Poor,0b01,,0b0000000000000001,0b0000000000000001,Good,0b0111111111111111,0b0111111111111111,
3,48.08535,10.69133,2009-01-01,h18v04,459.0,1714.0,14571.0,32767.0,14326.0,32767.0,...,0b11,Poor,0b01,,0b0000000000000000,0b0000000000000000,Best,0b0111111111111111,0b0111111111111111,
4,48.08535,10.69133,2010-01-01,h18v04,459.0,1714.0,14933.0,32767.0,14665.0,32767.0,...,0b11,Poor,0b01,,0b0000000000000000,0b0000000000000000,Best,0b0111111111111111,0b0111111111111111,
5,48.08535,10.69133,2011-01-01,h18v04,459.0,1714.0,15297.0,32767.0,15046.0,32767.0,...,0b11,Poor,0b01,,0b0000000000000000,0b0000000000000000,Best,0b0111111111111111,0b0111111111111111,


Notice that there's a lot of columns that are not necessarily of interest. The geometry is available. As anticipated, there are two greenup cycles, however the second one only contains the value `32767` which we recognize as the fill_value specified in the metadata (the output of `layers()`). The units ('day') are a bit cryptic, but hypothesizing that they represent days since a certain offset we can deduct that the offset is probably the "default" offset i.e. 01-01-1970 00:00.


In [6]:
# guess: the values of greenup represent days since default offset??
from pandas import Timestamp

print(Timestamp(0, unit="D"))
print(Timestamp(14353.0, unit="D"))
print(Timestamp(14353.0, unit="D") - Timestamp("20090101"))

### Harmonization

So, in order to get the DOY we need to the number of days between 1970 and the present year. Note that this may be different for other products/layers, such as LAI.

The `load_points()` method, as opposed to the raw load, does the following:

- Remove unnecessary columns (filter the requested layers), and rename them to something more manageable.
- Convert fill-value to NaN and drop columns with only fill value
- If `infer_date_offset` is `True`, reconstruct the DOY and convert datetime index to year; otherwise split time in year and DOY and pivot the DOY column.
- Extract geometry and convert to geopandas


In [7]:
modis_df = dataset.load()
modis_df

INFO:springtime.datasets.appeears:Found /home/peter/.cache/springtime/appeears/MCD12Q2-2009-2011-Dormancy-Greenup-50f0093a3994764340e8bb6f70797f854dd3a4eb-MCD12Q2-061-results.csv


Unnamed: 0,Dormancy_0,Greenup_0,geometry,year
0,327,108,POINT (8.89300 47.09780),2009
1,289,114,POINT (8.89300 47.09780),2010
2,311,100,POINT (8.89300 47.09780),2011
3,326,81,POINT (10.69133 48.08535),2009
4,323,55,POINT (10.69133 48.08535),2010
5,322,71,POINT (10.69133 48.08535),2011


## Loading raster data

Raster data come in netcdf format:


In [8]:
from springtime.datasets import Appeears

dataset = Appeears(
    years=[2009, 2011],
    product="MCD12Q2",
    version="061",
    layers=["Greenup", "Dormancy"],
    points=[(9.1, 49.1), (9.6, 49.6), (9.9, 49.9)],
    area={"name": "eastfrankfurt", "bbox": [9.0, 49.0, 10.0, 50.0]},
)

dataset.raw_load()

INFO:springtime.datasets.appeears:Looking for data...
INFO:springtime.datasets.appeears:Downloading /home/peter/.cache/springtime/appeears/eastfrankfurt/MCD12Q2.061_500m_aid0001.nc
INFO:springtime.datasets.appeears:Task dcc3183f-ee6c-4bf5-984c-601cf79251f4 completed


Notice that the two growing cycles are represented by the Num_Modes dimension. For now, springtime only keeps one of them. In the process of loading the data, we discard the QA columns and the CRS. Furthermore, we can again infer the DOY offset. Finally, we reuse the point extraction function we have seen previously, to obtain a dataframe that is (almost) equal to the point extraction.


In [9]:
modis_df = dataset.load()
modis_df.head()

INFO:springtime.datasets.appeears:Looking for data...
INFO:springtime.datasets.appeears:Found /home/peter/.cache/springtime/appeears/eastfrankfurt/MCD12Q2.061_500m_aid0001.nc
  datetimeindex = ds.indexes["time"].to_datetimeindex()


Unnamed: 0,Dormancy,Greenup,geometry,year
0,322,66,POINT (9.10000 49.10000),2009
1,318,83,POINT (9.10000 49.10000),2010
2,318,26,POINT (9.10000 49.10000),2011
3,220,78,POINT (9.60000 49.60000),2009
4,320,81,POINT (9.60000 49.60000),2010


## MODIS as target or predictor data

The phenological indices downloaded so far are similar to the field observations that we have typically referred to as "target" variables. However, there are other products that are more suitable as "predictors", for example, leaf area index. This product comes at a different (time) resolution and, if relevant, can be resampled to a different frequency. Below, we show an example of downloading LAI FPAR.


In [10]:
from springtime.datasets import Appeears

dataset = Appeears(
    years=[2009, 2011],
    product="MCD15A2H",
    version="061",
    layers=["Fpar_500m", "Lai_500m"],
    points=[(9.1, 49.1), (9.6, 49.6), (9.9, 49.9)],
    area={"name": "eastfrankfurt", "bbox": [9.0, 49.0, 10.0, 50.0]},
    infer_date_offset=False,
)

df = dataset.load()
df.head()

INFO:springtime.datasets.appeears:Looking for data...
INFO:springtime.datasets.appeears:Found /home/peter/.cache/springtime/appeears/eastfrankfurt/MCD15A2H.061_500m_aid0001.nc
  datetimeindex = ds.indexes["time"].to_datetimeindex()


Unnamed: 0,year,geometry,Fpar_500m|1,Fpar_500m|9,Fpar_500m|17,Fpar_500m|25,Fpar_500m|33,Fpar_500m|41,Fpar_500m|49,Fpar_500m|57,...,Lai_500m|289,Lai_500m|297,Lai_500m|305,Lai_500m|313,Lai_500m|321,Lai_500m|329,Lai_500m|337,Lai_500m|345,Lai_500m|353,Lai_500m|361
0,2002,POINT (9.10000 49.10000),,,,,,,,,...,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0
1,2002,POINT (9.60000 49.60000),,,,,,,,,...,1.2,1.3,1.2,1.2,1.3,1.3,1.7,0.6,1.6,0.1
2,2002,POINT (9.90000 49.90000),,,,,,,,,...,0.1,0.3,0.1,0.2,0.1,0.2,0.9,0.6,0.8,0.0
3,2003,POINT (9.10000 49.10000),2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,...,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0
4,2003,POINT (9.60000 49.60000),0.07,0.49,0.15,0.42,0.0,0.09,0.26,0.27,...,0.8,1.0,0.8,0.9,0.6,1.7,1.6,1.5,0.9,1.6


## Export to recipe

Of course, MODIS datasets can also be exported as recipes. Remember that it is possible to extract points from an area, but not an area without points.


In [11]:
print(dataset.to_recipe())