In [1]:
from rich import print
import logging

logging.basicConfig(level=logging.INFO)

# PPO

Observations from the [Plant Phenology Ontology (PPO)](https://www.ebi.ac.uk/ols/ontologies/ppo)

---

Uses [rppo](https://github.com/ropensci/rppo/) to get data from http://plantphenology.org/

See [dataset documentation](https://github.com/PlantPhenoOntology/ppo/blob/master/documentation/ppo.pdf)

## Example use case

The [paper](https://doi.org/10.3389/fpls.2018.00517) introducing the PPO portal suggests the following:

> ... we examined leafing out dates for the genera Acer (maples) and Quercus (oaks) and flowering dates for the genera Acer and Syringa (lilacs). [...] To estimate leafing out dates, we used all observations of plants with the PPO trait 'true leaves present' that did not also have the trait 'senescing true leaves present', and to estimate flowering dates, we used all observations of plants with the PPO trait 'flowers present' that did not also have the trait 'senesced flowers present'. All geographic locations (i.e., latitude and longitude) were rounded to a 0.1-degree grid, and the data were filtered to only keep the earliest relevant observation for each unique combination of grid cell and year.

Here, we will walk through the steps to do this from scratch with springtime, and finally see how we can do the same thing in one go.


### Getting the term IDs

The springtime interface is a thin wrapper around rppo, so the options you can
provide are similar to those you can provide directly to the R package.

First, we need to figure out the termIDs for "flowering" and "leafing out". We can use the `` `ppo_get_terms` function for that.


In [2]:
from springtime.datasets.insitu.ppo import ppo_get_terms

terms = ppo_get_terms()

INFO:rpy2.situation:cffi mode is CFFI_MODE.ANY
INFO:rpy2.situation:R home found: /home/peter/mambaforge/envs/springtime/lib/R
INFO:rpy2.situation:R library path: 
INFO:rpy2.situation:LD_LIBRARY_PATH: 
INFO:rpy2.rinterface_lib.embedded:Default options to initialize R: rpy2, --quiet, --no-save
INFO:rpy2.rinterface_lib.embedded:R is already initialized. No need to initialize.
INFO:springtime.datasets.insitu.ppo:Downloading terms


In [3]:
terms.query("label.str.contains('true leaves present')")

Unnamed: 0,termID,label,definition
10,obo:PPO_0002322,expanding true leaves present,An 'expanding true leaf presence' (PPO:0002024...
12,obo:PPO_0002320,expanding unfolded true leaves present,An 'expanding unfolded true leaf presence' (PP...
22,obo:PPO_0002318,immature unfolded true leaves present,An 'immature unfolded true leaf presence' (PPO...
25,obo:PPO_0002319,mature true leaves present,An 'mature true leaf presence' (PPO:0002021) t...
41,obo:PPO_0002316,non-senescing unfolded true leaves present,A 'non-senescing unfolded true leaf presence' ...
70,obo:PPO_0002317,senescing true leaves present,A 'senescing true leaf presence' (PPO:0002019)...
71,obo:PPO_0002313,true leaves present,A 'true leaf presence' (PPO:0002015) trait tha...
73,obo:PPO_0002315,unfolded true leaves present,An 'unfolded true leaf presence' (PPO:0002017)...
75,obo:PPO_0002314,unfolding true leaves present,An 'unfolding true leaf presence' (PPO:0002016...


In [4]:
terms.query("label.str.contains('flowers present')")

Unnamed: 0,termID,label,definition
16,obo:PPO_0002330,flowers present,A 'flower presence' (PPO:0002032) trait that i...
39,obo:PPO_0002331,non-senesced flowers present,A 'non-senesced flower presence' (PPO:0002033)...
46,obo:PPO_0002333,open flowers present,An 'open flower presence' (PPO:0002035) trait ...
52,obo:PPO_0002334,pollen-releasing flowers present,A 'pollen-releasing flower presence' (PPO:0002...
68,obo:PPO_0002335,senesced flowers present,A 'senesced flower presence' (PPO:0002037) tra...
80,obo:PPO_0002332,unopened flowers present,An 'unopened flower presence' (PPO:0002034) tr...


### Getting the data

Now that we know the relevant termIDs, let's start with a simple dataset definition.


In [5]:
from springtime.datasets import RPPO

leafing_maples = RPPO(genus="Acer", termID="obo:PPO_0002313", years=[1990, 2020])
leafing_oaks = RPPO(genus="Quercus", termID="obo:PPO_0002313", years=[1990, 2020])
flowering_maples = RPPO(genus="Acer", termID="obo:PPO_0002032", years=[1990, 2020])
flowering_lilacs = RPPO(genus="Syringa", termID="obo:PPO_0002032", years=[1990, 2020])

Let's continue to explore the flowering lilacs


In [6]:
print(flowering_lilacs)

In [7]:
raw_df = flowering_lilacs.raw_load()

INFO:springtime.datasets.insitu.ppo:Locating data...


Found /home/peter/.cache/springtime/PPO/Syringa.obo:PPO_0002032.1990-2020.csv


In [8]:
raw_df

Unnamed: 0,dayOfYear,year,genus,specificEpithet,eventRemarks,latitude,longitude,termID,source,eventId
0,142,2016,Syringa,vulgaris,End of flowering (lilac/honeysuckle),44.930183,-93.209820,"obo:BFO_0000002,obo:BFO_0000001,obo:PPO_000200...",USA-NPN,urn:phenologicalObservingProcess/7956537
1,149,2016,Syringa,vulgaris,End of flowering (lilac/honeysuckle),44.930183,-93.209820,"obo:BFO_0000020,obo:PPO_0002037,obo:PPO_000232...",USA-NPN,urn:phenologicalObservingProcess/8021769
2,149,2016,Syringa,vulgaris,End of flowering (lilac/honeysuckle),44.930183,-93.209820,"obo:BFO_0000020,obo:PPO_0002324,obo:PPO_000232...",USA-NPN,urn:phenologicalObservingProcess/8021774
3,152,2020,Syringa,vulgaris,End of flowering (lilac/honeysuckle),44.930183,-93.209820,"obo:PATO_0000001,obo:BFO_0000002,obo:BFO_00000...",USA-NPN,urn:phenologicalObservingProcess/22739137
4,161,2020,Syringa,vulgaris,End of flowering (lilac/honeysuckle),44.930183,-93.209820,"obo:PATO_0000001,obo:PPO_0002323,obo:PPO_00023...",USA-NPN,urn:phenologicalObservingProcess/22808877
...,...,...,...,...,...,...,...,...,...,...
29374,99,2010,Syringa,vulgaris,Open flowers (lilac),42.168755,-88.371340,"obo:PATO_0000001,obo:PPO_0002041,obo:BFO_00000...",USA-NPN,urn:phenologicalObservingProcess/193882
29375,99,2010,Syringa,vulgaris,Full flowering (lilac),42.168755,-88.371340,"obo:PATO_0000001,obo:PPO_0002041,obo:BFO_00000...",USA-NPN,urn:phenologicalObservingProcess/193883
29376,115,2010,Syringa,vulgaris,Open flowers (lilac),42.162610,-88.398506,"obo:PPO_0002001,obo:PPO_0002331,obo:PPO_000200...",USA-NPN,urn:phenologicalObservingProcess/204229
29377,124,2010,Syringa,vulgaris,Open flowers (lilac),42.162610,-88.398506,"obo:PPO_0002025,obo:PPO_0002026,obo:PPO_000203...",USA-NPN,urn:phenologicalObservingProcess/232974


The raw data takes some getting used to. What's nice is that we already have columns for year, dayOfYear, latitude, and longitude. The other columns are less evident.

### Filtering senesced flowers

The most relevant column is the `termID`. PPO is a state-based dataset, and the `termID` column contains every state that is applicable for a given observation. In our query, we looked for records with termID PPO:0002032 = flower presence, and we can verify that indeed, this term is present in all rows.


In [9]:
flowers = raw_df.termID.map(lambda x: "obo:PPO_0002032" in x)
print(f"{flowers.sum()} / {len(raw_df)}")

According to the paper, we need to disregard any terms that also have "senesced flowers present", so we need to filter our data. Unfortunately, this is a bit problematic, as the first 1000 results (note we set the limit to 1000) all seem to include both termIDs.


In [10]:
# fresh_flowers = raw_df.query("~termID.str.contains('obo:PPO_0002335')")
# equivalent but faster
fresh_flowers = raw_df[~raw_df.termID.map(lambda x: "obo:PPO_0002335" in x)]
print(f"{len(fresh_flowers)} / {len(raw_df)}")

### Conversion to event-based data

Notice that sometimes the same state may have been observed multiple times in the same year.


In [11]:
fresh_flowers.groupby(["year", "latitude", "longitude"])["dayOfYear"].agg(
    ["min", "max", "count"]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,min,max,count
year,latitude,longitude,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1990,30.930000,-100.120000,68,77,2
1990,32.650000,-103.380000,88,101,2
1990,32.670000,-116.300000,97,106,2
1990,32.850000,-116.620000,92,102,2
1990,32.930000,-107.570000,96,103,2
...,...,...,...,...,...
2020,48.024920,-122.469190,140,140,2
2020,48.033490,-122.598724,111,134,5
2020,48.919930,-122.640564,130,130,2
2020,49.356550,-124.414300,149,163,3


Following the procedure outlined in the reference paper, we can convert the data
to an event-based dataset by retaining only the first DOY.


In [12]:
groups = ["year", "latitude", "longitude"]
onset_of_flowers = fresh_flowers.groupby(groups)["dayOfYear"].agg("min")
onset_of_flowers

year  latitude   longitude  
1990  30.930000  -100.120000     68
      32.650000  -103.380000     88
      32.670000  -116.300000     97
      32.850000  -116.620000     92
      32.930000  -107.570000     96
                               ... 
2020  48.024920  -122.469190    140
      48.033490  -122.598724    111
      48.919930  -122.640564    130
      49.356550  -124.414300    149
      52.095947  -106.574160    157
Name: dayOfYear, Length: 3213, dtype: int64

### Final tweaks

The last step to make ppo fully springtime-compatibly is to convert the data to a geopandas dataframe.


In [13]:
import geopandas as gpd

df = onset_of_flowers.reset_index()
lon = df.pop("longitude")
lat = df.pop("latitude")
geometry = gpd.points_from_xy(lon, lat)
gdf = gpd.GeoDataFrame(df, geometry=geometry)
gdf

Unnamed: 0,year,dayOfYear,geometry
0,1990,68,POINT (-100.12000 30.93000)
1,1990,88,POINT (-103.38000 32.65000)
2,1990,97,POINT (-116.30000 32.67000)
3,1990,92,POINT (-116.62000 32.85000)
4,1990,96,POINT (-107.57000 32.93000)
...,...,...,...
3208,2020,140,POINT (-122.46919 48.02492)
3209,2020,111,POINT (-122.59872 48.03349)
3210,2020,130,POINT (-122.64056 48.91993)
3211,2020,149,POINT (-124.41430 49.35655)


## Bringing it all together

To do everything in one go, springtime adds the following keywords to the `RPPO` dataset: `exclude_terms` and `infer_event`.
As such, we can completely automate the steps above.


In [14]:
from springtime.datasets import RPPO

flowering_lilacs = RPPO(
    genus="Syringa",
    termID="obo:PPO_0002032",
    years=[1990, 2020],
    exclude_terms=["obo:PPO_0002335"],
    infer_event="first_yes_day",
)
df = flowering_lilacs.load()
df

INFO:springtime.datasets.insitu.ppo:Locating data...


Found /home/peter/.cache/springtime/PPO/Syringa.obo:PPO_0002032.1990-2020.csv


Unnamed: 0,year,dayOfYear,geometry
0,1990,68,POINT (-100.12000 30.93000)
1,1990,88,POINT (-103.38000 32.65000)
2,1990,97,POINT (-116.30000 32.67000)
3,1990,92,POINT (-116.62000 32.85000)
4,1990,96,POINT (-107.57000 32.93000)
...,...,...,...
3208,2020,140,POINT (-122.46919 48.02492)
3209,2020,111,POINT (-122.59872 48.03349)
3210,2020,130,POINT (-122.64056 48.91993)
3211,2020,149,POINT (-124.41430 49.35655)


## Export as recipe

Finally, we can export the dataset to a recipe for sharing and reproducibility.


In [15]:
print(flowering_lilacs.to_recipe())