# FluxNet Site Sampling

Creating a sample of 15 Fluxnet sites from the original 185. This falls to 92 when filtering on availability of data in 2014. The reason for this was the launching of the OCO-2 satellite in June 2014 and exploring co2 mole fraction vertical column data from OCO-2 as a potential variable.

In [2]:
import pandas as pd
import numpy as np

In [8]:
flux_sites = pd.read_excel('Fluxnet Site Descs.xlsx')
flux_sites.head(5)

Unnamed: 0,Site,Country,Area,Continent,Lat,Long,Vegetation IGBP,Vegetation Meaning,Year Start,Year End
0,AR-Slu,Argentina,San Luis,South America,-33.4648,-66.4598,MF,Mixed Forest,2009,2011
1,AR-Vir,Argentina,Virasoro,South America,-28.2395,-56.1886,ENF,Evergreen Needleleaf Forest,2009,2012
2,AT-Neu,Austria,Neustift,Europe,47.1167,11.3175,GRA,Grasslands,2002,2012
3,AU-ASM,Australia,Alice Springs,Oceania,-22.283,133.249,SAV,Savannas,2010,2014
4,AU-Ade,Australia,Adelaide River,Oceania,-13.0769,131.1178,WSA,Woody Savannas,2007,2009


In [11]:
print('Before Filter: ' + str(len(flux_sites)))
flux_sites = flux_sites[(flux_sites['Year Start'] <= 2014) & (flux_sites['Year End'] >= 2014)]
print('After Filter: ' + str(len(flux_sites)))

Before Filter: 81
After Filter: 68


Filtering on maximum and minimum lattitudes found for one of the other datasets

In [12]:
print('Before Filter: ' + str(len(flux_sites)))
flux_sites = flux_sites[(flux_sites['Lat'] <= 53.4) & (flux_sites['Lat'] >= -84.4)]
print('After Filter: ' + str(len(flux_sites)))

Before Filter: 68
After Filter: 68


Ensuring the post-filter group still holds a strong varity of landcover types and continents.

In [13]:
flux_sites['Vegetation IGBP'].value_counts()

Vegetation IGBP
ENF    13
GRA    10
CRO     9
DBF     8
EBF     6
MF      6
WET     5
SAV     4
WSA     4
OSH     2
CSH     1
Name: count, dtype: int64

In [14]:
flux_sites['Continent'].value_counts()

Continent
Europe           30
North America    25
Oceania          11
South America     1
Africa            1
Name: count, dtype: int64

Sampling Process, included stratified sampling of 1 site per continent-landcover type variation, followed by a random selection of 15 from these 22 unique-combination sites.

In [5]:
flux_sites = flux_sites.sample(frac=1, random_state=42).reset_index(drop=True)

In [6]:
strata = 'Vegetation IGBP','Continent'
sample = flux_sites.groupby(list(strata)).apply(lambda x: x.head(1)).reset_index(drop=True)

In [7]:
all_types = set(flux_sites['Vegetation IGBP'].unique())
sampled_types = set(sample['Vegetation IGBP'].unique())
print('All types: ' + str(all_types))
print('Sample types: ' + str(sampled_types))

All types: {'WET', 'WSA', 'OSH', 'CSH', 'MF', 'EBF', 'SAV', 'ENF', 'GRA', 'DBF', 'CRO'}
Sample types: {'WET', 'WSA', 'OSH', 'CSH', 'EBF', 'MF', 'SAV', 'ENF', 'GRA', 'DBF', 'CRO'}


In [8]:
len(sample)

22

In [9]:
sample['Vegetation IGBP'].value_counts()

Vegetation IGBP
EBF    4
GRA    3
CRO    2
DBF    2
ENF    2
MF     2
WET    2
WSA    2
CSH    1
OSH    1
SAV    1
Name: count, dtype: int64

In [10]:
sample = sample.sample(n=15, random_state=42)

Final sample. Followed by checking to ensure strong coverage of continents and landcover types. This is important to ensure the final model can generalise well.

In [98]:
sample['Vegetation IGBP'].value_counts()

Vegetation IGBP
GRA    3
CRO    2
EBF    2
DBF    2
MF     1
WSA    1
SAV    1
WET    1
OSH    1
CSH    1
Name: count, dtype: int64

In [99]:
sample['Continent'].value_counts()

Continent
North America    6
Europe           5
Oceania          2
South America    1
Africa           1
Name: count, dtype: int64

In [100]:
sample.to_excel('FluxNet_Site_Sample.xlsx')

Sample for Evaluation

In [None]:
# Re-importing original dataset
flux_sites = pd.read_excel('Fluxnet Site Descs.xlsx')
flux_sites.head(5)

In [None]:
# Filtering on new period - 2012-2014
print('Before Filter: ' + str(len(flux_sites)))
flux_sites = flux_sites[(flux_sites['Year Start'] <= 2014) & (flux_sites['Year End'] >= 2014)]
print('After Filter: ' + str(len(flux_sites)))

print('Before Filter: ' + str(len(flux_sites)))
flux_sites = flux_sites[(flux_sites['Lat'] <= 53.4) & (flux_sites['Lat'] >= -84.4)]
print('After Filter: ' + str(len(flux_sites)))

In [15]:
sample = flux_sites.sample(frac=1, random_state=42).reset_index(drop=True)
strata = 'Vegetation IGBP','Continent'
eval_sample = sample.groupby(list(strata)).apply(lambda x: x.head(1)).reset_index(drop=True)
len(eval_sample)

22

In [16]:
eval_sample

Unnamed: 0,Site,Country,Area,Continent,Lat,Long,Vegetation IGBP,Vegetation Meaning,Year Start,Year End
0,BE-Lon,Belgium,Lonzee,Europe,50.5516,4.7462,CRO,Croplands,2004,2014
1,US-Twt,USA,Twitchell Island,North America,38.1087,-121.6531,CRO,Croplands,2009,2014
2,IT-Noe,Italy,Arca di Noe,Europe,40.6062,8.1517,CSH,Closed Shrublands,2004,2014
3,IT-CA3,Italy,Castel d'Asso,Europe,42.38,12.0222,DBF,Deciduous Broadleaf Forest,2011,2014
4,US-WCr,USA,Willow Creek,North America,45.8059,-90.0799,DBF,Deciduous Broadleaf Forest,1999,2014
5,GH-Ank,Ghana,Ankasa,Africa,5.2685,-2.6942,EBF,Evergreen Broadleaf Forest,2011,2014
6,FR-Pue,France,Puechabon,Europe,43.7413,3.5957,EBF,Evergreen Broadleaf Forest,2000,2014
7,Au-Whr,Australia,Whroo,Oceania,-36.6732,145.0294,EBF,Evergreen Broadleaf Forest,2011,2014
8,GF-Guy,French Guiana,Guyaflux,South America,5.2788,-52.9249,EBF,Evergreen Broadleaf Forest,2004,2014
9,NL-Loo,Netherlands,Loobos,Europe,52.1666,5.7436,ENF,Evergreen Needleleaf Forest,1996,2014


In [18]:
flux_sites[flux_sites['Vegetation IGBP'] == 'ENF']

Unnamed: 0,Site,Country,Area,Continent,Lat,Long,Vegetation IGBP,Vegetation Meaning,Year Start,Year End
36,CA-TP1,Canada,Ontario Turkey Point,North America,42.6609,-80.5595,ENF,Evergreen Needleleaf Forest,2003,2014
38,CA-TP3,Canada,Ontario Turkey Point,North America,42.7068,-80.3483,ENF,Evergreen Needleleaf Forest,2002,2014
39,CA-TP4,Canada,Ontario Turkey Point,North America,42.7102,-80.3574,ENF,Evergreen Needleleaf Forest,2002,2014
43,CH-Dav,Switzerland,Davos,Europe,46.8153,9.8559,ENF,Evergreen Needleleaf Forest,1997,2014
58,CZ-BK1,Czechia,Bily Kriz,Europe,49.5021,18.5369,ENF,Evergreen Needleleaf Forest,2004,2014
68,DE-Obe,Germany,Oberbarenburg,Europe,50.7867,13.7213,ENF,Evergreen Needleleaf Forest,2008,2014
74,DE-Tha,Germany,Tharandt,Europe,50.9626,13.5651,ENF,Evergreen Needleleaf Forest,1996,2014
106,IT-Lav,Italy,Lavarone,Europe,45.9562,11.2813,ENF,Evergreen Needleleaf Forest,2003,2014
120,NL-Loo,Netherlands,Loobos,Europe,52.1666,5.7436,ENF,Evergreen Needleleaf Forest,1996,2014
141,US-GLE,USA,Glees,North America,41.3665,-106.2399,ENF,Evergreen Needleleaf Forest,2004,2014


manually choosing 4 new sites based on less represnted land covers - ENF in original and lack covers EBF and WSA represnted at new locations

In [20]:
eval_sites = ['NL-Loo','FR-Pue','US-NR1','AU-Gin']
eval_sample_fin = flux_sites[flux_sites['Site'].isin(eval_sites)]
eval_sample_fin

Unnamed: 0,Site,Country,Area,Continent,Lat,Long,Vegetation IGBP,Vegetation Meaning,Year Start,Year End
91,FR-Pue,France,Puechabon,Europe,43.7413,3.5957,EBF,Evergreen Broadleaf Forest,2000,2014
120,NL-Loo,Netherlands,Loobos,Europe,52.1666,5.7436,ENF,Evergreen Needleleaf Forest,1996,2014
158,US-NR1,USA,Niwot Ridge Forest,North America,40.0329,-105.5464,ENF,Evergreen Needleleaf Forest,1998,2014
