### MAC Season 4 Data Cleaning
#### Traits
- aboveground dry biomass
- days & growing degree days (GDD) to flowering
- days & GDD to flag leaf emergence
- canopy height (time series)

This notebook contains the code Emily Cain used to clean and curate sorghum data from MAC Season 4. The input csv files were queried from betydb version 1 in February 2020 and from the MAC weather station website. To run the entire notebook (in the CyVerse Discovery Environment, for now, where the input data are stored) and output the csv files for the traits above, select `Run` and then `Run All Cells` from the notebook menu above. Once all cells have been executed, the output csv file will appear in the file panel on the left. Right click to download the file. If you experience any problems or have questions, please e-mail ejcain@arizona.edu.

#### Custom Functions Used

In [1]:
def check_for_subplots(df):
    
    """
    Function takes a dataframe as argument and checks for sitename subplots ending in ' E' or ' W'
    Will return rows with subplots, if any.
    """
    return df.loc[(df.sitename.str.endswith(' E')) | (df.sitename.str.endswith(' W'))]

#### I. Import python packages

In [2]:
import datetime
import numpy as np
import pandas as pd
import sqlalchemy
import sqlite3

#### II. Read in dataset
Future: query betydb directly with public API key for most recent data

In [3]:
df_0 = pd.read_csv('mac_season_four_2020-02-26.csv', low_memory=False)
print(df_0.shape)
df_0.head(3)

(397879, 39)


Unnamed: 0.1,Unnamed: 0,checked,result_type,id,citation_id,site_id,treatment_id,sitename,city,lat,...,n,statname,stat,notes,access_level,cultivar,entity,method_name,view_url,edit_url
0,1,0,traits,6002274258,,6000005457,,MAC Field Scanner Season 4 Range 5 Column 14,Maricopa,33.074691,...,,,,,2,PI653617,,Mean temperature from infrared images,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...
1,2,0,traits,6002274259,,6000005645,,MAC Field Scanner Season 4 Range 17 Column 10,Maricopa,33.075123,...,,,,,2,PI569420,,Mean temperature from infrared images,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...
2,3,0,traits,6002274265,,6000005871,,MAC Field Scanner Season 4 Range 40 Column 6,Maricopa,33.075949,...,,,,,2,PI329841,,Mean temperature from infrared images,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...


#### III. Drop Columns

In [4]:
df_0.columns

Index(['Unnamed: 0', 'checked', 'result_type', 'id', 'citation_id', 'site_id',
       'treatment_id', 'sitename', 'city', 'lat', 'lon', 'scientificname',
       'commonname', 'genus', 'species_id', 'cultivar_id', 'author',
       'citation_year', 'treatment', 'date', 'time', 'raw_date', 'month',
       'year', 'dateloc', 'trait', 'trait_description', 'mean', 'units', 'n',
       'statname', 'stat', 'notes', 'access_level', 'cultivar', 'entity',
       'method_name', 'view_url', 'edit_url'],
      dtype='object')

In [5]:
cols_to_drop = ['Unnamed: 0', 'checked', 'result_type', 'id', 'citation_id', 'site_id', 'treatment_id', 'city', 
                'scientificname', 'commonname', 'genus', 'species_id', 'cultivar_id', 'author',
                'citation_year', 'time', 'raw_date', 'month', 'year', 'dateloc', 'n', 'statname', 'stat', 'notes', 
                'access_level', 'entity', 'view_url', 'edit_url']

In [6]:
df_1 = df_0.drop(labels=cols_to_drop, axis=1)
print(df_1.shape)
df_1.head(3)

(397879, 11)


Unnamed: 0,sitename,lat,lon,treatment,date,trait,trait_description,mean,units,cultivar,method_name
0,MAC Field Scanner Season 4 Range 5 Column 14,33.074691,-111.974835,,2017 Jul 9,surface_temperature,Surface temperature,38.090662,C,PI653617,Mean temperature from infrared images
1,MAC Field Scanner Season 4 Range 17 Column 10,33.075123,-111.9749,,2017 Jul 9,surface_temperature,Surface temperature,37.715112,C,PI569420,Mean temperature from infrared images
2,MAC Field Scanner Season 4 Range 40 Column 6,33.075949,-111.974966,,2017 Jul 9,surface_temperature,Surface temperature,37.458246,C,PI329841,Mean temperature from infrared images


In [7]:
# for col in df_1.columns:
#     print(f'{col}: {df_1[col].nunique()}')

#### IV. Change `date` format

In [8]:
new_dates = []

for d in df_1.date.values:
    
    # strip '(America/Phoenix)' string from date
    if 'Phoenix' in d:
        new_name = d[:-18]
        new_dates.append(new_name)
    
    else:
        new_name = d
        new_dates.append(new_name)
        

# check that length of new dates matches number of rows
print(len(new_dates))
print(df_1.shape[0])

397879
397879


Convert string dates to datetime

In [9]:
iso_format_dates = pd.to_datetime(new_dates)

Add new column with datetime values

In [10]:
# copy df to avoid SettingWithCopyWarning
df_2 = df_1.copy()
df_2['date_1'] = iso_format_dates

print(df_2.shape)
df_2.head(3)

(397879, 12)


Unnamed: 0,sitename,lat,lon,treatment,date,trait,trait_description,mean,units,cultivar,method_name,date_1
0,MAC Field Scanner Season 4 Range 5 Column 14,33.074691,-111.974835,,2017 Jul 9,surface_temperature,Surface temperature,38.090662,C,PI653617,Mean temperature from infrared images,2017-07-09
1,MAC Field Scanner Season 4 Range 17 Column 10,33.075123,-111.9749,,2017 Jul 9,surface_temperature,Surface temperature,37.715112,C,PI569420,Mean temperature from infrared images,2017-07-09
2,MAC Field Scanner Season 4 Range 40 Column 6,33.075949,-111.974966,,2017 Jul 9,surface_temperature,Surface temperature,37.458246,C,PI329841,Mean temperature from infrared images,2017-07-09


#### V. Extract Range & Column Values for Location

In [11]:
df_3 = df_2.copy()

df_3['range'] = df_3['sitename'].str.extract("Range (\d+)").astype(int)
df_3['column'] = df_3['sitename'].str.extract("Column (\d+)").astype(int)

df_3.sample(n=3)

Unnamed: 0,sitename,lat,lon,treatment,date,trait,trait_description,mean,units,cultivar,method_name,date_1,range,column
272374,MAC Field Scanner Season 4 Range 52 Column 7,33.076381,-111.97495,,2017 Aug 21,leaf_angle_chi,,1.952289,,PI329299,3D scanner to leaf angle distribution,2017-08-21,52,7
369996,MAC Field Scanner Season 4 Range 30 Column 13,33.07559,-111.974852,"BAP 2017, water-deficit stress Aug 1-14",2017 Aug 23,leaf_temperature_differential,Difference between the leaf temperature and th...,-7.950001,C,PI511355,MultispeQ v1.0 field measurements of fluoresce...,2017-08-23,30,13
166821,MAC Field Scanner Season 4 Range 28 Column 10,33.075518,-111.974901,,2017 May 3,canopy_height,"top of the general canopy of the plant, discou...",10.0,cm,PI273465,3D scanner to 98th quantile height,2017-05-03,28,10


#### VI. Drop, Rename, & Reorder Columns

In [12]:
# drop string date column

df_4 = df_3.drop(labels=['date'], axis=1)

In [13]:
df_5 = df_4.rename({'date_1': 'date', 'mean': 'value'}, axis=1)

In [14]:
new_col_order = ['sitename', 'range', 'column', 'lat', 'lon', 'date', 'treatment', 'trait', 'trait_description', 'method_name', 'cultivar', 'value', 'units']

df_6 = pd.DataFrame(data=df_5, columns=new_col_order).reset_index(drop=True)
print(df_6.shape)
df_6.head(3)

(397879, 13)


Unnamed: 0,sitename,range,column,lat,lon,date,treatment,trait,trait_description,method_name,cultivar,value,units
0,MAC Field Scanner Season 4 Range 5 Column 14,5,14,33.074691,-111.974835,2017-07-09,,surface_temperature,Surface temperature,Mean temperature from infrared images,PI653617,38.090662,C
1,MAC Field Scanner Season 4 Range 17 Column 10,17,10,33.075123,-111.9749,2017-07-09,,surface_temperature,Surface temperature,Mean temperature from infrared images,PI569420,37.715112,C
2,MAC Field Scanner Season 4 Range 40 Column 6,40,6,33.075949,-111.974966,2017-07-09,,surface_temperature,Surface temperature,Mean temperature from infrared images,PI329841,37.458246,C


#### VI. Select for specific traits
- aboveground dry biomass
- days & GDD to flowering
- days & GDD to flag leaf emergence
- canopy height - time series

#### A. Aboveground Dry Biomass

In [None]:
adb_df = df_6.loc[df_6.trait == 'aboveground_dry_biomass']
print(adb_df.shape)
adb_df.tail(3)

##### Check for E and W subplots

In [None]:
# will have no output if there are no subplots

check_for_subplots(adb_df)

#### Write dataframe to csv file with timestamp

In [None]:
timestamp = datetime.datetime.now().replace(microsecond=0).isoformat()
output_filename = f'aboveground_dry_biomass_season_4_{timestamp}.csv'.replace(':', '')

adb_df.to_csv(output_filename, index=False)

#### B. Days & Growing Degree Days (GDD) to Flowering

In [None]:
# df_5.trait.unique()

In [None]:
flower_df_0 = df_6.loc[df_6.trait == 'flowering_time']
print(flower_df_0.shape)
flower_df_0.head(3)

##### Check for E and W subplots

In [None]:
# will have no output if there are no subplots

check_for_subplots(flower_df_0)

#### Read in Season Four Weather Data from MAC Weather Station

In [None]:
weather_df_0 = pd.read_csv('mac_weather_station_raw_daily_2017.csv')
print(weather_df_0.shape)
weather_df_0.head(3)

#### Slice dataframe for season dates only and add date column
* Planting Date: 2017-04-20, Day 110
* Last Day of Harvest: 2017-09-16, Day 259

In [None]:
weather_df_1 = weather_df_0.loc[(weather_df_0.day_of_year >= 110) & (weather_df_0.day_of_year <= 259)]
print(weather_df_1.shape)
weather_df_1.head(3)

In [None]:
season_4_date_range = pd.date_range(start='2017-04-20', end='2017-09-16')

In [None]:
weather_df_2 = weather_df_1.copy()
weather_df_2['date'] = season_4_date_range
print(weather_df_2.shape)
weather_df_2.tail(3)

#### Add Growing Degree Days
- Future: add LaTeX equation
- Future: add info about min and max daily values
- 10 degrees Celsius is base temp for sorghum
- Daily gdd value = ((max temp + min temp) / 2) - 10 (base temp)
- Growing Degree Days = cumulative sum of daily gdd values

In [None]:
weather_df_3 = weather_df_2.copy()
weather_df_3['daily_gdd'] = (((weather_df_3['air_temp_max'] + weather_df_3['air_temp_min'])) / 2) - 10
print(weather_df_3.shape)
weather_df_3.head(3)

In [None]:
weather_df_4 = weather_df_3.copy()
weather_df_4['gdd'] = np.rint(np.cumsum(weather_df_4['daily_gdd']))
print(weather_df_4.shape)
weather_df_4.tail(3)

#### Add planting date 2017-04-20

In [None]:
day_of_planting = datetime.date(2017,4,20)
flower_df_1 = flower_df_0.copy()

flower_df_1['date_of_planting'] = day_of_planting
print(flower_df_1.shape)
flower_df_1.head(3)

#### Create timedelta using days to flowering

In [None]:
timedelta_values = flower_df_1['value'].values
dates_of_flowering = []

for val in timedelta_values:
    
    date_of_flowering = day_of_planting + datetime.timedelta(days=val)
    dates_of_flowering.append(date_of_flowering)
    
print(flower_df_1.shape[0])
print(len(dates_of_flowering))

In [None]:
flower_df_2 = flower_df_1.copy()
flower_df_2['date_of_flowering'] = dates_of_flowering
print(flower_df_2.shape)
flower_df_2.head(3)

#### Add GDD to flowering dataframe

In [None]:
# slice df for date and cumulative gdd values only

season_4_gdd = weather_df_4[['date', 'gdd']]
print(season_4_gdd.shape)
season_4_gdd.head(3)

In [None]:
flower_df_2.dtypes

In [None]:
flower_df_3 = flower_df_2.copy()
flower_df_3.date_of_flowering = pd.to_datetime(flower_df_3.date_of_flowering)
flower_df_3.dtypes

In [None]:
flower_df_4 = flower_df_3.merge(season_4_gdd, how='left', left_on='date_of_flowering', right_on='date')
print(flower_df_4.shape)
flower_df_4.head(3)

#### Drop all date columns except `date_of_flowering`

In [None]:
date_cols_to_drop = ['date_x', 'date_of_planting', 'date_y']
flower_df_5 = flower_df_4.drop(labels=date_cols_to_drop, axis=1)
print(flower_df_5.shape)
flower_df_5.tail(3)

#### Check for duplicates

In [None]:
flower_df_5.duplicated().value_counts()

#### Write dataframe to csv file with timestamp

In [None]:
timestamp = datetime.datetime.now().replace(microsecond=0).isoformat()
output_filename = f'days_gdd_to_flowering_season_4_{timestamp}.csv'.replace(':', '')

flower_df_5.to_csv(output_filename, index=False)

#### C. Days & GDD to Flag Leaf Emergence

In [None]:
fle_0 = df_6.loc[df_6.trait == 'flag_leaf_emergence_time']
print(fle_0.shape)
fle_0.head(3)

##### Check for E and W subplots

In [None]:
# will have no output if there are no subplots

check_for_subplots(fle_0)

#### Read in Season Four Weather Data from MAC Weather Station

In [None]:
# weather_df_0 = pd.read_csv('mac_weather_station_raw_daily_2017.csv')
print(weather_df_0.shape)
weather_df_0.head(3)

#### Slice dataframe for season dates only and add date column
* Planting Date: 2017-04-20, Day 110
* Last Day of Harvest: 2017-09-16, Day 259

In [None]:
weather_df_1 = weather_df_0.loc[(weather_df_0.day_of_year >= 110) & (weather_df_0.day_of_year <= 259)]
print(weather_df_1.shape)
weather_df_1.head(3)

In [None]:
season_4_date_range = pd.date_range(start='2017-04-20', end='2017-09-16')

In [None]:
weather_df_2 = weather_df_1.copy()
weather_df_2['date'] = season_4_date_range
print(weather_df_2.shape)
weather_df_2.tail(3)

#### Add Growing Degree Days
- Future: add LaTeX equation
- Future: add info about min and max daily values
- 10 degrees Celsius is base temp for sorghum
- Daily gdd value = ((max temp + min temp) / 2) - 10 (base temp)
- Growing Degree Days = cumulative sum of daily gdd values

In [None]:
weather_df_3 = weather_df_2.copy()
weather_df_3['daily_gdd'] = (((weather_df_3['air_temp_max'] + weather_df_3['air_temp_min'])) / 2) - 10
print(weather_df_3.shape)
weather_df_3.head(3)

In [None]:
weather_df_4 = weather_df_3.copy()

# round to the nearest integer
weather_df_4['gdd'] = np.rint(np.cumsum(weather_df_4['daily_gdd']))
print(weather_df_4.shape)
weather_df_4.tail(3)

#### Add planting date 2017-04-20

In [None]:
day_of_planting = datetime.date(2017,4,20)
fle_1 = fle_0.copy()

fle_1['date_of_planting'] = day_of_planting
print(fle_1.shape)
fle_1.head(3)

#### Create timedelta using days to flag leaf emergence

In [None]:
timedelta_values = fle_1['value'].values
dates_of_flag_leaf_emergence = []

for val in timedelta_values:
    
    date_of_flag_leaf_emergence = day_of_planting + datetime.timedelta(days=val)
    dates_of_flag_leaf_emergence.append(date_of_flag_leaf_emergence)
    
print(fle_1.shape[0])
print(len(dates_of_flag_leaf_emergence))

In [None]:
fle_2 = fle_1.copy()
fle_2['date_of_flag_leaf_emergence'] = dates_of_flag_leaf_emergence
print(fle_2.shape)
fle_2.head(3)

#### Add GDD to flag leaf emergence

In [None]:
# slice df for date and cumulative gdd values only

season_4_gdd = weather_df_4[['date', 'gdd']]
print(season_4_gdd.shape)
season_4_gdd.head(3)

In [None]:
fle_2.dtypes

In [None]:
fle_3 = fle_2.copy()
fle_3.date_of_flag_leaf_emergence = pd.to_datetime(fle_3.date_of_flag_leaf_emergence)
fle_3.dtypes

In [None]:
fle_4 = fle_3.merge(season_4_gdd, how='left', left_on='date_of_flag_leaf_emergence', right_on='date')
print(fle_4.shape)
fle_4.head(3)

#### Drop all date columns except `date_of_flag_leaf_emergence`

In [None]:
date_cols_to_drop = ['date_x', 'date_of_planting', 'date_y']
fle_5 = fle_4.drop(labels=date_cols_to_drop, axis=1)
print(fle_5.shape)
fle_5.tail(3)

#### Check for duplicates

In [None]:
fle_5.duplicated().value_counts()

In [None]:
# keep duplicates for now?

#### Write dataframe to csv file with timestamp

In [None]:
timestamp = datetime.datetime.now().replace(microsecond=0).isoformat()
output_filename = f'days_gdd_to_flag_leaf_emergence_season_4_{timestamp}.csv'.replace(':', '')

fle_5.to_csv(output_filename, index=False)

### D. Canopy Height - Time Series

In [15]:
ch_0 = df_6.loc[df_6.trait == 'canopy_height']
print(ch_0.shape)
ch_0.head(3)

(52154, 13)


Unnamed: 0,sitename,range,column,lat,lon,date,treatment,trait,trait_description,method_name,cultivar,value,units
21410,MAC Field Scanner Season 4 Range 5 Column 7 E,5,7,33.074691,-111.974945,2017-07-11,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329300,310.0,cm
22171,MAC Field Scanner Season 4 Range 9 Column 6 E,9,6,33.074835,-111.974962,2017-05-29,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329351,44.0,cm
22172,MAC Field Scanner Season 4 Range 11 Column 3 W,11,3,33.074907,-111.975019,2017-05-29,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI655978,58.0,cm


In [18]:
subplots = check_for_subplots(ch_0)
subplots.shape

(4430, 13)

#### Take average canopy height values for subplots on same day
- Strip ` E` and ` W` subplot designations
- Group by rows with the same sitename and date and take the average value

In [19]:
sitename_values = ch_0.sitename.values
no_e_w_names = []

for name in sitename_values:
    
    if name.endswith(' W') | name.endswith(' E'):
        name = name[:-2]
        no_e_w_names.append(name)
        
    else:
        no_e_w_names.append(name)
        
print(len(no_e_w_names))

52154


#### Add new sitename column with no subplots

In [20]:
ch_1 = ch_0.copy()
ch_1['sitename_1'] = no_e_w_names
print(ch_1.shape)
ch_1.head(3)

(52154, 14)


Unnamed: 0,sitename,range,column,lat,lon,date,treatment,trait,trait_description,method_name,cultivar,value,units,sitename_1
21410,MAC Field Scanner Season 4 Range 5 Column 7 E,5,7,33.074691,-111.974945,2017-07-11,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329300,310.0,cm,MAC Field Scanner Season 4 Range 5 Column 7
22171,MAC Field Scanner Season 4 Range 9 Column 6 E,9,6,33.074835,-111.974962,2017-05-29,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329351,44.0,cm,MAC Field Scanner Season 4 Range 9 Column 6
22172,MAC Field Scanner Season 4 Range 11 Column 3 W,11,3,33.074907,-111.975019,2017-05-29,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI655978,58.0,cm,MAC Field Scanner Season 4 Range 11 Column 3


#### Use sqlite database to group by `sitename_1` and `date`

In [21]:
conn = sqlite3.connect('canopy_heights_season_4.sqlite')
cursor = conn.cursor()
print("Opened database successfully")

Opened database successfully


In [23]:
# comment next line out if db has already been created
ch_1.to_sql('canopy_heights_season_4.sqlite', conn)

In [24]:
ch_2 = pd.read_sql_query("""
                            SELECT sitename_1 AS sitename, range, column, lat, lon, date, treatment, 
                            trait, trait_description, method_name, cultivar, 
                            ROUND(AVG(value), 2) AS avg_canopy_height, units 
                            FROM 'canopy_heights_season_4.sqlite'
                            GROUP BY sitename_1, date, cultivar
                            ORDER BY date ASC;
                            """, conn)

print(ch_2.shape)
ch_2.head(3)

(34632, 13)


Unnamed: 0,sitename,range,column,lat,lon,date,treatment,trait,trait_description,method_name,cultivar,avg_canopy_height,units
0,MAC Field Scanner Season 4 Range 10 Column 10,10,10,33.074871,-111.9749,2017-05-01 00:00:00,,canopy_height,"top of the general canopy of the plant, discou...",3D scanner to 98th quantile height,PI152816,12.0,cm
1,MAC Field Scanner Season 4 Range 10 Column 11,10,11,33.074871,-111.974884,2017-05-01 00:00:00,,canopy_height,"top of the general canopy of the plant, discou...",3D scanner to 98th quantile height,PI195754,12.0,cm
2,MAC Field Scanner Season 4 Range 10 Column 12,10,12,33.074871,-111.974868,2017-05-01 00:00:00,,canopy_height,"top of the general canopy of the plant, discou...",3D scanner to 98th quantile height,PI329501,12.0,cm


In [25]:
# Sanity Check

sample_with_subplot = ch_1.loc[(ch_1.range == 5) & (ch_1.column == 7) & (ch_1.date == '2017-07-11')]
sample_with_subplot

Unnamed: 0,sitename,range,column,lat,lon,date,treatment,trait,trait_description,method_name,cultivar,value,units,sitename_1
21410,MAC Field Scanner Season 4 Range 5 Column 7 E,5,7,33.074691,-111.974945,2017-07-11,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329300,310.0,cm,MAC Field Scanner Season 4 Range 5 Column 7
25752,MAC Field Scanner Season 4 Range 5 Column 7 W,5,7,33.074691,-111.974953,2017-07-11,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329300,318.0,cm,MAC Field Scanner Season 4 Range 5 Column 7
45448,MAC Field Scanner Season 4 Range 5 Column 7 E,5,7,33.074691,-111.974945,2017-07-11,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329300,310.0,cm,MAC Field Scanner Season 4 Range 5 Column 7


In [26]:
# Sanity Check - should have only one row for the above group

sample_without_subplot = ch_2.loc[(ch_2.range == 5) & (ch_2.column == 7) & (ch_2.date == '2017-07-11 00:00:00')]
sample_without_subplot

Unnamed: 0,sitename,range,column,lat,lon,date,treatment,trait,trait_description,method_name,cultivar,avg_canopy_height,units
26726,MAC Field Scanner Season 4 Range 5 Column 7,5,7,33.074691,-111.974945,2017-07-11 00:00:00,"BAP 2017, water-deficit stress Aug 1-14",canopy_height,"top of the general canopy of the plant, discou...",Manual canopy height,PI329300,312.67,cm


#### Write dataframe to csv file with timestamp

In [None]:
timestamp = datetime.datetime.now().replace(microsecond=0).isoformat()
output_filename = f'canopy_height_time_series_season_4_{timestamp}.csv'.replace(':', '')

ch_2.to_csv(output_filename, index=False)