### Kansas Data Cleaning Notebook
#### Data from Kansas State University Sorghum Experiments (Ashland Bottoms)

- please contact Emily Cain at ejcain@email.arizona.edu with any questions or create an [issue](https://github.com/MagicMilly/for-data-publication/issues) in this [repository](https://github.com/MagicMilly/for-data-publication)

In [4]:
import datetime
import numpy as np
import pandas as pd

#### Trait data queried and downloaded from betydb in `R` using this code:
```
library(traits)

options(betydb_url = "https://terraref.ncsa.illinois.edu/bety/",
        betydb_api_version = 'v1',
        betydb_key = 'secret_api_key_123456_abcde')
        
kansas <- betydb_query(sitename  = "~Bottoms",
                         limit     =  "none")
                      
write.csv(kansas, file = "kansas_experiments_2020-06-11.csv")
```

#### Weather data downloaded from [KSU Ashland Bottoms Weather Station](https://mesonet.k-state.edu/weather/historical/)

In [5]:
%cd '/Users/ejcain/UA-AG/for-data-publication/'

/Users/ejcain/UA-AG/for-data-publication


### I. Functions

In [6]:
def save_to_csv_without_timestamp(list_of_dfs, list_of_output_filenames):

    for i,j in zip(list_of_dfs, list_of_output_filenames):
        i.to_csv(j, index=False)

In [7]:
df_0 = pd.read_csv('data/raw/ksu_data_2020-06-11.csv', low_memory=False)
print(df_0.shape)
df_0.head(3)

(29079, 39)


Unnamed: 0.1,Unnamed: 0,checked,result_type,id,citation_id,site_id,treatment_id,sitename,city,lat,...,n,statname,stat,notes,access_level,cultivar,entity,method_name,view_url,edit_url
0,1,0,traits,6001901130,6000000010,6000004294,6000000022,Ashland Bottoms KSU 2016 Season Range 18 Pass 24,Ashland,39.139492,...,,,,,2,PI570053,,ksu_plant_height,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...
1,2,0,traits,6001901128,6000000010,6000009167,6000000022,Ashland Bottoms KSU 2016 Season Range 32 Pass 12,Ashland,39.139995,...,,,,,2,PI570042,,ksu_plant_height,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...
2,3,0,traits,6001901125,6000000010,6000004232,6000000022,Ashland Bottoms KSU 2016 Season Range 16 Pass 16,Ashland,39.139423,...,,,,,2,PI570038,,ksu_plant_height,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...


### II. Slice for selected traits
- canopy height
- days & GDD to flowering

In [8]:
# traits available in this dataset

df_0.trait.unique()

array(['canopy_height', 'seedling_vigor', 'Sugar_content',
       'crude_protein', 'lodging_percent', 'ndf',
       'aboveground_biomass_moisture', 'crown_color',
       'aboveground_fresh_biomass_per_plot', 'emergence_score',
       'leaf_length', 'leaf_width', 'leaf_attachment_angle', 'adf',
       'stem_width', 'flowering_time'], dtype=object)

In [9]:
df_1 = df_0.loc[(df_0.trait == 'flowering_time') | (df_0.trait == 'canopy_height')]
print(df_1.shape)
# df_1.head(3)

(4700, 39)


Unnamed: 0.1,Unnamed: 0,checked,result_type,id,citation_id,site_id,treatment_id,sitename,city,lat,...,n,statname,stat,notes,access_level,cultivar,entity,method_name,view_url,edit_url
0,1,0,traits,6001901130,6000000010,6000004294,6000000022,Ashland Bottoms KSU 2016 Season Range 18 Pass 24,Ashland,39.139492,...,,,,,2,PI570053,,ksu_plant_height,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...
1,2,0,traits,6001901128,6000000010,6000009167,6000000022,Ashland Bottoms KSU 2016 Season Range 32 Pass 12,Ashland,39.139995,...,,,,,2,PI570042,,ksu_plant_height,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...
2,3,0,traits,6001901125,6000000010,6000004232,6000000022,Ashland Bottoms KSU 2016 Season Range 16 Pass 16,Ashland,39.139423,...,,,,,2,PI570038,,ksu_plant_height,https://terraref.ncsa.illinois.edu/bety/traits...,https://terraref.ncsa.illinois.edu/bety/traits...


#### Drop Columns & convert `date` to datetime object
- convert `raw_date` to new datetime object
- new datetime object will be in `date` column

In [11]:
cols_to_drop = ['Unnamed: 0', 'checked', 'result_type', 'id', 'citation_id', 'site_id', 'treatment_id', 
                'scientificname', 'commonname', 'genus', 'species_id', 'cultivar_id', 'author', 
                'citation_year', 'time', 'month', 'year', 'n', 'statname', 'stat', 'notes', 'access_level', 
                'entity', 'view_url', 'edit_url', 'date', 'dateloc', 'city', 'treatment']

df_2 = df_1.drop(labels=cols_to_drop, axis=1)
print(df_2.shape)
# df_2.tail()

(4700, 10)


Unnamed: 0,sitename,lat,lon,raw_date,trait,trait_description,mean,units,cultivar,method_name
29074,Ashland Bottoms KSU 2016 Season Range 12 Pass 20,39.13928,-96.631346,2016-07-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",113.0,cm,PI641807,ksu_plant_height
29075,Ashland Bottoms KSU 2016 Season Range 31 Pass 7,39.139961,-96.631814,2016-07-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",64.0,cm,PI620072,ksu_plant_height
29076,Ashland Bottoms KSU 2016 Season Range 11 Pass 15,39.139245,-96.631522,2016-07-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",89.7,cm,PI641817,ksu_plant_height
29077,Ashland Bottoms KSU 2016 Season Range 15 Pass 19,39.139387,-96.631383,2016-07-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",78.3,cm,PI641815,ksu_plant_height
29078,Ashland Bottoms KSU 2016 Season Range 10 Pass 11,39.13921,-96.631662,2016-07-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",71.3,cm,PI641824,ksu_plant_height


#### Convert `raw_date` to datetime object

In [12]:
new_dates = pd.to_datetime(df_2.raw_date)

df_3 = df_2.copy()
df_3['date'] = new_dates

print(df_2.shape[0])
print(df_3.shape[0])

# df_3.head()

4700
4700


Unnamed: 0,sitename,lat,lon,raw_date,trait,trait_description,mean,units,cultivar,method_name,date
0,Ashland Bottoms KSU 2016 Season Range 18 Pass 24,39.139492,-96.631207,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",291.0,cm,PI570053,ksu_plant_height,2016-10-21 00:00:00-05:00
1,Ashland Bottoms KSU 2016 Season Range 32 Pass 12,39.139995,-96.631637,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",286.0,cm,PI570042,ksu_plant_height,2016-10-21 00:00:00-05:00
2,Ashland Bottoms KSU 2016 Season Range 16 Pass 16,39.139423,-96.631489,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",313.0,cm,PI570038,ksu_plant_height,2016-10-21 00:00:00-05:00
3,Ashland Bottoms KSU 2016 Season Range 22 Pass 15,39.139638,-96.631527,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",297.0,cm,PI570074,ksu_plant_height,2016-10-21 00:00:00-05:00
4,Ashland Bottoms KSU 2016 Season Range 23 Pass 9,39.139676,-96.631739,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",355.0,cm,PI570075,ksu_plant_height,2016-10-21 00:00:00-05:00


#### Extract `Range` and `Pass` values

In [13]:
df_4 = df_3.copy()

df_4['range'] = df_4['sitename'].str.extract("Range (\d+)").astype(int)
df_4['pass'] = df_4['sitename'].str.extract("Pass (\d+)").astype(int)

# df_4.sample(n=3)

Unnamed: 0,sitename,lat,lon,raw_date,trait,trait_description,mean,units,cultivar,method_name,date,range,pass
9198,Ashland Bottoms KSU 2016 Season Range 12 Pass 26,39.139278,-96.631134,2016-08-30 00:00:00 -0500,flowering_time,Number of days from sowing to the date when 50...,74.0,days,PI152862,KSU flowering time,2016-08-30 00:00:00-05:00,12,26
25418,Ashland Bottoms KSU 2016 Season Range 5 Pass 2,39.139034,-96.631978,2016-08-16 00:00:00 -0500,flowering_time,Number of days from sowing to the date when 50...,60.0,days,PI535783,KSU flowering time,2016-08-16 00:00:00-05:00,5,2
28465,Ashland Bottoms KSU 2016 Season Range 28 Pass 19,39.13985,-96.631388,2016-07-15 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",62.3,cm,PI329256,ksu_plant_height,2016-07-15 00:00:00-05:00,28,19


#### A. Canopy Height

In [14]:
df_4.trait.value_counts()

canopy_height     4044
flowering_time     656
Name: trait, dtype: int64

In [15]:
ch_0 = df_4.loc[df_4.trait == 'canopy_height']
print(ch_0.shape)
# ch_0.head(3)

(4044, 13)


Unnamed: 0,sitename,lat,lon,raw_date,trait,trait_description,mean,units,cultivar,method_name,date,range,pass
0,Ashland Bottoms KSU 2016 Season Range 18 Pass 24,39.139492,-96.631207,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",291.0,cm,PI570053,ksu_plant_height,2016-10-21 00:00:00-05:00,18,24
1,Ashland Bottoms KSU 2016 Season Range 32 Pass 12,39.139995,-96.631637,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",286.0,cm,PI570042,ksu_plant_height,2016-10-21 00:00:00-05:00,32,12
2,Ashland Bottoms KSU 2016 Season Range 16 Pass 16,39.139423,-96.631489,2016-10-21 00:00:00 -0500,canopy_height,"top of the general canopy of the plant, discou...",313.0,cm,PI570038,ksu_plant_height,2016-10-21 00:00:00-05:00,16,16


#### Drop Columns
- trait
- units
- raw date
- method name

In [19]:
cols_to_drop = ['raw_date', 'trait', 'units', 'method_name']

ch_1 = ch_0.drop(labels=cols_to_drop, axis=1)
print(ch_1.shape)
# ch_1.head()

(4044, 9)


Unnamed: 0,sitename,lat,lon,trait_description,mean,cultivar,date,range,pass
0,Ashland Bottoms KSU 2016 Season Range 18 Pass 24,39.139492,-96.631207,"top of the general canopy of the plant, discou...",291.0,PI570053,2016-10-21 00:00:00-05:00,18,24
1,Ashland Bottoms KSU 2016 Season Range 32 Pass 12,39.139995,-96.631637,"top of the general canopy of the plant, discou...",286.0,PI570042,2016-10-21 00:00:00-05:00,32,12
2,Ashland Bottoms KSU 2016 Season Range 16 Pass 16,39.139423,-96.631489,"top of the general canopy of the plant, discou...",313.0,PI570038,2016-10-21 00:00:00-05:00,16,16
3,Ashland Bottoms KSU 2016 Season Range 22 Pass 15,39.139638,-96.631527,"top of the general canopy of the plant, discou...",297.0,PI570074,2016-10-21 00:00:00-05:00,22,15
4,Ashland Bottoms KSU 2016 Season Range 23 Pass 9,39.139676,-96.631739,"top of the general canopy of the plant, discou...",355.0,PI570075,2016-10-21 00:00:00-05:00,23,9


#### Reorder & Rename Columns

In [20]:
ch_2 = ch_1.rename({'mean': 'canopy_height_cm'}, axis=1)
print(ch_2.shape)
# ch_2.tail()

(4044, 9)


Unnamed: 0,sitename,lat,lon,trait_description,canopy_height_cm,cultivar,date,range,pass
29074,Ashland Bottoms KSU 2016 Season Range 12 Pass 20,39.13928,-96.631346,"top of the general canopy of the plant, discou...",113.0,PI641807,2016-07-21 00:00:00-05:00,12,20
29075,Ashland Bottoms KSU 2016 Season Range 31 Pass 7,39.139961,-96.631814,"top of the general canopy of the plant, discou...",64.0,PI620072,2016-07-21 00:00:00-05:00,31,7
29076,Ashland Bottoms KSU 2016 Season Range 11 Pass 15,39.139245,-96.631522,"top of the general canopy of the plant, discou...",89.7,PI641817,2016-07-21 00:00:00-05:00,11,15
29077,Ashland Bottoms KSU 2016 Season Range 15 Pass 19,39.139387,-96.631383,"top of the general canopy of the plant, discou...",78.3,PI641815,2016-07-21 00:00:00-05:00,15,19
29078,Ashland Bottoms KSU 2016 Season Range 10 Pass 11,39.13921,-96.631662,"top of the general canopy of the plant, discou...",71.3,PI641824,2016-07-21 00:00:00-05:00,10,11


#### Reorder Columns

In [21]:
new_col_order = ['date', 'sitename', 'range', 'pass', 'lat', 'lon', 'cultivar', 'trait_description', 'canopy_height_cm']

ch_3 = pd.DataFrame(data=ch_2, columns=new_col_order)
print(ch_3.shape)
# ch_3.head()

(4044, 9)


#### Check for duplicates

In [22]:
ch_3.duplicated().value_counts()

True     2454
False    1590
dtype: int64

In [23]:
duplicates = ch_3.loc[ch_3.duplicated(subset=['date', 'sitename', 'range', 'pass', 'cultivar', 'canopy_height_cm'])]

duplicates.shape

(2454, 9)

In [24]:
# duplicates.sample(n=3)

Unnamed: 0,date,sitename,range,pass,lat,lon,cultivar,trait_description,canopy_height_cm
14282,2016-08-11 00:00:00-05:00,Ashland Bottoms KSU 2016 Season Range 17 Pass 1,17,1,39.139463,-96.632017,Blade,"top of the general canopy of the plant, discou...",283.0
28454,2016-07-15 00:00:00-05:00,Ashland Bottoms KSU 2016 Season Range 7 Pass 26,7,26,39.139099,-96.631131,PI22913,"top of the general canopy of the plant, discou...",38.3
14206,2016-08-04 00:00:00-05:00,Ashland Bottoms KSU 2016 Season Range 8 Pass 17,8,17,39.139137,-96.631451,PI586443,"top of the general canopy of the plant, discou...",201.0


In [27]:
# check one of the duplicate entries

# ch_3.loc[(ch_3['date'] == '2016-08-11') & (ch_3['sitename'] == 'Ashland Bottoms KSU 2016 Season Range 17 Pass 1') &
#         (ch_3['cultivar'] == 'Blade') & (ch_3['canopy_height_cm'] == 283.0)]

#### Drop Duplicates

In [28]:
ch_4 = ch_3.drop_duplicates(subset=['date', 'sitename', 'range', 'pass', 'cultivar', 'canopy_height_cm'], ignore_index=True)

print(ch_4.shape)
# ch_4.head(3)

(1590, 9)


Unnamed: 0,date,sitename,range,pass,lat,lon,cultivar,trait_description,canopy_height_cm
0,2016-10-21 00:00:00-05:00,Ashland Bottoms KSU 2016 Season Range 18 Pass 24,18,24,39.139492,-96.631207,PI570053,"top of the general canopy of the plant, discou...",291.0
1,2016-10-21 00:00:00-05:00,Ashland Bottoms KSU 2016 Season Range 32 Pass 12,32,12,39.139995,-96.631637,PI570042,"top of the general canopy of the plant, discou...",286.0
2,2016-10-21 00:00:00-05:00,Ashland Bottoms KSU 2016 Season Range 16 Pass 16,16,16,39.139423,-96.631489,PI570038,"top of the general canopy of the plant, discou...",313.0


In [30]:
# sort by date

ch_5 = ch_4.sort_values(by=['date'], axis=1, ignore_index=True)
ch_5.head(3)

KeyError: 'date'

#### B. Days & Growing Degree Days (GDD) to Flowering
- Weather data in csv format downloaded from [KSU Weather Station](https://mesonet.k-state.edu/weather/historical/) in Ashland Bottoms
- Queried using season dates
- planting date: 2016-06-17
- harvest date: 2016-10-21

In [61]:
# df_4.columns

Index(['sitename', 'lat', 'lon', 'raw_date', 'trait', 'trait_description',
       'mean', 'units', 'cultivar', 'method_name', 'date', 'range', 'pass'],
      dtype='object')

In [62]:
print(df_4.raw_date.min())
print(df_4.raw_date.max())

2016-07-15 00:00:00 -0500
2016-10-21 00:00:00 -0500


In [76]:
weather_0 = pd.read_csv('data/raw/ashland_bottoms_daily_weather_2016.csv')
print(weather_0.shape)
# weather_0.head(5)

(129, 16)


Unnamed: 0.1,Unnamed: 0,Timestamp,Station,AirTemperature,AirTemperature.1,RelativeHumidity,Precipitation,WindSpeed2m,WindSpeed2m.1,SoilTemperature5cm,SoilTemperature5cm.1,SoilTemperature10cm,SoilTemperature10cm.1,SolarRadiation,ETo,ETo.1
0,,,,max,min,avg,total,avg,max,max,min,max,min,total,grass,alfalfa
1,,,,°C,°C,%,mm,m/s,m/s,°C,°C,°C,°C,MJ/m²,mm,mm
2,,2016-06-17,Ashland Bottoms,37.8,24,66.6,0,2.3,9,27.5,25,25.7,24,30.4,7.96,9.92
3,,2016-06-18,Ashland Bottoms,33.1,21.7,66.4,5.33,2.8,12.1,26.7,24.5,25.4,24.3,22.7,6.33,8.26
4,,2016-06-19,Ashland Bottoms,35.3,21.9,62.5,0,2.9,7.4,26.7,24,25.2,23.7,29.3,7.7,9.9


In [24]:
# weather_0.dtypes

Timestamp                object
Station                  object
AirTemperature           object
AirTemperature.1         object
RelativeHumidity         object
Precipitation            object
WindSpeed2m              object
WindSpeed2m.1            object
SoilTemperature5cm       object
SoilTemperature5cm.1     object
SoilTemperature10cm      object
SoilTemperature10cm.1    object
SolarRadiation           object
ETo                      object
ETo.1                    object
dtype: object

#### Change column names and drop first two rows
- Add datetime column

In [83]:
weather_1 = weather_0.rename({'Station': 'city', 'AirTemperature': 'air_temp_max_C', 
                                                  'AirTemperature.1': 'air_temp_min_C', 'RelativeHumidity': 'avg_rh',
                                                  'Precipitation': 'precip_mm', 'WindSpeed2m': 'avg_wind_speed', 
                                                  'WindSpeed2m.1': 'max_wind_speed', 'SoilTemperature5cm': 'soil_temp_5cm_max',
                                                  'SoilTemperature5cm.1': 'soil_temp_5cm_min', 
                                                  'SoilTemperature10cm': 'soil_temp_10cm_max', 
                                                  'SoilTemperature10cm.1': 'soil_temp_10cm_min', 'SolarRadiation': 'solar_rad',
                                                  'ETo': 'eto_grass', 'ETo.1': 'eto_alfalfa'}, axis=1)
print(weather_1.shape)
# weather_1.head()

(129, 16)


Unnamed: 0.1,Unnamed: 0,Timestamp,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa
0,,,,max,min,avg,total,avg,max,max,min,max,min,total,grass,alfalfa
1,,,,°C,°C,%,mm,m/s,m/s,°C,°C,°C,°C,MJ/m²,mm,mm
2,,2016-06-17,Ashland Bottoms,37.8,24,66.6,0,2.3,9,27.5,25,25.7,24,30.4,7.96,9.92
3,,2016-06-18,Ashland Bottoms,33.1,21.7,66.4,5.33,2.8,12.1,26.7,24.5,25.4,24.3,22.7,6.33,8.26
4,,2016-06-19,Ashland Bottoms,35.3,21.9,62.5,0,2.9,7.4,26.7,24,25.2,23.7,29.3,7.7,9.9


In [84]:
# Drop first 2 rows

weather_2 = weather_1.iloc[2:]
print(weather_2.shape)
# weather_2.head()

(127, 16)


Unnamed: 0.1,Unnamed: 0,Timestamp,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa
2,,2016-06-17,Ashland Bottoms,37.8,24.0,66.6,0.0,2.3,9.0,27.5,25.0,25.7,24.0,30.4,7.96,9.92
3,,2016-06-18,Ashland Bottoms,33.1,21.7,66.4,5.33,2.8,12.1,26.7,24.5,25.4,24.3,22.7,6.33,8.26
4,,2016-06-19,Ashland Bottoms,35.3,21.9,62.5,0.0,2.9,7.4,26.7,24.0,25.2,23.7,29.3,7.7,9.9
5,,2016-06-20,Ashland Bottoms,37.3,23.7,59.5,0.0,2.6,8.9,28.2,24.9,26.3,24.1,30.2,8.12,10.33
6,,2016-06-21,Ashland Bottoms,38.5,22.6,53.7,0.0,2.4,9.1,28.0,25.2,26.2,24.5,30.7,8.35,10.73


In [85]:
weather_3 = weather_2.copy()

datetimes = pd.to_datetime(weather_3['Timestamp'])
weather_3['date'] = datetimes

print(weather_3.shape)
# weather_3.tail()

(127, 17)


Unnamed: 0.1,Unnamed: 0,Timestamp,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa,date
124,,2016-10-17,Ashland Bottoms,32.5,19.6,59.4,0.0,3.9,12.8,21.1,19.1,19.2,18.3,17.1,5.55,8.33,2016-10-17
125,,2016-10-18,Ashland Bottoms,23.5,9.7,57.1,0.0,1.8,7.2,19.4,16.9,19.0,17.4,18.0,2.98,4.1,2016-10-18
126,,2016-10-19,Ashland Bottoms,23.4,7.4,67.2,0.0,2.1,8.2,17.6,15.4,17.6,16.2,15.7,2.88,4.07,2016-10-19
127,,2016-10-20,Ashland Bottoms,16.7,5.1,74.6,2.03,2.0,8.1,16.6,14.4,16.8,15.7,13.8,2.21,3.1,2016-10-20
128,,2016-10-21,Ashland Bottoms,20.1,1.5,67.4,0.0,1.8,7.3,15.4,12.6,15.7,14.2,16.8,2.58,3.7,2016-10-21


#### Drop Columns
- timestamp
- Unnamed: 0

In [86]:
cols_to_drop = ['Unnamed: 0', 'Timestamp']

weather_4 = weather_3.drop(labels=cols_to_drop, axis=1)
print(weather_4.shape)
# weather_4.head()

(127, 15)


Unnamed: 0,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa,date
2,Ashland Bottoms,37.8,24.0,66.6,0.0,2.3,9.0,27.5,25.0,25.7,24.0,30.4,7.96,9.92,2016-06-17
3,Ashland Bottoms,33.1,21.7,66.4,5.33,2.8,12.1,26.7,24.5,25.4,24.3,22.7,6.33,8.26,2016-06-18
4,Ashland Bottoms,35.3,21.9,62.5,0.0,2.9,7.4,26.7,24.0,25.2,23.7,29.3,7.7,9.9,2016-06-19
5,Ashland Bottoms,37.3,23.7,59.5,0.0,2.6,8.9,28.2,24.9,26.3,24.1,30.2,8.12,10.33,2016-06-20
6,Ashland Bottoms,38.5,22.6,53.7,0.0,2.4,9.1,28.0,25.2,26.2,24.5,30.7,8.35,10.73,2016-06-21


In [110]:
# weather_4.dtypes

city                          object
air_temp_max_C                object
air_temp_min_C                object
avg_rh                        object
precip_mm                     object
avg_wind_speed                object
max_wind_speed                object
soil_temp_5cm_max             object
soil_temp_5cm_min             object
soil_temp_10cm_max            object
soil_temp_10cm_min            object
solar_rad                     object
eto_grass                     object
eto_alfalfa                   object
date                  datetime64[ns]
dtype: object

#### Convert numeric columns 

In [111]:
# weather_4.columns

Index(['city', 'air_temp_max_C', 'air_temp_min_C', 'avg_rh', 'precip_mm',
       'avg_wind_speed', 'max_wind_speed', 'soil_temp_5cm_max',
       'soil_temp_5cm_min', 'soil_temp_10cm_max', 'soil_temp_10cm_min',
       'solar_rad', 'eto_grass', 'eto_alfalfa', 'date'],
      dtype='object')

In [113]:
cols_to_convert = ['air_temp_max_C', 'air_temp_min_C', 'avg_rh', 'precip_mm', 'avg_wind_speed', 'max_wind_speed', 
                   'soil_temp_5cm_max', 'soil_temp_5cm_min', 'soil_temp_10cm_max', 'soil_temp_10cm_min', 'solar_rad', 
                   'eto_grass', 'eto_alfalfa', ]

weather_5 = weather_4.copy()
weather_5[cols_to_convert] = weather_5[cols_to_convert].apply(pd.to_numeric)
print(weather_5.shape)
print(weather_5.dtypes)

(127, 15)
city                          object
air_temp_max_C               float64
air_temp_min_C               float64
avg_rh                       float64
precip_mm                    float64
avg_wind_speed               float64
max_wind_speed               float64
soil_temp_5cm_max            float64
soil_temp_5cm_min            float64
soil_temp_10cm_max           float64
soil_temp_10cm_min           float64
solar_rad                    float64
eto_grass                    float64
eto_alfalfa                  float64
date                  datetime64[ns]
dtype: object


#### Add `daily_gdd` column and check for negative values

In [114]:
weather_6 = weather_5.copy()
weather_6['daily_gdd'] = (((weather_6['air_temp_max_C'] + weather_6['air_temp_min_C'])) / 2) - 10
print(weather_6.shape)
# weather_6.head()

(127, 16)


Unnamed: 0,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa,date,daily_gdd
2,Ashland Bottoms,37.8,24.0,66.6,0.0,2.3,9.0,27.5,25.0,25.7,24.0,30.4,7.96,9.92,2016-06-17,20.9
3,Ashland Bottoms,33.1,21.7,66.4,5.33,2.8,12.1,26.7,24.5,25.4,24.3,22.7,6.33,8.26,2016-06-18,17.4
4,Ashland Bottoms,35.3,21.9,62.5,0.0,2.9,7.4,26.7,24.0,25.2,23.7,29.3,7.7,9.9,2016-06-19,18.6
5,Ashland Bottoms,37.3,23.7,59.5,0.0,2.6,8.9,28.2,24.9,26.3,24.1,30.2,8.12,10.33,2016-06-20,20.5
6,Ashland Bottoms,38.5,22.6,53.7,0.0,2.4,9.1,28.0,25.2,26.2,24.5,30.7,8.35,10.73,2016-06-21,20.55


In [116]:
# Check for negative values

weather_6.loc[weather_6.daily_gdd < 0]

Unnamed: 0,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa,date,daily_gdd
119,Ashland Bottoms,14.2,4.8,70.5,0.0,3.1,8.7,16.8,14.0,17.0,15.3,16.1,2.33,3.23,2016-10-12,-0.5
120,Ashland Bottoms,16.2,0.1,74.7,0.0,1.1,5.3,14.4,12.1,15.3,13.8,15.3,1.89,2.44,2016-10-13,-1.85


In [117]:
weather_7 = weather_6.copy()

for k,v in weather_7.iteritems():
    
    if k == 'daily_gdd':
        v[v < 0] = 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  v[v < 0] = 0


In [119]:
# should return empty df now

weather_7.loc[weather_7.daily_gdd < 0]

Unnamed: 0,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa,date,daily_gdd


In [120]:
# Add cumulative GDD, round to nearest integer

weather_8 = weather_7.copy()

weather_8['gdd'] = np.rint(np.cumsum(weather_8['daily_gdd']))
print(weather_8.shape)
# weather_8.tail()

(127, 17)


Unnamed: 0,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa,date,daily_gdd,gdd
124,Ashland Bottoms,32.5,19.6,59.4,0.0,3.9,12.8,21.1,19.1,19.2,18.3,17.1,5.55,8.33,2016-10-17,16.05,1729.0
125,Ashland Bottoms,23.5,9.7,57.1,0.0,1.8,7.2,19.4,16.9,19.0,17.4,18.0,2.98,4.1,2016-10-18,6.6,1735.0
126,Ashland Bottoms,23.4,7.4,67.2,0.0,2.1,8.2,17.6,15.4,17.6,16.2,15.7,2.88,4.07,2016-10-19,5.4,1741.0
127,Ashland Bottoms,16.7,5.1,74.6,2.03,2.0,8.1,16.6,14.4,16.8,15.7,13.8,2.21,3.1,2016-10-20,0.9,1742.0
128,Ashland Bottoms,20.1,1.5,67.4,0.0,1.8,7.3,15.4,12.6,15.7,14.2,16.8,2.58,3.7,2016-10-21,0.8,1743.0


#### Drop `daily_gdd` column, add cumulative precipitation column

In [121]:
weather_9 = weather_8.drop(labels=['daily_gdd'], axis=1)
print(weather_9.shape)
# print(weather_9.columns)

(127, 16)
Index(['city', 'air_temp_max_C', 'air_temp_min_C', 'avg_rh', 'precip_mm',
       'avg_wind_speed', 'max_wind_speed', 'soil_temp_5cm_max',
       'soil_temp_5cm_min', 'soil_temp_10cm_max', 'soil_temp_10cm_min',
       'solar_rad', 'eto_grass', 'eto_alfalfa', 'date', 'gdd'],
      dtype='object')


In [122]:
weather_10 = weather_9.copy()
weather_10['cum_precip_mm'] = np.round(np.cumsum(weather_10.precip_mm), 2)

print(weather_10.shape)
weather_10.head(3)

(127, 17)


Unnamed: 0,city,air_temp_max_C,air_temp_min_C,avg_rh,precip_mm,avg_wind_speed,max_wind_speed,soil_temp_5cm_max,soil_temp_5cm_min,soil_temp_10cm_max,soil_temp_10cm_min,solar_rad,eto_grass,eto_alfalfa,date,gdd,cum_precip_mm
2,Ashland Bottoms,37.8,24.0,66.6,0.0,2.3,9.0,27.5,25.0,25.7,24.0,30.4,7.96,9.92,2016-06-17,21.0,0.0
3,Ashland Bottoms,33.1,21.7,66.4,5.33,2.8,12.1,26.7,24.5,25.4,24.3,22.7,6.33,8.26,2016-06-18,38.0,5.33
4,Ashland Bottoms,35.3,21.9,62.5,0.0,2.9,7.4,26.7,24.0,25.2,23.7,29.3,7.7,9.9,2016-06-19,57.0,5.33


#### Write Ashland Bottoms Weather Interim Data to `.csv`

In [124]:
timestamp = datetime.datetime.now().replace(microsecond=0).isoformat()
output_filename = f'data/interim/ashland_bottoms_weather_2016_daily_{timestamp}.csv'.replace(':', '')

weather_10.to_csv(output_filename, index=False)

#### Add GDD to Days to Flowering DataFrame
- slice trait data to only include `days_to_flowering`
- merge DataFrames on `date_of_flowering`

In [126]:
df_4.trait.value_counts()

canopy_height     4044
flowering_time     656
Name: trait, dtype: int64

In [127]:
fl_0 = df_4.loc[df_4.trait == 'flowering_time']
print(fl_0.shape)
# fl_0.head()

(656, 13)


Unnamed: 0,sitename,lat,lon,raw_date,trait,trait_description,mean,units,cultivar,method_name,date,range,pass
9164,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,2016-09-27 00:00:00 -0500,flowering_time,Number of days from sowing to the date when 50...,102.0,days,SC1345,KSU flowering time,2016-09-27 00:00:00-05:00,2,12
9165,Ashland Bottoms KSU 2016 Season Range 2 Pass 19,39.138923,-96.631378,2016-08-28 00:00:00 -0500,flowering_time,Number of days from sowing to the date when 50...,72.0,days,Macia,KSU flowering time,2016-08-28 00:00:00-05:00,2,19
9166,Ashland Bottoms KSU 2016 Season Range 2 Pass 21,39.138922,-96.631306,2016-08-08 00:00:00 -0500,flowering_time,Number of days from sowing to the date when 50...,52.0,days,SC1103,KSU flowering time,2016-08-08 00:00:00-05:00,2,21
9167,Ashland Bottoms KSU 2016 Season Range 2 Pass 22,39.138922,-96.631271,2016-08-13 00:00:00 -0500,flowering_time,Number of days from sowing to the date when 50...,57.0,days,SC1345,KSU flowering time,2016-08-13 00:00:00-05:00,2,22
9168,Ashland Bottoms KSU 2016 Season Range 2 Pass 5,39.138927,-96.63187,2016-08-21 00:00:00 -0500,flowering_time,Number of days from sowing to the date when 50...,65.0,days,P898012,KSU flowering time,2016-08-21 00:00:00-05:00,2,5


In [132]:
cols_to_drop = ['raw_date', 'trait', 'units', 'method_name']

fl_1 = fl_0.drop(labels=cols_to_drop, axis=1)
print(fl_1.shape)
# fl_1.tail()

(656, 9)


Unnamed: 0,sitename,lat,lon,trait_description,mean,cultivar,date,range,pass
25496,Ashland Bottoms KSU 2016 Season Range 9 Pass 23,39.13917,-96.631239,Number of days from sowing to the date when 50...,68.0,PI586435,2016-08-24 00:00:00-05:00,9,23
25497,Ashland Bottoms KSU 2016 Season Range 9 Pass 24,39.139169,-96.631203,Number of days from sowing to the date when 50...,46.0,PI197542,2016-08-02 00:00:00-05:00,9,24
25498,Ashland Bottoms KSU 2016 Season Range 9 Pass 25,39.139169,-96.631167,Number of days from sowing to the date when 50...,61.0,PI535795,2016-08-17 00:00:00-05:00,9,25
25499,Ashland Bottoms KSU 2016 Season Range 9 Pass 26,39.139169,-96.631132,Number of days from sowing to the date when 50...,87.0,PI217691,2016-09-12 00:00:00-05:00,9,26
25500,Ashland Bottoms KSU 2016 Season Range 9 Pass 9,39.139175,-96.631733,Number of days from sowing to the date when 50...,64.0,PI196586,2016-08-20 00:00:00-05:00,9,9


#### Rename trait column

In [134]:
fl_2 = fl_1.rename({'mean': 'days_to_flowering'}, axis=1)
print(fl_2.shape)
# fl_2.sample(n=3, random_state=42)

(656, 9)


Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,date,range,pass
13672,Ashland Bottoms KSU 2016 Season Range 13 Pass 12,39.139317,-96.631628,Number of days from sowing to the date when 50...,75.0,PI147224,2016-08-31 00:00:00-05:00,13,12
13745,Ashland Bottoms KSU 2016 Season Range 11 Pass 2,39.139249,-96.63198,Number of days from sowing to the date when 50...,63.0,PI641821,2016-08-19 00:00:00-05:00,11,2
9265,Ashland Bottoms KSU 2016 Season Range 11 Pass 2,39.139249,-96.63198,Number of days from sowing to the date when 50...,63.0,PI641821,2016-08-19 00:00:00-05:00,11,2


#### Add `planting_date`
- 2016-06-17

In [135]:
day_of_planting = datetime.date(2016,6,17)
fl_3 = fl_2.copy()

fl_3['date_of_planting'] = day_of_planting
print(fl_3.shape)
# fl_3.head()

(656, 10)


Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,date,range,pass,date_of_planting
9164,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2016-09-27 00:00:00-05:00,2,12,2016-06-17
9165,Ashland Bottoms KSU 2016 Season Range 2 Pass 19,39.138923,-96.631378,Number of days from sowing to the date when 50...,72.0,Macia,2016-08-28 00:00:00-05:00,2,19,2016-06-17
9166,Ashland Bottoms KSU 2016 Season Range 2 Pass 21,39.138922,-96.631306,Number of days from sowing to the date when 50...,52.0,SC1103,2016-08-08 00:00:00-05:00,2,21,2016-06-17
9167,Ashland Bottoms KSU 2016 Season Range 2 Pass 22,39.138922,-96.631271,Number of days from sowing to the date when 50...,57.0,SC1345,2016-08-13 00:00:00-05:00,2,22,2016-06-17
9168,Ashland Bottoms KSU 2016 Season Range 2 Pass 5,39.138927,-96.63187,Number of days from sowing to the date when 50...,65.0,P898012,2016-08-21 00:00:00-05:00,2,5,2016-06-17


#### Create timedelta using `flowering_time` values

In [136]:
timedelta_values = fl_3['days_to_flowering'].values
dates_of_flowering = []

for val in timedelta_values:
    
    date_of_flowering = day_of_planting + datetime.timedelta(days=val)
    dates_of_flowering.append(date_of_flowering)
    
print(fl_3.shape[0])
print(len(dates_of_flowering))

656
656


In [137]:
fl_4 = fl_3.copy()
fl_4['date_of_flowering'] = dates_of_flowering
print(fl_4.shape)
# fl_4.sample(n=5, random_state=42)

(656, 11)


Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,date,range,pass,date_of_planting,date_of_flowering
13672,Ashland Bottoms KSU 2016 Season Range 13 Pass 12,39.139317,-96.631628,Number of days from sowing to the date when 50...,75.0,PI147224,2016-08-31 00:00:00-05:00,13,12,2016-06-17,2016-08-31
13745,Ashland Bottoms KSU 2016 Season Range 11 Pass 2,39.139249,-96.63198,Number of days from sowing to the date when 50...,63.0,PI641821,2016-08-19 00:00:00-05:00,11,2,2016-06-17,2016-08-19
9265,Ashland Bottoms KSU 2016 Season Range 11 Pass 2,39.139249,-96.63198,Number of days from sowing to the date when 50...,63.0,PI641821,2016-08-19 00:00:00-05:00,11,2,2016-06-17,2016-08-19
25470,Ashland Bottoms KSU 2016 Season Range 26 Pass 7,39.139783,-96.631811,Number of days from sowing to the date when 50...,84.0,PI643008,2016-09-09 00:00:00-05:00,26,7,2016-06-17,2016-09-09
25368,Ashland Bottoms KSU 2016 Season Range 11 Pass 7,39.139247,-96.631804,Number of days from sowing to the date when 50...,74.0,PI152694,2016-08-30 00:00:00-05:00,11,7,2016-06-17,2016-08-30


#### Add GDD flowering DataFrame

In [138]:
weather_gdd = weather_10[['date', 'gdd']]
print(weather_gdd.shape)
# weather_gdd.head()

(127, 2)


Unnamed: 0,date,gdd
2,2016-06-17,21.0
3,2016-06-18,38.0
4,2016-06-19,57.0
5,2016-06-20,77.0
6,2016-06-21,98.0


In [139]:
# fl_4.dtypes

sitename                                             object
lat                                                 float64
lon                                                 float64
trait_description                                    object
days_to_flowering                                   float64
cultivar                                             object
date                 datetime64[ns, pytz.FixedOffset(-300)]
range                                                 int64
pass                                                  int64
date_of_planting                                     object
date_of_flowering                                    object
dtype: object

In [140]:
fl_5 = fl_4.copy()
fl_5.date_of_flowering = pd.to_datetime(fl_5.date_of_flowering)
# fl_5.dtypes

sitename                                             object
lat                                                 float64
lon                                                 float64
trait_description                                    object
days_to_flowering                                   float64
cultivar                                             object
date                 datetime64[ns, pytz.FixedOffset(-300)]
range                                                 int64
pass                                                  int64
date_of_planting                                     object
date_of_flowering                            datetime64[ns]
dtype: object

In [141]:
fl_6 = fl_5.merge(weather_gdd, how='left', left_on='date_of_flowering', right_on=weather_gdd['date'])
print(fl_6.shape)
# fl_6.head()

(656, 13)


Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,date_x,range,pass,date_of_planting,date_of_flowering,date_y,gdd
0,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2016-09-27 00:00:00-05:00,2,12,2016-06-17,2016-09-27,2016-09-27,1590.0
1,Ashland Bottoms KSU 2016 Season Range 2 Pass 19,39.138923,-96.631378,Number of days from sowing to the date when 50...,72.0,Macia,2016-08-28 00:00:00-05:00,2,19,2016-06-17,2016-08-28,2016-08-28,1196.0
2,Ashland Bottoms KSU 2016 Season Range 2 Pass 21,39.138922,-96.631306,Number of days from sowing to the date when 50...,52.0,SC1103,2016-08-08 00:00:00-05:00,2,21,2016-06-17,2016-08-08,2016-08-08,900.0
3,Ashland Bottoms KSU 2016 Season Range 2 Pass 22,39.138922,-96.631271,Number of days from sowing to the date when 50...,57.0,SC1345,2016-08-13 00:00:00-05:00,2,22,2016-06-17,2016-08-13,2016-08-13,986.0
4,Ashland Bottoms KSU 2016 Season Range 2 Pass 5,39.138927,-96.63187,Number of days from sowing to the date when 50...,65.0,P898012,2016-08-21 00:00:00-05:00,2,5,2016-06-17,2016-08-21,2016-08-21,1098.0


#### Check for duplicates

In [142]:
fl_6.duplicated().value_counts()

True     492
False    164
dtype: int64

In [143]:
duplicates = fl_6.loc[fl_6.duplicated(subset=['sitename', 'days_to_flowering', 'cultivar', 'date_of_flowering', 'gdd'])]
print(duplicates.shape)
# duplicates.head(3)

(492, 13)


Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,date_x,range,pass,date_of_planting,date_of_flowering,date_y,gdd
164,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2016-09-27 00:00:00-05:00,2,12,2016-06-17,2016-09-27,2016-09-27,1590.0
165,Ashland Bottoms KSU 2016 Season Range 2 Pass 19,39.138923,-96.631378,Number of days from sowing to the date when 50...,72.0,Macia,2016-08-28 00:00:00-05:00,2,19,2016-06-17,2016-08-28,2016-08-28,1196.0
166,Ashland Bottoms KSU 2016 Season Range 2 Pass 21,39.138922,-96.631306,Number of days from sowing to the date when 50...,52.0,SC1103,2016-08-08 00:00:00-05:00,2,21,2016-06-17,2016-08-08,2016-08-08,900.0


#### Check one of the duplicated results

In [145]:
fl_6.loc[(fl_6.sitename == 'Ashland Bottoms KSU 2016 Season Range 2 Pass 12') & (fl_6.days_to_flowering == 102.0) &
        (fl_6.cultivar == 'SC1345') & (fl_6.date_of_flowering == '2016-09-27') & (fl_6.gdd == 1590.0)]

Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,date_x,range,pass,date_of_planting,date_of_flowering,date_y,gdd
0,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2016-09-27 00:00:00-05:00,2,12,2016-06-17,2016-09-27,2016-09-27,1590.0
164,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2016-09-27 00:00:00-05:00,2,12,2016-06-17,2016-09-27,2016-09-27,1590.0
328,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2016-09-27 00:00:00-05:00,2,12,2016-06-17,2016-09-27,2016-09-27,1590.0
492,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2016-09-27 00:00:00-05:00,2,12,2016-06-17,2016-09-27,2016-09-27,1590.0


In [146]:
fl_7 = fl_6.drop_duplicates(subset=['sitename', 'days_to_flowering', 'cultivar', 'date_of_flowering', 'gdd'], 
                           ignore_index=True)

print(fl_7.shape)
# fl_7.tail()

(164, 13)


Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,date_x,range,pass,date_of_planting,date_of_flowering,date_y,gdd
159,Ashland Bottoms KSU 2016 Season Range 9 Pass 23,39.13917,-96.631239,Number of days from sowing to the date when 50...,68.0,PI586435,2016-08-24 00:00:00-05:00,9,23,2016-06-17,2016-08-24,2016-08-24,1142.0
160,Ashland Bottoms KSU 2016 Season Range 9 Pass 24,39.139169,-96.631203,Number of days from sowing to the date when 50...,46.0,PI197542,2016-08-02 00:00:00-05:00,9,24,2016-06-17,2016-08-02,2016-08-02,808.0
161,Ashland Bottoms KSU 2016 Season Range 9 Pass 25,39.139169,-96.631167,Number of days from sowing to the date when 50...,61.0,PI535795,2016-08-17 00:00:00-05:00,9,25,2016-06-17,2016-08-17,2016-08-17,1046.0
162,Ashland Bottoms KSU 2016 Season Range 9 Pass 26,39.139169,-96.631132,Number of days from sowing to the date when 50...,87.0,PI217691,2016-09-12 00:00:00-05:00,9,26,2016-06-17,2016-09-12,2016-09-12,1402.0
163,Ashland Bottoms KSU 2016 Season Range 9 Pass 9,39.139175,-96.631733,Number of days from sowing to the date when 50...,64.0,PI196586,2016-08-20 00:00:00-05:00,9,9,2016-06-17,2016-08-20,2016-08-20,1088.0


#### Drop all date columns except `date_of_flowering`

In [147]:
date_cols_to_drop = ['date_x', 'date_of_planting', 'date_y']

fl_8 = fl_7.drop(labels=date_cols_to_drop, axis=1)
print(fl_8.shape)
# fl_8.head()

(164, 10)


Unnamed: 0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,range,pass,date_of_flowering,gdd
0,Ashland Bottoms KSU 2016 Season Range 2 Pass 12,39.138925,-96.631623,Number of days from sowing to the date when 50...,102.0,SC1345,2,12,2016-09-27,1590.0
1,Ashland Bottoms KSU 2016 Season Range 2 Pass 19,39.138923,-96.631378,Number of days from sowing to the date when 50...,72.0,Macia,2,19,2016-08-28,1196.0
2,Ashland Bottoms KSU 2016 Season Range 2 Pass 21,39.138922,-96.631306,Number of days from sowing to the date when 50...,52.0,SC1103,2,21,2016-08-08,900.0
3,Ashland Bottoms KSU 2016 Season Range 2 Pass 22,39.138922,-96.631271,Number of days from sowing to the date when 50...,57.0,SC1345,2,22,2016-08-13,986.0
4,Ashland Bottoms KSU 2016 Season Range 2 Pass 5,39.138927,-96.63187,Number of days from sowing to the date when 50...,65.0,P898012,2,5,2016-08-21,1098.0


#### Sort flowering dataframe by `date_of_flowering`

In [149]:
fl_9 = fl_8.set_index(keys=['date_of_flowering']).sort_index()
fl_9.head(3)

Unnamed: 0_level_0,sitename,lat,lon,trait_description,days_to_flowering,cultivar,range,pass,gdd
date_of_flowering,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2016-08-02,Ashland Bottoms KSU 2016 Season Range 13 Pass 13,39.139317,-96.631593,Number of days from sowing to the date when 50...,46.0,PI152651,13,13,808.0
2016-08-02,Ashland Bottoms KSU 2016 Season Range 9 Pass 24,39.139169,-96.631203,Number of days from sowing to the date when 50...,46.0,PI197542,9,24,808.0
2016-08-03,Ashland Bottoms KSU 2016 Season Range 10 Pass 13,39.13921,-96.631592,Number of days from sowing to the date when 50...,47.0,PI197542,10,13,827.0


#### Write flowering dataframe to `.csv`

In [150]:
timestamp = datetime.datetime.now().replace(microsecond=0).isoformat()
output_filename = f'data/processed/ksu_flowering_{timestamp}.csv'.replace(':', '')

fl_9.to_csv(output_filename, index=True)