# AGRON 935 - Semester project

**Name:** Javier Fernandez <br/>
**Semester:** Spring 2019 <br/>
**Project area:** Agronomy <br/>

## Table of contents
1. [Motivation for the project](#motivation_project)
2. [Current state and roadblocks](#state_roadblocks)
3. [Future steps](#future_steps)

<a name="motivation_project"></a>
### 1. Motivation for the project

The objective is to create a code that can determine and calculate the HTT (Hourly Thermal Time) and DTT (Daily Thermal Time) using Kansas Mesonet database for grain filling samples. Periods of time for HTT and DTT calculations will be between a flowering date (starting point) and a sampling date for each data point.

Moreover, the code will allow the user to choose a minimum and maximum cardinal temperatures for calculations on HTT and DTT. This will be useful to evaluate and compare different base temperatures for grain filling.

<a name="state_roadblocks"></a>
### 2. Current state and Roadblocks

The code now is calculating DTT for each sample in the data. Still the code is needing the input weather data as a .csv file directly from Kansas Mesonet website, and then is organizing the data to obtain each sample DTT.

##### Steps:
1. [Import modules](#import_modules)
2. [Import data](#import_data)
3. [Handling missing values](#missing_values)
4. [Calculate thermal time for each day](#daily_TT)
5. [Sumation of thermal time for each sample](#sample_TT)

Input files neccesary are:

  - daily and hourly weather .csv file directly from Kansas Mesonet website
  
  - grain filling or biomass data, especially having columns for sampling date (here "/Samp..Date2") and starting counting date (here /Silking2)

<a name="import_modules"></a>
***2.1 Import modules***

In [2]:
import pandas as pd
import numpy as np

# Working directory for the project files
dirname = 'https://raw.githubusercontent.com/jafernandez01/project/master/march_presentation/'

<a name="import_data"></a>
***2.2 Import data***

Grain filling data, with sample and flowering dates:

In [3]:
df = pd.read_csv(dirname + 'sample_dat.csv') # G, T, S, and Time are values representing treatments and experimental design
df = pd.DataFrame(df)
df.head()

Unnamed: 0,Plot,G,T,S,Time,Sampling,Flowering,DW_mg
0,101,1,1,1,1,7/20/2017,7/8/2017,28.2436
1,102,2,2,2,1,7/20/2017,7/7/2017,39.0236
2,103,3,3,3,1,7/20/2017,7/8/2017,25.0096
3,104,1,2,4,1,7/20/2017,7/9/2017,37.0832
4,105,2,3,5,1,7/20/2017,7/8/2017,24.255


Daily weather data:

In [100]:
wtrD = pd.read_csv(dirname + 'daily_2017.csv',skiprows=3,header=None)

#Renaming and dropping unused columns
wtrD = wtrD.rename(columns={0: 'Day', 2: 'Max', 3: 'Min'})
wtrD = wtrD.iloc[:,[0,2,3]] # keep only selected columns

# Converting values to type = float. Errors = coerce return invalid conversions to Nan
wtrD['Min'] = pd.to_numeric(wtrD['Min'], errors='coerce')
wtrD['Max'] = pd.to_numeric(wtrD['Max'], errors='coerce')

# Calculating daily mean temperatures
wtrD['Average'] = ((wtrD['Min']) + (wtrD['Max']))/2

wtrD.head()

Unnamed: 0,Day,Max,Min,Average
0,2017-07-05,31.4,16.8,24.1
1,2017-07-06,34.4,17.7,26.05
2,2017-07-07,30.6,17.4,24.0
3,2017-07-08,32.3,14.5,23.4
4,2017-07-09,34.9,19.0,26.95


Hourly weather data:

In [101]:
# For the hourly data, Mesonet does not allow to obtain more than 30 days in one same file. In this case we are working with a range of 
# 3 months, so we will have a total of 3 files

wtrH = pd.DataFrame(columns = ['Day', 'Temp']) # Creating an empty dataframe

# For loop to open three months at the same time and appending them at the end
for i in ['jul', 'aug', 'sept']:
    if i == 'jul':
        file = "july_hourly.csv"
    elif i == 'aug':
        file = "aug_hourly.csv"
    elif i == 'sept':
        file = "sept_hourly.csv"
        
    dat = pd.read_csv(dirname + file,skiprows=3,header=None)
    
    dat = dat.rename(columns={0: 'Day', 2: 'Temp'})
    dat = dat.iloc[:,[0,2]]
    
    wtrH = wtrH.append(dat)

# Converting values to type = float. Errors = coerce return invalid conversions to Nan
wtrH['Temp'] = pd.to_numeric(wtrH['Temp'], errors='coerce') 


# As the day column contains the day and the hour, we need to separate in two columns these values
# For this we split that column based on the space between strings
new = wtrH["Day"].str.split(" ", expand = True) 
 
wtrH.drop(columns =["Day"])  # Drop the old "Day" column

wtrH["Day"]= new[0]
wtrH["Time"]= new[1]

wtrH.head().append(wtrH.tail())  # Print the first five and last five rows to check that the data is OK

Unnamed: 0,Day,Temp,Time
0,2017-07-05,19.8,00:00
1,2017-07-05,19.0,01:00
2,2017-07-05,18.4,02:00
3,2017-07-05,17.9,03:00
4,2017-07-05,17.6,04:00
307,2017-09-13,25.9,19:00
308,2017-09-13,23.5,20:00
309,2017-09-13,23.4,21:00
310,2017-09-13,22.7,22:00
311,2017-09-13,22.0,23:00


<a name="missing_values"></a>
***2.3 Handling missing values***

In [102]:
# Now, let's check missing values in our dataframes
#print(df.isna().sum())
#print(wtrH.isna().sum())
#print(wtrD.isna().sum())

*This was the most difficult part to define, and I still have not decide if this is the best way to do it. For now, I was able to solve it by:*

- For the grain filling database, if missing values are encountered (in this example, is 1 row) we will just drop that row or measurement

- For the weather database, missing temperatures will be estimated with the previous recorded value. In this case, we do not have missing values in the daily weather file, so we will focus on the hourly weather data.

In [103]:
df = df[pd.notnull(df['DW_mg'])] # Drop rows that are not null in the DW_mg column

wtrH = wtrH.fillna(method='ffill')  # ffill stands for forward fill, so it is using the last notNan value to fill a Nan

<a name="daily_TT"></a>
***2.4 Calculate thermal time for each day***

We will use McMaster and Wilhelm (1997) method for calculating DTT, which has been widely used in studies. It assumes a linear relationship using the mean daily temperature, with an upper critical value.

- Tmax is the maximum temperature, Tmin is the minimum temperature, Tavg = (Tmax + Tmin)/2 
- Tbase is the base temperature = 8 C
- Tupp is the upper threshold temperature = 40 C

Input = Tmin and Tmax (daily)

In [104]:
# We define a function to calculate each GD for a given temperature.
def GDD1 (T_avg, tbase=8, Tupp = 40):
    if T_avg <= tbase:
        GD = 0
    elif T_avg >= Tupp:
        GD = Tupp - tbase
    else:
        GD = T_avg - tbase
    
    return GD

In [105]:
# We apply the function to each row in the daily weather file and we store it into a new column named GDD1.
wtrD['GDD1'] = wtrD.Average.apply(GDD1)
wtrD.head()

Unnamed: 0,Day,Max,Min,Average,GDD1
0,2017-07-05,31.4,16.8,24.1,16.1
1,2017-07-06,34.4,17.7,26.05,18.05
2,2017-07-07,30.6,17.4,24.0,16.0
3,2017-07-08,32.3,14.5,23.4,15.4
4,2017-07-09,34.9,19.0,26.95,18.95


<a name="sample_TT"></a>
***2.5 Sumation of thermal time for each sample***

In [106]:
# Set the dates as a same format for all the values we will be using
df['Sampling'] = pd.to_datetime(df['Sampling'], format='%m/%d/%Y')
df['Flowering'] = pd.to_datetime(df['Flowering'], format='%m/%d/%Y')
wtrD['Day'] = pd.to_datetime(wtrD['Day'], format='%Y/%m/%d')
wtrH['Day'] = pd.to_datetime(wtrH['Day'], format='%Y/%m/%d')

# In this section we calculated the sum of GDD or thermal time based on the new calculated weather file.
# For this, we define a function that actually performs the sumation indexing on sampling and flowering dates

def calc (start, end):
      
    TT = wtrD[(wtrD.Day >= start) & (wtrD.Day <= end)].GDD1.sum()
    return TT

In [107]:
# Then we need to iterate this function across all my rows in the dataframe, so I found the intertuples function useful.

mylist = [] # create an empty list
for row in df.itertuples():
    TT = calc(row.Flowering, row.Sampling)
    mylist.append(TT)   #append the TT values for each row in a list

# Add that list as a new column named GDD1 in our dataframe

df['GDD1'] = np.asarray(mylist)
df.head()

Unnamed: 0,Plot,G,T,S,Time,Sampling,Flowering,DW_mg,GDD1
0,101,1,1,1,1,2017-07-20,2017-07-08,28.2436,269.7
1,102,2,2,2,1,2017-07-20,2017-07-07,39.0236,285.7
2,103,3,3,3,1,2017-07-20,2017-07-08,25.0096,269.7
3,104,1,2,4,1,2017-07-20,2017-07-09,37.0832,254.3
4,105,2,3,5,1,2017-07-20,2017-07-08,24.255,269.7


<a name="future_steps"></a>
### 3. Future Steps

- The following step will be to create a function that calculates HTT using the hourly datafile, and then to calculate the sumation correspondent to each sample date. Basically, it will do the same as DTT but using hourly data.

- The next part of the code should be able to retrieve the specific .csv file from Mesonet website, just entering the dates and weather station.