## CNN Architecture and Training

Due to large file sizes for both the data used to train each of the CNN models and the models themselves, neither the models nor the full datasets have been uploaded to Github. Instead, only the code for encoding the images through VGG16 (which has been commented out purposefully) and training the CNN models is given here. Note that even though height is the target trait in this code, the same network architecture was used for all models trained in the paper.

In [5]:
import keras as k
import numpy as np
import cv2

In [4]:
"""Data set-up - because files are too large, these lines are included only to show how the training and testing data sets
were constructed. Note that random splitting into training and testing sets occurred prior to this step, so training images
were stored first follwed by test images."""

#vg = k.applications.vgg16.VGG16(include_top=False, input_shape=(1028,1227,3))
#encd_img = np.zeros((2000,32,38,512))
#for i in range(2000):
#    img_test = np.divide(cv2.imread('C:/Users/jason/Documents/Soy/SoyImg'+str(i)+'.png'),255)
#    i_test = np.zeros((1,1028,1227,3))
#    i_test[0,:,:,:] = img_test
#    encd_img[i,:,:,:] = vg.predict(i_test)

#soy_hts = np.loadtxt('C:/Users/jason/Documents/Soy/Data/hts_from_seg.txt')
#soy_train = encd_img[0:1800,:,:,:]
#soy_test = encd_img[1800:2000,:,:,:]
#hts_train = soy_hts[0:1800]
#hts_test = soy_hts[1800:2000]

In [None]:
###Create model architecture into which encoded images can be fed###
soy_vgg = k.models.Sequential()
soy_vgg.add(k.layers.Flatten(data_format = 'channels_last',input_shape = (32,38,512)))
soy_vgg.add(k.layers.Dense(64,activation='relu'))
soy_vgg.add(k.layers.Dense(64,activation='relu'))
soy_vgg.add(k.layers.Dense(1,activation='linear'))
#soy_vgg.summary()  #used to view a summary of the model

In [None]:
###Model compilation - adam optimizer, mse loss function, and tracking mae through training###
soy_vgg.compile(loss = 'mse',optimizer = k.optimizers.adam(0.0008), metrics = ['mae'])

In [None]:
###Model training - validation split = 0.1111 holds out 1/9 (200) of the training images as validation images###

#hist = soy_vgg.fit(soy_train, hts_train, epochs = 100, batch_size = 128, validation_split = 0.1111)

In [None]:
"""Model prediction on test images - with large enough RAM, this section of code could simply be soy_vgg.predict(soy_test).
However, because of the large RAM requirements, it was necessary for us predict on each image individually. Because the 
model is trained on a four-dimensional array (one dimension each for the number of images, height, width, and channels), 
a four-dimensional array was required as input. Thus such an array is first constructed for each image in the test set 
before prediction takes place."""
pred_hts = np.zeros(200)
for i in range(200):
    img_in = np.zeros((1,32,38,512))
    img_in[0,:,:,:] = soy_test[i,:,:,:]
    pred_hts[i] = soy_vgg.predict(img_in)