# ML Challenge 6 Part 2

Here I want to train a model on the PNG images generated in part 1. To start with, I'd like to train a convolutional network on the single pulse images. This removes the need for the network to learn if there are 0,1, or 2 pulses so I can focus on just getting the amplitude and time labels right. My hypothesis is that the resulting kernels will take the form of "V" shapes that can be fit to the downward facing peaks. The network should then extract the time and amplitude from the position the kernel was in when it best matched the peak. To help get the best match, I choose a kernel shape matched to the peak shape (i.e. narrow and tall). 

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

top_dir = '/work/halld2/home/davidl/2020.08.12.ML_challenge6/PNG/one_pulse_lines'
labels_file = top_dir+'/labels.csv'

# Read in filenames + labels  (filename,ped,A1,t1,A2,t2)
df = pd.read_csv(labels_file)
print('%d images in %s' % (len(df), top_dir))

labels_cols = ['ped','A1','t1']

# Create data generator
datagen = ImageDataGenerator(rescale=1.0/255, validation_split=0.2)

my_generator = datagen.flow_from_dataframe(
    dataframe=df,
    directory=top_dir,
    x_col='filename',
    y_col=labels_cols,
    target_size=(128, 612),
    color_mode='grayscale',
    classes=None,
    class_mode='raw',
    batch_size=32,
    shuffle=True,
    validate_filenames=False)


63000 images in /work/halld2/home/davidl/2020.08.12.ML_challenge6/PNG/one_pulse_lines
Found 63000 non-validated image filenames.


## Define the Model

In [4]:
from tensorflow.keras.models import load_model
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, Reshape, Flatten, Input, Lambda, Conv2D, Activation, MaxPooling2D, Dropout
from tensorflow.keras.optimizers import Adadelta

model = Sequential()
model.add(Conv2D(8, (20, 100), padding='same', input_shape=(128,612,1), activation='relu'))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='linear'))
model.add(Dense(3, activation='linear'))

opt = Adadelta(clipnorm=1.0)
model.compile(optimizer=opt, loss="mae", metrics=['mse','mae'])

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 128, 612, 8)       16008     
_________________________________________________________________
flatten_2 (Flatten)          (None, 626688)            0         
_________________________________________________________________
dense_4 (Dense)              (None, 32)                20054048  
_________________________________________________________________
dense_5 (Dense)              (None, 16)                528       
_________________________________________________________________
dense_6 (Dense)              (None, 3)                 51        
Total params: 20,070,635
Trainable params: 20,070,635
Non-trainable params: 0
_________________________________________________________________


In [5]:
model.fit(
    x=my_generator, epochs=1, verbose=1,
    shuffle=True,
    use_multiprocessing=False
)

   1/1969 [..............................] - ETA: 0s - loss: 75.4777 - mse: 28726.0801 - mae: 75.4777

KeyboardInterrupt: 