# AwkwardNN: Trial 2

### Get ROOT Data

First naive approach:
- The data is stored in column/feature format on ROOT files
- But we want to pass forward row/event formatted data through
the neural net
- So, beforehand, I convert from column to row based data as I
read it in.
    - But, this is kind of slow since I have to iterate
through each row in each column. Currently, awkward-array does
not support this type of operation in a vectorized manner.
- Result: a list of events, each with varying numbers of particles
and where each particle has varying numbers of features.
- Note: current `uproot` memory issue reading in features:
    - b'Particle.fBits'
    - b'Track.fBits'
    - b'Tower.fBits'
    - b'EFlowTrack.fBits'
    - b'EFlowPhoton.fBits'
    - b'EFlowNeutralHadron.fBits'


In [1]:
import uproot
from sklearn.model_selection import train_test_split
from awkwardNN.awkwardNN import awkwardNN
from awkwardNN.preprocessRoot import get_events

tree1 = uproot.open("../data/test_qcd_1000.root")["Delphes"]
tree2 = uproot.open("../data/test_ttbar_1000.root")["Delphes"]

# Can choose fields based on names for which to train on
fields = ["Jet*"]
X1 = get_events(tree1, fields)
X2 = get_events(tree2, fields)
y1 = [1] * len(X1)
y2 = [0] * len(X2)
X = X1 + X2
y = y1 + y2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)


  return cls.numpy.array(value, copy=False)


### Create and train pytorch neural net

Training procedure:
- For each event:
    - initialize first hidden state
    - For each particle
        - initialize second hidden state
        - For each feature in particle
            - pass feature + hidden states through RNN


In [2]:
from awkwardNN.awkwardNN import awkwardNN

# Trial for 10 epochs
# other arguments possible to set for Awkward NN initialization
# can specify hidden layers sizes, number of layers, etc.
model1 = awkwardNN(mode='rnn', max_iter=10)
#model1 = awkwardNN(mode='lstm', max_iter=10)
#model1 = awkwardNN(mode='gru', max_iter=10)

model1.train(X_train, y_train)
model1.test(X_test, y_test)


Valid set (epoch 1:
    Avg. loss: 0.6935, Accuracy: 93/180 (52%) [*]

Valid set (epoch 2:
    Avg. loss: 0.6931, Accuracy: 92/180 (51%)

Valid set (epoch 3:
    Avg. loss: 0.6919, Accuracy: 102/180 (57%) [*]

Valid set (epoch 4:
    Avg. loss: 0.6921, Accuracy: 106/180 (59%) [*]

Valid set (epoch 5:
    Avg. loss: 0.6934, Accuracy: 86/180 (48%)

Valid set (epoch 6:
    Avg. loss: 0.6941, Accuracy: 86/180 (48%)

Valid set (epoch 7:
    Avg. loss: 0.6933, Accuracy: 86/180 (48%)

Valid set (epoch 8:
    Avg. loss: 0.6940, Accuracy: 86/180 (48%)

Valid set (epoch 9:
    Avg. loss: 0.6939, Accuracy: 86/180 (48%)

Valid set (epoch 10:
    Avg. loss: 0.6939, Accuracy: 86/180 (48%)

[*] Test set:
    Avg. loss: 0.6925, Accuracy: 107/200 (54%)



(0.6924584510422283, tensor(53.5000))

### Can also make DeepSet networks

Model:
- 2 Deepsets
    - one for particles in events
    - one for features in particles

In [3]:
model2 = awkwardNN(mode='deepset', max_iter=10)
model2.train(X_train, y_train)
model2.test(X_test, y_test)





Valid set (epoch 1:
    Avg. loss: 934.4925, Accuracy: 102/180 (57%) [*]

Valid set (epoch 2:
    Avg. loss: 312.9083, Accuracy: 77/180 (43%)

Valid set (epoch 3:
    Avg. loss: 228.3764, Accuracy: 77/180 (43%)

Valid set (epoch 4:
    Avg. loss: 1076.7963, Accuracy: 102/180 (57%)

Valid set (epoch 5:
    Avg. loss: 222.8156, Accuracy: 77/180 (43%)

Valid set (epoch 6:
    Avg. loss: 262.3893, Accuracy: 102/180 (57%)

Valid set (epoch 7:
    Avg. loss: 220.2862, Accuracy: 102/180 (57%)

Valid set (epoch 8:
    Avg. loss: 439.0076, Accuracy: 102/180 (57%)

Valid set (epoch 9:
    Avg. loss: 17.4613, Accuracy: 102/180 (57%)

Valid set (epoch 10:
    Avg. loss: 532.0515, Accuracy: 102/180 (57%)

[*] Test set:
    Avg. loss: 1053.0035, Accuracy: 107/200 (54%)



(1053.0034724855886, tensor(53.5000))