# Awkward NN Trial 2

## Get ROOT Data

### First naive approach at getting data:

- The data is stored in column/feature format on ROOT files

- But we want (I assume) to pass data through the network in row/event format

- So, beforehand, I convert from column to row based data as I
read it in.

    - But, this is kind of slow since I have to iterate
      through each row in each column. Currently, awkward-array does
      not support this type of operation in a vectorized manner.

- Result: a list of events, each with varying numbers of particles
and where each particle has varying numbers of fields.

- Note: Some fields are interesting in that...

    - they are also nested, e.g. for an event that has 3 jets, the field known
      as `Jet.Mass` has 3 floating pt numbers (one for each jet), but the
      field  `Jet.Tau[5]` has 3 vectors of length five. So, when I convert this
      field to row/event format, I flatten it such that five numbers are added to
      the jet instead of 1 5-vector.

    - the size of the vector associated with the field is a constant multiple of
      the number of expected fields, e.g. an event with 3 jets has a field
      `Jets.TrimmedP4[5]` with 15 floating pt numbers. So when I convert this,
      I add the first 5 to the first jet, the next 5 to the second jet, and so
      on. I do this with all such types of fields, checking that:
      ```len(field) % (# of subevents) == 0```

    - some fields have varying lengths, e.g. while each jet has 1 `Jet.Mass`
      and 5 `Jets.TrimmedP4[5]`, it has a variable number
      of `Jets.Constituents`, thus making it "double-jagged"

- Note: For the field of type `TLorentzVector`, I exchange this field for its associated energy `E` attribute. Although this can of course be changed.

- Note: current `uproot` memory issue reading in the following features:
    - b'Particle.fBits'
    - b'Track.fBits'
    - b'Tower.fBits'
    - b'EFlowTrack.fBits'
    - b'EFlowPhoton.fBits'
    - b'EFlowNeutralHadron.fBits'

In [None]:
import uproot
from sklearn.model_selection import train_test_split
from awkwardNN.preprocessRoot import get_events
from awkwardNN.awkwardNN import awkwardNN

tree1 = uproot.open("../data/test_qcd_1000.root")["Delphes"]
tree2 = uproot.open("../data/test_ttbar_1000.root")["Delphes"]

# Get data that is "double-jagged":
# e.g. every event has a varying number of jets and every jet has a varying number of fields
varying_fields = ["Jet*"]
X1 = get_events(tree1, varying_fields)
X2 = get_events(tree2, varying_fields)
y1 = [1] * len(X1)
y2 = [0] * len(X2)
X = X1 + X2
y = y1 + y2
X_train_double_jagged, X_test_double_jagged, y_train_double_jagged, y_test_double_jagged = train_test_split(X, y, test_size=0.1)

# Get data that is "single-jagged": i.e. every
# fields such that every particle in every event has the same number of fields
fixed_fields = ['Particle.E', 'Particle.P[xyz]']
X1 = get_events(tree1, fixed_fields)
X2 = get_events(tree2, fixed_fields)
X = X1 + X2
X_train_single_jagged, X_test_single_jagged, y_train_single_jagged, y_test_single_jagged = train_test_split(X, y, test_size=0.1)


## Create and train pytorch neural net

2 types of AwkwardNNs so far: RNN/LSTM/GRU and Deepset

### For RNN single jagged - training procedure (essentially like a normal RNN)
- For each event:
    - Initialize hidden state
    - For each particle
        - Pass particle + hidden state through RNN

### For RNN double jagged - training procedure (1 RNN with 2 hidden states)

- For each event:
    - Initialize first hidden state
    - For each particle
        - Initialize second hidden state
        - For each feature in particle
            - Pass feature + hidden states through RNN

In [None]:
# use keyword argument `mode` to specify whether it is an `rnn`, `lstm`, `gru`, or `deepset` (next section)

# use keyword argument `feature_size_fixed` to distinguish between single-jagged RNN (feature_size_fixed=True)
# and double-jagged RNN (feature_size_fixed=True (default))

num_epochs = 100

# single jagged
rnn_single_jagged = awkwardNN(mode='rnn', max_iter=num_epochs, verbose=True, feature_size_fixed=True)
rnn_single_jagged.train(X_train_single_jagged, y_train_single_jagged)
rnn_single_jagged.test(X_test_single_jagged, y_test_single_jagged)


# double jagged
rnn_double_jagged = awkwardNN(mode='rnn', max_iter=num_epochs, verbose=True, feature_size_fixed=False)
rnn_double_jagged.train(X_train_double_jagged, y_train_double_jagged)
rnn_double_jagged.test(X_test_double_jagged, y_test_double_jagged)


### For Deepset single jagged - training procedure (like a normal deepset)

- 2 networks: phi network and rho network
- For each event:
    - For each particle:
        - Pass particle through phi network
    - Sum outputs from phi network
    - Pass sum through rho network

### For Deepset double jagged - training procedure (essentially 2 Deepsets stacked on each other)

- One deepset for particles in events
- One deepset for features in particle


In [None]:
# single jagged
rnn_single_jagged = awkwardNN(mode='deepset', max_iter=num_epochs, verbose=True, feature_size_fixed=True)
rnn_single_jagged.train(X_train_single_jagged, y_train_single_jagged)
rnn_single_jagged.test(X_test_single_jagged, y_test_single_jagged)


# double jagged
rnn_double_jagged = awkwardNN(mode='deepset', max_iter=num_epochs, verbose=True, feature_size_fixed=False)
rnn_double_jagged.train(X_train_double_jagged, y_train_double_jagged)
rnn_double_jagged.test(X_test_double_jagged, y_test_double_jagged)
