# FCCSchema Showcase

In [1]:
import coffea
print(f"Coffea Development : {coffea.__version__}\nModified by : {coffea.__mod__}")

Coffea Development : 0.1.dev3583+ge06c4b8
Modified by : Prayag Yadav


## Some comments

- This is a first attempt to build a useful Schema for FCC Event samples
- I have considered the Spring 2021 samples to build the schema
- The EDM4HEP schema by @jbrewster7 and @lgray was incompatible to the FCC events in many aspects:
    - Even though FCC Events are built on edm4hep classes, the names of the branches are different.
    - The FCC Events have indexed branches like Electron#0, that are not accomodated in the EDM4HEP schema
    - The EDM4HEP Schema defines some special classes like Track, Cluster, RecoParticle and MCTruthParticle. I don't understand the purpose of all them but Reco Particle and MCTruthParticle.
- RecoParticle and MCTruthParticle are mixin classes defined in the coffea/nanoevents/methods/fcc.py
- RecoParticle and MCTruthParticle inherit from LorentzVector, so the branches having those record names, can utilize LorentzVector functions

## Load the data

In [7]:
test_file = 'root://eospublic.cern.ch//eos/experiment/fcc/ee/generation/DelphesEvents/spring2021/IDEA/p8_ee_ZH_ecm240/events_101027117.root'
from coffea.nanoevents import NanoEventsFactory, FCCSchema
events = NanoEventsFactory.from_root(
    test_file+":events",
    entry_stop=10000,
    schemaclass=FCCSchema,
    delayed=True #Delayed=False doesn't work, more info in later sections
).events()

The numbers of fields in events has reduced as a result of all the collecting.

In [12]:
events.fields

['Particleidx',
 'MissingETidx',
 'AllMuonidx',
 'EFlowTrackidx',
 'Muonidx',
 'ReconstructedParticlesidx',
 'EFlowPhotonidx',
 'MCRecoAssociationsidx',
 'Photonidx',
 'EFlowNeutralHadronidx',
 'Jetidx',
 'Electronidx',
 'MissingET',
 'ParticleIDs',
 'EFlowPhoton',
 'ReconstructedParticles',
 'Jet',
 'EFlowTrack',
 'MCRecoAssociations',
 'Particle',
 'EFlowNeutralHadron',
 'EFlowTrack_1']

## Indexed fields
- All the index fields like Jet#0, Jet#1 etc have renamed to Jetidx0, Jetidx1 etc and zipped together into the Jetidx branch

In [13]:
events.Jetidx.fields

['Jetidx4', 'Jetidx2', 'Jetidx3', 'Jetidx5', 'Jetidx1', 'Jetidx0']

In [14]:
events.Jetidx.Jetidx0.fields

['index', 'collectionID']

In [16]:
events.Muonidx.Muonidx0.index.compute()

## Mixin class support
- A dictionary in the coffea/nanoevents/schemas/fcc.py defines the mixin mapping, i.e., it defines the behavior of each branch
- By default, the branches behave like a simple NanoCollection object

In [18]:
# Get the class of all the collected branches
for field in events.fields:
    print(f"{field} : ", events[field].layout.content.parameter("__record__"))

Particleidx :  NanoCollection
MissingETidx :  NanoCollection
AllMuonidx :  NanoCollection
EFlowTrackidx :  NanoCollection
Muonidx :  NanoCollection
ReconstructedParticlesidx :  NanoCollection
EFlowPhotonidx :  NanoCollection
MCRecoAssociationsidx :  NanoCollection
Photonidx :  NanoCollection
EFlowNeutralHadronidx :  NanoCollection
Jetidx :  NanoCollection
Electronidx :  NanoCollection
MissingET :  RecoParticle
ParticleIDs :  NanoCollection
EFlowPhoton :  NanoCollection
ReconstructedParticles :  RecoParticle
Jet :  RecoParticle
EFlowTrack :  Cluster
MCRecoAssociations :  NanoCollection
Particle :  MCTruthParticle
EFlowNeutralHadron :  NanoCollection
EFlowTrack_1 :  NanoCollection


As an example, ReconstructedParticles branch behaves like a 'RecoParticle' class object and hence has a LorentzVector behaviour

In [26]:
events.ReconstructedParticles.layout.content.parameter("__record__")

'RecoParticle'

In [19]:
events.ReconstructedParticles.fields

['type',
 'E',
 'x',
 'y',
 'z',
 'referencePoint.x',
 'referencePoint.y',
 'referencePoint.z',
 'charge',
 'mass',
 'goodnessOfPID',
 'covMatrix[10]',
 'clusters_begin',
 'clusters_end',
 'tracks_begin',
 'tracks_end',
 'particles_begin',
 'particles_end',
 'particleIDs_begin',
 'particleIDs_end']

In [22]:
events.ReconstructedParticles.pt.compute() #unavailable field, but possible to get, by the virtue of RecoParticle inheriting from the LorentzVector class

Extra Methods for the RecoParticle could be defined easily. 
Similar to RecoParticle, the Particle branch is a MCTruthParticle class

In [27]:
events.Particle.layout.content.parameter("__record__")

'MCTruthParticle'

In [23]:
events.Particle.fields

['PDG',
 'generatorStatus',
 'simulatorStatus',
 'charge',
 'time',
 'mass',
 'vertex.x',
 'vertex.y',
 'vertex.z',
 'endpoint.x',
 'endpoint.y',
 'endpoint.z',
 'x',
 'y',
 'z',
 'momentumAtEndpoint.x',
 'momentumAtEndpoint.y',
 'momentumAtEndpoint.z',
 'spin.x',
 'spin.y',
 'spin.z',
 'colorFlow.a',
 'colorFlow.b',
 'parents_begin',
 'parents_end',
 'daughters_begin',
 'daughters_end']

In [25]:
events.Particle.pt.compute()

## Things to work on
- Create special classmethods to associate the index branches to their corresponding associated collections
  For example Muons = events.ReconstructedParticle.somefunction(events.Muonidx.Muonix0)
- Work on Associating Generated particles with their children and parent particles
- Work on Associating MC and Reco particles
- Understand the edm4hep structure that edm4schema schema tries to model(Although i could not find consistent info on this, i might some help on this)
- Fix some bugs(see below)

## Some bugs and Issues

- No dask-free functionality

In [29]:
test_file = 'root://eospublic.cern.ch//eos/experiment/fcc/ee/generation/DelphesEvents/spring2021/IDEA/p8_ee_ZH_ecm240/events_101027117.root'
from coffea.nanoevents import NanoEventsFactory, FCCSchema
events = NanoEventsFactory.from_root(
    test_file+":events",
    entry_stop=10000,
    schemaclass=FCCSchema,
    delayed=False #Delayed=False doesn't work, and i cant understand why
).events()

TypeError: size of array (331938) is less than size of form (606222)

- The vector class defined in coffea is going to be depricated soon and need to be replaced with scikit vector

- np.sum and np.mult overload artifact

In [35]:
import dask_awkward as dak
goodevents = events.ReconstructedParticles[dak.num(events.ReconstructedParticles) >= 2 ] # at least two reco particles
r1 = goodevents[:,0].compute() # first reco particle in each event
r2 = goodevents[:,1].compute() # second reco particle in each event

r1+r2

AttributeError: no field named 't'