Skip to content

Commit

Permalink
Experiments with end-to-end training (#12)
Browse files Browse the repository at this point in the history
* start adding gnn

* add gnn to benchmarks

* Update run_training.sh

* update graph_data and EdgeNet to include edge_attr and benchmarking

* add notebook for plotting

* added first end-to-end training example

* up

* up

* up

* added end2end training examples

* up

* up

* up

* up

* up

* up

* cmdline args

* added sequential conv

* added cls accuracy monitoring

* up

* remove additional edges

* add act

* dataset location

* elem id encoding, fix norm

* fix nans

* add ntest

* added num pred and true plotting:

* added npy file saving

* switch to relu

* pfnet7 same setup as others

* up

* loss coefs configurable

* added model to predict only id

* fix bugs with relabeling

* fix plot title

* cosmetic

* up

* up

* added sinkhorn loss

* fixes

* added reordering code

* fix printout, reweighting

* added class weighting

* update readme

* dropout configurable, simplify cross-check model

* fix weight application

* update weights

Co-authored-by: Javier Duarte <jduarte@ucsd.edu>

Former-commit-id: 779431e
  • Loading branch information
jpata committed Jan 30, 2020
1 parent e57b2cd commit da81f70
Show file tree
Hide file tree
Showing 19 changed files with 2,703 additions and 245 deletions.
203 changes: 114 additions & 89 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,96 +1,28 @@
Notes on modernizing CMS particle flow, in particular [PFBlockAlgo](https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/src/PFBlockAlgo.cc) and [PFAlgo](https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/src/PFAlgo.cc).

## Standard CMS offline PF
The following pseudocode illustrates how the standard offline PF works in CMS.

```python
# Inputs and outputs of Particle Flow
# elements: array of ECAL cluster, HCAL cluster, tracks etc, size Nelem
# candidates: array of produced particle flow candidates (pions, kaons, photons etc)

# Intermediate data structures
# link_matrix: whether or not two elements are linked by having a finite distance (sparse, Nelem x Nelem)

def particle_flow(elements):

#based on https://github.com/cms-sw/cmssw/tree/master/RecoParticleFlow/PFProducer/plugins/linkers
link_matrix = compute_links(elements)

#based on https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/src/PFBlockAlgo.cc
blocks = create_blocks(elements, link_matrix)

#based on https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/src/PFAlgo.cc
candidates = []
for block in blocks:
candidates.append(create_candidates(block))

return candidates

def compute_links(elements):
Nelem = len(elements)

link_matrix = np.array((Nelem, Nelem))
link_matrix[:] = 0

#test if two elements are close by based on neighborhood implemented with KD-trees
for ielem in range(Nelem):
for jelem in range(Nelem):
if in_neighbourhood(elements, ielem, jelem):
link_matrix[ielem, jelem] = 1

return link_matrix

def in_neighbourhood(elements, ielem, jelem):
#This element-to-element neighborhood checking is done based on detector geometry
#e.g. here for TRK to ECAL: https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/plugins/linkers/TrackAndECALLinker.cc -> linkPrefilter
return True

def distance(elements, ielem, jelem):
#This element-to-element distance checking is done based on detector geometry
#e.g. here for TRK to ECAL: https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/plugins/linkers/TrackAndECALLinker.cc -> testLink
return 0.0

def create_blocks(elements, link_matrix):
#Each block is a list of elements, this is a list of blocks
blocks = []

Nelem = len(elements)

#Elements and connections between the elements
graph = Graph()
for ielem in range(Nelem):
graph.add_node(ielem)

#Check the distance between all relevant element pairs
for ielem in range(Nelem):
for jelem in range(Nelem):
if link_matrix[ielem, jelem]:
dist = distance(elements, ielem, jelem)
if dist > -0.5:
graph.add_edge(ielem, jelem)

#Find the sets of elements that are connected
for subgraph in find_subgraphs(graph):
this_block = []
for element in subgraph:
this_block.append(element)
blocks.append(this_block)

return blocks

def create_candidates(block):
#find all HCAL-ECAL-TRK triplets, produce pions
#find all HCAL-TRK pairs, produce kaons
#find all ECAL-TRK pairs, produce pions
#find all independent ECAL elements, produce photons
#etc etc
candidates = []
return candidates
```


# Overview

- [x] set up datasets and ntuples for detailed PF analysis
- [ ] GPU code for existing PF algorithms
- [x] test CLUE for element to block clustering
- [ ] port CLUE to PFBlockAlgo in CMSSW
- [ ] parallelize PFAlgo calls on blocks
- [ ] GPU-implementation of PFAlgo
- [ ] reproduce existing PF with machine learning
- [x] test element-to-block clustering with ML (Edge classifier, GNN)
- [x] test block-to-candidate regression
- [ ] end-to-end training of elements to MLPF-candidates using GNN-s
- [x] first baseline training converges to multiclass accuracy > 0.96, momentum correlation > 0.9
- [ ] improve training speed
- [ ] detailed hyperparameter scan
- [ ] further reduce bias in end-to-end training (muons, electrons, momentum tails)
- [ ] reconstruct genparticles directly from detector elements a la HGCAL, neutrino experiments etc
- [ ] set up datasets for regression genparticles from elements
- [ ] develop improved loss function for event-to-event comparison: EMD, GAN?
## Presentations

- Caltech group meeting, 2020-01-28: https://indico.cern.ch/event/881683/contributions/3714961/attachments/1977131/3291096/2020_01_21.pdf
- CMS PF group, 2020-01-17: https://indico.cern.ch/event/862200/contributions/3706909/attachments/1971145/3279010/2020_01_16.pdf
- CMS PF group, 2019-11-22: https://indico.cern.ch/event/862195/contributions/3649510/attachments/1949957/3236487/2019_11_22.pdf
- CMS PF group, 2019-11-08: https://indico.cern.ch/event/861409/contributions/3632204/attachments/1941376/3219105/2019_11_08.pdf
- Caltech ML meeting, 2019-10-31: https://indico.cern.ch/event/858644/contributions/3623446/attachments/1936711/3209684/2019_10_07_pf.pdf
Expand Down Expand Up @@ -277,3 +209,96 @@ python3 test/graph.py step3_AOD_1.root
- candidates: [Ncand, Ncand_feat] for the output PFCandidate data
- candidate_block_id: [Ncand, ] for the PFAlgo-based block id
- step3_AOD_1_dist.npz: sparse [Nelem, Nelem] distance matrix from PFBlockAlgo between the candidates

## Standard CMS offline PF
The following pseudocode illustrates how the standard offline PF works in CMS.

```python
# Inputs and outputs of Particle Flow
# elements: array of ECAL cluster, HCAL cluster, tracks etc, size Nelem
# candidates: array of produced particle flow candidates (pions, kaons, photons etc)

# Intermediate data structures
# link_matrix: whether or not two elements are linked by having a finite distance (sparse, Nelem x Nelem)

def particle_flow(elements):

#based on https://github.com/cms-sw/cmssw/tree/master/RecoParticleFlow/PFProducer/plugins/linkers
link_matrix = compute_links(elements)

#based on https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/src/PFBlockAlgo.cc
blocks = create_blocks(elements, link_matrix)

#based on https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/src/PFAlgo.cc
candidates = []
for block in blocks:
candidates.append(create_candidates(block))

return candidates

def compute_links(elements):
Nelem = len(elements)

link_matrix = np.array((Nelem, Nelem))
link_matrix[:] = 0

#test if two elements are close by based on neighborhood implemented with KD-trees
for ielem in range(Nelem):
for jelem in range(Nelem):
if in_neighbourhood(elements, ielem, jelem):
link_matrix[ielem, jelem] = 1

return link_matrix

def in_neighbourhood(elements, ielem, jelem):
#This element-to-element neighborhood checking is done based on detector geometry
#e.g. here for TRK to ECAL: https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/plugins/linkers/TrackAndECALLinker.cc -> linkPrefilter
return True

def distance(elements, ielem, jelem):
#This element-to-element distance checking is done based on detector geometry
#e.g. here for TRK to ECAL: https://github.com/cms-sw/cmssw/blob/master/RecoParticleFlow/PFProducer/plugins/linkers/TrackAndECALLinker.cc -> testLink
return 0.0

def create_blocks(elements, link_matrix):
#Each block is a list of elements, this is a list of blocks
blocks = []

Nelem = len(elements)

#Elements and connections between the elements
graph = Graph()
for ielem in range(Nelem):
graph.add_node(ielem)

#Check the distance between all relevant element pairs
for ielem in range(Nelem):
for jelem in range(Nelem):
if link_matrix[ielem, jelem]:
dist = distance(elements, ielem, jelem)
if dist > -0.5:
graph.add_edge(ielem, jelem)

#Find the sets of elements that are connected
for subgraph in find_subgraphs(graph):
this_block = []
for element in subgraph:
this_block.append(element)
blocks.append(this_block)

return blocks

def create_candidates(block):
#find all HCAL-ECAL-TRK triplets, produce pions
#find all HCAL-TRK pairs, produce kaons
#find all ECAL-TRK pairs, produce pions
#find all independent ECAL elements, produce photons
#etc etc
candidates = []
return candidates
```


## Acknowledgements

Part of this work was conducted at **iBanks**, the AI GPU cluster at Caltech. We acknowledge NVIDIA, SuperMicro and the Kavli Foundation for their support of **iBanks**.
Binary file added data/EdgeNet_14001_ca9bbfb3bb_jduarte.best.pth
Binary file not shown.
Binary file modified data/clustering.h5
Binary file not shown.
Binary file modified data/preprocessing.pkl
Binary file not shown.
Binary file modified data/regression.h5
Binary file not shown.
56 changes: 30 additions & 26 deletions notebooks/benchmarks.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,12 @@
"metadata": {},
"outputs": [],
"source": [
"def plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_glue, sample):\n",
"def plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, sample):\n",
" plt.figure(figsize=(5,5))\n",
" plt.scatter(df_blocks[\"num_blocks_true\"], df_blocks[\"num_blocks_pred\"], marker=\".\", label=\"Edge classifier\", alpha=0.5)\n",
" plt.scatter(df_blocks_dummy[\"num_blocks_true\"], df_blocks_dummy[\"num_blocks_pred\"], marker=\"x\", label=\"PFBlockAlgo\", alpha=0.5)\n",
" plt.scatter(df_blocks_glue[\"num_blocks_true\"], df_blocks_glue[\"num_blocks_pred\"], marker=\"^\", label=\"CLUE\", alpha=0.5)\n",
" plt.scatter(df_blocks_clue[\"num_blocks_true\"], df_blocks_clue[\"num_blocks_pred\"], marker=\"^\", label=\"CLUE\", alpha=0.5)\n",
" plt.scatter(df_blocks_gnn[\"num_blocks_true\"], df_blocks_gnn[\"num_blocks_pred\"], marker=\"^\", label=\"GNN\", alpha=0.5)\n",
" plt.xlim(0,5000)\n",
" plt.ylim(0,5000)\n",
" plt.plot([0,5000], [0,5000], color=\"black\", lw=1, ls=\"--\")\n",
Expand All @@ -68,12 +69,12 @@
"metadata": {},
"outputs": [],
"source": [
"def plot_block_size(df_blocks, df_blocks_dummy, df_blocks_glue, sample):\n",
"def plot_block_size(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, sample):\n",
" plt.figure(figsize=(5,5))\n",
" plt.scatter(df_blocks[\"max_block_size_true\"], df_blocks[\"max_block_size_pred\"], marker=\".\", label=\"Edge classifier\", alpha=0.3)\n",
" plt.scatter(df_blocks_dummy[\"max_block_size_true\"], df_blocks_dummy[\"max_block_size_pred\"], marker=\"x\", label=\"PFBlockAlgo\", alpha=0.3)\n",
" plt.scatter(df_blocks_glue[\"max_block_size_true\"], df_blocks_glue[\"max_block_size_pred\"], marker=\"^\", label=\"CLUE\", alpha=0.3)\n",
"\n",
" plt.scatter(df_blocks_clue[\"max_block_size_true\"], df_blocks_clue[\"max_block_size_pred\"], marker=\"^\", label=\"CLUE\", alpha=0.3)\n",
" plt.scatter(df_blocks_gnn[\"max_block_size_true\"], df_blocks_gnn[\"max_block_size_pred\"], marker=\"^\", label=\"GNN\", alpha=0.3)\n",
" plt.xlim(0,3000)\n",
" plt.ylim(0,3000)\n",
" plt.plot([0,3000], [0,3000], color=\"black\", lw=1, ls=\"--\")\n",
Expand All @@ -90,11 +91,12 @@
"metadata": {},
"outputs": [],
"source": [
"def plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_glue, sample):\n",
"def plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, sample):\n",
" plt.figure(figsize=(5,5))\n",
" plt.scatter(df_blocks[\"edge_precision\"], df_blocks[\"edge_recall\"], marker=\".\", alpha=0.5, label=\"Edge classifier\")\n",
" plt.scatter(df_blocks_dummy[\"edge_precision\"], df_blocks_dummy[\"edge_recall\"], marker=\"x\", alpha=0.5, label=\"PFBlockAlgo\")\n",
" plt.scatter(df_blocks_glue[\"edge_precision\"], df_blocks_glue[\"edge_recall\"], marker=\"^\", alpha=0.5, label=\"CLUE\")\n",
" plt.scatter(df_blocks_clue[\"edge_precision\"], df_blocks_clue[\"edge_recall\"], marker=\"^\", alpha=0.5, label=\"CLUE\")\n",
" plt.scatter(df_blocks_gnn[\"edge_precision\"], df_blocks_gnn[\"edge_recall\"], marker=\"^\", alpha=0.5, label=\"GNN\")\n",
"\n",
" plt.xlim(0,1.2)\n",
" plt.ylim(0,1.2)\n",
Expand All @@ -112,12 +114,13 @@
"metadata": {},
"outputs": [],
"source": [
"def plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_glue, sample):\n",
"def plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, sample):\n",
" plt.figure(figsize=(5,5))\n",
" b = np.logspace(0.1, 4, 40)\n",
" plt.hist(df_blocks[\"max_block_size_pred\"], bins=b, histtype=\"step\", lw=2, label=\"Edge classifier, m={0:.0f}\".format(np.mean(df_blocks[\"max_block_size_pred\"])));\n",
" plt.hist(df_blocks_dummy[\"max_block_size_pred\"], bins=b, histtype=\"step\", lw=2, label=\"PFBlockAlgo, m={0:.0f}\".format(np.mean(df_blocks_dummy[\"max_block_size_pred\"])));\n",
" plt.hist(df_blocks_glue[\"max_block_size_pred\"], bins=b, histtype=\"step\", lw=2, label=\"GLUE, m={0:.0f}\".format(np.mean(df_blocks_glue[\"max_block_size_pred\"])));\n",
" plt.hist(df_blocks_clue[\"max_block_size_pred\"], bins=b, histtype=\"step\", lw=2, label=\"GLUE, m={0:.0f}\".format(np.mean(df_blocks_clue[\"max_block_size_pred\"])));\n",
" plt.hist(df_blocks_gnn[\"max_block_size_pred\"], bins=b, histtype=\"step\", lw=2, label=\"GNN, m={0:.0f}\".format(np.mean(df_blocks_gnn[\"max_block_size_pred\"])));\n",
" plt.hist(df_blocks[\"max_block_size_true\"], bins=b, histtype=\"step\", lw=2, label=\"True blocks, m={0:.0f}\".format(np.mean(df_blocks[\"max_block_size_true\"])));\n",
" plt.xscale(\"log\")\n",
" plt.legend(frameon=False)\n",
Expand All @@ -134,12 +137,12 @@
"fl = glob.glob(\"../data/NuGun_run3/step3*.pkl\")\n",
"df_blocks = get_df(fl, \"blocks\")\n",
"df_blocks_dummy = get_df(fl, \"blocks_dummy\")\n",
"df_blocks_glue = get_df(fl, \"blocks_glue\")\n",
"df_blocks_clue = get_df(fl, \"blocks_clue\")\n",
"\n",
"plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_glue, \"NuGun-Run3\")\n",
"plot_block_size(df_blocks, df_blocks_dummy, df_blocks_glue, \"NuGun-Run3\")\n",
"plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_glue, \"NuGun-Run3\")\n",
"plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_glue, \"NuGun-Run3\")"
"plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"NuGun-Run3\")\n",
"plot_block_size(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"NuGun-Run3\")\n",
"plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"NuGun-Run3\")\n",
"plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"NuGun-Run3\")"
]
},
{
Expand All @@ -151,12 +154,12 @@
"fl = glob.glob(\"../data/QCD_run3/step3*.pkl\")\n",
"df_blocks = get_df(fl, \"blocks\")\n",
"df_blocks_dummy = get_df(fl, \"blocks_dummy\")\n",
"df_blocks_glue = get_df(fl, \"blocks_glue\")\n",
"df_blocks_clue = get_df(fl, \"blocks_clue\")\n",
"\n",
"plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_glue, \"QCD-Run3\")\n",
"plot_block_size(df_blocks, df_blocks_dummy, df_blocks_glue, \"QCD-Run3\")\n",
"plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_glue, \"QCD-Run3\")\n",
"plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_glue, \"QCD-Run3\")"
"plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"QCD-Run3\")\n",
"plot_block_size(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"QCD-Run3\")\n",
"plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"QCD-Run3\")\n",
"plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"QCD-Run3\")"
]
},
{
Expand All @@ -168,12 +171,13 @@
"fl = glob.glob(\"../data/TTbar_run3/step3*.pkl\")\n",
"df_blocks = get_df(fl, \"blocks\")\n",
"df_blocks_dummy = get_df(fl, \"blocks_dummy\")\n",
"df_blocks_glue = get_df(fl, \"blocks_glue\")\n",
"df_blocks_clue = get_df(fl, \"blocks_clue\")\n",
"df_blocks_gnn = get_df(fl, \"blocks_gnn\")\n",
"\n",
"plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_glue, \"TTbar-Run3\")\n",
"plot_block_size(df_blocks, df_blocks_dummy, df_blocks_glue, \"TTbar-Run3\")\n",
"plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_glue, \"TTbar-Run3\")\n",
"plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_glue, \"TTbar-Run3\")"
"plot_num_blocks(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"TTbar-Run3\")\n",
"plot_block_size(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"TTbar-Run3\")\n",
"plot_block_size_histo(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"TTbar-Run3\")\n",
"plot_precision_recall(df_blocks, df_blocks_dummy, df_blocks_clue, df_blocks_gnn, \"TTbar-Run3\")"
]
},
{
Expand Down Expand Up @@ -431,9 +435,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
Loading

0 comments on commit da81f70

Please sign in to comment.