Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLPF datasets v2.0.0: track pythia-level genjets, genmet in datasets; add per-particle ispu flag #332

Merged
merged 35 commits into from
Jul 17, 2024

Conversation

jpata
Copy link
Owner

@jpata jpata commented Jun 17, 2024

Regenerated CMS and CLIC datasets as v2.0.0:

  • CMS: new CMSSW generation with CMSSW_14->postprocessing->tfds
    • add CMS ttbar no-pileup sample as a cross-check
    • improve and cross-check ground truth
  • CLIC: new Key4HEP generation->postprocessing->tfds, updated with the features below, crucially a corrected genStatus definition (introduced a while ago but never regenerated)
  • CLD: first generation, consistent setup with CLIC
  • Delphes: deprecated and removed

New features and fixes in v2.0.0 in postprocessing:

  • for CMS postprocessing2.py, use only CaloParticles (add track links from TrackingParticles)
  • track pythia-level genjets, genmet in datasets as the ultimate jet/MET reconstruction target
    • for CMS: generate v3_1 with updated PFAnalysisNtuplizer, added pythia genjets and genmet: jpata/cmssw@eac6192
    • for Key4HEP: compute genjets, genmet on postprocessing using visible stable particle from .hepmc input events
  • propagate per-particle ispu flag
    • for CMS postprocessing2.py, implemented and cross-checked
    • for Key4HEP: currently placeholder

Misc other things:

  • CMSSW validation also generates JME nano

still TODO:

  • revert T2_EE specific conf from cmssw script

@jpata jpata changed the title generate ttbar nopu events track pythia-level genjets, genmet in datasets; add per-particle ispu flag; generate CMS ttbar+noPU Jun 21, 2024
@jpata jpata changed the title track pythia-level genjets, genmet in datasets; add per-particle ispu flag; generate CMS ttbar+noPU MLPF datasets v2.0: track pythia-level genjets, genmet in datasets; add per-particle ispu flag; generate CMS ttbar+noPU [WIP] Jun 21, 2024
@jpata jpata marked this pull request as draft June 21, 2024 09:04
@jpata
Copy link
Owner Author

jpata commented Jun 28, 2024

We find perfect alignment between genMetTrue and CaloParticleMET in single-particle guns, PF ntuple ROOT level:
image
image
image
image
image
image

and at the MLPF truth level:
image
image
image
image
image
image

@jpata jpata changed the title MLPF datasets v2.0: track pythia-level genjets, genmet in datasets; add per-particle ispu flag; generate CMS ttbar+noPU [WIP] MLPF datasets v2.0.0: track pythia-level genjets, genmet in datasets; add per-particle ispu flag Jul 15, 2024
@jpata jpata marked this pull request as ready for review July 16, 2024 10:37
@jpata jpata merged commit a0f4428 into main Jul 17, 2024
2 checks passed
erwulff pushed a commit to erwulff/particleflow that referenced this pull request Aug 9, 2024
… add per-particle ispu flag (jpata#332)

* generate ttbar nopu events

* up

* update postprocessing

* small sample generation

* v3_1 run

* updates for CMSSE 14 generation

* [skip ci] cleanup postprocessing

* [skip ci] update pu gen

* update postprocessing with new truth definition based only on caloparticles

* remove pdb, switch genjet to energy

* [skip ci] prepare for v3_3

* [skip ci] fix flag

* added time and mem limits

* pu files from scratch

* 20240702_cptruthdef submission

* ttbar nopu v2

* up

* added genjet, genmet to clic postprocessing

* remove delphes

* update tests

* add postprocessing jobs

* update torch

* update dataset version

* propagate genjets, genmet

* shared memory error

* training on v2.0.0 for cms

* fix occasional root file load bug

* add jmenano

* fix qq

* clic training

* up
jpata added a commit that referenced this pull request Aug 12, 2024
* chore: update raytune search space, utils and startscript

* fix: raytune deprecated env var for storage_path

Also add num samples to draw in HPO as cmd line arg

* chore: update clic config file for jureap57

* feat: script to build python env from scratch

* chore: update startscripts for raytrain and raytune

* fix CMS model path for ACAT2022

* MLPF datasets v2.0.0: track pythia-level genjets, genmet in datasets; add per-particle ispu flag (#332)

* generate ttbar nopu events

* up

* update postprocessing

* small sample generation

* v3_1 run

* updates for CMSSE 14 generation

* [skip ci] cleanup postprocessing

* [skip ci] update pu gen

* update postprocessing with new truth definition based only on caloparticles

* remove pdb, switch genjet to energy

* [skip ci] prepare for v3_3

* [skip ci] fix flag

* added time and mem limits

* pu files from scratch

* 20240702_cptruthdef submission

* ttbar nopu v2

* up

* added genjet, genmet to clic postprocessing

* remove delphes

* update tests

* add postprocessing jobs

* update torch

* update dataset version

* propagate genjets, genmet

* shared memory error

* training on v2.0.0 for cms

* fix occasional root file load bug

* add jmenano

* fix qq

* clic training

* up

* CMS training instructions (#336)

* CMS training instructions

* Update pyg-clic.yaml

* Update pyg-clic.yaml

* fix: black formatting

* Enable CI/CD test of HPO workflow

* fix: typo in test script

---------

Co-authored-by: Joosep Pata <joosep.pata@gmail.com>
farakiko pushed a commit to farakiko/particleflow that referenced this pull request Aug 26, 2024
… add per-particle ispu flag (jpata#332)

* generate ttbar nopu events

* up

* update postprocessing

* small sample generation

* v3_1 run

* updates for CMSSE 14 generation

* [skip ci] cleanup postprocessing

* [skip ci] update pu gen

* update postprocessing with new truth definition based only on caloparticles

* remove pdb, switch genjet to energy

* [skip ci] prepare for v3_3

* [skip ci] fix flag

* added time and mem limits

* pu files from scratch

* 20240702_cptruthdef submission

* ttbar nopu v2

* up

* added genjet, genmet to clic postprocessing

* remove delphes

* update tests

* add postprocessing jobs

* update torch

* update dataset version

* propagate genjets, genmet

* shared memory error

* training on v2.0.0 for cms

* fix occasional root file load bug

* add jmenano

* fix qq

* clic training

* up
farakiko pushed a commit to farakiko/particleflow that referenced this pull request Aug 26, 2024
* chore: update raytune search space, utils and startscript

* fix: raytune deprecated env var for storage_path

Also add num samples to draw in HPO as cmd line arg

* chore: update clic config file for jureap57

* feat: script to build python env from scratch

* chore: update startscripts for raytrain and raytune

* fix CMS model path for ACAT2022

* MLPF datasets v2.0.0: track pythia-level genjets, genmet in datasets; add per-particle ispu flag (jpata#332)

* generate ttbar nopu events

* up

* update postprocessing

* small sample generation

* v3_1 run

* updates for CMSSE 14 generation

* [skip ci] cleanup postprocessing

* [skip ci] update pu gen

* update postprocessing with new truth definition based only on caloparticles

* remove pdb, switch genjet to energy

* [skip ci] prepare for v3_3

* [skip ci] fix flag

* added time and mem limits

* pu files from scratch

* 20240702_cptruthdef submission

* ttbar nopu v2

* up

* added genjet, genmet to clic postprocessing

* remove delphes

* update tests

* add postprocessing jobs

* update torch

* update dataset version

* propagate genjets, genmet

* shared memory error

* training on v2.0.0 for cms

* fix occasional root file load bug

* add jmenano

* fix qq

* clic training

* up

* CMS training instructions (jpata#336)

* CMS training instructions

* Update pyg-clic.yaml

* Update pyg-clic.yaml

* fix: black formatting

* Enable CI/CD test of HPO workflow

* fix: typo in test script

---------

Co-authored-by: Joosep Pata <joosep.pata@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant