Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
211 lines (159 sloc) 7.11 KB

REANA example - CMS Higgs-to-four-leptons

About

This REANA reproducible analysis example studies the Higgs-to-four-lepton decay channel that led to the Higgs boson experimental discovery in 2012. The example uses CMS open data released in 2011 and 2012.

Analysis structure

Making a research data analysis reproducible basically means to provide "runnable recipes" addressing (1) where is the input data, (2) what software was used to analyse the data, (3) which computing environments were used to run the software and (4) which computational workflow steps were taken to run the analysis. This will permit to instantiate the analysis on the computational cloud and run the analysis to obtain (5) output results.

1. Input data

The analysis takes the following inputs:

  • the list of CMS validated runs included in the inputs directory:
    • Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt
  • a set of data files in the ROOT format, processed from CMS public datasets, included in the inputs directory:
    • DoubleE11.root
    • DoubleE12.root
    • DoubleMu11.root
    • DoubleMu12.root
    • DY1011.root
    • DY1012.root
    • DY101Jets12.root
    • DY50Mag12.root
    • DY50TuneZ11.root
    • DY50TuneZ12.root
    • DYTo2mu12.root
    • HZZ11.root
    • HZZ12.root
    • TTBar11.root
    • TTBar12.root
    • TTJets11.root
    • TTJets12.root
    • ZZ2mu2e11.root
    • ZZ2mu2e12.root
    • ZZ4e11.root
    • ZZ4e12.root
    • ZZ4mu11.root
    • ZZ4mu12.root
  • CMS collision data from 2011 and 2012 accessed "live" during analysis via CERN Open Data portal:
  • CMS simulated data from 2011 and 2012 accessed "live" during analysis via CERN Open Data portal:

2. Analysis code

The analysis will consist of two stages. In the first stage, we shall process the original collision data (using demoanalyzer_cfg_level3data.py) and simulated data (using demoanalyzer_cfg_level3MC.py) for one Higgs signal candidate with with reduced statistics. In the second stage, we shall plot the results (using M4Lnormdatall_lvl3.cc). The HiggsDemoAnalyzer directory contains the analysis code plugin for the CMSSW analysis framework.

3. Compute environment

In order to be able to rerun the analysis even several years in the future, we need to "encapsulate the current compute environment", for example to freeze the software package versions our analysis is using. We shall achieve this by preparing a Docker container image for our analysis steps.

This analysis example runs within the CMSSW analysis framework that was packaged for Docker in clelange/cmssw.

4. Analysis workflow

The analysis workflow is simple and consists of two above-mentioned stages:

                           START
                          /     \
                         /       \
                        /         \
+-------------------------+     +------------------------+
| process collision data  |     | process simulated data |
+-------------------------+     +------------------------+
                \                       /
                 \ Higgs4L1file.root   / DoubleMuParked2012C_10000_Higgs.root
                  \                   /
               +-------------------------+
               |    produce final plot   |
               +-------------------------+
                          |
                          | mass4l_combine_userlvl3.pdf
                          V
                         STOP

We shall use the CWL workflow specification to express the computational workflow:

and its individual steps:

5. Output results

The example produces a plot showing the Higgs signal:

mass4l_combine_userlvl3.png

Local testing

Optional

If you would like to test the analysis locally (i.e. outside of the REANA platform), you can proceed as follows.

Using pure Docker:

$ docker run -i -t --rm \
       -v `pwd`/inputs:/inputs \
       -v `pwd`/code:/code \
       -v `pwd`/outputs:/outputs \
       clelange/cmssw:5_3_32 \
   /bin/bash -c 'cp -r /code/HiggsExample20112012 .; \
                 scram b; \
                 cd /code/HiggsExample20112012/Level3; \
                 cmsRun ./demoanalyzer_cfg_level3data.py'

$ docker run -i -t --rm \
       -v `pwd`/inputs:/inputs \
       -v `pwd`/code:/code \
       -v `pwd`/outputs:/outputs \
       clelange/cmssw:5_3_32 \
   /bin/bash -c 'cp -r /code/HiggsExample20112012 .; \
                 scram b; \
                 cd /code/HiggsExample20112012/Level3; \
                 cmsRun demoanalyzer_cfg_level3MC.py'

$ docker run -i -t --rm \
       -v `pwd`/inputs:/inputs \
       -v `pwd`/code:/code \
       -v `pwd`/outputs:/outputs \
       clelange/cmssw:5_3_32 \
   /bin/bash -c 'cd /code/HiggsExample20112012/Level3; \
                 root -b -l -q ./M4Lnormdatall_lvl3.cc'

Using CWL:

$ cwltool --outdir=./outputs ./workflow/workflow.cwl ./workflow/input.yaml

Running the example on REANA cloud

FIXME

Contributors

This example is based on the original open data analysis by Jomhari, Nur Zulaiha; Geiser, Achim; Bin Anuar, Afiq Aizuddin, "Higgs-to-four-lepton analysis example using 2011-2012 data", CERN Open Data Portal, 2017. DOI: 10.7483/OPENDATA.CMS.JKB8.RR42

The list of contributors to this REANA example in alphabetical order: