# EMTF++

## Table of Contents

- [Introduction](#Introduction)
  
  * [Version](#Version)
  * [Requirements](#Requirements)
  * [New forward muon detectors](#New-forward-muon-detectors)
  * [EMTF firmware, emulator, fastsim](#EMTF-firmware,-emulator,-fastsim)
  
- [Algorithm](#Algorithm)

  * [Primitive conversion](#Primitive-conversion)
  * [Pattern recognition](#Pattern-recognition)
  * [Track building](#Track-building)
  * [Parameter assignment](#Parameter-assignment)
  
- [Performance plots](#Performance-plots)

  * [Resolution](#Resolution)
  * [Efficiency](#Efficiency)
  * [Rates](#Rates)
  * [Robustness](#Robustness)
  
- [Summary](#Summary)

- [Future plans](#Future-plans)

  * [Run 3 configuration](#Run-3-configuration)
  * Displaced muons
  * Extension to overlap region
  * Etc

## Introduction

### Version

- __v1.1.0__ (2018-11-06): Fixed the incorrect number of bunches used in the rate calculation: ~~1866~~ 2808

- __v1.0.0__ (2018-10-18): Reworked the patterns and the NN to reduce resource usage, without affecting the performance. Used as the starting point for (i) CMSSW integration. (ii) Firmware implementation.
  * Regarding the trigger primitives, `CMSSW_10_1_5` + changes from `cms-l1t-offline:l1t-phase2-v2.16.6` is used. For rate studies, official NeutrinoGun PU200 sample (500k events) `/SingleNeutrino/PhaseIIFall17D-L1TPU200_93X_upgrade2023_realistic_v5-v1/GEN-SIM-DIGI-RAW` is used.

### Requirements

- <span style="font-size: 120%">Maintain sensitivity to electroweak scale physics at higher luminosity and pileup of the HL-LHC</span>
  * Report all standalone muon coordinates and momenta in convention to facilitate __global correlation__ with tracks from the __Track Trigger__
    + The tracker will have far better p<sub>T</sub> resolution for rate reduction
  * Incorporate additional HL-LHC forward muon detectors to improve 
    + Efficiency, redundancy, and improved standalone p<sub>T</sub> measurement
  * Maintain standalone muon trigger (without track combination) for sufficiently high p<sub>T</sub> threshold
    + HL-LHC is “only” 3-4X higher lumi, and we increased the max L1 rate
- <span style="font-size: 120%">Add sensitivity to new physics scenarios, i.e. acceptance to __displaced muons and HSCPs__ from long-lived particle decays</span>
  * Additional patterns/logic and momentum assignment required (vertex constrained and not)

### New forward muon detectors

<img src="figures/cms_upg_o_g_b_ni_gem_re1_me0_grid_160229.png" width="800px"/>
<a href="figures/cms_upg_o_g_b_ni_gem_re1_me0_grid_160229.pdf">Download PDF</a>

- New forward detectors improve redundancy, efficiency and timing (only iRPC)
  * GEM detectors: GE1/1, GE2/1
  * iRPC detectors: RE3/1, RE4/1
  * ME0 detector
- New detectors also provide additional inputs and improved angular information (bend angles) for better correlation with TT tracks and better standalone muon p<sub>T</sub> measurement
- This requires increasing the bandwidth to the L1 muon electronics and adding more algorithm logic capability

### EMTF firmware, emulator, fastsim

While EMTF is really one algorithm, it is implemented in __two__ different ways: (i) firmware, written in Verilog and implemented in the MTF7 board; (ii) emulator, written in C++ and implemented as part of CMSSW software ([GitHub repo](https://github.com/cms-sw/cmssw/tree/master/L1Trigger/L1TMuonEndCap)). In the perfect world, they should be identical, but this is not always the case, and it is very important to recognize this fact when referring to the details of the EMTF algorithm.

In this document, I will try my best to be clear about which algorithm is being discussed. I'll denote the firmware algorithm as simply <span style="color: red; font-weight: bold">"firmware"</span>, and the emulator algorithm as <span style="color: red; font-weight: bold">"emulator"</span>. Whenever there is discrepancy, __firmware is always right__. Alex has written a description of the firmware algorithm [here](docs/EMU_TF_algorithm-1.docx). 

In addition, in the course of developing the Phase-2 EMTF algorithm (namely EMTF++), I made another implementation in Python, and can be used outside of CMSSW. I'll denote this as <span style="color: red; font-weight: bold">"fastsim"</span>. The main reason I do this is because (i) it's very tedious to work with the emulator directly, and (ii) it allows me to use certain powerful Python libraries directly <sup><a href="#myfootnote1">[1]</a></sup>. Basically, I have made simple ntuples and I can run Python scripts on them to do analysis much more quickly (instead of going through the full-blown CMSSW). It allows me to make experiments more quickly.

<a name="myfootnote1"><sup>1</sup></a>   I found the scientific stack in Python very powerful. These are the libraries I have used in my study: rootpy, NumPy, SciPy, Scikit-learn, Keras, Tensorflow.

At the time of writing, this is what I have:
- EMTF: firmware, emulator
- EMTF++: fastsim

My goal is to have the firmware and emulator versions of EMTF++ ready soon. (when?!)

## Algorithm

The EMTF system consists of 12 EMTF processors &mdash; 6 for each endcap, 1 for each 60&deg; trigger sector (at 15&deg;, 75&deg;, ...). Each sector works independently.

<img src="figures/trigger_sectors.png" width="500px"/>

To ensure coverage near the sector boundaries, each sector N also shares certain chambers with the neighbor sector N+1. So, although each sector is nominally 60&deg;, it actually covers 70&deg; in the rings with 10&deg;-wide chambers, or 80&deg; in the rings with 20&deg;-wide chambers.

EMTF is a successor of CSCTF. It was installed during Phase-1 upgrade (see [Phase-1 L1 Trigger Upgrade TDR](https://cds.cern.ch/record/1556311/) for more info).

The EMTF algorithm starts by receiving the hits, or trigger primitives, and ends by producing muon tracks with certain information: (most importantly) p<sub>T</sub> , &eta;, &#981;, track quality, and its associated hits. I will break the EMTF algorithm into 4 building blocks. 

<img src="figures/algorithm_blocks.gv.svg" width="800px"/>

Each block may consist of smaller blocks. EMTF++ is an extension of the EMTF algorithm, and it follows the same design. At the moment, EMTF++ only builds "SingleMu"-quality tracks, defined as tracks that have at least 3 hits in 3 different stations, and at least one of the hits must be in ME1. Extensions to lower quality tracks are not yet considered.

In the following, I'll try to describe how each block works. I will review how it works in EMTF, then explain what are changed for EMTF++.

<hr/>

### Primitive conversion

In Phase-1, EMTF receives two types of trigger primitives: <span style="color: #31bd38">&#11035;</span> __CSC__ and <span style="color: #0096ff">&#11035;</span> __RPC__ hits. In Phase-2, EMTF++ will receive additional <span style="color: #e63600">&#11035;</span> __GEM__, <span style="color: #9437ff">&#11035;</span> __iRPC__ and <span style="color: #ffa25f">&#11035;</span> __ME0__ trigger primitives. 

The most important function here is to convert strip (half-strip, actually) and wire numbers to integer &#981; and &theta; units. The integer &#981; unit is a local coordinate defined within a sector. The scale of the unit is 1/60 = 0.0167&deg;, internally called "1/8-strip", encoded as a 13-bit integer (0 to 8191). The position 0 corresponds to -22&deg; from the lower boundary of the sector (since Sep 2018). To convert from the integer &#981; unit to CMS global coordinate (in degrees):

\begin{align}
\phi_{\text{sector}} &= \phi_{\text{integer}} / 60 - 22 \\
\phi_{\text{CMS}} &= \phi_{\text{sector}} + 15 + (\text{sector}-1) \times 60 \\
& \quad \  \text{sector} \in \{1,2,3,4,5,6\}
\end{align}

The integer &theta; unit has a scale of approx 0.285&deg;, encoded as a 7-bit integer (0 to 127). The position 0 corresponds to 8.5&deg;. To convert from the integer &theta; unit to CMS global coordinate (in degrees):

\begin{align}
\theta_{\text{CMS}} &= \theta_{\text{integer}} \cdot \left(\frac{45-8.5}{128}\right) + 8.5\\
\end{align}

The conversions from half-strip number to the integer &#981; unit and from wire number to the integer &theta; unit are done using simple linear equations. The constants (i.e. offsets & slopes) are derived offline and stored in the Block RAM in the firmware. There are additional corrections to the &theta; conversion for ME1/1 chambers with tilted wires, which I won't describe here.

<span style="font-size: 120%; font-weight: bold">(i) CSC</span>

The CSC trigger primitive is called a Local Charge Track (LCT). It is a track segment reconstructed from up to 6 hits in the CSC detector. LCTs are built by the Trigger MotherBoard (TMB) and then sent to the Muon Port Card (MPC). The data are concentrated by MPC and then sent to EMTF.

CSC geometry is very interesting. When designing the trigger, one must pay attention to the following features:
- ME1 has 3 rings, and the chambers are 10&deg; wide. ME2,3,4 have 2 rings, but the ring 1 chambers are 20&deg; wide; and the ring 2 chambers are 10&deg; wide.
- ME1 ring 1 is located at a different z position (~6m) compared to rings 2 & 3 (~7m). The magnetic field is very different for ME1/1 because it is inside the solenoid ($B_z = 3.8{\text{T}}$). The ME1/1 wires are tilted by 29&deg; to compensate for the drift of the electrons in the strong magnetic field.
- ME1/1 is further split into ME1/1a and ME1/1b at &eta;~2.05. They have different numbers of strips.
  * ME1/1a has 48 strips, ME1/1b has 64 strips, ME1/2 has 80 strips, ME1/3 has 64 strips, ME2,3,4 have 80 strips
- ME1/3 is only 80% (?) efficient.
- Depending on the location of a chamber (even or odd chamber number, ME1,2 or ME3,4, positive or negative endcap), a chamber can be closer or further from the interaction point. We assign a F/R (front or rear) bit for each chamber to take this into account.
- Neighbor sharing:
  * ME1: chambers with CSCID=3,6,9, subsector=2 are shared
  * ME2,3,4: chambers with CSCID=3,9 are shared

See detector note [IN-2007/024](docs/IN2007_024.pdf) for more details.

The strips and wires are independent of each other. If there are two LCTs in a given chamber, we will get 2 strip numbers and 2 wire numbers. We do not know which strip number and which wire number to be paired up. Thus, we consider all the possibilities. Since there are at most 2 LCTs in a given chamber, there are at most 4 combinations to consider.

A summary of the exact CSC inputs was given by Alex:
- &#981; of each LCT: 13-bit values
- &theta; of each LCT: 7-bit values
- pattern ID for each LCT: 4-bit values
- FR bit for each LCT: 1-bit values
- Pattern mask: 4-bit mask showing which CSC stations are present in track (ME1,2,3,4)

In 
<span style="display: inline-block; border-radius: .4rem; padding: 0.35rem .6rem; background: #3776ab; color: #ffffff; vertical-align: middle;">EMTF++</span>
, we will be using the same primitive conversion modules that are already in the firmware and in the emulator. In addition, we want to add a new __bend__ variable, if it can be done in the TMB. TAMU and UCLA have shown that doing a least squares fit to the CLCT comparator digis can improve the &#981; position (see Andrew Peck's [talk](https://indico.cern.ch/event/712513/contributions/2959915/attachments/1630667/2599369/20180410_otmb_pattern_logic.pdf)). But I find it more useful to extract the local bending angle from the fit, where the bend is defined as the &Delta;&#981; between the hit &#981;'s at the innermost and outermost layers of the CSC detector.

Basically, consider there are up to 6 hits in the 6 layers of the CSC detector. We can do a very simple linear fit with $x$ = layer number and $y$ = hit &#981;. Extract the slope of the fit and multiply it by 6 to get the new bend variable. The bend is a floating-point value. I multiply it by 4 (arbitrarily) before quantization, which should fit into 7 bits (-64 to 63). Since the CSC comparator digis are in half-strip unit, the bend is in 1/8-strip unit. It is only useful for ME1/1 and ME1/2. For more details, see slides 7-9 in my [talk](https://indico.cern.ch/event/759873/contributions/3151457/subcontributions/264210/attachments/1722105/2780661/2018-09-25_phase2_emtf_v1.pdf).

<span style="font-size: 120%; font-weight: bold">(ii) RPC</span>

In Phase-1, EMTF has received endcap RPC hit information to improve efficiency and redundancy. Specifically, the RPC chambers: RE1/2, RE2/2, RE3/2, RE3/3, RE4/2, and RE4/3 are received. To distinguish from the Phase 2 iRPC detectors, I will also refer to the Phase 1 RPC as "old RPC". 

Old RPC hits are pre-processed by CPPF (which stands for Concentration, PreProcessing and Fanout), which clusterizes the RPC digis and performs coordinate conversion, before sending up to 2 clusters per RPC chamber to the EMTF. The integer &#981; and &theta; units used for the CPPF converted hits are 4x coarser compared to the units used for CSC, because of the worse detector resolutions.

Old RPC geometry info:
- 10&deg;-wide chambers, 32 strips per chamber, 3 &eta; partitions per chamber (or "rolls").
- Max cluster width is 3 strips.
- To ensure coverage near the sector boundaries, each sector also receives one 10&deg; chamber from the neighbor sector.
- Because RPC is a single-layer detector, it is more prone to noise.

A summary of the exact CPPF inputs was given by Alex:
- &#981; of each hit: 11-bit values
- &theta; of each hit: 5-bit values
- Pattern mask: 4-bit mask showing which RPC stations are present in track (RE1,2,3,4)

No changes for 
<span style="display: inline-block; border-radius: .4rem; padding: 0.35rem .6rem; background: #3776ab; color: #ffffff; vertical-align: middle;">EMTF++</span>
. But I believe CPPF will be replaced by newer electronics during Phase-2 upgrade.

<span style="font-size: 120%; font-weight: bold">(iii) iRPC</span>

iRPC are improved RPC detectors that will be available for Phase-2. These are the iRPC chambers: RE3/1, RE4/1. In general, iRPC is very similar to RPC, but with better spatial resolutions and timing resolution (~1.5 ns).  See [Phase-2 Muon TDR](https://cds.cern.ch/record/2283189) for more info.

iRPC geometry info:
- 20&deg;-wide chambers, 192 strips per chamber, 5 &eta; partitions per chamber (or "rolls").
- Since the iRPC chamber is 2x larger than the old RPC chamber, with 6x more strips per chamber, the &#981; resolution is 3x better.
- Like RPC, iRPC is a single-layer detector, and it is more prone to noise.
- Note: I was told that iRPC will have 2D readout, which will provide better &eta; or &theta; resolution. However, the 2D readout was not present in the CMSSW release that I'm using. Also, I'm not sure about the exact cluster width cut. I apply a cut of 9 strips (arbitrarily). I also only keep up to 2 clusters per chamber. To the extent where it's possible, I try to treat RPC and iRPC hits the same, so that it doesn't complicate the workflow too much.

For the coordinate conversion, since iRPC has better resolution, I want to use the EMTF integer units, not the 4x coarser CPPF integer units. However, due to a bug that I only discovered just now, I was using the 4x coarser CPPF integer units. I will correct this in my results in the future.

Currently, I don't use the 1.5ns timing in the baseline simulation, because I'm wary that the simulation performance might not be very realistic. It can always be added in the future when we are more confident. In fact, it's supposed to be very important to help triggering for HSCPs (heavy stable charged particles).

<span style="font-size: 120%; font-weight: bold">(iv) GEM</span>

GEM (Gas Electron Multiplier) is a new type of detector to be installed at GE1/1 and GE2/1. In particular, GE1/1 can provide a bend angle between GE1/1 and ME1/1, which is claimed to significantly improve the p<sub>T</sub> measurement. GE1/1 is planned to be installed during LS2, and be ready for physics in Run 3; GE2/1 will be installed during LS3 together with the rest of new Phase-2 detectors. See [GEM TDR](https://cds.cern.ch/record/2021453) for more info about GE1/1, and [Phase-2 Muon TDR](https://cds.cern.ch/record/2283189) for more info about GE2/1.

GEM geometry info:
- GE1/1 is 10&deg; wide with 192 pads; GE2/1 is 20&deg; wide with 384 pads. A pad is 2 strips ganged together (double-strip), and it is the unit that will be transmitted to the EMTF. Same &#981; resolution for GE1/1 and GE2/1.
- Both GE1/1 and GE2/1 have 8 &eta; partitions per chamber.
- Both GE1/1 and GE2/1 are made of 2-layer detectors.

According to Sven Dildick (TAMU), the GEM pads will be clustered before they are sent to EMTF. The max cluster width is 8 pads, and there are at most 8 clusters per chamber. According to his instructions, EMTF is supposed to declusterize the clusters to retrieve the original pads and use them (up to 64 pads) in the trigger, so as not to "lose" resolution. I'm not sure what is the merit of these instructions, but I have followed the instructions in my study.

Furthermore, TAMU has developed algorithms that combine GEM & CSC hits to form a new ILT (integrated LCT) trigger primitives. Doing this allows us to (i) reduce the number-of-layers requirement for the CSC LCT from 4 to 3 which improves efficiency; (ii) improve background rejection with the increased number of hits during the trigger primitive reconstruction. The ILT algorithm is currently not used in my study, but it can always be added in the future. 

<span style="font-size: 120%; font-weight: bold">(v) ME0</span>

ME0 is a 6-layer GEM detector that will be mounted at the "nose" of the endcap. A ME0 segment will be built by dedicated electronics, and we will receive the &#981;, &theta; (or &eta;), the local bending angle and the quality. The bend is the &Delta;&#981; between the innermost and the outermost ME0 detector layers. See [Phase-2 Muon TDR](https://cds.cern.ch/record/2283189) for more info.

ME0 extends from &eta; = 2.0 to 2.8. Although this allows EMTF to trigger beyond &eta; of 2.4, I don't include it as part of the current results, because it adds extra rates, and makes it hard to do rate comparisons.

ME0 geometry info:
- ME0 chambers are 20&deg; wide, 384 strips (192 pads) per chamber, 8 &eta; partitions per chamber.
- I'm not sure we will receive ME0 pad numbers, or we will receive the converted &#981; and &eta; coordinates. I assume the latter currently.

ME0 has really excellent efficiency and background rejection. It works so well that I'm kind of worried that the simulation performance might not be realistic. So, I think it's important to add some safety margin even though we see a large rate reduction.

<span style="font-size: 120%; font-weight: bold">(vi) Summary</span>

I try to summarize all the spatial resolutions that we have in the following table. (Maybe adding DT too?)

<img src="figures/chamber_types_and_resolutions.png" width="800px"/>

The relative &#981; resolution is relative to CSC 10&deg; chambers (non-ME1/1). For ME0, I assume the &#981; coordinate is extracted from the fit, so I use 384 strips and a factor of 1/sqrt(6). Note that the exact number of bits used for the new Phase-2 muon detectors might not be simulated correctly at the moment.

In the Phase-2 scenario, each sector processor may receive up to 95 links. For the numbers of input links coming from each detector type, see slide 15 in Darin's [talk](https://indico.cern.ch/event/768406/contributions/3192943/subcontributions/266196/attachments/1743806/2822333/Phase2EndcapMuonTriggerDemos.pdf). 

### Pattern recognition

The trajectory of a muon as it passes through the endcap muon stations is bent by the magnetic field. Muons with low p<sub>T</sub> are bent more compared to muons with high p<sub>T</sub>. Note that as the magnetic field diminishes at large z, there is very little bending of the muon trajectories in the outer stations even for low p<sub>T</sub> muons. In EMTF, we use patterns of different straightness to detect muons by matching hits to the patterns. Each sector is divided into 4 "zones" (0-3) along the &theta; direction, and pattern matching is done in each zone. Multiple patterns may be activated ("fired") by the same hits, so a ghost busting step is done after pattern matching to get rid of the duplicates ("ghosts").

Due to latency in receiving the RPC hits from CPPF, RPC hits are not included in pattern matching. Due to possible BX misidentification, all the CSC hits from 2 consecutive BX's are used (since June 2018). The boundaries of the 4 zones are (0, 41, 49, 87, 127) in terms of the integer &theta; unit. To ensure coverage near the boundaries, an overlap of 2 units is allocated. E.g. CSC hits with &theta; from 0 to 43 are included in zone 0, and CSC hits with &theta; from 40 to 51 are included in zone 1, and so on.

The patterns that are used at the beginning of EMTF ("asymmetric") and the symmetric patterns that have been in use since Sep 2016 are shown below. In the asymmetric case, patterns of straightness 0-3 also have their mirrors, so there are 9 patterns in total. The unit used in pattern matching is a coarser &#981; unit, which is internally called "quad-strip". The scale of the quad-strip unit is approx 0.5&deg;, which is 32x coarser than the integer &#981; unit described previously (1/8-strip).

<div class="row" style="width:800px; margin: auto;">
  <br/>
  <div class="column" style="width:48%; float: left">
    Asymmetric patterns<br/>
    <img src="figures/emtf_patterns_madorsky.png" style="width:100%"/>
  </div>
  <div class="column" style="width:48%; float: left">
    Symmetric patterns<br/>
    <img src="figures/emtf_patterns_symmetric_madorsky.png" style="width:100%"/>
  </div>
</div>

A pattern is considered fired if at least 2 stations in the pattern have found hits, but with the rule that ME3 and ME4 are counted as a single station.

ME2 is called the key station. These patterns are repeated for 160 times for each quad-strip unit (from 0 to 159) at ME2 in order to fully cover the 80&deg; in &#981; in one sector (when including the chambers from the neighbor sector). Therefore, the total number of patterns in one sector is:

$$
4\ \text{(zone)} \times 5\ \text{(pattern)} \times 160\ \text{(quad-strip)} = 3,200
$$

In a given zone, each fired pattern ("road") can be identified with 2 parameters: (pattern straightness, pattern keystrip), where keystrip is the quad-strip position at the key station of the pattern. A road is also assigned a 6-bit quality code:

<table style="width:500px">
<thead>
<tr>
<th style="width:14%; text-align: left;">Bit number</th>
<th style="width:14%; text-align: left;">bit 5</th>
<th style="width:14%; text-align: left;">bit 4</th>
<th style="width:14%; text-align: left;">bit 3</th>
<th style="width:14%; text-align: left;">bit 2</th>
<th style="width:14%; text-align: left;">bit 1</th>
<th style="width:14%; text-align: left;">bit 0</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">Contents</td>
<td style="text-align: center;">S2</td>
<td style="text-align: center;">ME1 hit</td>
<td style="text-align: center;">S1</td>
<td style="text-align: center;">ME2 hit</td>
<td style="text-align: center;">S0</td>
<td style="text-align: center;">ME3 or ME4 hit</td>
</tr>
</tbody>
</table>

where (S2, S1, S0) are (bit 2, bit 1, bit 0) of the pattern straightness. In pattern ghost busting, each road is compared against its neighbors (+/-1 keystrip). If the quality code of a road is lower than either of the neighbors, the road is cancelled. The 3 best (i.e. highest quality code) roads in each zone are sent to the next step.

I made a number of changes in the
<span style="display: inline-block; border-radius: .4rem; padding: 0.35rem .6rem; background: #3776ab; color: #ffffff; vertical-align: middle;">EMTF++</span>
algorithm, which I will describe below.

<span style="font-size: 120%; font-weight: bold">(i) Virtual stations</span>

Although there are nominally 4 ME stations, ME1/1 and ME1/2 are very distinct due to very different magnetic field. The addition of ME0 also breaks the traditional four-station scheme. Thus, I decided to split the muon detectors by station and detector types into 12 virtual stations.

<table style="width:700px">
<thead>
<tr>
<th style="width:7.6%; text-align: left; background-color: white">Station number</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #31bd38">s0</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #31bd38">s1</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #31bd38">s2</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #31bd38">s3</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #31bd38">s4</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #0096ff">s5</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #0096ff">s6</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #0096ff">s7</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #0096ff">s8</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #ff3300">s9</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #ff3300">s10</td>
<th style="width:7.6%; text-align: center; font-size: 120%; color: white; background-color: #ffa25f">s11</td>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left; background-color: white">Muon detector</td>
<td style="text-align: center; font-size: 120%; color: #31bd38">ME1/1</td>
<td style="text-align: center; font-size: 120%; color: #31bd38">ME1/2</td> 
<td style="text-align: center; font-size: 120%; color: #31bd38">ME2</td>
<td style="text-align: center; font-size: 120%; color: #31bd38">ME3</td>
<td style="text-align: center; font-size: 120%; color: #31bd38">ME4</td>
<td style="text-align: center; font-size: 120%; color: #0096ff">RE1</td>
<td style="text-align: center; font-size: 120%; color: #0096ff">RE2</td>
<td style="text-align: center; font-size: 120%; color: #0096ff">RE3</td>
<td style="text-align: center; font-size: 120%; color: #0096ff">RE4</td>
<td style="text-align: center; font-size: 120%; color: #ff3300">GE1/1</td>
<td style="text-align: center; font-size: 120%; color: #ff3300">GE2/1</td>
<td style="text-align: center; font-size: 120%; color: #ffa25f">ME0</td>
</tr>
</tbody>
</table>

This might look a litte bit of overkill, because initially I wanted to make use of all the possible information. It wasn't clear how useful the different muon detectors are, so I decided to include all of them and split them as above because of their different &#981; and &theta; resolutions. Now, I think it's clear that ME0 > CSC > GEM > iRPC > RPC, so one could try something different that uses less firmware resource.

Perhaps the right thing to do is to combine CSC+RPC and CSC+GEM to form "super" trigger primitives, but that will require a lot more time investment. I imagine it took TAMU a lot of time to implement the ILT algorithm, and even so, I believe there are still bugs in their codes, which is the reason why I decided not to use it.

<span style="font-size: 120%; font-weight: bold">(ii) Zones</span>

I revised the zone boundaries. Alex's original rationale for the zone boundaries is to place them near the edges of the ME rings to minimize the effects due to the gaps. I decided to follow the same rationale, but creating more zones -- 6 zones with &eta; boundaries of (1.2, 1.55, 1.7, 1.8, 1.98, 2.15, 2.4):

| Zone | Description |
| :--- | :--- |
| zone 0 | before GE1/1 begins (&eta; ~ 2.15)
| zone 1 | from GE1/1 begins until ME0 ends (&eta; ~ 1.98)
| zone 2 | until ME4/1 ends (&eta; ~ 1.8)
| zone 3 | before ME1/2 begins (&eta; ~ 1.7)
| zone 4 | until ME1/1 ends (&eta; ~ 1.55)
| zone 5 | until ME1/2 ends (&eta; ~ 1.2)

As the result of this gerrymandering, there is less mixing of detector types or rings in the same zone (see below). The ring is important because the CSC ring 1 chambers and ring 2 chambers have different &#981; resolutions. Although RE4/1 and RE4/2 can both exist in zone 2, I decided to drop RE4/2 in that zone to avoid mixing different detector types, as RE4/1 is iRPC, but RE4/2 is old RPC. In any given zone, there are 8 or 9 active virtual stations.

<img src="figures/emtfpp_zones_1.png" width="800px"/>

My current implementation of the zones requires checking detector type, ring, and the integer &theta; value for each hit. The &theta; values are hardcoded (see below). The two values in each entry of the table are used as the zone boundaries for that particular detector type/ring. E.g. for ME1/1, the condition for zone 1 is `(16 <= theta_int && theta_int <= 26)`.
The reason I hardcoded the &theta; values is that, besides CSC, the &theta; values of the hits are very discrete, so they are not always accounted for by adding +/-2 to the boundary &theta; values. 

<img src="figures/emtfpp_zones_2.png" width="800px"/>

The BX windows for CSC hits is 2 consecutive BX windows, as done in EMTF currently. For all the other detector types, only BX=0 hits are used. Note: in my fastsim python script, I actually use BX=(-1,0) for CSC and BX=0 for everything else. In the simulation, the BX=0 stamp is well-defined, and it seems the probability of CSC pre-firing is much higher than post-firing.

<span style="font-size: 120%; font-weight: bold">(iii) Patterns</span>

The matching windows in the EMTF++ patterns are tuned separately for each virtual station and for each zone. They are tuned using a muon gun with p<sub>T</sub> &gt; 2 GeV. I use the same quad-strip unit and the 5 straightness codes, but revert to asymmetric patterns. I use muons in zone 1 to determine the p<sub>T</sub> binning that can generate patterns that have similar "straightness" as in the current EMTF patterns. The p<sub>T</sub> binning is:  (-0.5, -0.365, -0.26, -0.155, -0.07, 0.07, 0.155, 0.26, 0.365, 0.5) in q/p<sub>T</sub> [1/GeV].

The following are visualizations for straightness=0 EMTF pattern (recasted back to asymmetric form), and the corresponding EMTF++ pattern:

<div class="row" style="width:800px; margin: auto">
  <br/>
  <div class="column" style="width:48%; float: left">
    EMTF pattern<br/>
    straightness=0, negative charge<br/>
    <img src="figures/emtf_patterns_strg0.png" style="width:100%"/>
  </div>
  <div class="column" style="width:48%; float: left">
    EMTF++ pattern (zone 1)<br/>
    straightness=0, negative charge<br/>
    <img src="figures/emtfpp_patterns_strg0.png" style="width:100%"/>
  </div>
</div>

The visualizations for all the EMTF++ patterns (from top to bottom: negatively charged straightness 0, ...,  straightness 5, ..., positively charged straightness 0; from left to right: zone 0, ..., zone 5):

<img src="figures/emtfpp_patterns.png" width="600px"/>

To see the corresponding visualizations for the EMTF patterns, see [link](figures/emtf_patterns.png).

A pattern is considered fired if (i) the hits satisfy the SingleMu requirement; and (ii) the subset of CSC & ME0 hits satisfy the MuOpen requirement. SingleMu requires that there are at least 3 hits in 3 different stations, and at least one of the hits must be in YE1. MuOpen requires that there are at least 2 hits in 2 different stations. The virtual stations that belong to the YE1-4 stations are shown below

| Station | Muon detectors |
| :------ | :------------- |
| YE1 | ME1/1, ME1/2, RE1, GE1/1, ME0 |
| YE2 | ME2, RE2, GE2/1 |
| YE3 | ME3, RE3 |
| YE4 | ME4, RE4 |

The total number of EMTF++ patterns in one sector is:

$$
6\ \text{(zone)} \times 9\ \text{(pattern)} \times 160\ \text{(quad-strip)} = 8,640
$$

In addition, the number of stations has increased from 4 to 8-9. So I guess the resource usage might have increased by at least a factor of 6. If this is too much for the firmware, the patterns can be reworked. E.g. dropping the RPC hits.

Furthermore, in the EMTF++ pattern recognition step, the hits that are matched by a pattern are recorded. This is done to simplify the track building step (described later). Doing this avoids having to search for the hits again after pattern recognition. But I'm not sure how feasible this is for firmware implementation.

Note: currently I encode the EMTF++ pattern info in a big 4-D array (straightness) x (zone) x (station) x window, where window is a 3-tuple of (lower boundary, median, upper boundary). I call this the pattern bank.

Note: I also applied an extra correction to ME1/1a, ME1/1b and ME1/2 with the help of the new bend definition. For these chambers, the F/R nature of the chambers can affect the hit &#981; position. This can be corrected by applying a small F/R-dependent correction, such that the hit &#981; position becomes the same on average for both F/R. The correction is `new_phi = old_phi + k * bend`, where k is a constant:

- ME1/1a (R): -1.6419, (F): +1.6012
- ME1/1b (R): -1.3861, (F): +1.3692
- ME1/2  (R): -0.9237, (F): +0.8287

In practice, this doesn't really matter because the pattern quad-strip unit is very coarse. But since it has been implemented, I decided to keep it.

<span style="font-size: 120%; font-weight: bold">(iv) Pattern ghost busting</span>

The same ghost busting step from EMTF can be applied, but with a new definition for the quality code:

<table style="width:800px">
<thead>
<tr>
<th style="width:14%; text-align: left;">Bit number</th>
<th style="width:14%; text-align: left;">bit 11</th>
<th style="width:14%; text-align: left;">bit 10</th>
<th style="width:14%; text-align: left;">bit 9</th>
<th style="width:14%; text-align: left;">bit 8</th>
<th style="width:14%; text-align: left;">bit 7</th>
<th style="width:14%; text-align: left;">bit 6</th>
<th style="width:14%; text-align: left;">bit 5</th>
<th style="width:14%; text-align: left;">bit 4</th>
<th style="width:14%; text-align: left;">bit 3</th>
<th style="width:14%; text-align: left;">bit 2</th>
<th style="width:14%; text-align: left;">bit 1</th>
<th style="width:14%; text-align: left;">bit 0</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">Contents</td>
<td style="text-align: center;">ME0</td>
<td style="text-align: center;">ME1/1</td>
<td style="text-align: center;">GE1/1</td>
<td style="text-align: center;">ME1/2</td>
<td style="text-align: center;">ME2</td>
<td style="text-align: center;">GE2/1</td>
<td style="text-align: center;">ME3 or ME4</td>
<td style="text-align: center;">RE1 or RE2</td>
<td style="text-align: center;">RE3 or RE4</td>
<td style="text-align: center;">S2</td>
<td style="text-align: center;">S1</td>
<td style="text-align: center;">S0</td>
</tr>
</tbody>
</table>

It is possible to compress this if necessary. I prioritize having more hits than the pattern straightness, so that the neural network (described later) can have more information to make the decision.

I implemented something more complex in my fastsim python script for the ghost busting:

- I cluster the neighbor roads that have the same pattern straightness. Two roads are neighbors if their keystrips differ by 1.
- For each road cluster, I pick the median (the road with keystrip = median among the keystrips of the cluster) to represent the cluster.
- I apply a cut on the BX's of the hits in the road cluster representative. I require at most 2 hits with BX=-1, and at least 2 hits with BX=0, i.e. `bx_minus_1_counter <= 2 and bx_zero_counter >= 2`
  * Note that I only include CSC hits with BX=(-1,0), and all the other hits with BX=0.
- I sort all the remaining road clusters based on the quality code.
- If the keystrip range of a road cluster intersects with the keystrip range of another road cluster with a higher quality code, the road is cancelled.
  * Given two ranges `(x1, x2)` and `(y1, y2)`. They intersect if `(x2 >= y1) and (x1 <= y2)`
  * I added an extra +/-2 unit to make the cancellation more aggressive: `(x2+2 >= y1) and (x1-2 <= y2)`

There is probably no need to implement the complex logic above in firmware, but this was done to remove the duplicates in a more aggressive way.

### Track building

In the Phase-1 track building step, we associate hits to the roads. If there are multiple hits in a station, we resolve the ambiguities by building the straightest track. We also calculate the deflection angles (&Delta;&#981; and &Delta;&theta; between stations), apply cuts on the &Delta;&theta;, and assign the track BX. The output is a proto-track, which is basically the final track, but without the p<sub>T</sub> (and also &eta; and &#981; in the &mu;GMT convention).

This step is also sometimes referred to as "primitive matching". 

Recall that the RPC hits are not used in the pattern recognition step. But they are included in the track building step (since early 2017):
- Each RPC chamber can be mapped to a corresponding 10&deg; CSC chamber.
- If the corresponding CSC chamber has zero LCTs, slip in the RPC hits. If there is any LCT, the RPC hits are ignored.

The algorithm for building the straightest track is as follows:
- For each station, compute the absolute value of the &Delta;&#981; between hit &#981; and pattern keystrip x 32 for each hit.
  * The factor of 32 puts the pattern keystrip into the EMTF integer &#981; unit
- A hit is considered valid if the &Delta;&#981; is within 496 units for ME1, or 240 units for ME2,3,4.
- For each station, sort the valid hits and pick the hit with minimum &Delta;&#981;.

There are firmware constraints on the number of chambers used in each zone <sup><a href="#myfootnote2">[2]</a></sup>:

| Zone | Chambers (n) |
| :--- | :--- |
| zone 0 | ME1/1 (6+1), ME2/1 (3+1), ME3/1 (3+1), ME4/1 (3+1) |
| zone 1 | ME1/1 (6+1), ME2/1 (3+1), ME3/2 (6+1), ME4/2 (6+1) |
| zone 2 | ME1/2 (6+1), ME2/2 (6+1), ME3/2 (6+1), ME4/2 (6+1) |
| zone 3 | ME1/3 (6+1), ME2/2 (6+1), ME3/2 (6+1) |

<a name="myfootnote2"><sup>2</sup></a>   Because of the constraints, the revised zone boundaries by Andrew were not implemented because there would be too many chambers if both rings are used in a given zone.

The algorithm for applying the &Delta;&theta; cuts is as follows:
- There are 4 stations, so there are 6 pairs of &Delta;&theta;'s to calculate: (1,2), (1,3), (1,4), (2,3), (2,4), (3,4). A LCT can have up to 2 &theta; values due to ambiguities in matching the strips and wires. Thus, for each pair of stations, there are up to 4 possible &Delta;&theta;'s. Sort and pick the minimum &Delta;&theta; (absolute value).
- Check that the &Delta;&theta; is <= window, where window is 4 for zone 0, or 8 for zone 1,2,3 (since June 2018).

The results of the &Delta;&theta; cuts are used to reject certain hits from the proto-track. The logic is implemented in a truth table:
- If all 6 &Delta;&theta;'s are good, keep all hits
- Else if (1,2), (2,3), (1,3) are good, reject ME4 hit
- Else if (1,2), (2,4), (1,4) are good, reject ME3 hit
- Else if (1,3), (3,4), (1,4) are good, reject ME2 hit
- Else if (2,3), (3,4), (2,4) are good, reject ME1 hit
- Else if (1,2) is good, reject ME3 and ME4 hits
- Else if (1,3) is good, reject ME2 and ME4 hits
- Else if (1,4) is good, reject ME2 and ME3 hits
- Else if (2,3) is good, reject ME1 and ME4 hits
- Else if (2,4) is good, reject ME1 and ME3 hits
- Else if (3,4) is good, reject ME1 and ME2 hits
- Else, reject all hits

If a proto-track ends up with less than 2 good hits, the proto-track is cancelled. 

The &#981; and &theta; values are obtained from the ME2 hit if it is available; else, the ME3 hit is used if it is available; else, the ME4 hit is used. The BX is obtained from the second earliest hit. Also, each proto-track is assigned a 7-bit quality code:

<table style="width:500px">
<thead>
<tr>
<th style="width:14%; text-align: left;">Bit number</th>
<th style="width:14%; text-align: left;">bit 6</th>
<th style="width:14%; text-align: left;">bit 5</th>
<th style="width:14%; text-align: left;">bit 4</th>
<th style="width:14%; text-align: left;">bit 3</th>
<th style="width:14%; text-align: left;">bit 2</th>
<th style="width:14%; text-align: left;">bit 1</th>
<th style="width:14%; text-align: left;">bit 0</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">Contents</td>
<td style="text-align: center;">S2</td>
<td style="text-align: center;">ME1 hit</td>
<td style="text-align: center;">S1</td>
<td style="text-align: center;">ME2 hit</td>
<td style="text-align: center;">S0</td>
<td style="text-align: center;">ME3 hit</td>
<td style="text-align: center;">ME4 hit</td>
</tr>
</tbody>
</table>

Finally, there is a proto-track ghost busting step. If a proto-track shares a hit with another proto-track with a higher quality code, the proto-track is cancelled. The 3 best (i.e. highest quality code) proto-tracks from a total of 36 (3 from each of the 4 zones from 3 consecutive BX's) are sent to the next step.

The changes in the
<span style="display: inline-block; border-radius: .4rem; padding: 0.35rem .6rem; background: #3776ab; color: #ffffff; vertical-align: middle;">EMTF++</span>
algorithm are described below.

<span style="font-size: 120%; font-weight: bold">(i) Unique hit for each station</span>

I'm assuming that the hits matched to a pattern can be recorded during the pattern recognition step, so only those hits need to be considered here. (In EMTF, the search during this step considers all the hits in all the chambers in a given station). 

The algorithm to build the straightest track is tweaked to instead build the track with straightness as expected by the pattern straightness:
- First, find the median &theta; value of all the hit &theta;'s in the road. Use it as the road &theta;. Also, use the pattern keystrip x 32 as the road &#981;.
- For each virtual station, compute the absolute value of the &Delta;&#981; between hit &#981; and a partner hit &#981; including a bias term, i.e. `dphi = abs(hit_phi - partner_hit_phi - bias)`.
  * Each virtual station is assigned a partner virtual station (see table below). E.g. ME1/1 is partnered with ME2, so `dphi = abs(me11_hit_phi - me2_hit_phi - bias)`. 
    + The entry that says "ME1/1 or ME1/2" means ME1/1 for zones 0-4, or ME1/2 for zone 5.
    + The bias terms are stored in a big 4-D array (straightness) x (zone) x (station) x window, where window is a 3-tuple of (5-percentile, median, 95-percentile). Only the median is needed -- it is the bias term.
  * For the CSC stations, I compute a "best estimate" &#981; value using the road &#981; and the bias terms. These best estimate values help me patch up the missing CSC hits. Then, they can be used to look for hits in the other detectors (RPC, iRPC, GEM, ME0).
    + For ME1/1 and ME1/2, the best estimate is `estimate_phi = road_phi + bias`.
    + For ME2,3,4, the best estimate is `estimate_phi = me11_estimate_phi + bias` for zones 0-4, or `estimate_phi = me12_estimate_phi + bias`
  * When calculating the &Delta;&#981;, if there is no hit in the partner station, the best estimate value is used instead.


<table style="width: 800px">
<thead>
<tr>
<th style="width:7.6%; text-align: left;">Station number</td>
<th style="width:7.6%; text-align: center;">s0</td>
<th style="width:7.6%; text-align: center;">s1</td>
<th style="width:7.6%; text-align: center;">s2</td>
<th style="width:7.6%; text-align: center;">s3</td>
<th style="width:7.6%; text-align: center;">s4</td>
<th style="width:7.6%; text-align: center;">s5</td>
<th style="width:7.6%; text-align: center;">s6</td>
<th style="width:7.6%; text-align: center;">s7</td>
<th style="width:7.6%; text-align: center;">s8</td>
<th style="width:7.6%; text-align: center;">s9</td>
<th style="width:7.6%; text-align: center;">s10</td>
<th style="width:7.6%; text-align: center;">s11</td>
</tr>
</thead>
<tbody>
<tr>
<tr>
<td style="text-align: left;">Muon detector</td>
<td style="text-align: center;">ME1/1</td>
<td style="text-align: center;">ME1/2</td> 
<td style="text-align: center;">ME2</td>
<td style="text-align: center;">ME3</td>
<td style="text-align: center;">ME4</td>
<td style="text-align: center;">RE1</td>
<td style="text-align: center;">RE2</td>
<td style="text-align: center;">RE3</td>
<td style="text-align: center;">RE4</td>
<td style="text-align: center;">GE1/1</td>
<td style="text-align: center;">GE2/1</td>
<td style="text-align: center;">ME0</td>
</tr>
<tr>
<td style="text-align: left;">Partner station number</td>
<td style="text-align: center;">s2</td>
<td style="text-align: center;">s2</td>
<td style="text-align: center;">s0 or s1</td>
<td style="text-align: center;">s0 or s1</td>
<td style="text-align: center;">s0 or s1</td>
<td style="text-align: center;">s0 or s1</td>
<td style="text-align: center;">s2</td>
<td style="text-align: center;">s3</td>
<td style="text-align: center;">s4</td>
<td style="text-align: center;">s0 or s1</td>
<td style="text-align: center;">s2</td>
<td style="text-align: center;">s0 or s1</td>
</tr>
<tr>
<td style="text-align: left;">Partner muon detector</td>
<td style="text-align: center;">ME2</td>
<td style="text-align: center;">ME2</td>
<td style="text-align: center;">ME1/1 or ME1/2</td>
<td style="text-align: center;">ME1/1 or ME1/2</td>
<td style="text-align: center;">ME1/1 or ME1/2</td>
<td style="text-align: center;">ME1/1 or ME1/2</td>
<td style="text-align: center;">ME2</td>
<td style="text-align: center;">ME3</td>
<td style="text-align: center;">ME4</td>
<td style="text-align: center;">ME1/1 or ME1/2</td>
<td style="text-align: center;">ME2</td>
<td style="text-align: center;">ME1/1 or ME1/2</td>
</tr>
</tbody>
</table>


- For each virtual station, also compute the absolute value of the &Delta;&theta; between hit &theta; and the road &theta;, i.e. `dtheta = abs(hit_theta - road_theta)`

- Sort the hits in each virtual station by minimizing (&Delta;&theta;, &Delta;&#981;), i.e. select minimum &Delta;&theta;, and if the &Delta;&theta;'s are the same, select minimum &Delta;&#981;.

<span style="font-size: 120%; font-weight: bold">(ii) Proto-track &#981;, &theta;, BX</span>

- Proto-track &#981; = pattern keystrip x 32.
- Proto-track &theta; = median &theta; value of all the hit &theta;'s.
- Proto-track BX = second earliest hit BX.
  + In the fastsim python script, I just cheat and always use BX = 0. Recall that I already made a cut that rejects any road with more than 2 hits with BX=-1.

<span style="font-size: 120%; font-weight: bold">(iii) Proto-track ghost busting</span>

- We can apply the same procedure as in EMTF.
  * In the fastsim python script, I didn't implement it, because there are very few duplicates. But I should double check this again.

### Parameter assignment

We have all the information we need now, so finally we can assign the p<sub>T</sub>! In Phase-1, we use a very large (1GB <sup><a href="#myfootnote3">[3]</a></sup>), fast <sup><a href="#myfootnote4">[4]</a></sup> LUT for p<sub>T</sub> assignment. It allows almost any algorithm to be implemented with very low latency. But the inputs to the algorithm must be compressed into the 30-bit address space of the PTLUT. The BDT algorithm is trained offline, and then evaluated offline for all the possible addresses. The BDT outputs, each encoded as a 9-bit word, are stored into the PTLUT. For Phase-2, a larger (128 GB) PTLUT with 37-bit address space is being R&D-ed.

<a name="myfootnote3"><sup>3</sup></a>   It is a little larger than 1 GB: $9\ \text{(bit)} \times \frac{1}{8}\ \text{(byte/bit)} \times 2^{30} = 1.125\ \text{GB}$

<a name="myfootnote4"><sup>4</sup></a>   The type of LUT is Reduced Latency DRAM (RLDRAM), which has low latency for random address lookup.


We make use of about 25 variables in the Phase-1 BDT. Depending on the track mode, a subset of the variables are selected and compressed into a 30-bit address. Mode is a 4-bit word:

|Bit number|Bit 3|Bit 2|Bit 1|Bit 0|
|:---------|:----|:----|:----|:----|
|Content|ME1 hit|ME2 hit|ME3 hit|ME4 hit|

To save address space, the deflection angles &Delta;&#981; are converted into a non-linear scale. The mode word is also compressed with a special scheme (since June 2017, BDT "v7") to allow more efficient use of the limited number of addresses -- more addresses allocated for important track modes such as the 4-station and 3-station tracks.

<img src="figures/emtf_ptlut_addresses_mode.png" width="300px"/>

The input variables (a.k.a features), and the numbers of bits they use, for different track modes are shown below:

<img src="figures/emtf_ptlut_addresses.png" width="800px"/>

The above table is a simplified view. The exact definitions of these variables, along with other details of the PTLUT address scheme, are documented by Alex and Andrew [here](docs/EMU_TF_PT_LUT_address_formation_2017_06_05.docx). Also see Andrew's [talk](https://indico.cern.ch/event/623713/contributions/2516606/subcontributions/222725/attachments/1429297/2194604/2017_03_09_EMTF_LUT_bit_assignment.pdf).

In addition, the BDT predicted p<sub>T</sub> usually has a resolution with mean of 0 and sigma of approx 20%. If the BDT output is used as the trigger threshold, the efficiency will only be 50% at the threshold. But at L1, we are supposed to report the p<sub>T</sub> at 90% efficiency working point. Thus, a p<sub>T</sub>-dependent scale factor is applied to the BDT output, and the new p<sub>T</sub> is stored into the PTLUT. The scale factor equation is: 

$$
\text{sf} = 1.2/(1 - 0.015 \times \min(20, p_{\mathrm{T}}))
$$


The &#981; and &theta; are also converted into the &mu;GMT convention for &#981; and &eta;. The &mu;GMT convention is documented in [DN-2015/017](docs/DN2015_017_v3.pdf). The muon charge is determined by a separate logic based on the signs of the &Delta;&#981;'s.

The BDT has been replaced by a Neural Network for
<span style="display: inline-block; border-radius: .4rem; padding: 0.35rem .6rem; background: #3776ab; color: #ffffff; vertical-align: middle;">EMTF++</span>
. The advantage is that it alleviates the bottleneck of the number of address bits in the PTLUT, by using logic and Digital Signal Processing (DSP) resources in the modern FPGA instead. This allows for using many more variables without heavy compression. In addition, NN is being actively developed, and we can benefit from future improvements to the NN technology.

<span style="font-size: 120%; font-weight: bold">(i) Neural network approach</span>

I take advantage of the possibility of using larger number of inputs, and choose to use more low-level features. In the Phase-1 EMTF, a considerable amount of work is done to select and engineer the features that can be encoded in the fewest number of bits. The FPGA also does a considerable amount of preprocessing to compress data into PTLUT addresses. The NN approach allows us to skip that, and let the machine intelligence do the work instead.

In "v1" EMTF++, the following 39 features are used:

<img src="figures/emtfpp_features.png" width="500px"/>

Explain NN structure ... Wikipedia ...


A feed-forward neural network with 3 hidden layers (50/30/20 nodes per layer) is used. Batch normalization is applied at all the hidden layers (see [arXiv:1502.03167](https://arxiv.org/abs/1502.03167)).

When I started training the NN, I observed an issue due to fake tracks (e.g. tracks from combinatorics from PU) at high p<sub>T</sub>. I believe this was due to training the NN on the real muons, but when applying to &lt;PU&gt;=200 events, the NN can make erroneous outputs. I tried to tune the regression in different ways, but I could not get rid of them satisfactorily. I tried adding one more output node to do a simple classificaition of S vs B, where S is any real muon with p<sub>T</sub> &gt; 8 GeV, and B is any track from the &lt;PU&gt;=200 sample, after vetoing events with any muons with p<sub>T</sub> &gt; 8 GeV. This seems to work quite well, and I call this second output the PU discriminator.

The number of parameters ...

<img src="figures/neural_network.gv.svg" width="350px"/>


Refer to HLS4ML


The training of the NN is done offline and won't be described here.




## Performance plots

In the following results, you can find the efficiency and rate plots for <span style="color: red; font-weight: bold;">EMTF</span> and <span style="color: blue; font-weight: bold;">EMTF++</span>. 

Note that I'm currently using the EMTF emulator version at the beginning of 2018. There is a set of important changes applied to EMTF in June 2018, which has reduced the pileup dependence. The effect is 30% rate reduction at &lt;PU&gt; = 200. I haven't found time to include the set of changes (as well as all the other changes since the beginning of 2018). I'll update the results once I'm able to do that.

Other assumptions:
- Performance of the ME0 trigger primitive in the current simulation is realistic.
- CSC CLCT comparator digi fits are possible.
- The new patterns can fit inside the FPGA.
- In the rate plots, the number of colliding bunches = 2808 is used. (see below)

$$
\begin{align*}
\text{Event pileup for one BX, at a given time, on average},\quad  &\mathcal{P} &=& \frac{\mathcal{L} \times \sigma_{pp}}{\mathcal{N} \times \text{freq}} \\
\text{Total cross section for}\ \sqrt{s}\text{ = 14 TeV},\quad  &\sigma_{pp} &=& 80\ \mathrm{mb} \\
\text{Number of colliding pp bunches},\quad  &\mathcal{N} &=& 1...2808 \\
\text{LHC orbit frequency},\quad  &\text{freq} &=& 11246\ \mathrm{Hz}
\end{align*}
$$

### Resolution

<div class="row" style="width:800px; margin: auto;">
  <div class="column" style="width:48%; float: left">
    EMTF p<sub>T</sub> bias<br/>
    <img src="figures_perf/emtf_l1ptres_vs_genpt_bias.png" style="width:100%"/>
  </div>
  <div class="column" style="width:48%; float: left">
    EMTF++ p<sub>T</sub> bias<br/>
    <img src="figures_perf/emtf2023_l1ptres_vs_genpt_bias.png" style="width:100%"/>
  </div>
</div>
<div class="row" style="width:800px; margin: auto;">
  <div class="column" style="width:48%; float: left">
    EMTF p<sub>T</sub> resolution<br/>
    <img src="figures_perf/emtf_l1ptres_vs_genpt_res.png" style="width:100%"/>
  </div>
  <div class="column" style="width:48%; float: left">
    EMTF++ p<sub>T</sub> resolution<br/>
    <img src="figures_perf/emtf2023_l1ptres_vs_genpt_res.png" style="width:100%"/>
  </div>
</div>

<span style="color: red; font-weight: bold;">EMTF</span> has a noticeable underestimation of p<sub>T</sub> at high p<sub>T</sub>. <span style="color: blue; font-weight: bold;">EMTF++</span> has a noticeable overestimation of p<sub>T</sub> at low p<sub>T</sub> (2-3 GeV). This is because the NN output has a sharp cut-off at 2 GeV, so the 2 GeV muons always get p<sub>T</sub> &gt; 2 GeV, but not lower, hence it is biased to higher p<sub>T</sub> on average. As the result, the resolution is also skewed. In any case, we are not going to worry about the resolution at 2 GeV.

### Efficiency

<div class="row" style="width:800px; margin: auto;">
  <div class="column" style="width:48%; float: left">
    Efficiency vs. p<sub>T</sub> at 20 GeV trigger threshold<br/>
    <br/>
    <img src="figures_perf/emtf_eff_vs_genpt_l1pt20.png" style="width:100%"/>
  </div>
  <div class="column" style="width:48%; float: left">
    Efficiency vs. &eta; at 20 GeV trigger threshold<br/>
    (gen p<sub>T</sub> &gt; 20 GeV)<br/>
    <img src="figures_perf/emtf_eff_vs_geneta_l1pt20.png" style="width:100%"/>
  </div>
</div>

<span style="color: blue; font-weight: bold;">EMTF++</span> has a sharper turn on curve and higher plateau efficiency, compared to <span style="color: red; font-weight: bold;">EMTF</span>. The efficiency is also more flat  vs &eta;, thanks to additional Phase-2 muon detectors.

### Rates

<div class="row" style="width:800px; margin: auto;">
  <div class="column" style="width:48%; float: left">
    Rate vs. p<sub>T</sub> trigger threshold<br/>
    <img src="figures_perf/emtf2023_rate_reduction.png" style="width:100%"/>
  </div>
  <div class="column" style="width:48%; float: left">
    Rate vs. PU at 20 GeV trigger threshold<br/>
    <img src="figures_perf/emtf2023_rate_pu_dependence.png" style="width:100%"/>
  </div>
</div>

<span style="color: blue; font-weight: bold;">EMTF++</span> rate @ 20 GeV threshold is 11.0 kHz; for <span style="color: red; font-weight: bold;">EMTF</span>, it is 44.5 kHz. So, <span style="color: blue; font-weight: bold;">EMTF++</span> has achieved a factor of 4 rate reduction (or 75% rate reduction). There are O(10%) stat. uncertainties in these numbers. Note that there is a change of slope around 10 GeV, which is due to the PU discriminator that is applied to >8 GeV tracks (8 GeV becomes approx 10 GeV after the online-to-offline scaling).

Comparing the rates at &lt;PU&gt;=140 &amp; 200, we also see that <span style="color: blue; font-weight: bold;">EMTF++</span> has much more linear dependence.

<div class="row" style="width:800px; margin: auto;">
  <div class="column" style="width:32%; float: left">
    Rate vs. p<sub>T</sub> trigger threshold<br/>
    1.24 &lt; |&eta;| &lt; 1.65<br/>
    (no new Phase-2 detectors)<br/>
    <img src="figures_perf/emtf2023_rate_reduction_1.png" style="width:100%"/>
  </div>
  <div class="column" style="width:32%; float: left">
    Rate vs. p<sub>T</sub> trigger threshold<br/>
    1.65 &lt; |&eta;| &lt; 2.15<br/>
    (GE1/1, GE2/1, ME0, iRPC)<br/>
    <img src="figures_perf/emtf2023_rate_reduction_2.png" style="width:100%"/>
  </div>
  <div class="column" style="width:32%; float: left">
    Rate vs. p<sub>T</sub> trigger threshold<br/>
    2.15 &lt; |&eta;| &lt; 2.4<br/>
    (GE2/1, ME0, iRPC)<br/>
    <img src="figures_perf/emtf2023_rate_reduction_3.png" style="width:100%"/>
  </div>
</div>

Looking at the rates separately for 3 regions: 1.24 &lt; |&eta;| &lt; 1.65, 1.65 &lt; |&eta;| &lt; 2.15, and 2.15 &lt; |&eta;| &lt; 2.4. There are 49% rate reduction at 20 GeV in the first region, 85% rate reduction in the second, and 81% rate reduction in the third.

### Robustness

(pending)

## Summary

A "v1" version of EMTF++ has been implemented. It has achieved 11.0 kHz at 20 GeV trigger threshold for &lt;PU&gt;=200, which is a factor of 4 rate reduction. In the |&eta;| > 1.6 region with the new Phase 2 muon detectors, the rate reduction is 6x. At the same time, the overall efficiency is also higher.

Unfortunately, it falls short of the goal to reduce the rate below 10 kHz at &lt;PU&gt;=200. However, I'm sure further improvements/optimizations can be made. Also, certain assumptions in this v1 version might be difficult to implement in the firmware, or maybe there are bugs, so it might still need to be reworked and debugged. In any case, we now have a new baseline that can be used as the benchmark for future studies, which I think is the most important point. For example, if one were to develop a new algorithm (e.g. Kalman Filter) for the endcap muon trigger, they might want to find out how good the new algorithm is compared to the old one.

The NN p<sub>T</sub> assignment is a very interesting technology to be applied at L1 trigger. We plan to make it ready for Run 3 (not to replace the BDT-based p<sub>T</sub> assignment, but as an exercise to investigate the feasibility of NNs).

Also, we can also extend EMTF++ to trigger on new scenario, such as displaced muons, or new detector region, such as the OMTF region. 

## Future plans

### Run 3 configuration

(Insert Sergo's results here ...)