In [None]:
#include <iostream>
#include <unordered_set>
#include <ROOT/RDataFrame.hxx> // For modern data frame tool
#include <cmath>              // Include for sqrt function

In [1]:
// Open the ROOT file containing the tree.
TFile *file = TFile::Open("../Data/run6578.root");
if (!file || file->IsZombie()) { // Check if the file is opened successfully.
    std::cerr << "Error opening file or file not found." << std::endl;
    return;
}

// Retrieve the tree named "raw" from the file.
TTree *tree = (TTree*)file->Get("raw");
tree->Print();
if (!tree) { // Check if the tree exists.
    std::cerr << "Tree 'raw' not found in file." << std::endl;
    file->Close(); // Close the file if the tree is not found.
    return; // Exit the function.
}

******************************************************************************
*Tree    :raw       : rawapvdata                                             *
*Entries :    15976 : Total =        31315138 bytes  File  Size =   15214339 *
*        :          : Tree compression factor =   2.06                       *
******************************************************************************
*Br    0 :apv_evt   : apv_evt/i                                              *
*Entries :    15976 : Total  Size=      64616 bytes  File Size  =      22583 *
*Baskets :        3 : Basket Size=      32000 bytes  Compression=   2.84     *
*............................................................................*
*Br    1 :time_s    : time_s/I                                               *
*Entries :    15976 : Total  Size=      64609 bytes  File Size  =      31184 *
*Baskets :        3 : Basket Size=      32000 bytes  Compression=   2.06     *
*...................................................

Documentation [ROOT::RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html)

ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree , CSV and other data formats, in C++ or Python.

In addition, multi-threading and other low-level optimisations allow users to exploit all the resources available on their machines completely transparently.
Skip to the class reference or keep reading for the user guide.

In [2]:
// Create a data frame from the TTree for easier processing.
ROOT::RDataFrame df(*tree);

**hasDetector13 Function**

1. **Lambda Expression:** The `hasDetector13` is a lambda expression, which is a convenient way to define an anonymous (unnamed) function right at the location where it is needed. In C++, lambda expressions are a feature that allows you to write inline functions which can capture variables from the surrounding context.
2. **Function Signature:** This lambda takes a single parameter:
    * `const std::vector<unsigned int>& ids` - A constant reference to a vector of unsigned integers. The const keyword means the vector cannot be modified by the lambda, and `&` indicates that it is passed by reference, which avoids copying the entire vector and thus is more efficient.
2. **Function Body::** Inside the lambda, the function uses the std::find algorithm to search through the vector ids:
    * `std::find(ids.begin(), ids.end(), 13)` - This call to `std::find` starts looking at the beginning of the vector `(ids.begin())` and continues to the end of the vector `(ids.end())`, searching for the integer `13`.
    * The result of `std::find` is compared to `ids.end()`. The `std::find` function returns an iterator to the first element in the range that matches the value (`13` in this case). If no such element is found, `std::find` returns the end iterator `(ids.end())`, which represents a position one past the last element of the vector.
4. **Return Value:** The lambda returns a boolean value:
    * If `std::find` does not return `ids.end()`, it means the value `13` was found in the vector, so the lambda returns true.
    * If `std::find` returns `ids.end()`, it means the value `13` was not found in the vector, so the lambda returns false.


In [3]:
// Lambda function to check if detector with id 13 is activated in the event.
auto hasDetector13 = [](const std::vector<unsigned int>& ids) {
    return std::find(ids.begin(), ids.end(), 13) != ids.end();
};

`hasDetector13` lambda function is used to filter events in a `ROOT::RDataFrame`. It is passed as a filter criterion to `df.Filter()`, which processes each entry of the data frame:

In [4]:
// Filter the data frame to include only those events where detector 13 is activated.
auto filtered = df.Filter(hasDetector13, {"apv_id"});


Here, `df.Filter()` applies `hasDetector13` to each entry of the data frame, and `hasDetector13` checks whether the vector `apv_id` from each entry contains the number `13`. Only those entries where hasDetector13 returns true (i.e., entries where detector 13 is activated) are included in the filtered data frame. This filtered data frame is then used for further processing, ensuring that only relevant data (where detector 13 is active) is considered.

This selective filtering is crucial for your analysis, particularly when calculating efficiencies, as it focuses on counting and analyzing only those events that meet a specific condition (activation of detector 13).

[**Unordered map** ](https://www.geeksforgeeks.org/unordered_map-in-cpp-stl/)

Unordered maps are associative containers that store elements formed by the combination of a key value and a mapped value, and which allows for fast retrieval of individual elements based on their keys.

In an unordered_map, the key value is generally used to uniquely identify the element, while the mapped value is an object with the content associated to this key. Types of key and mapped value may differ.

In [5]:
// Map to store counts of activations for each detector.
std::unordered_map<int, int> counts;

[**Foreach**](https://root.cern/doc/master/classROOT_1_1RDF_1_1RInterface.html#ad2822a7ccb8a9afdf3e5b2ea321886ca)
Execute a user-defined function on each entry (instant action).

**Lambda Function in Foreach**

1. **Lambda Introduction `[&]`:**
    * `[&]` is the capture clause of the lambda. The ampersand `&` signifies that the lambda captures all external variables used within the lambda by reference. This allows the lambda to modify these external variables. In this context, it allows the lambda to modify the `counts` map that is defined outside the lambda.
2. **Parameter:**
    * **`const std::vector<unsigned int>& ids`:** The lambda takes a single parameter, `ids`, which is a reference to a `std::vector` of unsigned integers. The `const` qualifier indicates that the vector cannot be modified by the lambda. This vector is expected to contain the IDs of the detectors activated in a given event.
3. **Using `std::unordered_set`:** 
    * **`std::unordered_set<int> unique_ids(ids.begin(), ids.end());`:** This line creates a set of integers `(unique_ids)` initialized with the elements from the vector `ids`. The purpose of using a set is to eliminate any duplicate IDs from the vector, as sets only store unique elements. This ensures each detector ID is counted only once per event, even if it appears multiple times in the ids vector.
4. **Processing Detector IDs:**
    * **`for (int id : {8,9,10,11,12,13})`:** This loop iterates over a fixed list of detector IDs (from 8 to 13). These represent the specific detectors of interest.
    * **`if (unique_ids.count(id))`:** Inside the loop, the code checks if the current `id` from the list is present in the `unique_ids` set using the count method, which returns 1 if the element is found and 0 otherwise.
    * **`counts[id]++`**: If the detector ID is found in the set, the corresponding entry in the counts map is incremented. This map (`counts`) is used to tally how many events each detector was activated in.

The main goal of this lambda function is to analyze each event's detector activations and increment counts for the specific detectors of interest, ensuring each detector is counted only once per event. This data is then used to calculate efficiencies, reflecting the proportion of events (where detector 13 is activated) in which each of the other detectors was also activated.



In [6]:
// Process each entry in the filtered data frame.
filtered.Foreach([&](const std::vector<unsigned int>& ids) {
    std::unordered_set<int> unique_ids(ids.begin(), ids.end()); // Use a set to avoid counting duplicates.
    for (int id : {8,9,10,11,12,13}) {
        if (unique_ids.count(id)) {
            counts[id]++; // Increment count for each detector found.
        }
    }
}, {"apv_id"});

// Get the number of entries where detector 13 was hit to calculate efficiencies.
double N = filtered.Count().GetValue();

std::cout << "number of entries where detector 13 was hit: " << N << std::endl;

number of entries where detector 13 was hit: 14900


[Count](https://root.cern/doc/master/classROOT_1_1RDF_1_1RInterface.html#a876bfce418c82a93caf2b143c9c08704)
Return the number of entries where detector 13 was hit to calculate efficiencies.

**Efficiency Calculation**

1. **Conditional Operator (Ternary Operator):** The condition checked is `n > 0`, which verifies if there are any valid events (where detector 13 is activated). If `n` is greater than 0, it proceeds with the calculation of the efficiency. If `n` is 0 (indicating no such events were found), it directly assigns 0 to avoid division by zero, which would result in a runtime error.
2. **Division for Efficiency Calculation:** 
    * **`counts[id]`:** This is the number of times the particular detector (identified by `id`) was activated in those events where detector 13 was also activated. It's taken from a map (`counts`) where each detector's ID is mapped to its activation count.
    * **`n`:** Represents the total number of events where detector 13 was detected. This acts as the denominator in the efficiency calculation.
3. **Type Casting:**
    * **`static_cast<double>(counts[id])`**: This part of the code is casting the count from `int` (which is the type of `counts[id]`) to `double`. This cast is necessary to ensure floating-point division. If this cast wasn't there, and both `counts[id]` and `n` were integers, the division would perform integer division, which discards the decimal part and could lead to incorrect efficiency calculations.

So, the condition; calculates the efficiency as the fraction of events where the specific detector was activated out of the events where detector 13 was activated, ensuring that it handles cases where there are no valid events by setting efficiency to 0. Then, it print each efficiency.


In [7]:
// Output the efficiency results and their errors for each detector.
for (int id : {8,9,10,11,12,13}) {
    int n = counts[id];
    double efficiency = N > 0 ? (n / N) * 100 : 0; // Calculate efficiency as a percentage.
    double error = N > 0 ? (1 / sqrt(N)) * sqrt(n / N * (1 - n / N)) * 100 : 0; // Calculate error as a percentage.

    // Print counts as integers without decimal places
    std::cout << "Detector " << id << "\t" << "Counts " << n << "\t"; // Display count as an integer

    // Set fixed point and two decimal places for efficiencies and errors
    std::cout << std::fixed << std::setprecision(2) 
            << "Err Counts " << sqrt(n) << "\t" << "Efficiencies " << efficiency << "%\t" << "Error Eff " << error << "%" << std::endl;
}

Detector 8	Counts 12337	Err Counts 111.07	Efficiencies 82.80%	Error Eff 0.31%
Detector 9	Counts 11240	Err Counts 106.02	Efficiencies 75.44%	Error Eff 0.35%
Detector 10	Counts 8089	Err Counts 89.94	Efficiencies 54.29%	Error Eff 0.41%
Detector 11	Counts 11169	Err Counts 105.68	Efficiencies 74.96%	Error Eff 0.35%
Detector 12	Counts 7587	Err Counts 87.10	Efficiencies 50.92%	Error Eff 0.41%
Detector 13	Counts 14900	Err Counts 122.07	Efficiencies 100.00%	Error Eff 0.00%
