1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [1]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [2]:
import pandas as pd # standard naming convention
import numpy as np

In [3]:
file_name = "./data/data_000637.txt" #1310720 rows
N = 1310720
df = pd.read_csv(file_name, nrows = N)
df

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14
1310716,1,1,4,3869211171,763,11
1310717,1,0,64,3869211171,764,0
1310718,1,0,139,3869211171,769,0


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [4]:
unit = df['BX_COUNTER'].max() #3563
print(unit)

3563


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [13]:
pd.set_option("display.precision", 15)

df['ABSOLUTE_TIME'] = df['ORBIT_CNT']*unit*25 + df['BX_COUNTER']*25 + df["TDC_MEAS"]*25/30 
df

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,ABSOLUTE_TIME
0,1,0,123,3869200167,2374,26,344649004934896.6875
1,1,0,124,3869200167,2374,27,344649004934897.5000
2,1,0,63,3869200167,2553,28,344649004939373.3125
3,1,0,64,3869200167,2558,19,344649004939490.8125
4,1,0,64,3869200167,2760,25,344649004944545.8125
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,344649985075886.6875
1310716,1,1,4,3869211171,763,11,344649985075909.1875
1310717,1,0,64,3869211171,764,0,344649985075925.0000
1310718,1,0,139,3869211171,769,0,344649985076050.0000


In [24]:
ts = pd.to_datetime(df['ABSOLUTE_TIME'])

print("TS type:",type(ts))
ts

TS type: <class 'pandas.core.series.Series'>


0         1970-01-04 23:44:09.004934896
1         1970-01-04 23:44:09.004934897
2         1970-01-04 23:44:09.004939373
3         1970-01-04 23:44:09.004939490
4         1970-01-04 23:44:09.004944545
                       ...             
1310715   1970-01-04 23:44:09.985075886
1310716   1970-01-04 23:44:09.985075909
1310717   1970-01-04 23:44:09.985075925
1310718   1970-01-04 23:44:09.985076050
1310719   1970-01-04 23:44:09.985075890
Name: ABSOLUTE_TIME, Length: 1310720, dtype: datetime64[ns]

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [25]:
pd.to_timedelta(df['ABSOLUTE_TIME'])

0         3 days 23:44:09.004934896
1         3 days 23:44:09.004934897
2         3 days 23:44:09.004939373
3         3 days 23:44:09.004939490
4         3 days 23:44:09.004944545
                     ...           
1310715   3 days 23:44:09.985075886
1310716   3 days 23:44:09.985075909
1310717   3 days 23:44:09.985075925
1310718   3 days 23:44:09.985076050
1310719   3 days 23:44:09.985075890
Name: ABSOLUTE_TIME, Length: 1310720, dtype: timedelta64[ns]

In [27]:
df

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,ABSOLUTE_TIME
0,1,0,123,3869200167,2374,26,344649004934896.6875
1,1,0,124,3869200167,2374,27,344649004934897.5000
2,1,0,63,3869200167,2553,28,344649004939373.3125
3,1,0,64,3869200167,2558,19,344649004939490.8125
4,1,0,64,3869200167,2760,25,344649004944545.8125
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,344649985075886.6875
1310716,1,1,4,3869211171,763,11,344649985075909.1875
1310717,1,0,64,3869211171,764,0,344649985075925.0000
1310718,1,0,139,3869211171,769,0,344649985076050.0000


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [30]:
df.groupby(['TDC_CHANNEL','FPGA']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,HEAD,ORBIT_CNT,BX_COUNTER,TDC_MEAS,ABSOLUTE_TIME
TDC_CHANNEL,FPGA,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0,1215,4701085531941,2144401,17205,4.187491938112689e+17
1,1,28438,110032486229838,50854862,413328,9.801143712194535e+18
2,0,1602,6198468500264,2823572,23878,5.521285817316250e+17
2,1,32669,126403097729581,58194220,475221,1.125935593171768e+19
3,0,1493,5776724712057,2695019,21800,5.145617537938709e+17
...,...,...,...,...,...,...
137,1,36,139291431693,65003,504,1.240738427967947e+16
138,0,34,131553026550,63314,276,1.171808584152433e+16
138,1,36,139291431693,65003,180,1.240738427967920e+16
139,0,75617,292577762508697,134482540,0,2.606136419882425e+19


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.