## Pandas analysis

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a couple of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every 'x' BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

1\. Create a Pandas DataFrame reading N rows of the 'data_000637.txt' dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k.

2\. Find out the number of BX in a ORBIT (the value 'x').

3\. Find out how much the data taking lasted. You can either make an estimate based on the fraction of the measurements (rows) you read, or perform this check precisely by reading out the whole dataset.

4\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information).

5\. Replace the values (all 1) of the HEAD column randomly with 0 or 1.

6\. Create a new DataFrame that contains only the rows with HEAD=1.

7\. Make two occupancy plots (one for each FPGA), i.e. plot the number of counts per TDC channel

8\. Use the groupby method to find out the noisy channels, i.e. the TDC channels with most counts (say the top 3)

9\. Count the number of unique orbits. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139

In [43]:
# If you didn't download it yet, please get the relevant file now!
!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ~/data/

--2020-11-21 12:58:48--  https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt
Risoluzione di www.dropbox.com (www.dropbox.com)... 2620:100:6025:1::a27d:4501, 162.125.69.1
Connessione a www.dropbox.com (www.dropbox.com)|2620:100:6025:1::a27d:4501|:443... connesso.
Richiesta HTTP inviata, in attesa di risposta... 301 Moved Permanently
Posizione: /s/raw/xvjzaxzz3ysphme/data_000637.txt [segue]
--2020-11-21 12:58:48--  https://www.dropbox.com/s/raw/xvjzaxzz3ysphme/data_000637.txt
Riutilizzo della connessione esistente a [www.dropbox.com]:443.
Richiesta HTTP inviata, in attesa di risposta... 302 Found
Posizione: https://uc5ac9cbf73c864c028727b16ed6.dl.dropboxusercontent.com/cd/0/inline/BDkDZ1ZwyqXe5HUfWyElQqxsXhCJcdet3VlGowoiACSgQwEDz0hHRMzflmHw1W2Nn0bWx3DKfO0wf1L1_k58xoaarY8v9PJgrUjFcgW-dC9qrigJ_MSzjkuusCh_J8hH51k/file# [segue]
--2020-11-21 12:58:48--  https://uc5ac9cbf73c864c028727b16ed6.dl.dropboxusercontent.com/cd/0/inline/BDkDZ1ZwyqXe5HUfWyElQqxsXhCJcdet3VlGowoiACSgQwEDz0hHRMzflmHw

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Create a Pandas DataFrame reading N rows of the 'data_000637.txt' dataset. 
#Choose N to be smaller than or equal to the maximum number of rows and larger that 10k.

N = 100000

data = pd.read_csv('/home/sabrina/data/data_000637.txt')
df = data.loc[0:N]
df



Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
99996,1,0,70,3869201161,2472,26
99997,1,0,58,3869201161,2558,0
99998,1,0,57,3869201161,2561,23
99999,1,0,56,3869201161,2565,12


In [3]:
#2. Find out the number of BX in a ORBIT (the value 'x').


#reach max value then it starts counting again 
print("The number of BX in a ORBIT is:", df['BX_COUNTER'].max()+1)




The number of BX in a ORBIT is: 3564


In [4]:
#3. Find out how much the data taking lasted. You can either make an estimate based on the fraction 
#of the measurements (rows) you read, or perform this check precisely by reading out the whole dataset.


#IT?S CORRECT
#OR

init = data.loc[0, ['ORBIT_CNT','BX_COUNTER','TDC_MEAS']]
end = data.loc[len(data)-1, ['ORBIT_CNT','BX_COUNTER','TDC_MEAS']]

init_ns=init[0]*3564*25+init[1]*25+init[2]
end_ns=end[0]*3564*25+end[1]*25+end[2]

#print(init)
#print(fin)
#print("\n")

#print(init_ns)
#print(end_ns)

print("The data taking lasted: ", end_ns-init_ns, "ns")
print("The data taking lasted: ", (end_ns-init_ns)*(1e-9), "sec")




The data taking lasted:  980416092 ns
The data taking lasted:  0.980416092 sec


In [7]:
#5. Replace the values (all 1) of the HEAD column randomly with 0 or 1.

df_random01=df.copy()
df_random01.loc[df_random01['HEAD'] == 1,'HEAD'] = df_random01['HEAD'].apply(lambda x: np.random.randint(0,2))
df_random01




Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,0,0,63,3869200167,2553,28
3,0,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
99996,1,0,70,3869201161,2472,26
99997,0,0,58,3869201161,2558,0
99998,1,0,57,3869201161,2561,23
99999,1,0,56,3869201161,2565,12


In [8]:
#4. Create a new column with the absolute time in ns
#(as a combination of the other three columns with timing information).


abs_time= data['ORBIT_CNT']*3564*25+data['BX_COUNTER']*25+data['TDC_MEAS']
data['ABS_TIME']=abs_time
data



Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,ABS_TIME
0,1,0,123,3869200167,2374,26,344745734939076
1,1,0,124,3869200167,2374,27,344745734939077
2,1,0,63,3869200167,2553,28,344745734943553
3,1,0,64,3869200167,2558,19,344745734943669
4,1,0,64,3869200167,2760,25,344745734948725
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,344746715355164
1310716,1,1,4,3869211171,763,11,344746715355186
1310717,1,0,64,3869211171,764,0,344746715355200
1310718,1,0,139,3869211171,769,0,344746715355325


In [9]:
#6. Create a new DataFrame that contains only the rows with HEAD=1.

df_head1=df_random01.copy()
df_head1 = df_random01.loc[df_random01['HEAD'] == 1]
df_head1

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
4,1,0,64,3869200167,2760,25
5,1,0,63,3869200167,2762,4
9,1,0,60,3869200167,2788,7
...,...,...,...,...,...,...
99989,1,1,106,3869201161,2291,19
99994,1,0,63,3869201161,2376,25
99996,1,0,70,3869201161,2472,26
99998,1,0,57,3869201161,2561,23


In [17]:
#7. Make two occupancy plots (one for each FPGA), i.e. plot the number of counts per TDC channel
import matplotlib.pyplot as plt

print(df[df["FPGA"] == 0])

print(df[df["FPGA"] == 1])

        HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0          1     0          123  3869200167        2374        26
1          1     0          124  3869200167        2374        27
2          1     0           63  3869200167        2553        28
3          1     0           64  3869200167        2558        19
4          1     0           64  3869200167        2760        25
...      ...   ...          ...         ...         ...       ...
99996      1     0           70  3869201161        2472        26
99997      1     0           58  3869201161        2558         0
99998      1     0           57  3869201161        2561        23
99999      1     0           56  3869201161        2565        12
100000     1     0           63  3869201161        3450        23

[70517 rows x 6 columns]
       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
10        1     1            7  3869200167        2785         4
12        1     1            6  3869200167        27

In [12]:
#8. Use the groupby method to find out the noisy channels, i.e. the TDC channels with most counts (say the top 3)
#contare quante volte appare ogni channel...

df.groupby(['TDC_CHANNEL']).sum()

Unnamed: 0_level_0,HEAD,FPGA,ORBIT_CNT,BX_COUNTER,TDC_MEAS
TDC_CHANNEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,2207,2116,8539325854261,3943441,31737
2,2553,2444,9878069296999,4577194,37206
3,1788,1670,6918130780014,3160654,25768
4,2162,2014,8365211850370,3899374,32023
5,1272,1151,4921623243900,2295462,18683
...,...,...,...,...,...
129,2,2,7738401741,2635,35
130,6,4,23215204676,6721,91
137,4,2,15476803482,5250,35
138,4,2,15476803482,5250,24


In [None]:
#9. Count the number of unique orbits. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139


