## Pandas analysis

In the following a series of exercises is proposed on a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a couple of FPGA's. Each measurement (i.e. each raw) consists of the address of the TDC providing the signal, 'FPGA' and 'TDC_Channel, and the timing information itself, 'ORBIT_CNT', 'BX_COUNTER' and 'TDC_MEAS'. Each TDC count correspond 25/30 ns, whereas the BX_COUNTER feauters gets updated every 25 ns and the ORBIT_CNT every 'x' BX_COUNTER. You can see these way of storing the time as similar to hours, minutes and seconds.

1\. Create a Pandas DataFrame by read N raws of the 'data_000637.txt' dataset. Choose N to be smaller than or equal to the maximum number of raws and larger that 10k.

2\. Find out the value of 'x'

3\. Find out how much the data taking lasted. You can either make an estimate on the baseis of the fraction of the measurements (raws) you read, or perform this check precisely by reading out the whole dataset

4\. Create a new column with the actual time in ns (as a combination of the other three columns with timing information)

5\. Replace the values (all 1) of the HEAD column randomly with 0 or 1

6\. Create a new DataFrame with only the raws with HEAD=1

7\. Make two occupancy plots (one per FPGA), i.e. plot the number of counts per TDC channel

8\. Use the groupby method to find out the noisy channels, i.e. the TDC channels with most counts (say the top 3)

9\. Count the number of unique orbits. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

file_name = '../../data/data_000637.txt'
data = pd.read_csv(file_name)

####### (1.) #######
# number of elements of the dataset to be processed
N = 20000
data = data[0:N]
#print('\n(1.) Data:\n', data)


####### (2.) #######
# value of x
x = np.max(data['BX_COUNTER'])
print('\n(2.) Value of x:', x)


####### (3.) - (4.) ########
data['TIMENS'] = data['TDC_MEAS']*25/30 + data['BX_COUNTER']*25 + (x*25)*data['ORBIT_CNT']
timens_min = np.min(data['TIMENS'])
timens_max = np.max(data['TIMENS'])
time_t = timens_max - timens_min
print('\n(3.) Time in seconds:', time_t*1e-9)
#print('\n(4.) Data:\n', data)


######## (5.) ########
rnd = np.random.randint( 0, 2, N)
data['HEAD'] = rnd


######## (6.) ########
data_head_1 = data[ data['HEAD'] == 1 ]


######## (7.) ########
FPGA_0 = data[ data['FPGA'] == 0 ]
FPGA_1 = data[ data['FPGA'] == 1 ]
occ_0 = FPGA_0['TDC_CHANNEL'].value_counts()
occ_1 = FPGA_1['TDC_CHANNEL'].value_counts()
#occ_0 = FPGA_0.groupby('TDC_CHANNEL').count()
#occ_1 = FPGA_1.groupby('TDC_CHANNEL').count()

# Plotting
%matplotlib inline
#plt.plot( np.array(occ_0.index), np.array(occ_0['HEAD']), 'k-', label='FPGA 0'  )
plt.scatter( np.array(occ_0.index), np.array(occ_0), marker='^', s=8, label='FPGA 0' )
plt.scatter( np.array(occ_1.index), np.array(occ_1), marker='.', s=8, label='FPGA 1' )
plt.xlabel('TDC_CHANNEL')
plt.ylabel('Number of counts')
plt.legend()


######## (8.) ########
num_noisy_ch = 3
noisy_ch = np.zeros( (num_noisy_ch,), dtype=int )
occur = data.groupby('TDC_CHANNEL').count()
for i in range( 1, num_noisy_ch+1 ):
    noisy_ch[i-1] = occur['HEAD'].idxmax()
    occur = occur.drop( occur['HEAD'].idxmax() ) 
print('\n(8.) Top 3 noisy channel:', noisy_ch)


######## (9.) ########
orbit_unique_count = data['ORBIT_CNT'].nunique()
tdc_139 = data[ data['TDC_CHANNEL'] == 139 ]
tdc139_orbit_unique_count = tdc_139['ORBIT_CNT'].nunique()
print( '\n(9.) Number of unique orbits:', orbit_unique_count )
print( '\n(9.) Number of unique orbits with TDC_CHANNEL = 139:' , tdc139_orbit_unique_count )