## Pandas analysis

In the following a series of exercises is proposed on a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a couple of FPGA's. Each measurement (i.e. each raw) consists of the address of the TDC providing the signal, 'FPGA' and 'TDC_Channel, and the timing information itself, 'ORBIT_CNT', 'BX_COUNTER' and 'TDC_MEAS'. Each TDC count correspond 25/30 ns, whereas the BX_COUNTER feauters gets updated every 25 ns and the ORBIT_CNT every 'x' BX_COUNTER. You can see these way of storing the time as similar to hours, minutes and seconds.

1\. Create a Pandas DataFrame by read N raws of the 'data_000637.txt' dataset. Choose N to be smaller than or equal to the maximum number of raws and larger that 10k.

2\. Find out the value of 'x'

3\. Find out how much the data taking lasted. You can either make an estimate on the baseis of the fraction of the measurements (raws) you read, or perform this check precisely by reading out the whole dataset

4\. Create a new column with the actual time in ns (as a combination of the other three columns with timing information)

5\. Replace the values (all 1) of the HEAD column randomly with 0 or 1

6\. Create a new DataFrame with only the raws with HEAD=1

7\. Make two occupancy plots (one per FPGA), i.e. plot the number of counts per TDC channel

8\. Use the groupby method to find out the noisy channels, i.e. the TDC channels with most counts (say the top 3)

9\. Count the number of unique orbits. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Ex. 1.
data = pd.read_csv('data_000637.csv')

N = (int(1e4) + 1)

data = data.head(N)

In [3]:
# Ex. 2.
orbit_unit = data['BX_COUNTER'].max()

# Ex. 3.
max_ORBIT_CNT = data['ORBIT_CNT'].max()

data_max_BX_ORBIT = data.loc[data['ORBIT_CNT'] == max_ORBIT_CNT]

max_BX_ORBIT = data_max_BX_ORBIT['BX_COUNTER'].max()

min_ORBIT_CNT = data['ORBIT_CNT'].min()

data_min_BX_ORBIT = data.loc[data['ORBIT_CNT'] == min_ORBIT_CNT]

min_BX_ORBIT = data_min_BX_ORBIT['BX_COUNTER'].min()

time_start = (min_ORBIT_CNT * orbit_unit + min_BX_ORBIT)

time_end = (max_ORBIT_CNT * orbit_unit + max_BX_ORBIT)

time_tot = time_end - time_start

time_tot_second = time_tot * 25 * 1e-9

print(time_tot)

354703


In [4]:
# Ex. 4.
BX_COUNTER = np.array(data['BX_COUNTER'])
ORBIT_CNT = np.array(data['ORBIT_CNT'])

time = (BX_COUNTER+ORBIT_CNT*orbit_unit)*25

data['TIME_NS'] = time

In [5]:
# Ex. 5.
import random

ran = np.array([random.randint(0, 1) for i in range(int(N))])

data['HEAD'] = ran

In [6]:
#Ex. 6.
data1 = data.loc[data['HEAD'] == 1]

In [7]:
#Ex. 7.

import matplotlib.pyplot as plt

fpga0 = data1.loc[data1['FPGA'] == 0]
fpga1 = data1.loc[data1['FPGA'] == 1]

plt.hist(fpga0['TDC_CHANNEL'],bins=int(np.sqrt(fpga0['TDC_CHANNEL'].shape[0])))
plt.xlabel("TDC_CHANNEL")
plt.ylabel("Counts")
plt.show()

plt.hist(fpga1['TDC_CHANNEL'],bins=int(np.sqrt(fpga1['TDC_CHANNEL'].shape[0])))
plt.xlabel("TDC_CHANNEL")
plt.ylabel("Counts")
plt.show()

<Figure size 640x480 with 1 Axes>

<Figure size 640x480 with 1 Axes>

In [8]:
# Ex. 8.

print(data[data['FPGA'] == 0].groupby(['TDC_CHANNEL']).count()['FPGA'].nlargest(3))
print(data[data['FPGA'] == 1].groupby(['TDC_CHANNEL']).count()['FPGA'].nlargest(3))

TDC_CHANNEL
139    599
63     496
64     480
Name: FPGA, dtype: int64
TDC_CHANNEL
139    261
2      250
1      237
Name: FPGA, dtype: int64


In [9]:
# Ex. 9.

data.nunique()['ORBIT_CNT']

channel_139 = data.loc[data['TDC_CHANNEL'] == 139]

channel_139_zero = channel_139.loc[channel_139['TDC_MEAS'] != 0]

channel_139_zero.nunique()['ORBIT_CNT']

print("Number of unique orbits: ",channel_139.shape[0])
print("Number of unique orbits with at least one measurament: ",channel_139.shape[0])

Number of unique orbits:  860
Number of unique orbits with at least one measurament:  860
