1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [3]:
import pandas as pd # standard naming convention
import numpy as np

In [4]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [5]:
data = pd.read_csv("./data/data_000637.txt")
#first way to create Pandas DataFrame
DT = pd.DataFrame(data).iloc[:100000]
print(DT)
#second way to create Pandas DataFrame
data1 = pd.DataFrame(data, index = range(100000))
print(data1)


       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0         1     0          123  3869200167        2374        26
1         1     0          124  3869200167        2374        27
2         1     0           63  3869200167        2553        28
3         1     0           64  3869200167        2558        19
4         1     0           64  3869200167        2760        25
...     ...   ...          ...         ...         ...       ...
99995     1     0           64  3869201161        2378        29
99996     1     0           70  3869201161        2472        26
99997     1     0           58  3869201161        2558         0
99998     1     0           57  3869201161        2561        23
99999     1     0           56  3869201161        2565        12

[100000 rows x 6 columns]
       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0         1     0          123  3869200167        2374        26
1         1     0          124  3869200167        2374        2

2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [78]:
#The Value of X is the maximum value of BX_COUNTER before being reset to 0
X = max(DT['BX_COUNTER'])
print('The Number of BX in an ORBIT =', X)

The Number of BX in an ORBIT = 3563


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [79]:
DT['ABS_TIME'] = ((DT['ORBIT_CNT']*X + DT['BX_COUNTER']*25 + DT['TDC_MEAS']*(25/30))/3600)*(10**-9) #As per the given Data
#Ask tutor for the help
DT

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,ABS_TIME
0,1,0,123,3869200167,2374,26,3.829433
1,1,0,124,3869200167,2374,27,3.829433
2,1,0,63,3869200167,2553,28,3.829433
3,1,0,64,3869200167,2558,19,3.829433
4,1,0,64,3869200167,2760,25,3.829433
...,...,...,...,...,...,...,...
99995,1,0,64,3869201161,2378,29,3.829434
99996,1,0,70,3869201161,2472,26,3.829434
99997,1,0,58,3869201161,2558,0,3.829434
99998,1,0,57,3869201161,2561,23,3.829434


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [8]:
import datetime as dt

init_time = dt.datetime.now() #Recording the Initial Time

absTime = ((data['ORBIT_CNT']*X + data['BX_COUNTER']*25 + data['TDC_MEAS']*(25/30))/3600)*(10**-9)

final_time = dt.datetime.now() #Recoding the final time after the operation

time_taken = final_time-init_time #Total time taken for the operation

print("The time taken to read the whole dataset", time_taken) #print the time_taken





The time taken to read the whole dataset 0:00:00.045036


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [80]:
A = data.groupby(['TDC_CHANNEL']).count()
print(A)
A.sort_values(by='TDC_CHANNEL', ascending=False).iloc[:3]

               HEAD    FPGA  ORBIT_CNT  BX_COUNTER  TDC_MEAS  ABS_TIME
TDC_CHANNEL                                                           
1             29653   29653      29653       29653     29653     29653
2             34271   34271      34271       34271     34271     34271
3             23463   23463      23463       23463     23463     23463
4             28755   28755      28755       28755     28755     28755
5             16435   16435      16435       16435     16435     16435
...             ...     ...        ...         ...       ...       ...
129              37      37         37          37        37        37
130              71      71         71          71        71        71
137              68      68         68          68        68        68
138              70      70         70          70        70        70
139          108059  108059     108059      108059    108059    108059

[133 rows x 6 columns]


Unnamed: 0_level_0,HEAD,FPGA,ORBIT_CNT,BX_COUNTER,TDC_MEAS,ABS_TIME
TDC_CHANNEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
139,108059,108059,108059,108059,108059,108059
138,70,70,70,70,70,70
137,68,68,68,68,68,68


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [10]:
#To Count the Number of Non-Empty Orbits

Count = data.groupby(['ORBIT_CNT']).count()
print(Count)

            HEAD  FPGA  TDC_CHANNEL  BX_COUNTER  TDC_MEAS  ABS_TIME
ORBIT_CNT                                                          
3869200167    43    43           43          43        43        43
3869200168    85    85           85          85        85        85
3869200169   127   127          127         127       127       127
3869200170    98    98           98          98        98        98
3869200171   109   109          109         109       109       109
...          ...   ...          ...         ...       ...       ...
3869211167   208   208          208         208       208       208
3869211168   109   109          109         109       109       109
3869211169   191   191          191         191       191       191
3869211170   137   137          137         137       137       137
3869211171    22    22           22          22        22        22

[11001 rows x 6 columns]


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [93]:

print('The length of unique orbits  =', len(data.loc[data['TDC_CHANNEL'] == 139].groupby(['ORBIT_CNT']).count()))


The length of unique orbits  = 10976


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.