In [5]:
import numpy as np
import pandas as pd
from IPython.display import display

# Questions


### 🟡Step 1: Read the data (tar.gz file)
As a first step, we unzipped the tar.gz file into a .dat file using 7-zip. 
Then, we convert the .dat file into a string and then into a DataFrame.

#### Method 1 
.strip() --> remove spaces on the sides

.split() --> separate values by spaces (otherwise we'd get a single conlumn)

In [2]:
# Convert .dat file into string (list comprehension)
datContent = [i.strip().split() for i in open("tth_semihad.dat").readlines()]

# Convert string into DataFrame
data = pd.DataFrame(datContent)

### 🟡Step 2: Explore the data
**Physics**

"The file was produced from a simulation of pp->tt~H where the top decays hadronically
and the anti-top decays leptonically. I selected events with exactly 1 fat jet with R=1.5."


**Notes**

- The rows represent events. 
- The first column represents the number of constituents. 
- The following columns represent the coordinates of the constituents, η, φ, pT, cycling in that order. (e.g. columns 1, 2, 3 are η, φ, pT for the 1st constituent, columns 4, 5, 6 are η, φ, pT for the 2nd constituent etc.)
- -infinity < η < infinity 
- -π < φ < π
- pT[GeV] > 0



In [4]:
# Display the data
data = data.rename(columns={0: 'Const'})
display(data)

# Print statements
print('There are {} events.'.format(data.shape[0]))
print('The maximum number of constituents in an event is {}.'.format((data.shape[1] - 1) // 3))

# Display data types
print('\nData Types: \n', data.dtypes)

data.describe()

Unnamed: 0,Const,1,2,3,4,5,6,7,8,9,...,99,100,101,102,103,104,105,106,107,108
0,4,2.30474,0.221042,78.9436,1.00519,0.736657,61.9115,1.25546,0.748395,48.9755,...,,,,,,,,,,
1,2,2.35134,-2.18449,176.076,2.46233,-1.50073,47.3355,,,,...,,,,,,,,,,
2,6,0.492933,0.766876,51.5247,-0.984489,2.29985,13.7463,0.103217,1.40088,5.31666,...,,,,,,,,,,
3,10,-0.624329,0.566723,130.197,-0.602316,0.573666,38.5226,-0.541426,0.449072,15.3244,...,,,,,,,,,,
4,15,-0.538961,-0.617644,0.819517,0.527734,1.53319,1.94989,0.20174,0.916744,5.63418,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12172,9,0.920302,-1.16412,2.7334,1.29659,-0.802425,31.0219,1.43924,-0.331847,41.0458,...,,,,,,,,,,
12173,20,-3.37552,0.408371,1.10438,-2.20178,-0.384944,9.30608,-2.20033,-0.250145,69.2937,...,,,,,,,,,,
12174,10,1.39299,-0.378084,121.604,0.58147,-0.162943,6.92172,0.480303,-0.159881,15.9922,...,,,,,,,,,,
12175,9,1.72606,2.9924,2.48751,1.11057,2.81182,1.12,1.1923,2.66506,148.502,...,,,,,,,,,,


There are 12177 events.
The maximum number of constituents in an event is 36.

Data Types: 
 Const    object
1        object
2        object
3        object
4        object
          ...  
104      object
105      object
106      object
107      object
108      object
Length: 109, dtype: object


Unnamed: 0,Const,1,2,3,4,5,6,7,8,9,...,99,100,101,102,103,104,105,106,107,108
count,12177,12177.0,12177.0,12177.0,12157.0,12157.0,12157.0,12048.0,12048.0,12048.0,...,2.0,2.0,2.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0
unique,32,12129.0,12093.0,12133.0,12108.0,12069.0,12119.0,11996.0,11960.0,11989.0,...,2.0,2.0,2.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0
top,9,-1.71042,3.01345,2.35089,-1.08374,2.63078,18.8627,1.1263,-2.33466,1.51516,...,1012.02,-1.6808,-2.79566,45.2066,-0.516047,-0.378023,109.598,-0.506679,-0.357684,167.894
freq,1260,2.0,2.0,2.0,3.0,2.0,2.0,2.0,3.0,2.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### 🟡Step 3: Construct Average Jet Image
