# Code to generate tensors for 3DCNN Head-tail classification
We use a small sample of simulated He recoil events for demonstration purposes

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from numba import jit
import torch

### Read sample data

In [2]:
df = pd.read_feather("../data/sample.feather")

### Shift events to origin of chip
Standardize location of events. Column, row, and BCID are binned x, y, and z coordinates, respectively

In [3]:
df['col_shift'] = df['column'].apply(lambda x: x-x.min())
df['row_shift'] = df['row'].apply(lambda x: x-x.min())
df['BCID_shift'] = df['BCID'].apply(lambda x: x-x.min())

### Determine truth direction and truth cos(theta) and truth cos(phi)

We train the 3DCNN using truth cos(phi) as our labels. This means our classifier determines whether the track points in the +x or -x "hemisphere."

In [4]:
### truth_vec is the true direction of the recoil track and is determined using 
### simulation such as in my BEAST_TPC_Fast_Digitizer repo

vecs = []
zhat = np.array([0,0,1])
df['truth_costheta'] = df['truth_vec'].apply(lambda x: np.dot(x,zhat)/np.linalg.norm(x)) # (vec . zhat)/|vec| = cos(theta)
df['truth_cosphi'] = df['truth_vec'].apply(lambda x: np.cos(np.arctan2(x[1],x[0]))) # phi = arctan(y/x) of a vector

### Randomize order of events to be thorough

In [5]:
np.random.seed(1)
df = df.sample(frac=1)
df['original_index'] = df.index
df.index = [i for i in range(0,len(df))]

In [6]:
# TOT is a 4-bit code our detectors read out that represents a quantized charge scale.
# Possible TOT values range from 0 to 13, however TOT = 0 does not represent 0 charge, so we
# add 1 to our TOT, so that TOT = 0 is distinct from actual bins without charge

df['charge_new'] = df['tot'].apply(lambda x: x.astype('uint8'))+1 #Add 1 to 

## Make labels

We assign labels based on the phi hemisphere of the true primary recoil. Events with cos(phi) < 0 are labeled 1 and events with cos(phi) > 0 are labeled 0

In [7]:
df['label'] = 1
index = df.query('truth_cosphi > 0').index.to_numpy()
df['label'][index] = 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


# Create and save voxelgrids

We store our voxelgrids as sparse tensors to save diskspace. Our 3D tensor images are mostly filled with 0's so storing as sparase tensors may lead to a nearly 100-fold reduction in average filesize depending on the application at hand

In [8]:
def voxelize_sparse(df, dim = (22,110,22)):
    voxels = []
    for i in tqdm(range(0,len(df))):
        voxelgrid = np.zeros(dim) #treat voxel locations as indices
        for x, y, z,tot in zip(df['col_shift'].iloc[i], df['row_shift'].iloc[i], 
                               df['BCID_shift'].iloc[i], df['charge_new'].iloc[i]):
            try:
                voxelgrid[x][y][z] += tot
            except IndexError:
                continue
        voxelgrid = voxelgrid.astype('uint8')
        voxelgrid = torch.tensor(voxelgrid).to_sparse() #need to unsqueeze later
        torch.save((voxelgrid,df['label'].iloc[i]),'../tensors/%s.pt'%(i))
    return df

In [9]:
df = voxelize_sparse(df)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:06<00:00, 1618.94it/s]


### Save track info with new event ordering
We remove the raw hits to save space and analysis time. This is especially important with large samples of TPC data.

In [10]:
#All of the arrays in our dataframe represent individual pixel hits. These pixel hits are now
#contained in the saved tensors, so we don't need them in the dataframe we use for analysis after
#training and evaluating our 3DCNN

df[[col for col in df.columns if df[col].dtype != 'O']].to_feather('../data/sample_noHits.feather')