Maching Learning Final Projection

Cole Kampa

Jacob McLaughlin

Zhiheng Sheng

# Introduction
Bubble chambers used in dark matter searches need to be able to count the number of bubbles produced by an event to achieve necessary levels of background discrimination. For the prototype scintillating bubble chamber (SBC) at Northwestern this counting was done by hand. Our objective is then to train an algorithm to perform this task automatically while maintaining a high degree of accuracy.

The input to the algorithm will be images take from the time when the chamber's data acquisition system triggered and took photos of the chamber. An example trigger image is displayed below
<img src="TriggerImageExample.png" style="width:250px"/>

This image is 2 different views of the same jar. The left slanted jar is a rotated frontal view while the right verical jar is an unroated side view. However, this image contains lots of irrelavant information about portions of the detector that do not ever change. Therefore, we take the difference between this image and the first frame of an event (taken a fixed amount of time before the trigger), which for the above event yields
<img src="DiffFirstEx.png" style="width:250px"/>

An image like this is produced for every event, and we perform supervised learning on extracted features and evaluate the performance of various classifiers.

This file will merely highlight important sections of code. The full code set is available here https://github.com/shengzhiheng/MachineLearningProject

In [5]:
import numpy as np

# Key Feature Extraction steps

In [2]:
class BubbleEvent:
    def __init__(self, File):
        #temp pixel arrays and event level meta data
        self.FileName = File
        Bot1PixelArray, Bot2PixelArray = BubbleEvent.GetPixelArray(self.FileName) #gets 2d array of pixel intensities
        self.Date, self.Run, self.EventID = BubbleEvent.GetRunInfo(self.FileName) #parses image name to get event info
        self.BubbleCount = 0
        #actual features to use to classify
        self.UsefulEdgeFeature0, self.UsefulEdgeFeature1, self.UsefulEdgeFeature2 = (GetEdgeFeature(
                                        DownSampleTheArray(2, Bot1PixelArray)) + 
                                        GetEdgeFeature(DownSampleTheArray(2, Bot2PixelArray))) #edge detect. sum
        self.UsefulBlobFeature = np.std(GetBlobs(Bot1PixelArray)) + np.std(GetBlobs(Bot2PixelArray)) #blob convalution deviation
        self.CountBlobPeakFeature = GetPeaks(Bot1PixelArray) + GetPeaks(Bot2PixelArray)
    def GetPixelArray(FileName):
        im = Image.open(FileName)
        PixelArray = np.asarray(im)
        Cutout = Cutout2D(PixelArray, (530,140), 235) #just cut out the parts of the image with bottles
        Bot1PixelArray = Cutout.data
        PixelArray =ndimage.rotate(PixelArray, -45)
        Cutout2 = Cutout2D(PixelArray, (270,310), 235) #other bottle view
        Bot2PixelArray = Cutout2.data
        return Bot1PixelArray, Bot2PixelArray
    def GetRunInfo(File):
        Date = int(File.split("/")[-1].split("_")[0]) #file should be date_run_event
        Run = int(File.split("/")[-1].split("_")[1])
        Event = int("{}{}{}".format(Date, Run,File.split("/")[-1].split("_")[2])) 
        return Date, Run, Event

Important features were extacted from the difference images and stored as BubbleEvent objects. For each bubble event the diff image file is provided as an input and 2 pixel arrays are extracted.

The first array is a cropped version of the full diff image that includes the verical view of the bubble chamber jar. For the event shown in the introduction this yields
<img src="View1Crop.png" style="width:200px"/>

The second array is a cropped view of the slanted view of the jar . For the event shown in the introduction this yields
<img src="View2Crop.png" style="width:200px"/>
These choices of cropping were chossen to highlight the area where bubbles may actually form.

These pixel arrays are then convaluted with a verical and horizontal edge detection kernel (the sobel kernal). The number of pixels with significant horizontal edges are stored in UsefulEdgeFeature0, vertical edges in UsefulEdgeFeature1, and the number pixels with significant horizontal and vertical edges are stored in UsefulEdgeFeature2. The results from each view of the jar are added together.

Next each view is fed to a blob detection kernal, which is the product of a laplacian kernel and a gaussian blur kernel (so a LOG kernel). The output of this kernel helps to emphasize bubbles in the views as below 
<img src="Bubble.png" style="width:200px"/>
For events with many bubbles you have more high pixel regions, so the standard deviation of the convovled image pixel value helps to tell you about how many bubbles are present. This value is stored in UsefulBlobFeature, again adding the contributions of each jar view.


The final step on each image is to perform a peak detection on the ouptu of the LOG kernel. The resuling image is scanned in a 15x15 box centered on a pixel of interest. Pixels that are both the maximum of the box centered on themsleves and sufficiently higher in intensity than the average pixel in that image will increase the peak count by 1. The peaks found (and their location) for an example event are displayed below.
<img src="Bubble.png" style="width:200px"/>
The result of peak detection is then stored in CountBlobPeakFeature, again adding the contributions of each jar view together.


These features were collected for every event in our dataset. This process takes ~0.8 seconds per image, so for all ~25000 images it takes ~20000 seconds or 5 hours on my laptop. To avoid this time sink, the results were compiled into an array which was then pickled (Events.p in the github). The classifiers then load this pickle to quickly access all the important features. 


A full overview of the functions not highlighted here is availabe in the github file BubbleClassification.ipynb.

# Perceptron Stuff

# Neural Net stuff