# Introduction

<span style="font-size: 1.1em;">
This is a guided walkthrough of how to resample your "testing" / input data to determine if it is likely that your data could be misclassified. A lot of the steps in this one will be very similar to the previous notebooks. The only real big difference is the resampling of data.

### Setting Up and Running the Classifier

These first few steps will be drawn straight from the fire_svm notebook. If you want to know more about each step you can refer to that notebook.

In [None]:
#Go to the right directory
cd /Users/RichardP/research/icyfire/py

#Import the packages
import numpy as np

import fire_data as dat
import fire_svm as clf
import fire_model as model
import fire_cv as cv

#Read out the data
file_read = dat.file_read('Insert path to a sage file here')  
data = dat.data_to_pytorch(file_read.data)
name_labels = {}
counter = 1
for i in data.name_unique:
    name_labels[i] = counter
    counter += 1
data.relabelling(name_labels)

def multistep(actual):
    mask = np.where(actual == 3)
    acc = np.copy(actual)
    acc[mask] = 2
    return acc
def deletion(data, actual):
    mask = np.where(actual == 1)
    data = np.delete(data, mask, axis = 0)
    acc = np.copy(actual)
    acc = np.delete(acc, mask)
    return data, acc

#Data separation and randomization
label_carb = multistep(data.label)
spectra_oxy, label_oxy = deletion(data.spectra, data.label)
training_carb, testing_carb, train_carb, test_carb, = data.randomization(label_carb, data.spectra, 90)
training_oxy, testing_oxy, train_oxy, test_oxy, = data.randomization(label_oxy, spectra_oxy, 90)

#Generating our classifiers
fire_carb = clf.svm_network(
    training_carb['x'], training_carb['y'], 
    testing_carb['x'], testing_carb['y'], 
    c= 1, gamma = 0.01, kernel = 'rbf')
fire_oxy = clf.svm_network(
    training_oxy['x'], training_oxy['y'], 
    testing_oxy['x'], testing_oxy['y'], 
    c=2600, gamma = 0.0001, kernel = 'rbf')

### Resampling

<span style="font-size: 1.1em;">
With our classifier set up we will start to resample the data. This involves using numpy.random.normal to take the actual flux value and the dflux value to generate a resampled flux value at each wavelength.

In [None]:
uncertainties = unc.data_uncertainty(file_read.data, test_carb)
objects, fluxxing = uncertainties.gen_spec("Insert # of resamples for specific point")

<span style="font-size: 1.1em;">
Next we want to input our resampled data into the classifier.
We will iterate over every source that is being tested. 

Using our index we will plot 2 graphs here. The first graph is a bar chart indicating the number of counts for each type of prediction. The second graph is a histogram of the distribution of residuals (Resampled spectra - Actual spectra) for each prediction

We will now look at the differences in spectra for each prediction. We take the mean value of each wavelength and plot the error which is the standard deviation of the residual value from earlier before. 

Lastly we want to look at the distribution of where the error lies in comparison to other spectra. We plot a histogram distribution of the median residual values for each source (top graph) for each type of prediction. The bottom is a similar distribution, but instead it contains the counts of each spectra rather than for the whole source

In [None]:
predict = {}
mask = {}
for i in sorted(objects):
    print('i')
    predict[i], mask[i] = uncertainties.predicting(fire_carb.clf, objects, i)
    uncertainties.plot_hist(predict, mask, fluxxing, i )
    uncertainties.plot_resid(objects, fluxxing, mask, i )
    uncertainties.plot_meds(fluxxing, mask, i)

We have all of our graphs for each source. Most sources will be fairly "boring" to look at, but some sources have very varied predictions which are of more importance. Some sources have fairly noisy spectra and so those are also some sources to take note of. 