# Introduction

<span style="font-size: 1.1em;">
This is guided a walkthrough of how to use the support vector machine (SVM) classifier. Our classifier is trained to classify stars as OAGB, CAGB, and RSG. We have used preexisting data that has already been classified to train our machine. This only details how to run the classifier without optimizing and without seeing the accuracy of the classifier. See fire_crossvalidation for optimizing parameters and accuracy. See fire_uncertainty to see how to resample the spectra.

### Setting Up and Running the Classifier

First you will need to cd to the directory with the code and import these packages

In [None]:
cd /Users/RichardP/research/icyfire/py

In [None]:
import fire_data as dat
import fire_svm as clf
import fire_model as model
import fire_org as org

Next we will be training our classifier with this SAGE data and reading out the data into our variables. Our data has 242 points and we want to use all the points (100%) of the data. 

In [None]:
file_read = dat.file_read('Insert path to one of the sage files')  
data = dat.data_to_pytorch(file_read.data)

Most data sets will have more than 3 labels so you'll want to rename each object to a number. The way this line works is that it creates a dict and sets each type of object to a certain number. For example, in our case it was CAGB = 1, OAGB = 2, etc. What is also helpful is that you get a dict key to each number so you can keep track of which object is what number, etc.

In [None]:
name_labels = {}
counter = 1
for i in data.name_unique:
    name_labels[i] = counter
    counter += 1
data.relabelling(name_labels)
sorted(sorted(data.__dict__))
print(name_labels)

### Normal Classification

<span style="font-size: 1.1em;">
Now that we have all our data we will input our data into our classifier and train it. There are 3 parameters, C, gamma, and kernel, that you can change. For more information about those three see, http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html. We want to randomize the training and test data set here. We select 100% of the entire data for this reason. We also classify all the objects at the same time. There is another option of doing multi-step classification below

In [None]:
training, testing, train, test = data.randomization(data.label, data.spectra, 100)

Next we want to classify some input data so. If you happen to have the actual classification of the objects then you can add it as an input.  <br /> Note: The data have to be inputed as Numpy arrays where the rows are different objects and the columns are the flux features of the object.

In [None]:
fire = clf.svm_network(
    data.training_x, data.training_y, 
    "Insert your data here", testing_y = None,  
    c=868, gamma = 0.0001, kernel = 'rbf')

All the outputs are stored in these variables below

In [None]:
print(sorted(classified.__dict__))

### Multi-Step Classification

<span style="font-size: 1.1em;">
If you have many classifications sometimes the classifier would be better off only trying to predict a couple of them at the same time. Theres too much information and too much room for error because of possible overlap. One way to avoid this is by doing multi-step classification

<span style="font-size: 1.1em;">
Multi-step classification means you separate the data into subsets. For example, in our SAGE-Spec data we separated the Carbon stars from non-Carbon stars and classified them first and then took the non-Carbon stars (OAGB and RSG) and classified them affter. This dumbs the classifier down to only classifying 2 different things. 

#### Isolation

This step is when you want to take your whole data and recategorize them to isolate different "groupings" . For example, in our case we categorized OAGB and RSG together so that the classifier would predict either carbon stars or non-carbon stars. After you've recategorized the objects the rest is the same as above.

In [None]:
label_carb = org.multistep(data.label)

In [None]:
training_carb, testing_carb, train_carb, test_carb, = data.randomization(label_carb, data.spectra, 100)

In [None]:
fire_carb = clf.svm_network(
    training_carb['x'], training_carb['y'], 
    testing_carb['x'], testing_carb['y'], 
    c= 1, gamma = 0.01, kernel = 'rbf')

#### Deletion

So now that you've isolated your data you want to classify your subset (or maybe you want to make another subset). To do this you only want to use the data that you're concerned with so here we will "delete" the other data for now.

In [None]:
spectra_oxy, label_oxy = org.deletion(data.spectra, data.label)

The deletion will retain all original labels and from here you just repeat the same steps as above

In [None]:
training_oxy, testing_oxy, train_oxy, test_oxy, = data.randomization(label_oxy, spectra_oxy, 90)
fire_oxy = clf.svm_network(
    training_oxy['x'], training_oxy['y'], 
    testing_oxy['x'], testing_oxy['y'], 
    c=2600 ,gamma = 0.0001, kernel = 'rbf')