# Classification Under y-Axis Projection
## Environment
We set up our environment first.

In [1]:
import pandas as pd
pd.options.display.max_rows = 10
pd.options.display.max_columns = 29

import numpy as np

from sklearn import svm, metrics

## Initial Handling of Data
### Brief Overview of Method
A digit's y-axis projection may be more easily classified than the raw vector.  To test this, we'll transform the data before firing up some classification tools.

In [2]:
df = pd.read_csv('train.csv')

In [None]:
projy = pd.DataFrame(data = df['label'], columns=['label'])

y_levels = []
column_names = []
for i in range(0, 28):
    y_levels.append(df.columns.values.tolist()[28*i+1:28*i+29])
    column_names.append('proj'+str(i))
    
for i in range(0,28):
    projy[column_names[i]] = df[y_levels[i]].sum(axis=1)
    
projy

Unnamed: 0,label,proj0,proj1,proj2,proj3,proj4,proj5,proj6,proj7,proj8,proj9,proj10,proj11,proj12,proj13,proj14,proj15,proj16,proj17,proj18,proj19,proj20,proj21,proj22,proj23,proj24,proj25,proj26,proj27
0,1,0,0,0,0,537,787,801,801,801,857,1025,899,1053,800,939,877,800,801,800,802,821,947,974,527,0,0,0,0
1,0,0,0,0,0,673,2011,2781,3224,3257,2648,2015,1836,1778,1675,1891,1890,1783,1952,2217,2589,3431,2990,2540,1428,0,0,0,0
2,1,0,0,0,0,286,525,525,623,701,701,701,629,527,588,701,701,701,701,701,848,879,877,877,633,0,0,0,0
3,4,0,0,0,0,0,491,521,741,719,726,612,682,750,753,776,751,797,1639,2022,1377,345,345,317,345,316,0,0,0
4,0,0,0,0,0,1262,2251,2856,3133,3395,2807,2757,2655,2451,2163,2223,2127,2100,2202,2315,2676,3436,3602,2932,1750,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41995,0,0,0,0,0,0,793,2052,2219,2356,1814,1415,1262,1288,1246,1164,1246,1110,977,1118,1147,1446,1852,2791,1863,151,0,0,0
41996,1,0,0,0,0,0,383,725,725,725,725,760,846,846,903,904,850,778,725,725,696,570,483,379,362,305,0,0,0
41997,7,0,0,0,0,0,0,0,1594,2936,4144,3953,2999,1467,1212,957,892,1084,956,956,1084,1084,1020,1148,1084,1148,1021,765,0
41998,6,0,0,503,824,886,692,737,758,763,824,763,753,802,915,1680,2300,2121,1948,2283,2790,2331,1708,0,0,0,0,0,0


## Initial Training
We'll train on the first 90% of the data to see what kind of accuracy we can expect.

In [None]:
projy_data = projy.get_values()
training_labels = projy_data[:37800,0]
training_data = projy_data[:37800,1:]
classifier = svm.LinearSVC()
classifier.fit(training_data, training_labels)

In [None]:
test_labels = projy_data[37800:,0]
test_data = projy_data[37800:,1:]
predicted = classifier.predict(test_data)
print(metrics.classification_report(test_labels,predicted))

## Conclusion

The first attempt didn't yield great results.  That's okay.  I still think that y-projection is promising, but we'll need to massage the data some more.