# Objective 7. Classify the point cloud using the parametrized points.
Now that we have all these extra parameters to describe each point we
can use this to classify. Classiﬁcation can be done by a range of machine
learning algorithms. A good python library for machine learning algorithms
is called scikit-learn (comes already installed with Anaconda). This library
contains functions for Support Vector Machines (SVM) and Random Forests
(RF) and many more.
To use a machine learning algorithm you need training and testing data to
feed the machine. This can be pretty labour intensive to get. CloudCompare
can be a good tool for this.

## Assignment 21. Use CloudCompare to manually segment a part of your point cloud into diﬀerent classes (for example buildings, vegetation, ground,water, etc..). 

Tip: Use the segment tool to segment points belonging to a
class from the main point cloud. Use the merge tool to merge all segments of
a class together. Export the resulting point clouds separately per class.

## Assignment 22. Import the class point clouds into python and merge them
into one pandas DataFrame with a ‘class’ column containing the designated
classes.
We are going to use a random forest for classiﬁcation. A random forest is
an algorithm that creates a range of decision trees based on random subsets
of the data. These decision trees will all classify and the class that gets the
most votes from the trees is used as the ﬁnal classiﬁcation.

In [1]:
# -*- coding: utf-8 -*-
"""

@author: Chris Lucas
"""

import pandas as pd
from clf_preprocessing import merge_dataframes, correlated_features
from clf_classifiers import BalancedRandomForest, classify_vegetation
from clf_assessment import (grid_search, cross_validation,
                            mean_decrease_impurity)

In [None]:
# %% Load ground truth data
print "loading data.."
classes = ['veg', 'non_veg']
veg_pc = pd.read_csv('../Data/C_39CN1_veg.csv', delimiter=';', header=0)
non_veg_pc = pd.read_csv('../Data/C_39CN1_nonveg.csv', delimiter=';', header=0)
data = merge_dataframes({'veg': veg_pc, 'non_veg': non_veg_pc}, 'class')
data.rename(columns={'//X': 'X'}, inplace=True)
data.rename(columns=lambda x: x.replace(',', '_'), inplace=True)
del veg_pc, non_veg_pc
class_cat, class_indexer = pd.factorize(data['class'])
data['class_cat'] = class_cat

# %% Define the feature space
features = data.columns.drop(['class', 'class_cat', 'X', 'Y', 'Z',
                              'norm_x_50', 'norm_y_50', 'return_number'],
                             'ignore')
features = features.drop(correlated_features(data, features, corr_th=0.98))

# %% GridSearch (Cross Validated)
param_dict = {'min_samples_leaf': [5, 10],
              'min_samples_split': [5, 10],
              'ratio': [0.15, 0.1, 0.05]}

gs_scores, param_grid = grid_search(data, features, 'class_cat', param_dict)

# %% Cross Validation
cv_scores, conf_matrices = cross_validation(data, features, 'class_cat')

# %% Load all data
point_cloud = pd.read_csv("../Data/C_39CN1_ResearchArea_params.csv",
                          delimiter=',', header=0)

loading data..








Done 1 of 36..






Assignment 23. Read up on random forests in python, for example on
http:// blog.yhat.com/ posts/ random-forests-in-python.html

Assignment 24. Train a random forest classiﬁer (from scikit-learn) with
train data and use it to classify test data.

In [None]:
# %% Create final classifier
clf = BalancedRandomForest(n_estimators=1000, min_samples_leaf=5,
                           min_samples_split=5, ratio=0.2)
clf.fit(data[features], data['class_cat'])

# %% Assess feature importances
fi_scores = mean_decrease_impurity(clf, features)

Assignment 27. Make a classiﬁer you are satisﬁed with and use it to classify
the entire point cloud.

In [None]:
# %% Classify vegetation / non-vegetation
classification = []
parts = 8
part = len(point_cloud)/parts
for i in xrange(parts):
    if i == parts-1:
        temp_pc = point_cloud.loc[point_cloud.index[i*part:]]
    else:
        temp_pc = point_cloud.loc[point_cloud.index[i*part:(i+1)*part]]
    preds = clf.predict(temp_pc[features])
    classification.extend(list(preds))

point_cloud['class'] = classification

# %% Classify trees / low vegetation
points = point_cloud.loc[point_cloud['class'] == 1].as_matrix(columns=['X', 'Y', 'Z'])
radius = 2.0
tree_th = 4.0
classification = classify_vegetation(points, radius, tree_th)
point_cloud['veg_class'] = 'non_veg'
point_cloud.loc[point_cloud['class'] == 1, 'veg_class'] = classification
point_cloud['class'], _ = pd.factorize(point_cloud['veg_class'])

## Assignment 28. Export the point cloud and visualize the classiﬁcation
(in CloudCompare, ArcGIS or some other software). Explore the results and
visually check if the classiﬁcation results seem good.

In [None]:
# %% Save results
point_cloud.to_csv('../Data/veg_classification.csv',
                   columns=['X', 'Y', 'Z', 'class'], index=False)