Skip to content

Commit

Permalink
A little report writing.
Browse files Browse the repository at this point in the history
Introduce tables in report with results from modifying the parameter C
after model has been selected. Misses the analysis. Introduce another
table showing the optimal values for C and gamma after model selection
by grid-search.
  • Loading branch information
Thomas Bracht Laumann Jespersen committed Mar 18, 2012
1 parent c6ede21 commit f601ee1
Show file tree
Hide file tree
Showing 6 changed files with 102 additions and 16 deletions.
2 changes: 1 addition & 1 deletion handin3/Code/freeBoundedSVs.eps

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions handin3/Code/regularization.m
@@ -0,0 +1,21 @@


data = loadknoll('knollC-train200.dt');

[C gamma] = modelselect(data);

model = train(data, C, gamma)
modelLarger = train(data, C*100, gamma)
modelSmaller = train(data, C/100, gamma)

[f b] = dividesupportvectors(C, model.SVs, model.sv_coef);
[fl bl] = dividesupportvectors(C, modelLarger.SVs, modelLarger.sv_coef);
[fs bs] = dividesupportvectors(C, modelSmaller.SVs, ...
modelSmaller.sv_coef);

disp(sprintf('Original: #SVs: %d\t#free SVs: %d\t#bounded SVs: %d', ...
length(model.SVs), length(f), length(b)));
disp(sprintf('C*100: #SVs: %d\t#free SVs: %d\t#bounded SVs: %d', ...
length(modelLarger.SVs), length(fl), length(bl)));
disp(sprintf('C/100: #SVs: %d\t#free SVs: %d\t#bounded SVs: %d', ...
length(modelSmaller.SVs), length(fs), length(bs)));
24 changes: 15 additions & 9 deletions handin3/Code/runsvm.m
Expand Up @@ -22,18 +22,27 @@
%% We now train our SVM on each dataset using the respective values
%% we found for C and gamma.

c100model=train(knollC100(:,1:2), knollC100(:,3), c100, gamma100);
c100model=train(knollC100, c100, gamma100);

c200model=train(knollC200(:,1:2), knollC200(:,3), c200, gamma200);
c200model=train(knollC200, c200, gamma200);

c400model=train(knollC400(:,1:2), knollC400(:,3), c400, gamma400);
c400model=train(knollC400, c400, gamma400);

%% And run all instances on themselves (and the others?) and the test data

%% TODO



%% Get number of free and bounded support vectors
[free100 bounded100] = dividesupportvectors(c100, c100model.SVs, c100model.sv_coef);
[free200 bounded200] = dividesupportvectors(c200, c200model.SVs, c200model.sv_coef);
[free400 bounded400] = dividesupportvectors(c400, c400model.SVs, c400model.sv_coef);

disp(sprintf('knollC100: Free SVs: %d Bounded SVs: %d', length(free100), length(bounded100)));
disp(sprintf('knollC200: Free SVs: %d Bounded SVs: %d', length(free200), length(bounded200)));
disp(sprintf('knollC400: Free SVs: %d Bounded SVs: %d', length(free400), length(bounded400)));

%% Visualizing the SVM solution

%% We want to plot the original knollC-train200 data
Expand All @@ -45,11 +54,8 @@
hold on;
plot(class2(:, 1), class2(:, 2), 'bx');

%% Get the free and bounded support vectors
[free bounded] = dividesupportvectors(c200, c200model.SVs, c200model.sv_coef);

%% And plot them: bounded SVs in green, free ones in black
plot(bounded(:,1), bounded(:,2), 'go');
plot(free(:,1), free(:,2), 'ko');
%% Plot SVs: bounded SVs in green, free ones in black
plot(bounded200(:,1), bounded200(:,2), 'go');
plot(free200(:,1), free200(:,2), 'ko');

print -dpsc freeBoundedSVs.eps;
4 changes: 2 additions & 2 deletions handin3/Code/train.m
@@ -1,7 +1,7 @@
function [ model ] = train( data, labels, c, gamma )
function [ model ] = train(knolldata, c, gamma )
%% Trains the SVM on the given data using the given parameters.
%% Returns libsvm model data which can then be used with svmpredict.
commandstring = sprintf('-s 0 -t 2 -g %d -c %d', gamma, c);
model = svmtrain(labels, data, commandstring);
model = svmtrain(knolldata(:,3), knolldata(:,1:2), commandstring);

end
Binary file modified handin3/handin3.pdf
Binary file not shown.
67 changes: 63 additions & 4 deletions handin3/handin3.tex
Expand Up @@ -70,16 +70,75 @@ \section{Neural Networks}

\section{Support Vector Machines}

For this part of the assignment we chose to use the LIBSVM software.

\subsection{Model Selection}
Description (we normalized the data, then used the builtin function of libsvm, tried these values for gamma: [])

Result: best parameters are:
We did grid search using the following values of $\gamma: \{ 0.0001, 0.001, 0.01, 0.1, 1, 10, 100 \}$. This choice is based on what?

LIBSVM has built-in functionality to perform $n$-fold cross validation
given a command line option. To perform model selection we iterate
through all combinations of $C$ and $\gamma$ and call a function
called \texttt{crossval}, which invokes LIBSVM to perform a 5-fold
cross validation on the current values of $C$ and $\gamma$. When
performing $n$-fold cross validation, LIBSVM returns the accuracy,
which we use to keep track of the configuration that gives the
highest accuracy.

%% Result: best parameters are:
%% C: 1000, gamma: 0.100000 Cross Validation Accuracy = 98%
%% C: 1000, gamma: 0.100000 Cross Validation Accuracy = 97.5%
%% C: 100, gamma: 1.000000 Cross Validation Accuracy = 97.5%
\begin{table}[!h]
\centering
\begin{tabular}{l | c | c | c }
\hfill & $C$ & $\gamma$ & Acc.\\\hline
\texttt{knollC-train100} & 1000 & 0.1 & 98\%\\
\texttt{knollC-train200} & 1000 & 0.1 & 97.5\%\\
\texttt{knollC-train400} & 100 & 1 & 97.5\%
\end{tabular}
\caption{Table of results for model selection using grid-search
showing the optimal values for $C$ and $\gamma$.}
\end{table}

Applied to the testdata, this gives the following results:

\begin{figure}
\includegraphics[width=\textwidth]{Code/freeBoundedSVs.eps}
\caption{\texttt{knollC-train200} trained SVM model. Bounded support vectors are circled in green and free support vectors are circled in black.}
\subsection{Inspecting the kernel expansion}

\subsubsection{Visualization}

Fig.~\ref{fig:freebounded} shows the plot of the \texttt{knollC-train200} data set, in which the support vectors are circled. The free support vectors are circled in black, and bounded are circled in green. There are 87 bounded support vectors, and just six free for a total of 93 support vectors.

\begin{figure}[!ht]
\centering
\includegraphics[width=.8\textwidth]{Code/freeBoundedSVs.eps}
\caption{\texttt{knollC-train200} data set with circled support vectors.}
\label{fig:freebounded}
\end{figure}

\subsubsection{Effect of the regularization parameter}

%%Retrain model on \texttt{knollC-train200} using values of $C$ that are 100 times larger and 100 times smaller than the $C*$ found during model selection. How does it change?

The file \texttt{regularization.m} performs the outlined procedure, by first training the SVM model using the values for $C$ and $\gamma$ found during model selection. Then it trains to other models, one in which $C$ is multiplied by a hundred and one in which we divide $C$ by 100.

The most notable change is in the number of support vectors. There's a total of 93 support vectors for the ``original'' value of $C$---87 of which are bounded. When $C$ is a hundred times larger, the number of support vectors drop to just 19, all of which are free. Conversely, when dividing $C$ by a hundred we get an increase in the number of support vectors to 199, but again all of them are free.

\subsubsection{Scaling behaviour}

Table of free and bounded

\begin{table}[h!]
\centering
\begin{tabular}{l | c | c}
\hfill & bounded & free\\\hline
\texttt{knollC-train100} & 5 & 60 \\
\texttt{knollC-train200} & 6 & 87 \\
\texttt{knollC-train400} & 12 & 153 \\
\end{tabular}
\caption{Table of bounded and free support vectors for the three data sets.}
\label{tab:knoll_free_bounded_SV}
\end{table}

\end{document}

0 comments on commit f601ee1

Please sign in to comment.