From bf40e1e8a95b08e872b1fef815a27cfb66e74fc0 Mon Sep 17 00:00:00 2001 From: Kumar Shridhar Date: Tue, 8 Jan 2019 15:31:27 +0530 Subject: [PATCH] Updates from ShareLaTeX --- Abstract/abstract.tex | 2 +- Chapter2/chapter2.tex | 7 ++++++- Chapter6/chapter6.tex | 4 ++-- References/references.bib | 8 ++++++++ 4 files changed, 17 insertions(+), 4 deletions(-) diff --git a/Abstract/abstract.tex b/Abstract/abstract.tex index 8125557..edcc856 100644 --- a/Abstract/abstract.tex +++ b/Abstract/abstract.tex @@ -12,7 +12,7 @@ \newline -In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, CIFAR-100 and STL-10 datasets. Moreover, uncertainties are calculated and the architecture is pruned and a comparison between the results is drawn. +In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, and CIFAR-100 datasets. Moreover, uncertainties are calculated and the architecture is pruned and a comparison between the results is drawn. In the second part of the thesis, the concept is further applied to other computer vision tasks namely, Image Super-Resolution and Generative Adversarial Networks. The concept of BayesCNN is tested and compared against other concepts in a similar domain. diff --git a/Chapter2/chapter2.tex b/Chapter2/chapter2.tex index ffecb99..f0ad61f 100644 --- a/Chapter2/chapter2.tex +++ b/Chapter2/chapter2.tex @@ -215,6 +215,11 @@ \section{Model Weights Pruning} efficient inference using these sparse models requires purpose-built hardware capable of loading sparse matrices and/or performing sparse matrix-vector operations. Thus the overall memory usage is reduced with the new pruned model. -There are several ways of achieving the pruned model, the most popular one is to map the low contributing weights to zero and reducing the number of overall non-zero valued weights. This can be achieved by training a large sparse model and pruning it further which makes it comparable to training a small dense model. Pruning away the less salient features to zero has been used in this thesis and is explained in details in Chapter 4. +There are several ways of achieving the pruned model, the most popular one is to map the low contributing weights to zero and reducing the number of overall non-zero valued weights. This can be achieved by training a large sparse model and pruning it further which makes it comparable to training a small dense model. + +Assigning weights zero to most features and non-zero weights to only important features can be formalized by applying the $L_0$ norm, where $L_0 = ||\theta||_0 = \sum{_j} \delta (\theta_j \neq 0)$, and it applies a constant penalty to all non-zero weights. +$L_0$ norm can be thought of a feature selector norm that only assigns non-zero values to feature that are important. However, the $L_0$ norm is non-convex and hence, non-differentiable that makes it a NP-hard problem and can be only efficiently solved when $P = NP$. +The alternative that we use in our work is the $L_1$ norm, which is equal to the sum of the absolute weight values, $||\theta||_1 = \sum_j |\theta_j|$. $L_1$ norm is convex and hence differentiable and can be used as an approximation to $L_0$ norm \cite{tibshirani1996regression}. $L_1$ norm works as a sparsity inducing regularizer by making large number of coefficients equal to zero, working as a great feature selector in our case. Only thing to keep in mind is that the $L_1$ norm do not have a gradient at $\theta_j = 0$ and we need to keep that in mind. +Pruning away the less salient features to zero has been used in this thesis and is explained in details in Chapter 4. diff --git a/Chapter6/chapter6.tex b/Chapter6/chapter6.tex index 1d0c08a..b74314f 100644 --- a/Chapter6/chapter6.tex +++ b/Chapter6/chapter6.tex @@ -83,9 +83,9 @@ \subsection{Empirical Analysis} \begin{figure}[H] \begin{center} -\includegraphics{Chapter6/Figs/camel_LR.png} +\includegraphics[height=.38\textheight]{Chapter6/Figs/camel_SR.png} \label{fig:CamelSR} -\caption{Generated Super Resolution Image} +\caption{Generated Super Resolution Image scaled to 40 percent to fit} \end{center} \end{figure} diff --git a/References/references.bib b/References/references.bib index 8b6d9d0..8bc96e1 100644 --- a/References/references.bib +++ b/References/references.bib @@ -603,4 +603,12 @@ @article{hafner2018reliable journal={arXiv preprint arXiv:1807.09289}, year={2018} } +@article{tibshirani1996regression, + title={Regression shrinkage and selection via the lasso}, + author={Tibshirani, Robert}, + journal={Journal of the Royal Statistical Society. Series B (Methodological)}, + pages={267--288}, + year={1996}, + publisher={JSTOR} +} \ No newline at end of file