From bf40e1e8a95b08e872b1fef815a27cfb66e74fc0 Mon Sep 17 00:00:00 2001
From: Kumar Shridhar <shridhar.stark@gmail.com>
Date: Tue, 8 Jan 2019 15:31:27 +0530
Subject: [PATCH] Updates from ShareLaTeX

---
 Abstract/abstract.tex     | 2 +-
 Chapter2/chapter2.tex     | 7 ++++++-
 Chapter6/chapter6.tex     | 4 ++--
 References/references.bib | 8 ++++++++
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/Abstract/abstract.tex b/Abstract/abstract.tex
index 8125557..edcc856 100644
--- a/Abstract/abstract.tex
+++ b/Abstract/abstract.tex
@@ -12,7 +12,7 @@
 \newline
 
 
-In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, CIFAR-100 and STL-10 datasets. Moreover, uncertainties are calculated and the architecture is pruned and a comparison between the results is drawn.
+In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, and CIFAR-100 datasets. Moreover, uncertainties are calculated and the architecture is pruned and a comparison between the results is drawn.
 
 In the second part of the thesis, the concept is further applied to other computer vision tasks namely, Image Super-Resolution and Generative Adversarial Networks. The concept of BayesCNN is tested and compared against other concepts in a similar domain.  
 
diff --git a/Chapter2/chapter2.tex b/Chapter2/chapter2.tex
index ffecb99..f0ad61f 100644
--- a/Chapter2/chapter2.tex
+++ b/Chapter2/chapter2.tex
@@ -215,6 +215,11 @@ \section{Model Weights Pruning}
 efficient inference using these sparse models requires purpose-built hardware capable of loading sparse matrices and/or performing sparse matrix-vector operations. Thus the overall memory usage is reduced with the new pruned model. 
 
 
-There are several ways of achieving the pruned model, the most popular one is to map the low contributing weights to zero and reducing the number of overall non-zero valued weights. This can be achieved by training a large sparse model and pruning it further which makes it comparable to training a small dense model. Pruning away the less salient features to zero has been used in this thesis and is explained in details in Chapter 4. 
+There are several ways of achieving the pruned model, the most popular one is to map the low contributing weights to zero and reducing the number of overall non-zero valued weights. This can be achieved by training a large sparse model and pruning it further which makes it comparable to training a small dense model.
+
+Assigning weights zero to most features and non-zero weights to only important features can be formalized by applying the $L_0$ norm, where $L_0 = ||\theta||_0 = \sum{_j} \delta (\theta_j \neq 0)$, and it applies a constant penalty to all non-zero weights. 
+$L_0$ norm can be thought of a feature selector norm that only assigns non-zero values to feature that are important. However, the $L_0$ norm is non-convex and hence, non-differentiable that makes it a NP-hard problem and can be only efficiently solved when $P = NP$.
+The alternative that we use in our work is the $L_1$ norm, which is equal to the sum of the absolute weight values, $||\theta||_1 = \sum_j |\theta_j|$. $L_1$ norm is convex and hence differentiable and can be used as an approximation to $L_0$ norm  \cite{tibshirani1996regression}. $L_1$ norm works as a sparsity inducing regularizer by making large number of coefficients equal to zero, working as a great feature selector in our case. Only thing to keep in mind is that the $L_1$ norm do not have a gradient at $\theta_j = 0$ and we need to keep that in mind. 
+Pruning away the less salient features to zero has been used in this thesis and is explained in details in Chapter 4. 
 
 
diff --git a/Chapter6/chapter6.tex b/Chapter6/chapter6.tex
index 1d0c08a..b74314f 100644
--- a/Chapter6/chapter6.tex
+++ b/Chapter6/chapter6.tex
@@ -83,9 +83,9 @@ \subsection{Empirical Analysis}
 
 \begin{figure}[H]
 \begin{center}
-\includegraphics{Chapter6/Figs/camel_LR.png}
+\includegraphics[height=.38\textheight]{Chapter6/Figs/camel_SR.png}
 \label{fig:CamelSR}
-\caption{Generated Super Resolution Image}
+\caption{Generated Super Resolution Image scaled to 40 percent to fit}
 \end{center}
 \end{figure}
 
diff --git a/References/references.bib b/References/references.bib
index 8b6d9d0..8bc96e1 100644
--- a/References/references.bib
+++ b/References/references.bib
@@ -603,4 +603,12 @@ @article{hafner2018reliable
   journal={arXiv preprint arXiv:1807.09289},
   year={2018}
 }
+@article{tibshirani1996regression,
+  title={Regression shrinkage and selection via the lasso},
+  author={Tibshirani, Robert},
+  journal={Journal of the Royal Statistical Society. Series B (Methodological)},
+  pages={267--288},
+  year={1996},
+  publisher={JSTOR}
+}
     
\ No newline at end of file