fixed bug in mnist demo

ml4a · Mar 11, 2016 · 3dff853 · 3dff853
1 parent 538c417
commit 3dff853
Show file tree

Hide file tree

Showing 24 changed files with 896 additions and 273 deletions.
diff --git a/Gemfile b/Gemfile
@@ -1 +1 @@
-gem 'github-pages'
+gem 'github-pages'
diff --git a/_includes/video.html b/_includes/video.html
@@ -0,0 +1,6 @@
+{::nomarkdown}
+<video loop autoplay width={{ include.width }}>
+	<source src="{{ include.mp4 }}" type="video/mp4">
+	<source src="{{ include.webm }}" type="video/webm">
+</video>
+{:/nomarkdown}
diff --git a/_layouts/default.html b/_layouts/default.html
@@ -3,8 +3,28 @@
 		<head>
 			<title>{{ page.title }}</title>
 			<link rel="stylesheet" type="text/css" href="/css/main.css">
-			<script type="text/javascript" src="js/MathJax.js"></script>
+
+
+			<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML"></script>
+
+
+
+			<style>
+			.center {
+				text-align: center;
+			}
+			.section {
+				font-size:30px;
+				background-color:#eee;
+				padding:10px;
+			}
+			</style>
+
+
 		</head>
+
+
+
 		<body>
 			<nav>
 	    		<ul>

diff --git a/_posts/2016-01-02-introduction.md b/_posts/2016-01-02-introduction.md
@@ -5,14 +5,12 @@ date: 2016-01-02
 ---
 
 
-## What do artificial intelligence, machine learning, and deep learning mean?
-
-It depends who you ask!
-
 \"Artificial intelligence is anything computers [can\'t do yet](https://en.wikipedia.org/wiki/AI_effect)\" - Douglas Hofstadter, re: Tesler theorem 
 
 \"Machine learning is just applied computational statistics.\" - Chris Wiggins
 
+Evidently, the meaning of the terms _artificial intelligence_ and _machine learning_ depend on who you ask.
+
 The term artificial intelligence has been in [usage since the 1950s](https://en.wikipedia.org/wiki/History_of_artificial_intelligence), and originally referred to ___. It has been prone to periods of great popular interest, and [pessimism and decreased interest](__ai winter__), in part due to its failure to achieve the sort of techno-utopianism prognosticated by [crackpot speculators](https://en.wikipedia.org/wiki/The_Singularity_Is_Near).
 
 Pamela McGoldrick attributes these periodic busts to the notion that \"Practical AI successes, computational programs that actually achieved intelligent behavior, were soon assimilated into whatever application domain they were found to be useful in, and became silent partners alongside other problem-solving approaches, which left AI researchers to deal only with the \'failures.\'\" 
@@ -35,6 +33,7 @@ Critique of Paper by \'Deep Learning Conspiracy\'\"](http://people.idsia.ch/~jue
 The point is that none of these terms have well-defined meanings, and connotations are inconsistently associated with them, especially within the academic community itself. We are interested in machines which _do_ interesting things, and can do them _now_ or may do so in the near future.
 
 ## Core principles of this book
+## Mathemagics
 
 This book is not aimed to be scientific. plenty exists. nor is it. there is math, software, .. it is so you are capable of interrogating it. Machine learning is a large and fast-moving field so we can\'t get rid of everything. Instead, a computer science background is 
 
@@ -49,6 +48,16 @@ This book tries to find a middle way. IT uses many different. such is the state
 Machine learning is a fast-moving field and a quarter of the first draft of this book was obsolete by the time I finished writing it. I will try to keep up with the winds of change and welcome tips and contributions [github][twitter][email], and [your generous support](donate/paeon).
 
 
+## Math
+
+In general, this book will try to minimize the use of math, and rely on visual aides more than equations, both because neural networks can be well understood this way, and because it helps reduce the need for other qualifications. Nevertheless you may still find it helpful to review your maths to attach mental probabilistic and spatial geometric interpretations to them.
+Francis Tseng has a nice guide to AI and ML which contains a concise review of the math needed.
+
+
+
+one of the things we can do away with is some of the details of training deep networks and focus on applications. vindication for me as its all i used to do
+
+
 contents
  - machine learning
  - neural nets
@@ -59,6 +68,8 @@ contents
    - encoding images (refer tSne)
    - deepdream
    - style transfer
+ - ethics & issues
+ ------
  - autoencoders 
    - @kyle audio, @dribnet
  - gans
@@ -74,3 +85,6 @@ software
  - dimensionality reduction, PCA, self-organizing maps
  - tSne
 
+
+pictures from arxiv pictures
+ - http://cs231n.stanford.edu/slides/winter1516_lecture9.pdf
diff --git a/_posts/2016-01-03-machine-learning.md b/_posts/2016-01-03-machine-learning.md
@@ -4,63 +4,115 @@ title: "Machine learning"
 date: 2016-01-03
 ---
 
-top link
- - banner image with many examples of ML art
- - more images
- - tweet images
-
-multilayer NN are universal approximators
- - http://www.sciencedirect.com/science/article/pii/0893608089900208
-
-http://frnsys.com/ai_notes/foundations/linear_algebra.html
-
-
-
-You\'ve heard by now that machine learning refers to a broad set of techniques which allow computers to learn from data. But learn what exactly, and how? To understand this, we\'ll use several concrete examples in which techniques from machine learning can be applied.
+You\'ve heard by now that machine learning refers to a broad set of techniques which allow computers to learn from data. But learn what exactly, and how? Let\'s look at several concrete examples in which techniques from machine learning can be applied.
 
 **Example 1**: Suppose you are a climatologist who is trying to devise a computer program which can predict whether or not it will rain on a given day. Turns out this is hard! But we understand intuitively that rain has something to do with the temperature, atmospheric pressure, humidity, wind, cloud cover, location and time of year, and so on.
 
 **Example 2**: Gmail, Yahoo, and other e-mail services provide tools to automatically filter out spam e-mails before they reach your inbox. Like in the rain example, we have a few intuitions about this task. E-mails containing phrases like \"make $$$ now\" or \"free weight loss pills\" are probably suspicious, and we can surely think up a few more. Of course the presence of one suspicious term does not guarantee it\'s spam, and so we can\'t take the naive approach of labeling spam for any e-mail containing any suspicious phrase.
 
-One way to approach solving these problems is a \"rule-based\" or \"expert\" approach in which a series of rules are carefully designed and tested at runtime to determine the output. In the spam example, this could take the form of a [decision tree](___). Upon receiving an e-mail, check to see if it\'s from an unknown sender; if it is, check to see if the phrase \"lose weight now!\" appears, and if it does appear and there is a link to an unknown website, classify it as spam. Obviously our decision tree would be much larger and more complicated than this, but it would still be characterized by a sequence of if-then statements leading to a decision.
+One way to approach solving these problems is a \"rule-based\" or \"[expert](https://en.wikipedia.org/wiki/Expert_system)\" approach in which a series of rules are carefully designed and tested at runtime to determine the output. In the spam example, this could take the form of a [decision tree](___). Upon receiving an e-mail, check to see if it\'s from an unknown sender; if it is, check to see if the phrase \"lose weight now!\" appears, and if it does appear and there is a link to an unknown website, classify it as spam. Obviously our decision tree would be much larger and more complicated than this, but it would still be characterized by a sequence of if-then statements leading to a decision.
 
 Such a strategy suffers from two major weaknesses. First, it requires a great deal of expert guidance and hand-engineering which may be time-consuming and costly. Furthermore, spam trigger words and global climate patterns change continuously, and we\'d have to reprogram them every so often for them to remain effective. Secondly, a rule-based approach does not _generalize_. Notice our spam decision tree won\'t help us predict the rain, or vice-versa, nor will they easily apply to other problems we haven\'t talked about. Expert systems like these are domain-specific, and if our task changes even slightly, our carefully crafted algorithm must be reconstructed from scratch.
 
-# Learning from past observations
+{:.section}
+Learning from past observations
+
+With machine learning, we take a different approach. We start by reducing these two very different example problems given above to essentially the same generic task: given a set of observations about something, make a decision, or _**classification**_. Rain or no rain; spam or not spam. In other problem domains we may have more than two choices. Or we may have one continuous value to predict, e.g. _how much_ it will rain. In this last case, we call this problem _**regression**_. 
 
-With machine learning, we take a different approach. We start by reducing these two very different example problems given above to essentially the same generic task: given a set of observations about something, make a decision, or _**classification**_. Rain or no rain; spam or not spam. In other problem domains we may have more than 2 choices. Or we may have one continuous value to predict, e.g. _how much_ it will rain. In this last case, we call this problem _**regression**_. In each of these cases, we have posed a single abstract problem: determine the relationship between our observations or data, and our desired task.
+In both of our two examples, we have posed a single abstract problem: determine the relationship between our observations or data, and our desired task. This can take the form of a function or model which takes in our observations, and calculates a decision from them. The model is determined from experience, by giving it a set of known pairs of observations and decisions. Once we have the model, we can make predicted outputs. [this is all sloppy, fix this] 
 
 [Known observations] -> [Learning] <- [Known outputs]
                             ||
 [Unknown observation] -> [ Model ] -> Predicted output
 
-Furthermore, machine learning also takes the position that such a functional relationship can be _learned_ from past observations and their known outputs. For the rain prediction problem, we may have a database with thousands of examples where those variables we think are important (pressure, temperature, etc) were measured and we know whether or not it actually rained those days. In the spam example, we may have a database of e-mails which were labeled as spam or not spam by a human. Using this data, we can craft a function which is able to modify its own internal structure in response to new observations, so as to be able to improve its ability to accurately perform the task. Formally, the set of previous examples with its known outcomes is often called a _ground truth_ and it is used as a _training set_ to _train_ our predictive algorithm. 
+Machine learning also takes the position that such a functional relationship can be _learned_ from past observations and their known outputs. For the rain prediction problem, we may have a database with thousands of examples where those variables we think are important (pressure, temperature, etc) were measured and we know whether or not it actually rained those days. In the spam example, we may have a database of e-mails which were labeled as spam or not spam by a human. Using this data, we can craft a function which is able to modify its own internal structure in response to new observations, so as to be able to improve its ability to accurately perform the task. Formally, the set of previous examples with its known outcomes is often called a _ground truth_ and it is used as a _training set_ to _train_ our predictive algorithm.  [[ all of this needs to be fixed/merged with previous section ]]
 
 More generally, what\'s been defined in this section is called _**supervised learning**_ and is one of the foundational branches of machine learning. _**Unsupervised learning**_ refers to tasks involving data which is unlabeled, and _**reinforcement learning**_ is a hybrid of the two, but we will get to those later.
 
-# The simplest machine learning algorithm: a linear classifier
+data-driven ?
 
-We've introduced the notion of an algorithm which which makes a series of empirical observations about something and uses those observations to make a decision about something  data-driven machine learning
+{:.section}
+The simplest machine learning algorithm: a linear classifier
 
-Using that as our blueprint, we will now make our first predictive model, a simple linear classifier. A linear classifier is defined as a function of our data, $$X$$, and 
+We\'ve introduced the notion of an algorithm which which makes a series of empirical observations about something and uses those observations to make a decision about something.  
 
+Now we will now make our first predictive model, a simple _linear classifier_. A linear classifier is defined as a function of our data, $$X$$, and 
 
+Let\'s take our first example, that of predicting whether or not it will rain on a given day. We will use a simplified dataset which consists of only two observations: atmospheric pressure and humidity.  Suppose we have a dataset with 6 days of past data.
 
-[ 1 ] - 2d line classifier
-[ 2 ] - 3d plane classifier
 
-sometimes our data is not linearly separable. take the simple example of __
-[ 3 ] - 2d non-linearly separable 
+* notes
+ - make two columns, put humidity + pressure into 
+
+|**Humidity (%)**|**Pressure (kPa)**|**Rain?**|
+|==|==|==|
+|29|101.7|-|
+|60|98.6|+|
+|40|101.1|-|
+|62|99.9|+|
+|39|103.2|-|
+|51|97.6|+|
+|46|102.1|-|
+|55|100.2|+|
+
+Let\'s plot these on a 2d graph.
+
+{:.center}
+![linear classifier](/images/lin_classifier_2d.png 'linear classifier')
+
+Intuitively, we see that rainy days tend to have low pressure and high humidity, and non-rainy days are the opposite. If we look at the graph, we can see that we can easily separate the two classes with a line.
 
-# Dimension X
+If we let $$x_1$$ represent the humidity, and $$x_2$$ represent the pressure, we can plot a line on our graph with the following equation:
 
+$$w_1*x_1 + w_2*x_2 + b = 0$$
 
-matplotlib: http://matplotlib.org/examples/mplot3d/surface3d_demo.html
-do gif rotation of planar classifier
+where $$w_1$$, $$w_2$$, and $$b$$ are coefficients that we can freely choose. If we set $$w_1 = 5$$, $$w_2 = 6$$, and $$b = 1.2$$, and then plot the resulting line on the graph, we see it perfectly separates our two classes. We call this line our _decision boundary_.
+
+(equation should be written beside the line, Ax+By+C)
+
+{:.center}
+![linear classifier](/images/lin_classifier_2d.png 'linear classifier')
+
+Suppose we know today\'s humidity and pressure, and are asked to predict whether it will rain or not. Let\'s say we are given $$x_1 = 20$$ and $$x_2 = 3$$. We can plot this new point on our graph.
+
+(now with new point, as a ?)
+{:.center}
+![linear classifier](/images/lin_classifier_2d.png 'linear classifier')
+
+It appears on the negative side of our decision boundary, and thus we predict it won\'t rain. More concretely, our classification decision can be expressed as:
+
+$$
+\begin{eqnarray}
+  \mbox{classification} & = & \left\{ \begin{array}{ll}
+  	  1 & \mbox{if } w_1*x_1 + w_2*x_2 + b \gt 0 \\
+      0 & \mbox{if } w_1*x_1 + w_2*x_2 + b \leq 0
+      \end{array} \right.
+\tag{1}\end{eqnarray}
+$$
+
+This is a 2-dimensional linear classifier.
+
+Now let's do the same thing in 3 dimensions. add a column.  we get this:
+
+{% include video.html mp4='/images/video.mp4' webm='/images/video.webm' width='400' %}
 
 A flat plane in 3d is analogous to a line in 2d, and is thus called \"linear.\" This is true in general for any n-dimensional hyperplane. Linear classifiers are limited because in reality, most problems that interest us don't have such flat behavior; different variables interact in various ways.
 
 
+{:.section}
+Dimension X
+
+In practive, we have many many dimensions, but it works the same.
+
+
+{:.section}
+Limitations of linear classifier
+
+sometimes our data is not linearly separable. Suppose we receive a training set that looks like this: dots in the middle.  
+Clearly, no line is going to 
+[ 3 ] - 2d non-linearly separable 
+
+
 # no labels
 
 Later, unsupervised
@@ -76,6 +128,19 @@ the same linear classifier which is simply telling apart two objects from each o
 ** taken together, the weights and the biases are often also called _parameters_ because the behavior of our machine depends on how we set them. 
 
 
+# Supervised learning
+
+this is supervised elarning. used for blah
+
+# Unsupervised learning
+
+Find underlying structure
+
+t-SNE is beautiful (my gif)
+
+# Reinforcement learning
+
+physics demo (balancing a stick) (top banner?). this book will prbbably mostly reference
 
 
 
@@ -89,6 +154,7 @@ deepdream
  - top bar: mike tyka's original recursive ones
  - on june __ someone mysteriously posted a photo
  - mike tyka's new deepdream experiments
+ - VR: http://www.engadget.com/2016/02/27/google-deepdream-vr-experiment/
 
 convnets
  - top bar: ofxCcv viewer animated gif of me waving
@@ -108,13 +174,31 @@ t-SNE
  - include olivia jack, moritz stefaner, myself, golan + aman
 
 NLP
- - top bar: kyle's ragas or my wikipedia concepts
+ - top bar: wikipedia concepts?
  - quote: chris manning, deep learning will steamroll NLP
  - word2vec
+ - ragas 
  - translation feedback loops?
 
 ethics
  - top bar: heather's faces
  - kate crawford, hanna wallach
  - many art projects offer a glimpse into a brave new world, filled with 
- - corporations prefer deep learning because it automates feature extraction
+ - corporations prefer deep learning because it automates feature extraction
+
+
+
+-----
+
+For example, we might want to make predictions about the price of a house so that represents the price of the house in dollars and the elements of represent “features” that describe the house (such as its size and the number of bedrooms). Suppose that we are given many examples of houses where the features for the i’th house are denoted and the price is . For short, we will denote the
+
+Our goal is to find a function
+succeed in finding a function
+prices, we hope that the function
+given the features for a new house where the price is not known.
+so that we have for each training example. If we like this, and we have seen enough examples of houses and their will also be a good predictor of the house price even when we are
+given the features for a new house where the price is not known.
+
+We initialize a sigmoid neural network with 3 input neurons and 1 output neuron, and 1 hidden layer with 2 neurons. Every connection has a random initial weight, and neurons in the hidden and output layers have a random bias.
+
+