# Bolzmann Machines
# Geofrey Hinton
# RBM contains two layer 1) Visible layer 2) Hidden Layer




<b>So what was it that allowed researchers to overcome the vanishing gradient problem? The
answer to this question has two parts, the first of which involves a Restricted Boltzmann
Machine. This is a method that can automatically find patterns in our data by reconstructing
the input. Sounds complicated, but bear with me. We’ll take a closer look.
Geoff Hinton at the University of Toronto was one of the first researchers to devise
a breakthrough idea for training deep nets.His approach led to the creation of the Restricted
Boltzmann Machine, also known as the RBM. Because of his pioneering work he’s often
referred to as one of the father’s of deep learning.
An RBM is a shallow, two-layer net; the first layer is known as the visible layer and the
second is called the hidden layer. Each node in the visible layer is connected to every
node in the hidden layer. An RBM is considered “restricted” because no two nodes in the
same layer share a connection.
An RBM is the mathematical equivalent of a two-way translator – in the forward pass,
an RBM takes the inputs and translates them into a set of numbers that encode the inputs.
In the backward pass, it takes this set of numbers and translates them back to form the
re-constructed inputs. A well-trained net will be able to perform the backwards translation
with a high degree of accuracy. In both steps, the weights and biases have a very important
role. They allow the RBM to decipher the interrelationships among the input features, and they also help
the RBM decide which input features are the most important when detecting patterns.
Through several forward and backward passes, an RBM is trained to reconstruct the input
data. Three steps are repeated over and over through the training process:
a) With a forward pass, every input is combined with an individual weight and one overall
bias, and the result is passed to the hidden layer which may or may not activate. Here’s
how it flows for the whole net. b) Next, in a backward pass, each activation
is combined with an individual weight and an overall bias, and the result is passed
to the visible layer for reconstruction. Here’s how it flows back.
c) At the visible layer, the reconstruction is compared against the original input to
determine the quality of the result.
RBMs use a measure called KL Divergence for step c); steps a) thru c) are repeated with
varying weights and biases until the input and the re-construction are as close as possible.
Have you ever had to train an RBM in one of your own machine learning projects? Please
comment and tell me about your experiences.
An interesting aspect of an RBM is that the data does not need to be labelled. This turns
out to be very important for real-world data sets like photos, videos, voices, and sensor
data – all of which tend to be unlabelled. Rather than having people manually label the
data and introduce errors, an RBM automatically sorts through the data, and by properly adjusting
the weights and biases, an RBM is able to extract the important features and reconstruct
the input. An important note is that an RBM is actually making decisions about which input
features are important and how they should be combined to form patterns. In other words,
an RBM is part of a family of feature extractor neural nets, which are all designed to recognize
inherent patterns in data. These nets are also called autoencoders, because in a way,
they have to encode their own structure.
So we saw that an RBM can extract features, but how does that help with the vanishing
gradient? We’ll get to the second part of our answer in the next video when we take
a look at the Deep Belief Net.</b>

# DBF - MLP
# Deep Belief Network

# Stack of RBM

# First RBM will train based on iput and next 
# MLP - Multi Layer Perceptron
# Implication - DBM need only need small label traning data
# Limited GPU.




"""
If there’s one deep net that has completely dominated the machine vision space in recent
years, it’s certainly the convolutional neural net, or CNN. These nets are so influential
that they’ve made Deep Learning one of the hottest topics in AI today. But they can be
tricky to understand, so let’s take a closer look and see how they work.
CNNs were pioneered by Yann Lecun of New York University, who also serves as the director
of Facebook's AI group. It is currently believed that Facebook uses a CNN for its facial recognition
software.
A convolutional net has been the go to solution for machine vision projects in the last few
years. Early in 2015, after a series of breakthroughs by Microsoft, Google, and Baidu, a machine
was able to beat a human at an object recognition challenge for the first time in the history
of AI.
It’s hard to mention a CNN without touching on the ImageNet challenge. ImageNet is a project
that was inspired by the growing need for high-quality data in the image processing
space. Every year, the top Deep Learning teams in the world compete with each other to create
the best possible object recognition software. Going back to 2012 when Geoff Hinton’s team
took first place in the challenge, every single winner has used a convolutional net as their
model. This isn’t surprising, since the error rate of image detection tasks has dropped
significantly with CNNs, as seen in this image.
Have you ever struggled while trying to learn about CNNs? If so, please comment and share
your experiences.
We’ll keep our discussion of CNNs high level, but if you’re inclined to learn about the
math, be sure to check out Andrej Karpathy’s amazing CS231n course notes on these nets.
There are many component layers to a CNN, and we will explain them one at a time. Let’s
start with an analogy that will help describe the first component, which is the “convolutional
layer”
Imagine that we have a wall, which will represent a digital image. Also imagine that we have
a series of flashlights shining at the wall, creating a group of overlapping circles. The
purpose of these flashlights is to seek out a certain pattern in the image, like an edge
or a color contrast for example. Each flashlight looks for the exact same pattern as all the
others, but they all search in a different section of the image, defined by the fixed
region created by the circle of light. When combined together, the flashlights form what’s
a called a filter. A filter is able to determine if the given pattern occurs in the image,
and in what regions. What you see in this example is an 8x6 grid of lights, which is
all considered to be one filter.
Now let’s take a look from the top. In practice, flashlights from multiple different filters
will all be shining at the same spots in parallel, simultaneously detecting a wide array of patterns.
In this example, we have four filters all shining at the wall, all looking for a different
pattern. So this particular convolutional layer is an 8x6x4, 3-dimensionsal grid of
these flashlights.
Now let’s connect the dots of our explanation: - Why is it called a convolutional net? The
net uses the technical operation of convolution to search for a particular pattern. While
the exact definition of convolution is beyond the scope of this video, to keep things simple,
just think of it as the process of filtering through the image for a specific pattern.
Although one important note is that the weights and biases of this layer affect how this operation
is performed: tweaking these numbers impacts the effectiveness of the filtering process.
- Each flashlight represents a neuron in the CNN. Typically, neurons in a layer activate
or fire. On the other hand, in the convolutional layer, neurons perform this “convolution”
operation. We're going to draw a box around one set of flashlights to make things look
a bit more organized.
- Unlike the nets we've seen thus far where every neuron in a layer is connected to every
neuron in the adjacent layers, a CNN has the flashlight structure. Each neuron is only
connected to the input neurons it "shines" upon.
The neurons in a given filter share the same weight and bias parameters. This means that,
anywhere on the filter, a given neuron is connected to the same number of input neurons
and has the same weights and biases. This is what allows the filter to look for the
same pattern in different sections of the image. By arranging these neurons in the same
structure as the flashlight grid, we ensure that the entire image is scanned.
The next two layers that follow are RELU and pooling, both of which help to build up the
simple patterns discovered by the convolutional layer. Each node in the convolutional layer
is connected to a node that fires like in other nets. The activation used is called
RELU, or rectified linear unit. CNNs are trained using backpropagation, so the vanishing gradient
is once again a potential issue. For reasons that depend on the mathematical definition
of RELU, the gradient is held more or less constant at every layer of the net. So the
RELU activation allows the net to be properly trained, without harmful slowdowns in the
crucial early layers.
The pooling layer is used for dimensionality reduction. CNNs tile multiple instances of
convolutional layers and RELU layers together in a sequence, in order to build more and
more complex patterns. The problem with this is that the number of possible patterns becomes
exceedingly large. By introducing pooling layers, we ensure that the net focuses on
only the most relevant patterns discovered by convolution and RELU. This helps limit
both the memory and processing requirements for running a CNN.
Together, these three layers can discover a host of complex patterns, but the net will
have no understanding of what these patterns mean. So a fully connected layer is attached
to the end of the net in order to equip the net with the ability to classify data samples.
Let’s recap the major components of a CNN. A typical deep CNN has three sets of layers
– a convolutional layer, RELU, and pooling layers – all of which are repeated several
times. These layers are followed by a few fully connected layers in order to support
classification. Since CNNs are such deep nets, they most likely need to be trained using
server resources with GPUs.
Despite the power of CNNs, these nets have one drawback. Since they are a supervised
learning method, they require a large set of labelled data for training, which can be
challenging to obtain in a real-world application. In the next video, we’ll shift our attention
to another important deep learning model – the Recurrent Net.

""" 	

# CNN - Imagenet -
# Reading ANdrej Karpathy CS23
# Yenn Lecun

If there’s one deep net that has completely dominated the machine vision space in recent
years, it’s certainly the convolutional neural net, or CNN. These nets are so influential
that they’ve made Deep Learning one of the hottest topics in AI today. But they can be
tricky to understand, so let’s take a closer look and see how they work.
CNNs were pioneered by Yann Lecun of New York University, who also serves as the director
of Facebook's AI group. It is currently believed that Facebook uses a CNN for its facial recognition
software.
A convolutional net has been the go to solution for machine vision projects in the last few
years. Early in 2015, after a series of breakthroughs by Microsoft, Google, and Baidu, a machine
was able to beat a human at an object recognition challenge for the first time in the history
of AI.
It’s hard to mention a CNN without touching on the ImageNet challenge. ImageNet is a project
that was inspired by the growing need for high-quality data in the image processing
space. Every year, the top Deep Learning teams in the world compete with each other to create
the best possible object recognition software. Going back to 2012 when Geoff Hinton’s team
took first place in the challenge, every single winner has used a convolutional net as their
model. This isn’t surprising, since the error rate of image detection tasks has dropped
significantly with CNNs, as seen in this image.
Have you ever struggled while trying to learn about CNNs? If so, please comment and share
your experiences.
We’ll keep our discussion of CNNs high level, but if you’re inclined to learn about the
math, be sure to check out Andrej Karpathy’s amazing CS231n course notes on these nets.
There are many component layers to a CNN, and we will explain them one at a time. Let’s
start with an analogy that will help describe the first component, which is the “convolutional
layer”
Imagine that we have a wall, which will represent a digital image. Also imagine that we have
a series of flashlights shining at the wall, creating a group of overlapping circles. The
purpose of these flashlights is to seek out a certain pattern in the image, like an edge
or a color contrast for example. Each flashlight looks for the exact same pattern as all the
others, but they all search in a different section of the image, defined by the fixed
region created by the circle of light. When combined together, the flashlights form what’s
a called a filter. A filter is able to determine if the given pattern occurs in the image,
and in what regions. What you see in this example is an 8x6 grid of lights, which is
all considered to be one filter.
Now let’s take a look from the top. In practice, flashlights from multiple different filters
will all be shining at the same spots in parallel, simultaneously detecting a wide array of patterns.
In this example, we have four filters all shining at the wall, all looking for a different
pattern. So this particular convolutional layer is an 8x6x4, 3-dimensionsal grid of
these flashlights.
Now let’s connect the dots of our explanation: - Why is it called a convolutional net? The
net uses the technical operation of convolution to search for a particular pattern. While
the exact definition of convolution is beyond the scope of this video, to keep things simple,
just think of it as the process of filtering through the image for a specific pattern.
Although one important note is that the weights and biases of this layer affect how this operation
is performed: tweaking these numbers impacts the effectiveness of the filtering process.
- Each flashlight represents a neuron in the CNN. Typically, neurons in a layer activate
or fire. On the other hand, in the convolutional layer, neurons perform this “convolution”
operation. We're going to draw a box around one set of flashlights to make things look
a bit more organized.
- Unlike the nets we've seen thus far where every neuron in a layer is connected to every
neuron in the adjacent layers, a CNN has the flashlight structure. Each neuron is only
connected to the input neurons it "shines" upon.
The neurons in a given filter share the same weight and bias parameters. This means that,
anywhere on the filter, a given neuron is connected to the same number of input neurons
and has the same weights and biases. This is what allows the filter to look for the
same pattern in different sections of the image. By arranging these neurons in the same
structure as the flashlight grid, we ensure that the entire image is scanned.
The next two layers that follow are RELU and pooling, both of which help to build up the
simple patterns discovered by the convolutional layer. Each node in the convolutional layer
is connected to a node that fires like in other nets. The activation used is called
RELU, or rectified linear unit. CNNs are trained using backpropagation, so the vanishing gradient
is once again a potential issue. For reasons that depend on the mathematical definition
of RELU, the gradient is held more or less constant at every layer of the net. So the
RELU activation allows the net to be properly trained, without harmful slowdowns in the
crucial early layers.
The pooling layer is used for dimensionality reduction. CNNs tile multiple instances of
convolutional layers and RELU layers together in a sequence, in order to build more and
more complex patterns. The problem with this is that the number of possible patterns becomes
exceedingly large. By introducing pooling layers, we ensure that the net focuses on
only the most relevant patterns discovered by convolution and RELU. This helps limit
both the memory and processing requirements for running a CNN.
Together, these three layers can discover a host of complex patterns, but the net will
have no understanding of what these patterns mean. So a fully connected layer is attached
to the end of the net in order to equip the net with the ability to classify data samples.
Let’s recap the major components of a CNN. A typical deep CNN has three sets of layers
– a convolutional layer, RELU, and pooling layers – all of which are repeated several
times. These layers are followed by a few fully connected layers in order to support
classification. Since CNNs are such deep nets, they most likely need to be trained using
server resources with GPUs.
Despite the power of CNNs, these nets have one drawback. Since they are a supervised
learning method, they require a large set of labelled data for training, which can be
challenging to obtain in a real-world application. In the next video, we’ll shift our attention
to another important deep learning model – the Recurrent Net.


# RNN Simple  Brian child of Jurgen Schomidher 
# Speech Recognization adn Self 

# RNN is not a feed forward network
#Will received Sequence of Input and provide sequence of outout
# Document Classification, Self Driving car, Statistical Forecasting,

# Vanishing Gradient is more in RNN,
# Avoid using Gating, - LSTM ,GRU
# Gradient Clipping

# RNN is timeseries data
# Classification / Regression / Forecasting.

What do you do if the patterns in your data change with time? In that case, your best
bet is to use a recurrent neural network. This deep learning model has a simple structure
with a built-in feedback loop, allowing it to act as a forecasting engine. Let’s take
a closer look.
]Recurrent neural networks, or RNNs, have a long history, but their recent popularity
is mostly due to the works of Juergen Schmidhuber, Sepp Hochreiter, and Alex Graves. Their applications
are extremely versatile – ranging from speech recognition to driverless cars.
All the nets we’ve seen up to this point have been feedforward neural networks. In
a feedforward neural network, signals flow in only one direction from input to output,
one layer at a time. In a recurrent net, the output of a layer is added to the next input
and fed back into the same layer, which is typically the only layer in the entire network.
You can think of this process as a passage through time – shown here are 4 such time
steps. At t = 1, the net takes the output of time t = 0 and sends it back into the net
along with the next input. The net repeats this for t = 2, t = 3, and so on.
Unlike feedforward nets, a recurrent net can receive a sequence of values as input, and
it can also produce a sequence of values as output. The ability to operate with sequences
opens up these nets to a wide variety of applications. Here are a few examples. When the input is
singular and the output is a sequence, a potential application is image captioning. A sequence
of inputs with a single output can be used for document classification. When both the
input and output are sequences, these nets can classify videos frame by frame. If a time
delay is introduced, the net can statistically forecast the demand in supply chain planning.
Have you ever used an RNN for one of these applications? If so, please comment and share
your experiences.
Like we’ve seen with previous deep learning models, by stacking RNNs on top of each other,
you can form a net capable of more complex output than a single RNN working alone.
Typically, an RNN is an extremely difficult net to train. Since these nets use backpropagation,
we once again run into the problem of the vanishing gradient. Unfortunately, the vanishing
gradient is exponentially worse for an RNN. The reason for this is that each time step
is the equivalent of an entire layer in a feedforward network. So training an RNN for
100 time steps is like training a 100-layer feedforward net – this leads to exponentially
small gradients and a decay of information through time.
There are several ways to address this problem - the most popular of which is gating. Gating
is a technique that helps the net decide when to forget the current input, and when to remember
it for future time steps. The most popular gating types today are GRU and LSTM. Besides
gating, there are also a few other techniques like gradient clipping, steeper gates, and
better optimizers.
When it comes to training a recurrent net, GPUs are an obvious choice over an ordinary
CPU. This was validated by a research team at Indico, which uses these nets on text processing
tasks like sentiment analysis and helpfulness extraction. The team found that GPUs were
able to train the nets 250 times faster! That’s the difference between one day of training,
and over eight months!
So under what circumstances would you use a recurrent net over a feedforward net? We
know that a feedforward net outputs one value, which in many cases was a class or a prediction.
A recurrent net is suited for time series data, where an output can be the next value
in a sequence, or the next several values. So the answer depends on whether the application
calls for classification, regression, or forecasting.
In the next video, we’ll take a look at a family of deep learning models known as
the autoencoders.

# Auto Encoders - Helps to reduce the dimensionality reduction, similar to PCA
# Denoising and contractive

It will perform feature extraction


# Shallow - Encode and decode - Backpropagation with Loss
# Deep is useful for MNIST Image


    There are times when it’s extremely useful to figure out the underlying structure of
    a data set. Having access to the most important data features gives you a lot of flexibility
    when you start applying labels. Autoencoders are an important family of neural networks
    that are well-suited for this task. Let’s take a look.
    In a previous video we looked at the Restricted Boltzmann Machine, which is a very popular
    example of an autoencoder. But there are other types of autoencoders like denoising and contractive,
    just to name a few. Just like an RBM, an autoencoder is a neural net that takes a set of typically
    unlabelled inputs, and after encoding them, tries to reconstruct them as accurately as
    possible. As a result of this, the net must decide which of the data features are the
    most important, essentially acting as a feature extraction engine.
    Autoencoders are typically very shallow, and are usually comprised of an input layer, an
    output layer and a hidden layer. An RBM is an example of an autoencoder with only two
    layers. Here is a forward pass that ends with a reconstruction of the input. There are two
    steps - the encoding and the decoding. Typically, the same weights that are used to encode a
    feature in the hidden layer are used to reconstruct an image in the output layer.
    Autoencoders are trained with backpropagation, using a metric called “loss”. As opposed
    to “cost”, loss measures the amount of information that was lost when the net tried
    to reconstruct the input. A net with a small loss value will produce reconstructions that
    look very similar to the originals.
    Not all of these nets are shallow. In fact, deep autoencoders are extremely useful tools
    for dimensionality reduction. Consider an image containing a 28x28 grid of pixels. A
    neural net would need to process over 750 input values just for one image – doing
    this across millions of images would waste significant amounts of memory and processing
    time. A deep autoencoder could encode this image into an impressive 30 numbers, and still
    maintain information about the key image features. When decoding the output, the net acts like
    a two-way translator. In this example, a well-trained net could translate these 30 encoded numbers
    back into a reconstruction that looks similar to the original image. Certain types of nets
    also introduce random noise to the encoding-decoding process, which has been shown to improve the
    robustness of the resulting patterns.
    Have you ever needed to use an autoencoder to reduce the dimensionality of your data?
    If so, please comment and share your experiences.
    Deep autoencoders perform better at dimensionality reduction than their predecessor, principal
    component analysis, or PCA. Below is a comparison of two letter codes for news stories of different
    topics – generated by both a deep autoencoder and a PCA. Labels were added to the picture
    for illustrative purposes.
    In the next video, we’ll take a look at Recursive Neural Tensor Nets or RNTNs


# Recursive Neural Tensor Nets- By Richard Socher (RNTN)
# Sentimental Analysis

# Structure
# Root and Leaf (Binary Tree)

# Output Class and Score via Recursion

# Usage : NLP, Sentimental Analysis, Image Tagging



Sometimes it’s useful to discover the hierarchical structure of a set of data, such as the parse
trees of a group of sentences. In these cases, Recursive Neural Tensor Networks, or RNTNs,
are a better fit than feedforward or recurrent nets. Let’s take a closer look and see why.
RNTNs were conceived by Richard Socher of MetaMind as part of his PhD thesis at Stanford.
The purpose of these nets was to analyze data that had a hierarchical structure. Originally,
they were designed for sentiment analysis, where the sentiment of a sentence depends
not just on its component words, but on the order in which they’re syntactically grouped.
So let’s take a look at the structure of an RNTN. An RNTN has three basic components
– a parent group, which we’ll call the root, and the child groups, which we’ll
call the leaves. Each group is simply a collection of neurons, where the number of neurons depends
on the complexity of the input data. As you can see, the root is connected to both leaves,
but the leaves are not connected to each other. Technically speaking, the three components
form what’s called a binary tree. In general, the leaf groups receive input, and the root
group uses a classifier to fire out a class and a score. We’ll get to the significance
of these two values in a moment. An RNTN’s structure may seem simple, but just like a
recurrent net, the complexity comes from the way in which data moves throughout the network.
In the case of an RNTN, this process is recursive.
To see how this recursion works, let’s take a look at an example. Let’s feed an English
sentence into the net, and receive the sentence’s parse tree as output. At step one, we feed
the first two words into leaf groups one and two, respectively. As a practical note, the
leaf groups do not actually receive the words per say, but rather vector representations
of the words. A vector is just an ordered set of numbers, and it’s been shown that
these nets work best with very specific vector representations – particularly, good results
are achieved when the numbers in the two vectors encode the similarities between the two words,
when compared to other words in the vocabulary. The exact details of this process are beyond
the scope of this video.
The two vectors move across the net to the root, which processes them and fires out two
values – the class and the score. The score represents the quality of the current parse,
and the class represents an encoding of a structure in the current parse. This is the
point where the net starts the recursion. At the next step, the first leaf group now
receives the current parse, rather than a single word. The second leaf group receives
the next word in the sentence. At this point, the root group would output the score of a
parse that is three words long. This continues until all the inputs are used up, and the
net has a parse tree with every single word included.
This simplified example illustrates the main idea behind an RNTN; but in a practical application,
we typically encounter more complex recursive processes. Rather than use the next word in
the sentence for the second leaf group, an RNTN would try all of the next words, and
eventually, vectors that represent entire sub-parses. By doing this at every step of
the recursive process, the net is able to analyze – and score – every possible syntactic
parse.
Have you ever had to work with data where the underlying patterns were hierarchical?
Please comment and let us know what you learned.
Shown here are three possible parse trees for the same sentence. To pick the best one,
the net relies on the score value produced by the root group. By using this score to
select the best substructure at each step of the recursive process, the net will produce
the highest-scoring parse as its final output.
Once the net has the final structure, it backtracks through the parse tree in order to figure
out the right grammatical label for each part of the sentence. Here, it does that one first
and labels it as a noun phrase. Then it works on this, and you get a verb phrase. It then
works its way up, and when it reaches the top, it adds a special label that signifies
the beginning of the parse structure.
RNTNs are trained with backpropagation by comparing the predicted sentence structure
with the proper sentence structure obtained from a set of labelled training data. Once
trained, the net will give a higher score to structures that are more similar to the
parse trees that it saw in training.
RNTNs are used in natural language processing for both syntactic parsing and sentiment analysis.
They are also used to parse images, typically when an image contains a scene with many different
components.
In the next video, we’ll take a closer look at the many applications of Deep Learning.

# USe case of Deeplearning
machien Vision
Image Classification / Auto tagging
Object Recognization
Sppech Recognization 
Fact Extraction
Text to other languages
Sentimental Analysis - Meta mind
6K factor for Cacner patient Surviying
Drug Discovery
Radiology
Trading - Deeplearning
Di
# https://www.clarifai.com/



    There are so many important use cases for Deep Learning, that it’s impossible to produce
    an exhaustive list. Deep Learning is just getting started, and new applications pop
    up all the time. Let’s take a look at some of the biggest ones today.
    At this point, it should be no surprise that machine vision is one of the biggest applications
    of deep learning. Image search systems use deep learning for image classification and
    automatic tagging, which allows the images to be accessible through a standard search
    query. Companies like Facebook use deep nets to scan pictures for faces at many different
    angles, and then label the face with the proper name. Deep nets are also used to recognize
    objects within images which allow for images to be searchable based on the objects within
    them. Let’s look at an example application – Clarifai.
    Let’s load Clarifai in a browser. Here is the URL, which you'll also find in the video
    description below. Clarifai is an app that uses a convolutional net to recognize things
    and concepts in a digital image. Let’s take a look. Right in the middle of the page you
    have the demo button. Lets click that.
    It takes you to part of the webpage where you have the demo. You have two choices - a)
    either choose a URL where the image is located, or b) load the digital image yourselves if
    you have it on file. I'm going with choice b) - loading an image; I am in the right folder
    now and am going to select the first one.
    When I select an image, it wants me to go through a verification process. In this case,
    it wants me to select all the squares that have a gift box, so I'm gonna go through and
    do that. This changes every time btw - you can have different tests.
    Its come back and you see the predictions. Firstly, it says there's no person, it expected
    to find a person but there weren’t any so it identified that as a pattern for this one,
    which is cool! The other predictions are "tableware", "indoors", "party", "fashion" etc. So this
    is the list of tags its associated with this image.
    If you scroll down, it shows a list of example images and the items in them. Like the first
    one with a coffee and croissant, which I think is cool. If you go to the one with the concert,
    its tagged it pretty accurately with "concert", "band", "singer" etc. You also get similar
    images.
    I'm going to pick another one, this time of a county fair. Again it goes through the same
    verification process - this time it wants me to pick images with cars. Ok - it came
    back and gave me some tags. It recognized a Ferris wheel, and though carousel is only
    partly visible to the left, it still picked it out! It also picked out the word "fun".
    Also, the images it suggested as similar are accurate - they are virtually identical to
    the one I picked. Further, it presents the same example images as the last time.
    So there you have it, a demo of object recognition using Clarifai.
    Other uses of deep learning include image and video parsing. Video recognition systems
    are important tools for driverless cars, remote robots, and theft detection. And while not
    exactly a part of machine vision, the speech recognition field got a powerful boost from
    the introduction of deep nets.
    Deep Net parsers can be used to extract relations and facts from text, as well as automatically
    translate text to other languages. These nets are extremely useful in sentiment analysis
    applications, and can be used as part of movie ratings and new product intros. Here is a
    quick demo of Metamind - an RNTN that performs sentiment analysis.
    Let’s load Metamind in a browser. Here is the URL, which you'll also find in the video
    description below. Metamind is an app by Richard Socher that uses an RNTN for twitter sentiment,
    amongst other things.
    You can search by user name, or keyword or hashtag. I'm going to search by hash tag.
    My first one's #coffee.
    When you click "Classify", it first downloads the tweets which takes a little time. It then
    comes back and displays you two things. On the left, it shows you a pie chart of the
    3 different sentiments - positive, negative and neutral. For most searches, you'll get
    lots of neutral comments which is natural, but here you have more positive comments than
    negative - 206 vs 41, which I think is good :-)
    On the right, it also lists some example comments classified as positive, neutral and negative.
    Let’s search a different one - #holidays. Not surprisingly, you find a ton more positive
    comments about the holidays. In this case, if you look at the example, even the negative
    ones are light-hearted.
    So there you have it, a demo of twitter sentiment analysis using Metamind.
    Even recurrent nets have found uses in character-level text processing and document classification.
    Deep nets are now beginning to thrive in the medical field. A Stanford team used deep learning
    to identify over 6000 factors that help predict the chances of a cancer patient surviving.
    Researchers from IDSIA in Switzerland created a deep net model to identify invasive breast
    cancer cells. Beyond this, deep nets are even used for drug discovery. In 2012, Merck hosted
    the Molecular Activity challenge on Kaggle in order to predict the biological activities
    of different drug molecules based solely on chemical structure. As a brief mention, this
    challenge was won by George Dahl of the University of Toronto, who led a team by the name of
    ‘gggg’. But one crucial application of deep nets is radiology. Convolutional nets
    can help detect anomalies like tumors and cancers through the use of data from MRI,
    fMRI, EKG, and CT scans.
    In the field of finance, deep nets can help make buy and sell predictions based on market
    data streams, portfolio allocations, and risk profiles. Depending on how they’re trained,
    they’re useful for both short term trading and long term investing. In digital advertising,
    deep nets are used to segment users by purchase history in order to offer relevant and personalized
    ads in real time. Based on historical ad price data and other factors, deep nets can learn
    to optimally bid for ad space on a given web page. In fraud detection, deep nets use multiple
    data sources to flag a transaction as fraudulent in real time. They can also determine which
    products and markets are typically the most susceptible to fraud. In marketing and sales,
    deep nets are used to gather and analyze customer information, in order to determine the best
    upselling strategies. In agriculture, deep nets use satellite feeds and sensor data to
    identify problematic environmental conditions.
    Which of these deep learning applications appeals to you the most? Please comment and
    share your thoughts.
    In the next video, we’ll take a look at the main ideas behind a Deep Learning Platform.



# Deep Learning Platforms and Libraries

In [5]:
#Software platform , Infrastructure, 
#Platform - No code
#Library - Need to use libraries + flexibiltiy
#ersatzlabs, h2o.ai, Datao - Plaftform


In [6]:
#h2o.ai - Srisatish Ambati
# Dato Graph Lab

In [None]:
#Deep Learning  Libaries
# Theano
# Well test Code



# Caffe



#Tensorflow