<a href="https://colab.research.google.com/github/ggeorgekkariuki/Tensorflow-Plant-Classifier/blob/main/TF_Clustering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Clustering


Clustering is a Machine Learning technique that involves the grouping of data points. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. (https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68)


####Basic Algorithm for K-Means.
- Step 1: Randomly pick K points to place K centroids
- Step 2: Assign all the data points to the centroids by distance. The closest centroid to a point is the one it is assigned to.
- Step 3: Average all the points belonging to each centroid to find the middle of those clusters (center of mass). Place the corresponding centroids into that position.
- Step 4: Reassign every point once again to the closest centroid.
- Step 5: Repeat steps 3-4 until no point changes which centroid it belongs to.


### Hidden Markov Models

A hidden markov model works with probabilities to predict future events or states. In this section we will learn how to create a hidden markov model that can predict the weather.

Components of a markov model.

**States:** In each markov model we have a finite set of states. These states could be something like "warm" and "cold" or "high" and "low" or even "red", "green" and "blue". These states are "hidden" within the model, which means we do not direcly observe them.

**Observations:** Each state has a particular outcome or observation associated with it based on a probability distribution. An example of this is the following: *On a hot day Tim has a 80% chance of being happy and a 20% chance of being sad.*

**Transitions:** Each state will have a probability defining the likelyhood of transitioning to a different state. An example is the following: *a cold day has a 30% chance of being followed by a hot day and a 70% chance of being follwed by another cold day.*

To create a hidden markov model we need.
- States
- Observation Distribution
- Transition Distribution


In [2]:
%tensorflow_version 2.x  # this line is not required unless you are in a notebook

`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `2.x  # this line is not required unless you are in a notebook`. This will be interpreted as: `2.x`.


TensorFlow is already loaded. Please restart the runtime to change versions.


In [1]:
import tensorflow_probability as tfp  # We are using a different module from tensorflow this time
import tensorflow as tf

A Weather Nodel using the Hidden Markov Model

In [3]:
tfd = tfp.distributions

# Represent a cold day with 0 and a hot day with 1.
# Suppose the first day of a sequence has a 0.8 chance of being cold.
# We can model this using the categorical distribution:
initial_distributions = tfd.Categorical([0.8, 0.2]) # cold, hot


# Suppose a cold day has a 30% chance of being followed by a hot day
# and a hot day has a 20% chance of being followed by a cold day.
# We can model this as:
transition_distribution = tfd.Categorical(probs=[[0.7, 0.3], 
                                                 [0.2, 0.8]])

# Suppose additionally that on each day the temperature is
# normally distributed with mean and standard deviation 0 and 5 on
# a cold day and mean and standard deviation 15 and 10 on a hot day.
# We can model this with:
observation_distribution = tfd.Normal(loc = [0., 15.], scale=[5., 10.])

In [4]:
# We can combine these distributions into a single week long
# hidden Markov model with:
model = tfd.HiddenMarkovModel(
    initial_distribution=initial_distributions,
    transition_distribution=transition_distribution,
    observation_distribution=observation_distribution,
    num_steps=7
)

In [5]:
# The expected temperatures for each day are given by:

model.mean()  # shape [7], elements approach 9.0


<tf.Tensor: shape=(7,), dtype=float32, numpy=
array([5.315155, 7.157577, 8.078789, 8.539393, 8.769697, 8.884848,
       8.942423], dtype=float32)>

In [6]:

# The log pdf of a week of temperature 0 is:

model.log_prob(tf.zeros(shape=[7]))

<tf.Tensor: shape=(), dtype=float32, numpy=-20.053804>