Developing a classifier:
========================
* Review of image as array
* Computing statistics on selected regions
* Filtering data via Boolean masking
* Using statistics to build an NDVI classifier
* Creating classifiers for water, building, clouds



In [None]:
#read in data
import landsat as lf
rgbpath = "data/landsat_RGBN.txt" 
rgbn = lf.landsat_read(rgbpath)

In [None]:
#find the clearest image
%matplotlib notebook
rgb_array = lf.rgb_display(rgbn[...,:3])

How do we unpack data?
============================================
![axis](figs/axis.png)

In [None]:
#rgb_array has three axis: [row, col, channel] axis=[0, 1, 2]
#unpack red, green, blue
red = rgb_array[...,0]
green = rgb_array[...,1]
blue = rgb_array[...,2]

In [None]:
#plot each of the individual channels and compare to image
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2,2, sharex=True, sharey=True)
rgb = axes[0,0].imshow(rgb_array)
r = axes[0,1].imshow(red, cmap="Reds", vmin=0, vmax=1)
fig.colorbar(r, ax=axes[0,1])
g = axes[1,0].imshow(green, cmap="Greens", vmin=0, vmax=1)
fig.colorbar(g, ax=axes[1,0])
b = axes[1,1].imshow(blue, cmap="Blues", vmin=0, vmax=1)
fig.colorbar(b, ax=axes[1,1])
for ax in axes.flatten():
    ax.set_adjustable('box-forced')
#to do: explore the following images and see if you can find links
#between the raw data and each of the color bands

How do we look at the distribution of the data?
===============================================


In [None]:
#plot histogram of red channel
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.hist(red.flatten(), color="red")
ax.set_ylabel("# of observations")
ax.set_xlabel("temperature bins")
fig.canvas.draw()

In [None]:
# add in blue channel
%matplotlib inline
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.hist(red.flatten(), color="red", alpha=0.3)
ax.hist(blue.flatten(), color='blue', alpha=0.3)
ax.set_ylabel("# of observations")
ax.set_xlabel("temperature bins")
fig.canvas.draw()

ToDo:
====
1. plot histograms for green and blue channels
2. plot histograms for raw (uncontrasted) data


Histogram => pdf
=========================

![img](figs/distro.jpg)
src: [NausicaaDistribution](https://www.etsy.com/listing/71739287/collection-of-10-distribution-plushies)

pdfs (probability distribution functions) are a representation of the chance (probability) of an observation occurring

In [None]:
import scipy.stats as st
import numpy as np

#estimate the P function of the distribution
kernal = st.gaussian_kde(red.flatten())
# color values range between 0 & 1 

x = np.linspace(0,1,100) #the range of potential color values
y = kernal(x) #probability of each x happening

In [None]:
#plot histogram and pdf of red channel
%matplotlib inline
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.hist(red.flatten(), color="red",normed=True, alpha=0.3, label="Hist(r)")
ax.plot(x, y, color="red", label="$P(r)$")
ax.set_ylabel("# of observations")
ax.set_xlabel("temperature bins")
ax.legend()
fig.canvas.draw()

ToDo
=====
1. plot pdfs for green and blue channels
2. plot pfds for raw (uncontrasted) data