<i>STATUS: Draft<i>

In [30]:
import numpy as np
import sympy as sp
from IPython.display import HTML, IFrame
import matplotlib.pyplot as plt
from matplotlib import animation
from matplotlib.patches import Rectangle
from IPython.display import Image
import sys
import HTM_Code as hc
import pandas as pd

In the last notebook, we built our first encoder. Admittedly, it was very simple. It just allows us to create integers and provide them to, and provide back indice location. It provided a little extra functionality so it could track previous, and comparison. And we established some rules that we would like to see in place. 

We also discussed what similiarity metrics in such an environment might look like, based on the rules. Its important to emphasie that we have really conflated the idea of semantic meaning, with distance, which is problematic, and we will need to take a deeper look at what it means to really encode something semantically. 

Finally, the Encoder Class that was introduced in the last notebook is also included in the HTM_Code.py file for us to use


<div style="background:#99ddff; color:black; padding: 10px">
<b>Add to these notes:</b>

I want to keep these notes in HTML so I don't have to host on a server, but a great exceriise is to use the ipython widgets to experience this like it happens in the video. 
</div>

Let's double check everything still works, calling it now from our functions. We will create 3 different encoders, with differen types of features that we can now exploit:

In [31]:
class Encoder:
    def __init__(self, bit_space_size = None,
                number_of_bits_used_to_encode_value = None,
                min_val = None,
                max_val = None,
                is_randomly_distributed = None,
                clip_values_outside_range = None):

        self.bit_space_size = bit_space_size
        self.number_of_bits_used_to_encode_value = number_of_bits_used_to_encode_value
        self.clip_values_outside_range = clip_values_outside_range
        self.is_periodic = False
        self.is_randomly_distributed = is_randomly_distributed

        self.resolution = 1
        self.uniqueness = 1
        self.min_value_to_encode = min_val
        self.max_value_to_encode = max_val
        self.max_bit_space_value = bit_space_size
        self.min_bit_space_value = 0
        self.encoded_values = []
        self.encoded_values_bit_locations = []
        self.offset_for_array_indice = 1
        
        self.bucket_capacity = self.compute_bucket_capacity(self.bit_space_size, self.number_of_bits_used_to_encode_value)
        
        if self.is_randomly_distributed:
            self.initial_encoding = np.array(hc.create_randomised_sdr(self.bit_space_size, self.number_of_bits_used_to_encode_value))

            self.encoded_values_and_bit_locations = {str(self.min_value_to_encode):self.initial_encoding}
            self.encoded_values.append(self.min_value_to_encode)
            self.encoded_values_bit_locations.append(np.array(self.initial_encoding))
        
    def get_summary(self):
        print("----------------- SUMMARY -------------------------")
        print("|L3| Bit Space Size: ", self.bit_space_size)
        print("|L4| Number of bits to be used when encoding each value:", self.number_of_bits_used_to_encode_value)
        print("|L5| Range of values that can be encoded: From ", self.min_value_to_encode, ' to ', self.max_value_to_encode)
        print("|L6| Number of buckets available in bit space:", float(self.bucket_capacity))
        print("|L1| Encode periodically: ", self.is_periodic)
        print("|L1| Values are encoded as are randomly distributed arrays: ", self.is_randomly_distributed)
        print("|L1| Resolution: ", self.resolution)
        print("|L1| Unique active bits per bucket: ", self.uniqueness)
        print("|L2| Values outside range will to be clipped: ",self.clip_values_outside_range)
        print("|L7| Encoded values bit locations:\n ", self.encoded_values_bit_locations)
        print("|L8| Encoded values", self.encoded_values)
        print("----------------------------------------------------")

        
    def compute_bucket_capacity(self, n, w):
        if self.is_randomly_distributed:
            return(sp.binomial(self.bit_space_size, self.number_of_bits_used_to_encode_value))
        else:
            return(n - w + 1)

    def create_buckets_for_randomly_encoded_values(self, iterations_needed):
        
        for i in range(0, iterations_needed):
            random_bit_index_to_move = np.random.randint(0, self.number_of_bits_used_to_encode_value, 1)[0]
            random_direction_to_move = np.random.randint(0, 2, 1)

            next_sdr = self.encoded_values_bit_locations[-1].copy()
            value = next_sdr[random_bit_index_to_move]
            
            if random_direction_to_move == 1:
                value = next_sdr[random_bit_index_to_move] + 1
            else: 
                value = next_sdr[random_bit_index_to_move] - 1
                
            if value > self.max_bit_space_value:
                value = value - 2
            elif value < 0:
                value = value + 2

            next_sdr[random_bit_index_to_move] = value

            self.encoded_values_bit_locations.append(next_sdr.copy())
            self.encoded_values.append(np.array(self.encoded_values[-1] + 1))
            self.encoded_values_and_bit_locations[str(self.encoded_values[-1])] = next_sdr.copy()
  

    def encode_value_in_bit_space(self, value_choice):
        print("\nEncoding the value ->", value_choice)
        unclipped_value = value_choice
        if self.clip_values_outside_range:
            if value_choice < self.min_value_to_encode or value_choice > self.max_value_to_encode:
                if value_choice < self.min_value_to_encode:
                    value_choice = self.min_value_to_encode
                else:
                    value_choice = self.max_value_to_encode
                print("The value of: ", unclipped_value, "has been clipped to ->", value_choice)
            elif value_choice > self.min_value_to_encode or value_choice < self.max_value_to_encode:
                pass
        else:
            print("Not a valid choice, ", value_choice, " is outside encoder range")
            return

        
        if self.is_randomly_distributed:
            if (value_choice < self.encoded_values[-1]):
                print("There is a bucket already created for the value", value_choice, "-> ", self.encoded_values_and_bit_locations[str(value_choice)])
                if unclipped_value < self.min_value_to_encode or unclipped_value > self.max_value_to_encode:
                    print("This bucket will be used to encode", unclipped_value)
                    self.encoded_values_and_bit_locations[str(unclipped_value)] = self.encoded_values_and_bit_locations[str(value_choice)]
                return
            
            buckets_needed_to_encode_value = value_choice - self.encoded_values[-1]
            print("Current number of buckets: " , len(self.encoded_values))
            print("Value held in first bucket: ", self.min_value_to_encode)
            print("Number of additional buckets required to accomodate the value choice of", value_choice, ": ", buckets_needed_to_encode_value)
            self.create_buckets_for_randomly_encoded_values(buckets_needed_to_encode_value)
            self.encoded_values_and_bit_locations[str(unclipped_value)] = self.encoded_values_and_bit_locations[str(value_choice)]
        
        else:
            window = [value_choice, value_choice + self.number_of_bits_used_to_encode_value]
            all_values = np.arange(window[0], window[1])
            self.encoded_values_bit_locations.append(all_values)
            self.encoded_values.append(value_choice)
 

In [32]:
bit_space_size_choice = 64
number_of_bits_used_to_encode_value_choice = 8

e1 = Encoder(bit_space_size = bit_space_size_choice,
                number_of_bits_used_to_encode_value = number_of_bits_used_to_encode_value_choice,
                min_val = 0,
                max_val = 1,
            is_randomly_distributed = False,
            clip_values_outside_range = False)

In [33]:
e1.get_summary()

----------------- SUMMARY -------------------------
|L3| Bit Space Size:  64
|L4| Number of bits to be used when encoding each value: 8
|L5| Range of values that can be encoded: From  0  to  1
|L6| Number of buckets available in bit space: 57.0
|L1| Encode periodically:  False
|L1| Values are encoded as are randomly distributed arrays:  False
|L1| Resolution:  1
|L1| Unique active bits per bucket:  1
|L2| Values outside range will to be clipped:  False
|L7| Encoded values bit locations:
  []
|L8| Encoded values []
----------------------------------------------------


Unnamed: 0,date_time,power_consumption
1,7/2/10 0:00,21.2
2,7/2/10 1:00,16.4
3,7/2/10 2:00,4.7
4,7/2/10 3:00,4.7
5,7/2/10 4:00,4.6


In [None]:
df = pd.read_csv("./data/gymdata.csv", header=1)
df = df.rename(columns={"datetime": "date_time", "float": "power_consumption"})
df = df.iloc[1:]
df.date_time = pd.to_datetime(df.date_time, format="%m/%d/%y %H:%M")
df['date'] = [d.date() for d in df.date_time]
df['time_of_day'] = [d.time() for d in df.date_time]
df['weekday'] = [d.weekday() for d in df.date]
df['is_weekend'] = df.loc[df.weekday < 4, 'weekday'] = 0
df['is_weekend'] = df.loc[df.weekday >= 4, 'weekday'] = 1
df = df.drop('date_time', axis=1)
#df = df.drop('weekday', axis=1)

df.weekday

Now onto next episode of HTM School

In [5]:
IFrame("https://www.youtube.com/embed/PTYlge2K1G8", width=600, height=300)

This episode is about how we might build Date and Time Encoder, but the way I like to think of this is . But the idea behind this is really that join multiple encoders togehter. For example, I might be interested in an encoder that takes 365 values, one for each day of the year, or just 2 values (one for whether it is weekend or not the weekend), 4 values (telling me which of the seasons of the year it is), or a minute encoder (telling which minute of the day). 

Under the hood, these are each no different from the encoders that we were already looking at. Each value that would stored is simply and SDR with active bits and an SDR size. 

I could join each of these encoders together (by simply concatentating the arrays), and I would in principle have an encoder that tracks the meaning of theach these, and I could feed into it a date and time, and see thier differen they are. 

To explore this further, let's build another class: a multi enncoder

In [6]:
class MultiEncoder:
    def __init__(self):
        self.encoders = []
        self.bit_space_size = None
    def add_encoder(self, encoder):
        self.encoders.append(encoder)
    def join_encoders(self):
        pass

So this is a really powerful idea. We can think about multiple time series unfolding, date time components. Or we could think of symbolic music, track key changes

In [None]:
def compute_union_and_overlap(SDR1_on_bits, SDR2_on_bits):
    union = list(set(SDR1_on_bits).union(SDR2_on_bits))
    overlap = list(set(SDR1_on_bits).intersection(SDR2_on_bits))
    
    return({"union": union, "overlap": overlap})



This is starting to seem more like we saw in the early notebooks, we can how semantic similairty is affected by noise and subsampling

There is alot more we can do with encoders. Delta encoder, log encoder. A geospatial encoder particularly interesting, enconding values on a sphere, what other geometrical shapes, opens us up to diffent types of geometry adn topology also

Let's look at more encoders: 
https://numenta.com/assets/pdf/biological-and-machine-intelligence/BaMI-Encoders.pdf

IMportant to capture semantics properly

4 principles recall..... eg consider two numbers should be semantically similiar 

Note for encoders, we can encode anything we can put in a relationship of 

we need encoders to have to to incorporate noise and subsamplinig 

Encoding Daa for HTM systems - Purdy


Date Time Encoder

 
