# Signal sources and Samplers

As we saw before, a `SignalSource` is simply a callable that takes no argument and returns some data point. Examples of `SignalSource`s included in the library are current mouse position, camera snapshot, random noise, and keyboard information (is the user pressing a given key?) 

In [None]:
from genki_signals.sources import MouseSource, CameraSource, RandomNoise, KeyboardSource

cam = CameraSource()
cam.start()
img = cam()

print(img.shape)

noise = RandomNoise()   
noise.start()
print(noise())

kb = KeyboardSource(keys=['enter'])
kb.start()
print(kb())

We also introduced the `Sampler` to sample from these sources at a given rate. However, there are also devices that act as a `SignalSource` but also contain their own `Sampler`. An example of this is the microphone, which defines its own sample rate which we have no control over:

In [None]:
from genki_signals.sources import MicSource

mic = MicSource()
mic.sample_rate

This distinction is somewhat arbitrary - we could, theoretically, sample a single value from the microphone and then wrap it in a Sampler. But A) such a single sample is extremely unlikely to be useful and B) the `Sampler` will not be able to reach such a high sample rate because of how it is implemented (samples a single data point at a time and then tries to sleep for the right amount of time). The `MicSource` has `chunk_size` with a default value of 1024 that delegates the complex sampling process to lower level software and only receives data chunks of that size.

On the other hand, we could also have called the camera a `Sampler` - to record video there is a limit to how many frames per second we can reasonably record. But for a camera it might be quite useful to grab a single frame on some other (slower) schedule.

The distinction between `SignalSource` and `Sampler` is mostly useful to combine separate sources into one. Suppose we have some external device that we are streaming data from - e.g. a chip with an IMU sensor, and we want to create a labelled dataset of some movements for machine learning. We can use the keyboard to label the data: we press some key when we do the movement. So we want to combine some bluetooth signal source with the keyboard one, and we need to synchronise their timestamps somehow. Clocks in a distributed system are notoriously hard to deal with so in a situation like this it is best to mark one device as the master clock. We make the bluetooth source the sampler and each time we receive a data point we query the keyboard source for a value.

In the following example we use a single `Sampler` object to combine data from the mouse and the keyboard.

In [None]:
from genki_signals.sources import Sampler
from genki_signals.system import System
from genki_signals.frontends import WidgetDashboard, Line

kb = KeyboardSource(keys=['shift_r'])
mouse = MouseSource()

sampler = Sampler({
    'mouse': mouse,
    'keyboard': kb
}, sample_rate=30)

system = System(sampler)
system.start()


WidgetDashboard(widgets=[
    Line(system, "timestamp", "mouse"),
    Line(system, "timestamp", "keyboard_pressing_shift_r")
])

In [None]:
system.stop()

### The `DataBuffer`

At this point it is worth introducing one of the key data structures underlying Genki Signals, the `DataBuffer`. The `DataBuffer` is similar to a pandas `DataFrame`, it acts as a mapping from names to sequences. There are two major differences between a `DataBuffer` and a `DataFrame`: 
* `DataBuffer`s are implemented in numpy and generally much faster than `DataFrames`
* Series in `DataBuffer`s can be n-dimensional. 

Having signals be n-dimensional can be very useful. For example, if we have a 3D signal from a gyroscope we can stream it into a buffer and use the name `'gyro'` - using a `DataFrame` we would have to separate into e.g. `'gyro_x'`, `'gyro_y'`, and `'gyro_z'`. An even better example is the camera: a video signal might have the shape `(height, width, n_channels, t)` which we can store under a single name in a `DataBuffer`. Each entry in a `DataBuffer` is just a numpy `ndarray`, and they are synced over the last dimension which is assumed to be time.

A `DataBuffer` can be arbitrarily large or have a maximum length in which case it acts as a circular buffer.

In [None]:
from genki_signals.buffers import DataBuffer
import numpy as np

buffer = DataBuffer(maxlen=400)

buffer['3d_signal'] = np.random.rand(3, 200)
buffer

In [None]:
for i in range(300):
    buffer.append({'3d_signal': np.array([1,2,3])})

buffer

In [None]:
buffer.extend({'3d_signal': np.ones((3, 200))})
buffer

Data buffers may be useful in their own right, but the reason we have introduced them here is that they are an important part of the inner workings of Genki Signals. For example, the way most signal sources work is that they run a separate thread for collecting data, and their API consists only of one function: `read()` that returns all data points collected since the last call to `read()`, and it returns those points in a `DataBuffer`:

In [None]:
import time

mouse = Sampler({'mouse': MouseSource()}, sample_rate=100)
mouse.start()

time.sleep(3)
collected_data = mouse.read()
collected_data

## Creating a custom SignalSource / Sampler

Creating your own `SignalSource` is easy: you just need some callable that takes no arguments and returns a data point. Then you can wrap it in the basic `Sampler`. 

Creating your own `Sampler` is slightly more complicated, but still quite easy. In this example we create a `Sampler` that checks for the current exchange rate between the icelandic króna and some other currency using the API at http://apis.is/currency/m5 - We want to do this as fast as we can, so it acts as its own sampler, where the sample rate depends on factors such as network speed etc. 

In [None]:
import requests
import json
from queue import Queue
import time
from threading import Thread

from genki_signals.sources.sampler import SamplerBase

class ExchangeRateSource(SamplerBase):
    def __init__(self, currency):
        self.is_active = False
        # reading / writing happen in separate threads so we 
        # need a thread-safe queue for intermediate results
        self.queue = Queue()
        self.currency = currency
    
    def _run(self):
        while self.is_active:
            r = requests.get('http://apis.is/currency/m5')
            currency_data = json.loads(r.text).get('results') or []
            for c in currency_data:
                if c.get('shortName') == self.currency:
                    data_point = {
                        'timestamp': time.time(),
                        f'{self.currency}_exchange_rate': c['value']
                    }
                    self.queue.put(data_point)
    
    def read(self):
        data = DataBuffer()
        while not self.queue.empty():
            data.append(self.queue.get())
        return data
                    
    def start(self):
        self.is_active = True
        self.main_thread = Thread(target=self._run)
        self.main_thread.start()
        
    def stop(self):
        self.is_active = False
        self.main_thread.join() 

We inherit from `SamplerBase`, and need to implement `start()`, `stop()`, and `read()`.

Let's try using this to get a stream of exchange rates for the Japanese Yen (JPY):

In [None]:
jpy_source = ExchangeRateSource('JPY')
jpy_source.start()

time.sleep(1)
data = jpy_source.read()

In [None]:
data['JPY_exchange_rate']

It works! Although the exchange rate probably moves slower than we can query the API, and we are being impolite to the API providers which might block our requests as spam if we keep this up.

In [None]:
jpy_source.stop()