# What Is This Notebook?

This notebook an exploration of using [zadd](https://redis.io/commands/zadd) add data into redis without having any duplicates. Instead of querying for data directly to determine if a piece of data has already been added into the database, we'll instead use the zadd command. We'll serialize information in json format then save it at a given timestamp. The save format will look a little bit like the following:

```py
{
    b"{...relevant data goes here}": float(timestamp) 
}
```

We'll be able to query all timeseries data using that timestamp we set inside of the zadd function. Technically it's called the `score` inside of redis. It's how we order separate keys of information.


Generally we'll be querying using the following concepts:

* [ZRANGEBYSCORE](https://redis.io/commands/zrangebyscore) - We'll be able to query by timeindex.
* [ZREMRANGEBYSCORE](https://redis.io/commands/zremrangebyscore) - We'll be able to remove data by time index. We could use it to remove the first, last, or any section of data if need be.
* [ZRANGE](https://redis.io/commands/zrange) - We'll be able to possibly get the last inputed item.



Generally the data is added into redis the following way:


```py

redis = Redis()

data = {"...key": float(timestamp), "...key2": float(timestamp)}

redis.zadd("event_key", *data)
```

In [33]:
import uuid
import maya
import orjson
from copy import copy
from redis import Redis
from typing import List, Set
from jamboree.utils.helper import Helpers

In [34]:
redis = Redis()
helper = Helpers()

# Helper Functions

---
Functions we need to officially handle the dictionary/set manipulation. They're heavily used inside of the save and query latest function.


### Using Maya to handle `epoch` time

`maya` is a heavily simplified time handling library. You might be wondering why we're using it over datetime and time. Well, that's because of all of the amazing features it has compared to the two native libraries. It allows us to have a variance of handling timeseries information.


Just have a look at the ways people can use it simply:


```py
>>> now = maya.now()
<MayaDT epoch=1481850660.9>

>>> tomorrow = maya.when('tomorrow')
<MayaDT epoch=1481919067.23>

>>> tomorrow.slang_date()
'tomorrow'

>>> tomorrow.slang_time()
'23 hours from now'

# Also: MayaDT.from_iso8601(...)
>>> tomorrow.iso8601()
'2017-02-10T22:17:01.445418Z'

# Also: MayaDT.from_rfc2822(...)
>>> tomorrow.rfc2822()
'Fri, 10 Feb 2017 22:17:01 GMT'

# Also: MayaDT.from_rfc3339(...)
>>> tomorrow.rfc3339()
'2017-02-10T22:17:01.44Z'

>>> tomorrow.datetime()
datetime.datetime(2016, 12, 16, 15, 11, 30, 263350, tzinfo=<UTC>)

# Automatically parse datetime strings and generate naive datetimes.
>>> scraped = '2016-12-16 18:23:45.423992+00:00'
>>> maya.parse(scraped).datetime(to_timezone='US/Eastern', naive=True)
datetime.datetime(2016, 12, 16, 13, 23, 45, 423992)

>>> rand_day = maya.when('2011-02-07', timezone='US/Eastern')
<MayaDT epoch=1297036800.0>

# Maya speaks Python.
>>> m = maya.MayaDT.from_datetime(datetime.utcnow())
>>> print(m)
Wed, 20 Sep 2017 17:24:32 GMT

>>> m = maya.MayaDT.from_struct(time.gmtime())
>>> print(m)
Wed, 20 Sep 2017 17:24:32 GMT

>>> m = maya.MayaDT(time.time())
>>> print(m)
Wed, 20 Sep 2017 17:24:32 GMT

>>> rand_day.day
7

>>> rand_day.add(days=10).day
17

# Always.
>>> rand_day.timezone
UTC

# Range of hours in a day:
>>> maya.intervals(start=maya.now(), end=maya.now().add(days=1), interval=60*60)
<generator object intervals at 0x105ba5820>

# snap modifiers
>>> dt = maya.when('Mon, 21 Feb 1994 21:21:42 GMT')
>>> dt.snap('@d+3h').rfc2822()
'Mon, 21 Feb 1994 03:00:00 GMT'
```



As you can see, the maya library can handle a wide array of time manipulation. This is wonderful as we're moving between the different formats of time. Using it we can also make time manipulation more robust over time. 

Simply adding the text `ten_ago = maya.now().sub(days=10)` allows us to get 10 days prior to the current point in time. We could then take that time data then convert into epoch required to query our data.

`ten_ago._epoch` - This allows us to get an `epoch` to the milisecond.

In [35]:
def add_time(item:dict, _time:float, rel_abs="absolute"):
    """ Adds time to the dictionaries we query. """
    if rel_abs == "absolute":
        item['timestamp'] = _time
    else:
        item['time'] = _time
    return item

In [36]:
def generate_dicts(data, _time, timestamp):
    relative = copy(data)
    absolute = copy(data)
    relative['time'] = _time
    absolute['timestamp'] = _time
    return {
        "relative": relative,
        "absolute": absolute
    }

In [37]:
def sorted_z_to_dict(zset:List[Set], rel_abs="absolute"):
    if len(zset) == 0 or abs_rel not in ["absolute", "relative"]:
        return []
    
    times = [x[1] for x in zset]
    dicts = [add_time(orjson.loads(x[0]), times[i], rel_abs) for i, x in enumerate(zset)]
    return dicts

In [38]:
def dictify(azset:List[Set], rzset:List[Set]):
    """Creates a single dictionary that represents the information we intend to query. """
    if len(azset) == 0 or len(rzset) == 0:
        return {}
    adict = {}
    for azs in azset:
        item, time = azs
        if item == b'{"placeholder": "place"}':
            continue
        current_item = adict.get(item, {})
        current_item['timestamp'] = time
        adict[item] = current_item
    
    # Set the relative time
    for rzs in rzset:
        item, time = rzs
        if item == b'{"placeholder": "place"}':
            continue
        current_item = adict.get(item, {})
        current_item['time'] = time
        adict[item] = current_item
    
    return adict

In [39]:
def deserialize_dicts(dictified:dict):
    _deserialized = []
    for key, value in dictified.items():
        _key = orjson.loads(key)
        _key['time'] = value.get("time", maya.now()._epoch)
        _key['timestamp'] = value.get("timestamp", maya.now()._epoch)
        _deserialized.append(_key)
    return _deserialized

In [40]:
def check_time(_time:float, _timestamp:float, local_time:float, local_timestamp:float):
    current_time = maya.now()._epoch
    
    
    if local_time is not None:
        _time = local_time
    elif _time is None:
        _time = current_time
    if local_timestamp is not None:
        _timestamp = local_timestamp
    elif _timestamp is None:
        _timestamp = current_time
    
    return {
        "time": _time,
        "timestamp": _timestamp
    }

In [41]:
def separate_time_data(data:dict, _time:float=None, _timestamp:float=None):
    local_time = data.pop("time", None)
    local_timestamp = data.pop("timestamp", None)
    timing = check_time(_time, _timestamp, local_time, local_timestamp)
    return data, timing

## Save Functions

* **save** - Save a single record at a specified time. The specified time will have both **relative** and **absolute** time. 
    - **Relative time** - This is the time specified by the data source. This is like the index inside of a timeseries dataframe. We'll query these times by epoch time.
    - **Absolute time** This is the time the record is saved into the database. The general idea here is that you'll be able to get the data in the order we entered it in as well as the actual timing of the data. We save the data in two *zlists* to represent how we want to query the data.
* **save_many**
    * Exactly the same as above, only with lots of data at once. This will be useful when we want to save lots of data representing a single data source.

In [42]:
def save(query, data, _time=None, _timestamp=None):
    if not helper.validate_query(query):
        return 
    _hash = helper.generate_hash(query)
    
    
    query.update(data)
    
    
    
    data, timing = separate_time_data(query, _time, _timestamp)
    
    relative_time_key = f"{_hash}:rlist"
    absolute_time_key = f"{_hash}:alist"

    
    # Generate Data
    mono = orjson.dumps(data)
    relative_data = {
        mono: timing["time"]
    }
    absolute_data = {
        mono: timing["timestamp"]
    }    

    redis.zadd(relative_time_key, relative_data)
    redis.zadd(absolute_time_key, absolute_data)

# Query Functions

Query all of the data according to our parameters. You'll see the conventional query key query up top. You'll see the `abs_rel` parameter inside of query_latest. The general idea here is that you'll be able to query from either the relative or absolute time factor. Try it out.


The query functions you'll have to work on are the following:

1. Query
2. Query Latest - Get the n latest records according to our query parameters.
3. Query Between - Query between two epoch times
4. Query Before - Query and get everything before an epoch time
5. Query After - Query and get everything after epoch time

In [43]:
def query(_query):
    if not helper.validate_query(_query):
        return 
    _hash = helper.generate_hash(_query)
    relative_time_key = f"{_hash}:rlist"
    absolute_time_key = f"{_hash}:alist"
    keys = redis.zrange(relative_time_key, 0, -1, withscores=True)
    akeys = redis.zrange(absolute_time_key, 0, -1, withscores=True)

    dicts = dictify(akeys, keys)
    combined = deserialize_dicts(dicts)
    return combined

In [44]:
def query_latest(_query, abs_rel="absolute"):
    if not helper.validate_query(_query) or abs_rel not in ["absolute", "relative"]:
        return 
    _hash = helper.generate_hash(_query)
    
    _current_key = ""
    if abs_rel == "absolute":
        _current_key = f"{_hash}:alist"
    else:
        _current_key = f"{_hash}:rlist"
    
    blank_keys = [(b'{"placeholder": "place"}', 0)]
    keys = redis.zrange(_current_key, -1, -1, withscores=True)
    dicts = dictify(keys, blank_keys)
    combined = deserialize_dicts(dicts)
    return combined

In [45]:
def query_latest_many(_query, abs_rel="absolute", limit:int=10):
    if not helper.validate_query(_query) or abs_rel not in ["absolute", "relative"]:
        return 
    _hash = helper.generate_hash(_query)
    
    _current_key = ""
    if abs_rel == "absolute":
        _current_key = f"{_hash}:alist"
    else:
        _current_key = f"{_hash}:rlist"
    
    blank_keys = [(b'{"placeholder": "place"}', 0)]
    keys = redis.zrange(_current_key, -limit, -1, withscores=True)
    dicts = []
    if abs_rel == "absolute":
        dicts = dictify(keys, blank_keys)
    else:
        
        dicts = dictify(blank_keys, keys)
    
    combined = deserialize_dicts(dicts)
    return combined

In [46]:
episode = uuid.uuid1().hex

In [47]:
save({"type": "hello", "episode": episode}, {"name":"world"}, _time=(maya.now()._epoch + 3600))
save({"type": "hello", "episode": episode}, {"name":"world", "my": "world"}, _time=(maya.now()._epoch))

In [48]:
query_latest({"type": "hello", "episode": episode}, abs_rel="relative")

[{'type': 'hello',
  'episode': '9a9126dc396a11ea8afc80c5f21e8205',
  'name': 'world',
  'time': 1579294061.359215,
  'timestamp': 1579297661.2312005}]

In [49]:
query_latest({"type": "hello", "episode": episode})

[{'type': 'hello',
  'episode': '9a9126dc396a11ea8afc80c5f21e8205',
  'name': 'world',
  'my': 'world',
  'time': 1579294061.4644415,
  'timestamp': 1579294061.2422016}]

In [50]:
query_latest_many({"type": "hello", "episode": episode}, abs_rel="relative", limit=100)

[{'type': 'hello',
  'episode': '9a9126dc396a11ea8afc80c5f21e8205',
  'name': 'world',
  'my': 'world',
  'time': 1579294061.2421703,
  'timestamp': 1579294061.5966113},
 {'type': 'hello',
  'episode': '9a9126dc396a11ea8afc80c5f21e8205',
  'name': 'world',
  'time': 1579297661.2312005,
  'timestamp': 1579294061.5966148}]

In [51]:
all_items = [
    {
        "data": {"name":"world", "my": "world", "_id": uuid.uuid4().hex},
        "timestamp": maya.now()._epoch
    },
    {
        "data": {"name":"world", "my": "world", "_id": uuid.uuid4().hex},
        "timestamp": maya.now()._epoch
    },
    {
        "data": {"name":"world", "my": "world", "_id": uuid.uuid4().hex},
        "timestamp": maya.now()._epoch
    },
    {
        "data": {"name":"world", "my": "world", "_id": uuid.uuid4().hex},
        "timestamp": maya.now()._epoch
    },
    {
        "data": {"name":"world", "my": "world", "_id": uuid.uuid4().hex},
        "timestamp": maya.now()._epoch
    },
    {
        "data": {"name":"world", "my": "world", "_id": uuid.uuid4().hex},
        "timestamp": maya.now()._epoch
    },
    {
        "data": {"name":"world", "my": "world", "_id": uuid.uuid4().hex},
        "timestamp": maya.now()._epoch
    },
]

In [52]:
def convert_to_storable(items:list):
    savable = {}
    for item in items:
        item_json = orjson.dumps(item.get("data", {}))
        timestamp = item.get("timestamp", maya.now()._epoch)
        savable[item_json] = timestamp
    
    return savable

In [53]:
convert_to_storable(all_items)

{b'{"name":"world","my":"world","_id":"5960f469ec16421f907be9dc55fa708e"}': 1579294061.7396572,
 b'{"name":"world","my":"world","_id":"99ae96939bca4377b368b204cdeccfff"}': 1579294061.739671,
 b'{"name":"world","my":"world","_id":"d595b66b1ba04fee955025c1a84b25af"}': 1579294061.7397392,
 b'{"name":"world","my":"world","_id":"8ce14e00da5042778e9af0475da471d4"}': 1579294061.7399614,
 b'{"name":"world","my":"world","_id":"42ade183d2784c89894314a6c80662dc"}': 1579294061.7399752,
 b'{"name":"world","my":"world","_id":"a04a0eb797dd4fd3959bfdd95be3b639"}': 1579294061.7399848,
 b'{"name":"world","my":"world","_id":"24dcc0d5fabe443697d28055519a454e"}': 1579294061.7399924}