[New Feature] Seeking alternative to Redis as Online Storage. BigTable research #48

pyalex · 2021-03-22T03:47:17Z

At Gojek we extensively use Feast in production with Redis Cluster as our storage layer. The capacity of our cluster is currently 1Tb, which is not too much, right? But we already started to bump into some limitations:

With huge chunk of memory under management Redis'es durability functionality is broken. Due to specific design (redis is single threaded application) almost all operations are sequential & blocking. Fork operation for example, which must be called every time to store snapshot, stops the world and no client requests are handled during it. With 40-60 G allocated by Redis process fork often can take up to 1 sec. This means that cannot actually using snapshoting on the same node, where we reading for serving.
Loading snapshot in memory can take up to 1 hour. In the cases of network partitioning, which probability increases with increasing amount of machine, replica may initiate full synchronization with its master. This means generating full snapshot of DB on master and then applying it on replica. During applying replica goes into error state always returning the same response "LOADING". This is not handled very well even by most advanced redis clients (see the issue)
TBD

That being sad we want to explore possible alternatives and try to preserve as best as possible performance (read latency) while simplifying scaling & increasing durability.

BigTable (research)

There are two main factors that impact read performance in BigTable:

Even distribution of request across all nodes
Size of individual rows

(This is under assumptions that all 100% of Feast reads are Random Access since we have no knowledges about distribution of ingested entities, neither about distribution of entities in get_online_features requests).

Let's focus on the row size.

Some facts that we know about BigTable

Each stored column includes column qualifier (full name + name length) and column name should be considered as data.
Each column is a list of cells and it can growth to infinity (by default) if GC not configured otherwise.
Each cell stores timestamp (in micros), hence has at least 8 bytes of metadata.
There's no data types, everything stored as bytes, thus additionally bytes array length must be stored (per each cell).

(3 + 4) means that each cell costs approx 12 bytes of metadata.

We also conducted several experiments to better understand underlying storage model and find approach that will make each individual stored row as compact as possible. We generated test dataset which consists of 20 features (type float / 4 bytes) and entity key with length 20 bytes. Hence, effective size of raw dataset is 1Gb.

Experiment A. Every feature stored as separate column (within one column family). Qualifier length is 10 bytes. Table size after ingestion is 4.4Gb
Experiment B. Every feature stored as separate column (within one column family). Qualifier length is 30 bytes. Table size after ingestion is 5.4Gb
Experiment C. Same as Experiment A, but with longer column family name. Table size after ingestion is 4.4Gb
Experiment D. All features stored in single byte array. Qualifier is empty. Table size after ingestion is 1.4Gb
Experiment E. 20 float features replaced with 10 double features. Effective data set size is the same. Qualifier length is 10 bytes. But now it's just 10 of them. Table size after ingestion is 2.0Gb

Conclusions:

Although qualifiers are stored as part of the row, dependency between qualifier length and row length is not linear. Our guess is it's being compacted somehow (most probably dictionary is used) but not on table level - most probably on partition level instead. (see experiments A vs B)
In the meantime compacting metadata (timestamp + byte array length) is much complicated and the difference between row with 10 columns and row with 20 columns is much bigger. (see experiment A vs E).
Column family name apparently not stored in the row and doesn't impact the row size.
Best compaction achieved with storing all data as single column (also recommended in BT documentation)

Proposed design

Taking into account these factors:

Feast features values will in most cases occupy less space than metadata (timestamp + qualifier + qualifier length) if stored one feature per column
BigTable doesn't have any notion of schema, thus we must somehow store feature type along with value, which would increase our metadata part even more and also take care of serialization / deserialization

we propose next design solution:

A. Use entities as table name
B. Use entity values as row key
C. Store all features (from single Feature Table) as single byte array (one column with empty qualifier)
D. Prepend byte array with schema reference (hash of schema of 4/8 bytes)
E. Store schema under separate key (schema#<hash>) in the same table

In our implementation #46 we also plan to use Avro as serializer (although we leave the room for alternatives).

Example

FeatureTable(
   name = "customer_orders",
   entities = ["customer_id", "merchant_id"],
   features = [
      Feature("money_spent", DOUBLE),
      Feature("orders_amount", INTEGER)
  ],
  max_age=Duration("300s")
)

Table: customer_id__merchant_id

Key	Column Family	Column Qualifier	Value
1111#2222	customer_orders TTL=300		`0x00 0x00 0xAA 0xAA` `0x00 0x00 0x01 ... 0x01` ^ schema reference ^ ^ avro-serialized features ^
schema#x00x00xAAxAA	metadata	avro	`{"type": "record", "fields": [{"name": "money_spent", "type": "double"}, {"name": "orders_amount", "type": "integer"}]`

The text was updated successfully, but these errors were encountered:

pradithya · 2021-03-22T05:09:27Z

A. Use entities as table name

Will it become performance issue considering some table will be bigger compared to others?
Also considering the recommendation from BigTable (https://cloud.google.com/bigtable/docs/schema-design#tables):

Creating many small tables is a Cloud Bigtable anti-pattern for a few reasons:

- Sending requests to many different tables can increase backend connection overhead, resulting in increased tail latency.
- Having multiple tables of different sizes can disrupt the behind-the-scenes load balancing that makes Cloud Bigtable function well.

pradithya · 2021-03-22T05:50:11Z

C. Store all features (from single Feature Table) as single byte array (one column with empty qualifier)
D. Prepend byte array with schema reference (hash of schema of 4/8 bytes)
E. Store schema under separate key (schema#) in the same table

If I understand correctly, there will be 2 query to BigQuery? One for retrieving the values and the second one retrieve the schema. Only after that the particular row will can be decoded. Isn't it going to be a slow process?

pyalex · 2021-03-22T06:04:53Z

C. Store all features (from single Feature Table) as single byte array (one column with empty qualifier)
D. Prepend byte array with schema reference (hash of schema of 4/8 bytes)
E. Store schema under separate key (schema#) in the same table

If I understand correctly, there will be 2 query to BigQuery? One for retrieving the values and the second one retrieve the schema. Only after that the particular row will can be decoded. Isn't it going to be a slow process?

Assumptions is that schema is not being updated very frequently and thus can be cached on serving side. And reference is a hash of the schema itself.

pyalex · 2021-03-22T06:12:06Z

A. Use entities as table name

Will it become performance issue considering some table will be bigger compared to others?
Also considering the recommendation from BigTable (https://cloud.google.com/bigtable/docs/schema-design#tables):
Creating many small tables is a Cloud Bigtable anti-pattern for a few reasons:

- Having multiple tables of different sizes can disrupt the behind-the-scenes load balancing that makes Cloud Bigtable function well.

Question is what's many and what's small. Assumptions is there're not so many entities (at Gojek we have 30-40 entities, having 300 Feature tables) and some datasets per entity reach 100 Gb.

Alternatively, we could add entity key into row key, but wouldn't that increase row size?

> - Sending requests to many different tables can increase backend connection overhead, resulting in increased tail latency.

For one Feast request we're always gonna need only one table.

On the balancing note, I was thinking to separate tables into different BigTable instances, probably using Feast projects. (one instance per project, eg). That will also help us to scale them independently. Some projects are used more than others.

@pradithya

pyalex changed the title ~~[New Feature] Seeking alternative for Redis as Online Storage. BigTable research~~ [New Feature] Seeking alternative to Redis as Online Storage. BigTable research Mar 22, 2021

pyalex added documentation Improvements or additions to documentation enhancement New feature or request labels Mar 22, 2021

pyalex mentioned this issue Mar 22, 2021

Bigtable as alternative Online Storage #46

Merged

terryyylim mentioned this issue Mar 26, 2021

Add support for BigTable Online Storage feast-dev/feast-java-old#17

Merged

pyalex closed this as completed in #46 Mar 31, 2021

judahrand mentioned this issue Nov 24, 2021

Switch from protobuf to arrow feast-dev/feast#2013

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Feature] Seeking alternative to Redis as Online Storage. BigTable research #48

[New Feature] Seeking alternative to Redis as Online Storage. BigTable research #48

pyalex commented Mar 22, 2021 •

edited

pradithya commented Mar 22, 2021

pradithya commented Mar 22, 2021

pyalex commented Mar 22, 2021 •

edited

pyalex commented Mar 22, 2021 •

edited

[New Feature] Seeking alternative to Redis as Online Storage. BigTable research #48

[New Feature] Seeking alternative to Redis as Online Storage. BigTable research #48

Comments

pyalex commented Mar 22, 2021 • edited

BigTable (research)

Proposed design

Example

pradithya commented Mar 22, 2021

pradithya commented Mar 22, 2021

pyalex commented Mar 22, 2021 • edited

pyalex commented Mar 22, 2021 • edited

pyalex commented Mar 22, 2021 •

edited

pyalex commented Mar 22, 2021 •

edited

pyalex commented Mar 22, 2021 •

edited