From f0d574c5e67b967ea5ed828abc0d26b0f34bc580 Mon Sep 17 00:00:00 2001 From: Fabio Buso Date: Tue, 2 Sep 2025 14:40:00 +0200 Subject: [PATCH] [FSTORE-1817] Online row size validation adds too much overhead for varbinary --- docs/user_guides/fs/feature_group/data_types.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/docs/user_guides/fs/feature_group/data_types.md b/docs/user_guides/fs/feature_group/data_types.md index 6c1d03e72..e0796d16c 100644 --- a/docs/user_guides/fs/feature_group/data_types.md +++ b/docs/user_guides/fs/feature_group/data_types.md @@ -139,10 +139,22 @@ The byte size of each column is determined by its data type and calculated as fo | VARCHAR(LENGTH) | LENGTH * 4 | | VARCHAR(LENGTH) charset latin1; | LENGTH * 1 | | TEXT | 256 | -| VARBINARY(LENGTH) | LENGTH / 1.4 | +| VARBINARY(LENGTH) | LENGTH | | BLOB | 256 | | other | 8 | +!!! note "VARCHAR / VARBINARY overhead" + + For VARCHAR and VARBINARY data types, an additional 1 byte is required if the size is less than 256 bytes. If the size is 256 bytes or greater, 2 additional bytes are required. + + Memory allocation is performed in groups of 4 bytes. For example, a VARBINARY(100) requires 104 bytes of memory: + + - 100 bytes for the data itself + - 1 byte of overhead + - Total = 101 bytes + + Since memory is allocated in 4-byte groups, storing 101 bytes requires 26 groups (26 × 4 = 104 bytes) of allocated memory. + #### Pre-insert schema validation for online feature groups For online enabled feature groups, the dataframe to be ingested needs to adhere to the online schema definitions. The input dataframe is validated for schema checks accordingly.