Resampling MVP #1010

alexowens90 · 2023-10-30T09:59:00Z

The aim is to provide an API and minimal feature set that can be extended to cover all of the (useful) functionality provided by Pandas resample method.

Long term we will need to support the functionality provided by the arguments rule, closed, label, origin, offset.

For non-trivial bucket boundaries (e.g. last Thursday of every month) we should leverage Pandas to generate the actual boundaries of interest to pass to the C++ layer. For simpler boundaries (e.g. minute bars) we can have a more compact representation, although this is not required for the MVP.

Proposed MVP:

Do not support upsampling, only downsampling.
Present the resampled data back to the user, with no option to write the data back to another symbol directly.
Leverage Pandas to convert rule, origin, and offset into a list of pairs of UTC timestamps stored as int64_t nanoseconds since epoch representing the bucket boundaries.
Pass the closed and label arguments directly through to the clause constructor.
Use the QueryBuilder directly rather than adding syntactic-sugar methods to the Library or NativeVersionStore classes.
Use "data driven" approach to empty buckets. i.e. only include buckets in the output for which there was an index value in the appropriate range.
Have a single clause to handle resampling (as opposed to the 2-stage process for hash-based groupings) since the repartition would always be trivial for resampling.
Static schema supported only
No "named agg" equivalent, so only one aggregation possible per input column

e.g.

q = QueryBuilder()
q = q.resample(rule="T", closed="left", label="left", origin="epoch", offset=None).agg({"open": first, "close": "last"})
df = lib.read(sym, date_range=(t1, t2), query_builder=q).data

The text was updated successfully, but these errors were encountered:

DrNickClarke · 2023-11-16T16:00:33Z

Supported aggregations : sum, mean, min, max, count, first, last

all NaN-correct
first, last and count support strings

mosaikme · 2023-12-20T01:43:23Z

WOW , this would be such an epic addition, to this alrady awsome libary (db). esp with the newly (nearly) added first, last, count. I would be happy to help in testing . All the best.

Closes #1010

alexowens90 added the enhancement New feature or request label Oct 30, 2023

alexowens90 self-assigned this Oct 30, 2023

alexowens90 mentioned this issue Oct 30, 2023

EPIC: Resampling #1009

Closed

1 task

alexowens90 mentioned this issue May 15, 2024

Resampling MVP #1495

Merged

alexowens90 closed this as completed in #1495 May 30, 2024

alexowens90 added a commit that referenced this issue May 30, 2024

Resampling MVP (#1495)

ac846a8

Closes #1010

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resampling MVP #1010

Resampling MVP #1010

alexowens90 commented Oct 30, 2023 •

edited

Loading

DrNickClarke commented Nov 16, 2023

mosaikme commented Dec 20, 2023

Resampling MVP #1010

Resampling MVP #1010

Comments

alexowens90 commented Oct 30, 2023 • edited Loading

DrNickClarke commented Nov 16, 2023

mosaikme commented Dec 20, 2023

alexowens90 commented Oct 30, 2023 •

edited

Loading