Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use case: counting the average number of consecutive days BTC closes below its SMA50D #27

Closed
s-leroux opened this issue Apr 7, 2024 · 8 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@s-leroux
Copy link
Owner

s-leroux commented Apr 7, 2024

Implement the necessary code and documentation to cover the following use case:

We want to know the average number of consecutive days BTC closes below its SMA50D.

@s-leroux s-leroux added documentation Improvements or additions to documentation enhancement New feature or request labels Apr 7, 2024
@s-leroux
Copy link
Owner Author

s-leroux commented Apr 8, 2024

A possible solution:

quote.select(
    (fc.named("SMA"), fc.sma(50), "CLOSE"),
    (fc.named("COMP"), fc.lt, "CLOSE", "SMA"),
).group_by(
    ("COMP",),
    (f.cnamed("COUNT"), ag.count, "COMP"),
).select(
    (fc.ne, fc.constant(0), "COMP"),
).group_by(
    (fc.named("AVG"), ag.avg, "COUNT"),
)

This requires:

  • A group_by function with a different semantic from SQL, where rows are grouped by a consecutive sequence of values matching a condition.
  • A set of aggregate functions (ag.count, ag.avg).

Interestingly, tables defined in seq v0.1.0 had a group_by method. We may look at that for inspiration.

@s-leroux
Copy link
Owner Author

s-leroux commented Apr 8, 2024

Interestingly, tables defined in seq v0.1.0 had a group_by method. We may look at that for inspiration.

fin/fin/seq/table.py

Lines 165 to 193 in 9781eeb

def group(self, expr, aggregate_functions = {}):
"""
Return a new table with consecutive rows producing the same value for expr()
grouped together.
"""
def default_mapping(values):
return values[0]
def aggregate(dest, src):
row = [aggregate_functions.get(name, default_mapping)(column) for name, *column in zip(names, *src)]
dest.append(row)
names = self.names()
rows = self.row_iterator()
keys, = self.reval(expr)
prev = object()
group = []
result = []
for key, row in zip(keys, rows):
if key != prev:
prev = key
if group:
aggregate(result, group)
group = []
group.append(row)
if group:
aggregate(result, group)
return table_from_data([*zip(*result)], self.names())

@s-leroux
Copy link
Owner Author

s-leroux commented Apr 9, 2024

Given the implementation in c3b851d a possible solution would be:

quote.select(
    "DATE",
    (fc.named("SMA"), fc.sma(50), "CLOSE"),
    (fc.named("COMP"), fc.lt, "CLOSE", "SMA"), # <--- WRONG: `select()` is not recursive. We can't reference columns created in the same statement
).group_by(
    "COMP",
    (ag.first, "DATE"),
    (ag.count, fc.named("COUNT"), "COMP"),
).where(
    (fc.ne, fc.constant(0), "COMP"),
).group_by(
    fc.constant(1),
    (ag.avg, fc.named("AVG"), "COUNT"),
)
  • I am not quite sure of the last group_by clause.
  • It is also worth noting that it is rather confusing to omit the DATE column in the select clause because it is implied, while it is mandatory in group_by. (fixed by a5aff7b)
  • The first select() statement is wrong since we can't reference a column created in the same statement (whereas it is possible in create())

@s-leroux
Copy link
Owner Author

  • The first select() statement is wrong since we can't reference a column created in the same statement (whereas it is possible in create())

Possible solution:
Introduce the extend() predicate that works like create() but starts with a copy of the receiver's columns instead of an empty list.

The above statement might be rewritten as:

quote["CLOSE"].extend(
    (fc.named("SMA"), fc.sma(50), "CLOSE"),
    (fc.named("COMP"), fc.lt, "CLOSE", "SMA"),
).group_by(
...

@s-leroux
Copy link
Owner Author

Possible solution: Introduce the extend() predicate that works like create() but starts with a copy of the receiver's columns instead of an empty list.

Implemented in 03db264.

@s-leroux
Copy link
Owner Author

s-leroux commented Apr 11, 2024

...
).where(
    "COMP",
).group_by(
...

We also need a where() predicate to select rows.

The where predicate returns a series similar to the receiver, but containing only the rows satisfying some condition(s).

@s-leroux
Copy link
Owner Author

We also need a where() predicate to select rows.

Implemented in 1999c03.

@s-leroux
Copy link
Owner Author

Full use case implemented in ae6efd4 as:

    quote["Close"].extend(
        (fc.named("Sma"), fc.sma(50), "Close"),
        (fc.named("Comp"), fc.lt, "Close", "Sma"),
    ).group_by(
        "Comp",
        (ag.first, "Date"),
        (ag.first, "Comp"),
        (ag.count, fc.named("Count"), "Comp"),
    ).where(
        "Comp",
    ).group_by(
        "Comp",
        (ag.first, "Date"),
        (ag.avg, fc.named("Avg"), "Count"),
    )

s-leroux added a commit that referenced this issue Apr 11, 2024
- create()
- select()
- extend()
- sort_by()
- group_by()
- where()

See docs/snippets/snippet_counting_001.py for an example.
Close #27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant