-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
Description
What would be the best way to canonically describe a dataset, which could be read by both humans and machines?
For example, frequently in our code we have docstrings which look something like:
def get_returns(security_ids):
"""
Retuns mega-dimensional dataset which gives recent returns for a set of
securities by:
- Date
- Return (raw / economic / smoothed / etc)
- Scaling (constant / risk_scaled)
- Span
- Hedged vs Unhedged
Dataset keys are security ids. All dimensions have coords.
"""
This helps when attempting to understand what code is doing while only reading it.
But this isn't consistent between docstrings and can't be read or checked by a machine.
Has anyone solved this problem / have any suggestions for resources out there?
Tangentially related to python/typing#513 (but our issues are less about the type, dimension sizes, and more about the arrays within a dataset, their dimensions, and their names)