Representing & checking Dataset schemas  

What would be the best way to canonically describe a dataset, which could be read by both humans and machines?

For example, frequently in our code we have docstrings which look something like:

```
def get_returns(security_ids):
    """
    Retuns mega-dimensional dataset which gives recent returns for a set of
        securities by:
    - Date
    - Return (raw / economic / smoothed / etc)
    - Scaling (constant / risk_scaled)
    - Span
    - Hedged vs Unhedged

    Dataset keys are security ids. All dimensions have coords.
    """
```

This helps when attempting to understand what code is doing while only reading it. 
But this isn't consistent between docstrings and can't be read or checked by a machine. 
Has anyone solved this problem / have any suggestions for resources out there?

Tangentially related to https://github.com/python/typing/issues/513 (but our issues are less about the type, dimension sizes, and more about the arrays within a dataset, their dimensions, and their names)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Representing & checking Dataset schemas #1900

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Representing & checking Dataset schemas #1900

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions