Skip to content

Static type check for DataFrame types #17935

@biosunsci

Description

@biosunsci

Feature

We Know that mypy and typing now support

from typing import TypedDict, Optional, Literal
class OverlapsDict(TypedDict):
    id: int
    seq_id: int
    pr_order: int
    pos1: int
    seq: str    
    pos2: int
    seq_len: int
    repeat_info: float # exactly, should be np.nan
    repeat_type: float # exactly, should be np.nan
    item_type: Literal['overlap']
    devmode: str
    update_time: str

we can use OverlapsDict to restrict dict parameters like

def myfunc(a:OverlapsDict):
    pass

but in a lot of Data Science senerios, we need this parameter to be a DataFrame with certain columns in certain dtypes. is it possible
to achieve a new type class TypedDataFrame which can be used as the following code?

class OverlapsDataFrame(TypedDataFrame):
    id: int
    seq_id: int
    pr_order: int
    pos1: int
    seq: str    
    pos2: int
    seq_len: int
    repeat_info: float # exactly, should be np.nan
    repeat_type: float # exactly, should be np.nan
    item_type: Literal['overlap']
    devmode: str
    update_time: str

and can restrict DataFrame parameters with the OverlapsDataFrame

def myfunc2(a:OverlapsDataFrame):
    pass

constraint is a must be a DataFrame and has columns of certain names with certain types defined by OverlapsDataFrame

Pitch

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions