Wrong return type of `read_csv` if using IO-Objects #87

Barry1 · 2021-08-10T18:49:21Z

When I send an StringIO parameter to read_csv I get typing errors telling the result would be a TextFileReader.
But this feels completely wrong.
In all my work, it returns a DataFrame for example on an StringIO.
Thus I propose to change.
A corresponding pr would come in a few seconds.

The text was updated successfully, but these errors were encountered:

jakebailey · 2021-08-10T19:21:22Z

I think the overloads overall need to be different besides just that one change; reading the implementation and docs, it's dependant on chunksize or iterator being set that causes it to return a TextFileReader.

Barry1 · 2021-08-10T19:26:43Z

I think the overloads overall need to be different besides just that one change; reading the implementation and docs, it's dependant on chunksize or iterator being set that causes it to return a TextFileReader.

I think, you are completely right. I further think, that a first measure should be, to take care of the ReturnValue, which would help others most. As we only have stubs here, I think changing them will not harm.

Dr-Irv · 2021-10-12T20:49:12Z

I spent way too much time on this today, and then discovered this issue. I'm not sure how python type checking with overloads can do this right. Maybe @jakebailey or @erictraut have some ideas (see example below):

The design of the API changes the return type to TextFileReader under any of the following conditions:

chunksize is specified and integer and not None
iterator is specified and set to True

So you have to find a way in the overloads to handle all the possible combinations of chunksize and iterator, both of which are keyword arguments, and either or both may not be specified. That is reflected in the following table

`iterator`	`chunksize`	Return Type
Not specified	Not specified	`DataFrame`
`False`	Not specified	`DataFrame`
`False`	`None`	`DataFrame`
`False`	An integer	`TextFileReader`
`True`	Not specified	`TextFileReader`
`True`	`None`	`TextFileReader`
`True`	An integer	`TextFileReader`
Not specified	`None`	`DataFrame`
Not specified	An integer	`TextFileReader`

To help me figure this out, I wrote a simple test, where i1 is iterator and chunksize is cs, and a return type of int corresponds to DataFrame and a return type of str corresponds to TextFileReader :

@overload
def myfun(fake: str, i1: Literal[True], cs: Literal[None]) -> str:
    ...


@overload
def myfun(fake: str, i1: Literal[True], cs: int) -> str:
    ...


@overload
def myfun(fake: str, i1: Literal[True]) -> str:
    ...


@overload
def myfun(fake: str) -> int:
    ...


@overload
def myfun(fake: str, i1: Literal[False]) -> int:
    ...


@overload
def myfun(fake: str, i1: Literal[False], cs: Literal[None]) -> int:
    ...


@overload
def myfun(fake: str, i1: Literal[False], cs: int) -> str:
    ...


# @overload
# def myfun(fake: str, i1: bool = False, cs: None = None) -> int:
#     ...


# @overload
# def myfun(fake: str, i1: bool = ..., cs: int = ...) -> str:
#     ...


def myfun(fake: str, i1: bool = False, cs: Optional[int] = None) -> Union[str, int]:
    print(f"i1 is {i1} cs is {cs} result is ", end="")
    if i1:
        if cs is not None:
            return "TextFileReader"
        else:
            return "TextFileReader"
    else:
        if cs is not None:
            return "TextFileReader"
        else:
            return -1


res1: int = myfun("meh")
res2: int = myfun("meh", i1=False)
res3: int = myfun("meh", i1=False, cs=None)
res4: str = myfun("meh", i1=False, cs=23)
res5: str = myfun("meh", i1=True)
res6: str = myfun("meh", i1=True, cs=None)
res7: str = myfun("meh", i1=True, cs=23)
res8: int = myfun("meh", cs=None)
res9: str = myfun("meh", cs=23)

I could not figure out how to get the last 2 function calls (which correspond to the last 2 lines in the table) to have a proper overload, because in those cases, i1 (equivalently iterator) is not specified, but cs is specified. I've commented out some attempts but I'd get complaints about overlapping signatures.

If someone can show me how to get the above to work, then I could make the changes to read_csv() typing to correspond.

erictraut · 2021-10-12T21:47:53Z

I think this can be achieved by the following:

@overload
def myfun(fake: str, i1: Literal[True], cs: int | None = ...) -> str:
    ...

@overload
def myfun(fake: str, i1: Literal[False], cs: int) -> str:
    ...

@overload
def myfun(fake: str, i1: Literal[False] = ..., cs: None = ...) -> int:
    ...

@overload
def myfun(fake: str, i1: bool = ..., cs: int = ...) -> str:
    ...

Dr-Irv · 2021-10-12T22:02:08Z

Thanks @erictraut . That makes pyright happy, but mypy complains. I'll open up an issue there.

Barry1 mentioned this issue Aug 10, 2021

Update parsers.pyi #88

Closed

This was referenced Oct 12, 2021

false positive: "Overloaded function signatures ... overlap with incompatible return types" python/mypy#11165

Open

fix overloads for read_csv and read_table #109

Merged

jakebailey closed this as completed in #109 Oct 14, 2021

Dr-Irv mentioned this issue Feb 4, 2022

fix read_csv to work with different overloads #143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong return type of `read_csv` if using IO-Objects #87

Wrong return type of `read_csv` if using IO-Objects #87

Barry1 commented Aug 10, 2021

jakebailey commented Aug 10, 2021

Barry1 commented Aug 10, 2021

Dr-Irv commented Oct 12, 2021

erictraut commented Oct 12, 2021

Dr-Irv commented Oct 12, 2021

Wrong return type of read_csv if using IO-Objects #87

Wrong return type of read_csv if using IO-Objects #87

Comments

Barry1 commented Aug 10, 2021

jakebailey commented Aug 10, 2021

Barry1 commented Aug 10, 2021

Dr-Irv commented Oct 12, 2021

erictraut commented Oct 12, 2021

Dr-Irv commented Oct 12, 2021

Wrong return type of `read_csv` if using IO-Objects #87

Wrong return type of `read_csv` if using IO-Objects #87