Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong return type of read_csv if using IO-Objects #87

Closed
Barry1 opened this issue Aug 10, 2021 · 5 comments · Fixed by #109
Closed

Wrong return type of read_csv if using IO-Objects #87

Barry1 opened this issue Aug 10, 2021 · 5 comments · Fixed by #109

Comments

@Barry1
Copy link

Barry1 commented Aug 10, 2021

When I send an StringIO parameter to read_csv I get typing errors telling the result would be a TextFileReader.
But this feels completely wrong.
In all my work, it returns a DataFrame for example on an StringIO.
Thus I propose to change.
A corresponding pr would come in a few seconds.

@jakebailey
Copy link
Member

I think the overloads overall need to be different besides just that one change; reading the implementation and docs, it's dependant on chunksize or iterator being set that causes it to return a TextFileReader.

@Barry1
Copy link
Author

Barry1 commented Aug 10, 2021

I think the overloads overall need to be different besides just that one change; reading the implementation and docs, it's dependant on chunksize or iterator being set that causes it to return a TextFileReader.

I think, you are completely right. I further think, that a first measure should be, to take care of the ReturnValue, which would help others most. As we only have stubs here, I think changing them will not harm.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 12, 2021

I spent way too much time on this today, and then discovered this issue. I'm not sure how python type checking with overloads can do this right. Maybe @jakebailey or @erictraut have some ideas (see example below):

The design of the API changes the return type to TextFileReader under any of the following conditions:

  1. chunksize is specified and integer and not None
  2. iterator is specified and set to True

So you have to find a way in the overloads to handle all the possible combinations of chunksize and iterator, both of which are keyword arguments, and either or both may not be specified. That is reflected in the following table

iterator chunksize Return Type
Not specified Not specified DataFrame
False Not specified DataFrame
False None DataFrame
False An integer TextFileReader
True Not specified TextFileReader
True None TextFileReader
True An integer TextFileReader
Not specified None DataFrame
Not specified An integer TextFileReader

To help me figure this out, I wrote a simple test, where i1 is iterator and chunksize is cs, and a return type of int corresponds to DataFrame and a return type of str corresponds to TextFileReader :

@overload
def myfun(fake: str, i1: Literal[True], cs: Literal[None]) -> str:
    ...


@overload
def myfun(fake: str, i1: Literal[True], cs: int) -> str:
    ...


@overload
def myfun(fake: str, i1: Literal[True]) -> str:
    ...


@overload
def myfun(fake: str) -> int:
    ...


@overload
def myfun(fake: str, i1: Literal[False]) -> int:
    ...


@overload
def myfun(fake: str, i1: Literal[False], cs: Literal[None]) -> int:
    ...


@overload
def myfun(fake: str, i1: Literal[False], cs: int) -> str:
    ...


# @overload
# def myfun(fake: str, i1: bool = False, cs: None = None) -> int:
#     ...


# @overload
# def myfun(fake: str, i1: bool = ..., cs: int = ...) -> str:
#     ...


def myfun(fake: str, i1: bool = False, cs: Optional[int] = None) -> Union[str, int]:
    print(f"i1 is {i1} cs is {cs} result is ", end="")
    if i1:
        if cs is not None:
            return "TextFileReader"
        else:
            return "TextFileReader"
    else:
        if cs is not None:
            return "TextFileReader"
        else:
            return -1


res1: int = myfun("meh")
res2: int = myfun("meh", i1=False)
res3: int = myfun("meh", i1=False, cs=None)
res4: str = myfun("meh", i1=False, cs=23)
res5: str = myfun("meh", i1=True)
res6: str = myfun("meh", i1=True, cs=None)
res7: str = myfun("meh", i1=True, cs=23)
res8: int = myfun("meh", cs=None)
res9: str = myfun("meh", cs=23)

I could not figure out how to get the last 2 function calls (which correspond to the last 2 lines in the table) to have a proper overload, because in those cases, i1 (equivalently iterator) is not specified, but cs is specified. I've commented out some attempts but I'd get complaints about overlapping signatures.

If someone can show me how to get the above to work, then I could make the changes to read_csv() typing to correspond.

@erictraut
Copy link
Contributor

I think this can be achieved by the following:

@overload
def myfun(fake: str, i1: Literal[True], cs: int | None = ...) -> str:
    ...

@overload
def myfun(fake: str, i1: Literal[False], cs: int) -> str:
    ...

@overload
def myfun(fake: str, i1: Literal[False] = ..., cs: None = ...) -> int:
    ...

@overload
def myfun(fake: str, i1: bool = ..., cs: int = ...) -> str:
    ...

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 12, 2021

Thanks @erictraut . That makes pyright happy, but mypy complains. I'll open up an issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants