Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: first returns different values across backends #4918

Closed
1 task done
Hoxbro opened this issue Nov 28, 2022 · 4 comments · Fixed by #6074
Closed
1 task done

bug: first returns different values across backends #4918

Hoxbro opened this issue Nov 28, 2022 · 4 comments · Fixed by #6074
Labels
bug Incorrect behavior inside of ibis pandas The pandas backend

Comments

@Hoxbro
Copy link

Hoxbro commented Nov 28, 2022

What happened?

I would expect first (or last) of a table column to return a singular value and not a pd.Series of the same value.

import os
import sqlite3

import ibis
import pandas as pd


def create_sqlite(df, name):
    filename = "tmp.sqlite"
    os.remove(filename)
    con = sqlite3.Connection(filename)
    df.to_sql(name, con, index=False)
    return ibis.sqlite.connect(filename).table(name)


df = pd.DataFrame(
    {"string": ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]},
)
sqlite_table = create_sqlite(df, "table")
pandas_table = ibis.pandas.connect({"df": df}).table("df")

pandas_table["string"], sqlite_table["string"]
pandas_table["string"].first().execute()
sqlite_table["string"].first().execute()

image

What version of ibis are you using?

3.2.0

What backend(s) are you using, if any?

Pandas and Sqlite

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Hoxbro Hoxbro added the bug Incorrect behavior inside of ibis label Nov 28, 2022
@cpcloud
Copy link
Member

cpcloud commented Nov 28, 2022

Hi @Hoxbro, thanks for the issue!

The first and last APIs are based on the SQL window functions FIRST_VALUE and LAST_VALUE, which produce columns.

It sounds like you want first to handle both cases (single value in an aggregate setting, and window behavior in a window setting).

We have #4149 for that, so I'm going to close this out!

Thanks again for reporting!

@cpcloud cpcloud closed this as completed Nov 28, 2022
@cpcloud cpcloud added the question Questions about the library label Nov 28, 2022
@cpcloud
Copy link
Member

cpcloud commented Nov 28, 2022

@jcrist Pointed out to me that this is a slightly different issue: the backend behaviors don't match here. I think we can address this issue without addressing #4149, so I'll reopen this one.

@cpcloud cpcloud reopened this Nov 28, 2022
@cpcloud cpcloud added pandas The pandas backend and removed question Questions about the library labels Nov 28, 2022
@cpcloud cpcloud added this to the 5.0 milestone Jan 30, 2023
@cpcloud cpcloud changed the title bug: first return a pd.Series for sqlite backend bug: first returns different values across backends Jan 31, 2023
@cpcloud cpcloud removed this from the 5.0 milestone Mar 10, 2023
@cpcloud cpcloud added this to the 6.0 milestone Apr 4, 2023
@cpcloud cpcloud removed this from the 6.0 milestone Apr 11, 2023
@mesejo
Copy link
Contributor

mesejo commented Apr 24, 2023

Hey, I want to take this for a spin. Could you clarify which of the outputs is the correct one? I think it should be the pandas one, but not sure. Thanks!

@cpcloud
Copy link
Member

cpcloud commented Apr 24, 2023

@mesejo Great!

The SQLite behavior is the correct one, because first currently maps to SQL's first_value window function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis pandas The pandas backend
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants