Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior of .drop(fields) #2829

Closed
jcmincke opened this issue Jun 21, 2021 · 3 comments · Fixed by #2831
Closed

Weird behavior of .drop(fields) #2829

jcmincke opened this issue Jun 21, 2021 · 3 comments · Fixed by #2831
Labels
bug Incorrect behavior inside of ibis internals Issues or PRs related to ibis's internal APIs onboarding Issues that can be addressed by someone less familiar with ibis ux User experience related issues
Milestone

Comments

@jcmincke
Copy link
Contributor

import string

my_project = 'pj-becfr-eagle-ci-dev'
my_dataset = 'dm_express_assortment_V1'

import ibis
import ibis_bigquery
from google.cloud import bigquery
import pandas as pd

my_table = 'issue_3_table'

pdf = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})

# Load client
client = bigquery.Client(project=my_project)

# Load data to BQ
client.delete_table(my_dataset + "." + my_table, not_found_ok=True)
job = client.load_table_from_dataframe(pdf, my_dataset + "." + my_table)

ibis.options.interactive = False

conn = ibis_bigquery.connect(
    project_id=my_project,
    dataset_id=my_dataset)

t = conn.table(my_table)

e = t.drop("ab")

e.schema()  # a and b are removed

e = t.mutate(ab=t.c)

e.drop("ab")  # a and b are removed


e = t.mutate(ab=t.c)

e.drop(["ab"])  # ab is removed

e = t.mutate(bz=t.c)

e.drop("bz")  => KeyError: "Fields not in table: frozenset({'z'})"

e.drop(["bz"])  # bz is removed

e.drop(e.a) => ValueError: The truth value of an Ibis expression is not defined

e.drop([e.a]) => KeyError: "Fields not in table: frozenset({ref_0\nBigQueryTable[table]\n  name: ...

I think there is a mismatch between a string and a list of strings

The .drop() method takes a list/sequence/... of fields as argument.

From the examples above, each field can be a string (at least).

So t.drop(["ab", "bc"]) attempts to remove the attributes: "ab" and "bc" and that's fine.

It seems that if I call t.drop("ab"), ibis considers that the 2 attributes are "a" and "b".
But, that's not correct because the elements of a string are characters and not strings, unless we
suppose that field names can also be characters...

Anyway, that's quite confusing.

@datapythonista
Copy link
Contributor

In Python you can iterate over a string:

>>> for char in 'abc':
...     print(char)
... 
a
b
c

I guess that's why dropping ab drops columns a and b.

I agree with you this doesn't seem right, confusing and source of trouble. Did you check if this only happens with bigquery or also with other backends? For the bigquery fix, this will have to be addressed in the bigquery repo, which is separate from this repo.

@jcmincke
Copy link
Contributor Author

Yes, that's what I suspected and thus explains the first line of my comments.

Well, I did not check with any other backend as it seems to me that it occurs when building the expression, before it is compiled by the backend.

But that might well be an educated guess as I am not familiar enough with ibis' internals.

@datapythonista
Copy link
Contributor

Looks like this also happens in other backends, this is the function responsible: https://github.com/ibis-project/ibis/blob/master/ibis/expr/api.py#L4421

@jcmincke if you want to open a pull request, that would be great. Should be quite easy to implement, just check if the parameter is a string, and if it is you can see if it makes sense to raise an exception, or just convert the parameter to a one element list. You'll have to include tests and a release note (docs/source/release/index.rst).

@datapythonista datapythonista added bug Incorrect behavior inside of ibis internals Issues or PRs related to ibis's internal APIs onboarding Issues that can be addressed by someone less familiar with ibis ux User experience related issues labels Jun 21, 2021
@jreback jreback added this to the Next release milestone Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis internals Issues or PRs related to ibis's internal APIs onboarding Issues that can be addressed by someone less familiar with ibis ux User experience related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants