Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Consistent lower case for column and table names if case-insensitive #6772

Closed
1 task done
binste opened this issue Aug 4, 2023 · 5 comments
Closed
1 task done
Labels
feature Features or general enhancements

Comments

@binste
Copy link
Contributor

binste commented Aug 4, 2023

Is your feature request related to a problem?

Ibis uses upper case column and table names with the Snowflake backend. Although this is consistent with how Snowflake stores case-insensitive identifiers internally, see this FAQ, as a user I'd prefer to be able to always use lower case for the following reasons:

  • Better portability of queries between backends. DuckDB backend uses lower case but if I then want to execute the same code with the Snowflake backend I have to rewrite the query
  • What I'm used to from other libraries such as SQLAlchemy
  • Columns generated by Ibis are lower case and so I get a mix in an output, see this Voltron Data blog post, section"create the report":
    image
    • I haven't tested if this also affects how column names are stored in Snowflake when I'd create a table

Describe the solution you'd like

I'd prefer a consistent convention for column and table names across backends. I'm used to lower case but that might be due to the databases I have used in the past.

  • connection.table("TABLE_NAME") -> connection.table("table_name")
  • expression.filter("COLUMN_NAME") -> expression.filter("column_name")
  • ...

What version of ibis are you running?

6.1

What backend(s) are you using, if any?

Snowflake, DuckDB

Code of Conduct

  • I agree to follow this project's Code of Conduct
@binste binste added the feature Features or general enhancements label Aug 4, 2023
@lostmygithubaccount
Copy link
Member

hi @binste, thanks for the feedback!

it wouldn't fully address this, but are you aware of the .relabel("snake_case") functionality? this will at least allow you to uniformly mutate all column names to lower/snake case

[ins] In [1]: import ibis

[ins] In [2]: ibis.options.interactive = True

[ins] In [3]: t = ibis.examples.penguins.fetch()

[ins] In [4]: t = t.relabel("ALL_CAPS").limit(3)

[ins] In [5]: t
Out[5]:
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ SPECIESISLANDBILL_LENGTH_MMBILL_DEPTH_MMFLIPPER_LENGTH_MMBODY_MASS_GSEXYEAR  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ AdelieTorgersen39.118.71813750male2007 │
│ AdelieTorgersen39.517.41863800female2007 │
│ AdelieTorgersen40.318.01953250female2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

[ins] In [6]: t = t.relabel("snake_case").limit(3)

[ins] In [7]: t
Out[7]:
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexyear  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ AdelieTorgersen39.118.71813750male2007 │
│ AdelieTorgersen39.517.41863800female2007 │
│ AdelieTorgersen40.318.01953250female2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

@cpcloud
Copy link
Member

cpcloud commented Aug 4, 2023

@binste I hear what you're saying. I also find it annoying that there's not consistency across backends regarding identifier case.

Such behavior would depend on the driver not trying to subvert the database, and snowflake-sqlalchemy does the complete wrong thing trying to achieve what you're asking for. We have a hack to workaround this.

Here's a concrete problem: if a Snowflake metadata query returns MY_TABLE, how do I know whether that was created with CREATE TABLE "MY_TABLE" (explicitly asking for all caps) or CREATE TABLE MY_TABLE (case insensitive)?

This also complicates quoting behavior, because a quoted lowercase name in snowflake cannot be used to query the unquoted version.

I think relabel("snake_case") is probably the right implementation complexity/automagic tradeoff.

@binste
Copy link
Contributor Author

binste commented Aug 5, 2023

Thank you both for the inputs! Indeed, relabel("snake_case") seems like a simple way to solve this. I now also found #5741 and all the referenced GitHub issues and went a bit down that rabbit hole myself...

Happy to close this if you think you have explored all options and snowflake-sqlalchemy is fundamentally broken. Regarding the problem you posed @cpcloud, I think for all uppercase identifiers we don't need to know if they were stored with quotes and hence all caps or as case insensitive as Snowflake will resolve any identifier casing correctly as long as we don't quote the identifier, see https://docs.snowflake.com/en/sql-reference/identifiers-syntax#label-identifier-casing. For example, I can do:

CREATE TABLE "MY_TABLE" as select 1 as col1;

-- All of these work:
select *
from my_table;

select *
from my_tABle;

select *
from MY_TABLE;

-- And they also work if I do CREATE TABLE MY_TABLE ... without quotes

which is why I think the snowflake-sqlalchemy package does this in their normalize_name function:

if name.upper() == name and not self.identifier_preparer._requires_quotes(
    name.lower()
):
    return name.lower()

I think that's the part you removed in #5741, right?

I have so far understood the behavior of snowflake-sqlalchemy as: Only quote identifiers in queries if they are explicitly lowercase or mixed case. As an example:

import sqlalchemy as sa
my_snowflake_engine.execute("""CREATE OR REPLACE TABLE "MY_TABLE" as select 1 as col1, 2 as "CoL2", 3 as "COL3", 4 as "col4" """)
my_table = sa.Table("my_table", sa.MetaData(my_snowflake_engine), autoload=True)
str(sa.select(my_table))

Gives

SELECT my_table.col1, my_table."CoL2", my_table.col3, my_table."col4" 
FROM my_table

where the table, col1, and col3 don't need quotes and only col2 and col4.

Does this help in any way or am I missing something?

@cpcloud
Copy link
Member

cpcloud commented Aug 6, 2023

@binste

Here's an example showing strange behavior resulting from snowflake-sqlalchemy's attempted solution:

import os

import sqlalchemy as sa

eng = sa.create_engine(os.environ["SNOWFLAKE_URL"])

with eng.begin() as c:
    c.exec_driver_sql("DROP TABLE IF EXISTS \"t\"")
    c.exec_driver_sql("DROP TABLE IF EXISTS t")
    c.exec_driver_sql("DROP TABLE IF EXISTS T")
    c.exec_driver_sql("DROP TABLE IF EXISTS \"T\"")

    c.exec_driver_sql('CREATE TEMP TABLE "t" (x INT)')
    c.exec_driver_sql('CREATE TEMP TABLE "T" (x INT)')

inspector = sa.inspect(eng)
names = inspector.get_table_names()
print("number of tables that can be referred to as `t`:", names.count("t"))

This code prints:

number of tables that can be referred to as `t`: 2

which means that if someone asks for the table named t, the table that is returned is ambiguous.

This alone is motivation enough to preserve case as it comes exactly from the database.

@binste
Copy link
Contributor Author

binste commented Aug 6, 2023

Oh wow... I now see what you mean. Thank you both for walking me through it, this is much appreciated! Preserving case + optionally using relabel("snake_case") indeed seems to be the most reliable way to deal with it and I'll do that from now on.

@binste binste closed this as completed Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

No branches or pull requests

3 participants