Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(duckdb): re-registered view doesn't reflect changed schema #4729

Closed
gforsyth opened this issue Oct 26, 2022 · 1 comment · Fixed by #4735
Closed

bug(duckdb): re-registered view doesn't reflect changed schema #4729

gforsyth opened this issue Oct 26, 2022 · 1 comment · Fixed by #4735

Comments

@gforsyth
Copy link
Member

I initially register a csv file using con.register

[ins] In [81]: movielens = con.register("/home/gil/data/movielens/ratings.csv", table_name="movielens")

[ins] In [82]: movielens
Out[82]: 
AlchemyTable: movielens
  column0 int32
  imdb_id int32
  rating  int32

This looks correct but imdb_id is actually a string column and I haven't told DuckDB to sample enough rows to detect that. But if I use ALL_VARCHAR=True to force all columns to load as strings, the returned table has the same schema as before:

[ins] In [83]: movielens = con.register(
          ...:     "/home/gil/data/movielens/ratings.csv", table_name="movielens", ALL_VARCHAR=True
          ...: )

[ins] In [84]: movielens
Out[84]: 
AlchemyTable: movielens
  column0 int32
  imdb_id int32
  rating  int32

But if I register and change the underlying table name, the new schema is reflected:

[ins] In [85]: movielens = con.register(
          ...:     "/home/gil/data/movielens/ratings.csv", table_name="_movielens", ALL_VARCHAR=True
          ...: )

[ins] In [86]: movielens
Out[86]: 
AlchemyTable: _movielens
  column0 string
  imdb_id string
  rating  string

The underlying call is CREATE OR REPLACE VIEW -- the view is being replaced in DuckDB

[ins] In [93]: con.con.execute("DESCRIBE movielens").fetchall()
Out[93]: 
[('column0', 'VARCHAR', 'YES', None, None, None),
 ('imdb_id', 'VARCHAR', 'YES', None, None, None),
 ('rating', 'VARCHAR', 'YES', None, None, None)]

but something in duckdb-engine or ibis seems to be caching the alchemy table schema.

@jcrist
Copy link
Member

jcrist commented Oct 26, 2022

IIRC having traced this problem before, this caching happens at the SQLAlchemy level in the MetaData object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants