-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(format): converting from Arrow dictionary type is not working #8207
Comments
|
Thanks @buhrmann for the report! Confirmed. A smaller reproducer not involving the duckdb backend is to use import ibis
import pandas as pd
import pyarrow as pa
df = pd.Series(list("abc"), dtype="category").to_frame()
tbl = pa.Table.from_pandas(df)
ibis.memtable(tbl)The issue is caused by not handling the arrow dictionary type in PyArrowType.to_ibis()`. |
cpcloud
pushed a commit
that referenced
this issue
Feb 8, 2024
cpcloud
pushed a commit
to cpcloud/ibis
that referenced
this issue
Feb 12, 2024
cpcloud
pushed a commit
that referenced
this issue
Feb 12, 2024
cpcloud
pushed a commit
to cpcloud/ibis
that referenced
this issue
Feb 12, 2024
cpcloud
pushed a commit
that referenced
this issue
Feb 12, 2024
ncclementi
pushed a commit
to ncclementi/ibis
that referenced
this issue
Feb 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened?
Ibis chokes when importing an Arrow table, doesn't seem to support categorical/dictionary data:
What version of ibis are you using?
7.2.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[60], line 9 6 print(tbl) 8 icon = ibis.duckdb.connect() ----> 9 icon.create_table("dataset", obj=tbl) File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/backends/base/sql/alchemy/__init__.py:292, in BaseAlchemyBackend.create_table(self, name, obj, schema, database, temp, overwrite) 289 import pyarrow_hotfix # noqa: F401 291 if isinstance(obj, (pd.DataFrame, pa.Table)): --> 292 obj = ibis.memtable(obj) 294 if database == self.current_database: 295 # avoid fully qualified name 296 database = None File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/expr/api.py:438, in memtable(data, columns, schema, name) 433 if columns is not None and schema is not None: 434 raise NotImplementedError( 435 "passing `columns` and schema` is ambiguous; " 436 "pass one or the other but not both" 437 ) --> 438 return _memtable(data, name=name, schema=schema, columns=columns) File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/common/dispatch.py:88, in lazy_singledispatch.<locals>.call(arg, *args, **kwargs) 86 @functools.wraps(func) 87 def call(arg, *args, **kwargs): ---> 88 return dispatch(type(arg))(arg, *args, **kwargs) File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/expr/api.py:456, in _memtable_from_pyarrow_table(data, name, schema, columns) 452 assert schema is None, "if `columns` is not `None` then `schema` must be `None`" 453 schema = sch.Schema(dict(zip(columns, sch.infer(data).values()))) 454 return ops.InMemoryTable( 455 name=name if name is not None else util.gen_name("pyarrow_memtable"), --> 456 schema=sch.infer(data) if schema is None else schema, 457 data=PyArrowTableProxy(data), 458 ).to_expr() File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/common/dispatch.py:88, in lazy_singledispatch.<locals>.call(arg, *args, **kwargs) 86 @functools.wraps(func) 87 def call(arg, *args, **kwargs): ---> 88 return dispatch(type(arg))(arg, *args, **kwargs) File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/expr/schema.py:279, in infer_pyarrow_table(table, schema) 276 from ibis.formats.pyarrow import PyArrowSchema 278 schema = schema if schema is not None else table.schema --> 279 return PyArrowSchema.to_ibis(schema) File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/formats/pyarrow.py:216, in PyArrowSchema.to_ibis(cls, schema) 213 @classmethod 214 def to_ibis(cls, schema: pa.Schema) -> Schema: 215 """Convert a pyarrow schema to a schema.""" --> 216 fields = [(f.name, PyArrowType.to_ibis(f.type, f.nullable)) for f in schema] 217 return Schema.from_tuples(fields) File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/formats/pyarrow.py:216, in <listcomp>(.0) 213 @classmethod 214 def to_ibis(cls, schema: pa.Schema) -> Schema: 215 """Convert a pyarrow schema to a schema.""" --> 216 fields = [(f.name, PyArrowType.to_ibis(f.type, f.nullable)) for f in schema] 217 return Schema.from_tuples(fields) File ~/micromamba/envs/grapy/lib/python3.9/site-packages/ibis/formats/pyarrow.py:134, in PyArrowType.to_ibis(cls, typ, nullable) 132 return dt.JSON() 133 else: --> 134 return _from_pyarrow_types[typ](nullable=nullable) KeyError: DictionaryType(dictionary<values=string, indices=int8, ordered=0>)Code of Conduct
The text was updated successfully, but these errors were encountered: