New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified ibis <-> pandas type conversions #1221

Closed
kszucs opened this Issue Nov 9, 2017 · 11 comments

Comments

Projects
4 participants
@kszucs
Member

kszucs commented Nov 9, 2017

I'd like to implement a method to insert pandas dataframes to clickhouse table.

Problem

Actually this implementation is quite straightforward, but I'd like a unified way to convert dtypes between pandas, ibis [and clickhouse].

Currently this sort of type mapping logic has been implemented at multiple places:

Proposal

AFAIK the only supported data container is pd.DataFrame:

  • *query.execute() -> DataFrame
  • *table.insert(obj: DataFrame)

Ideally there should be two type conversion paths:

  1. Insertion or external data see #1164

    pandas dtypes -> ibis schema -> backend schema [in this case clickhouse]

  2. Selection

    backend schema -> ibis schema -> pandas dtypes

I think the highlighted parts should be placed in a single module which is reachable from every backend package, this way we can factor out the redundant implementations.

What do You think @cpcloud @wesm ?

@wesm

This comment has been minimized.

Member

wesm commented Nov 10, 2017

This makes sense. This probably also should be considered together with Arrow types, which need to have a reasonable mapping to Ibis types. cc @jreback

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 10, 2017

#1194
maps arrow to ibis types ; could/should be in the same place

@kszucs

This comment has been minimized.

Member

kszucs commented Nov 13, 2017

@jreback what's the status of that PR?

If You agree with my proposal, where should we place that type conversion logic?

  • go to ibis.pandas_interop
  • or just make multipledispatch mandatory and implement in ibis.pandas?

Should I go with it, or perhaps You'll implement it?

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 14, 2017

@kszucs that will merge soon. yes I think we should make a ibis.types where we can move all types of type conversion code

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 14, 2017

yeah I think we should just make multipledispatch as required, as pandas already is

@kszucs

This comment has been minimized.

Member

kszucs commented Nov 14, 2017

@jreback Great! Would You mind creating the skeleton of it?

I could help then to implement the ibis schema <-> backend schema conversions per backend (e.g. to ibis.clickhouse.types)

Should we include infer_literal_types from expr package too?

@cpcloud cpcloud added this to To do in Refactoring Nov 24, 2017

@cpcloud cpcloud added this to the 0.13 milestone Nov 24, 2017

@kszucs

This comment has been minimized.

Member

kszucs commented Dec 27, 2017

@cpcloud could You give me a proper mapping for implementingto_pandas ?
A preliminary attempt was here 93bd201

@cpcloud

This comment has been minimized.

Member

cpcloud commented Dec 27, 2017

@kszucs I think if you just glom all of those types into a dict somewhere then you have most of what you need. Let me know if there's something that isn't clear.

@kszucs

This comment has been minimized.

Member

kszucs commented Dec 27, 2017

Ok!

cpcloud added a commit that referenced this issue Jan 8, 2018

Dtype, schema inference and implicit casting
Relates to #1221

Author: Krisztián Szűcs <szucs.krisztian@gmail.com>

Closes #1269 from kszucs/infer_schema and squashes the following commits:

93bd201 [Krisztián Szűcs] revert to_pandas
2a1578a [Krisztián Szűcs] schema.to_pandas test
3558d95 [Krisztián Szűcs] categorical
85beb27 [Krisztián Szűcs] to_pandas skeleton
4ccb42e [Krisztián Szűcs] missing contextmanager decorator for compat.suppress
b6369c9 [Krisztián Szűcs] resolve a cuple of review issues
bcc6d25 [Krisztián Szűcs] comment out test case
a461e99 [Krisztián Szűcs] resolve compat issue
8b3102e [Krisztián Szűcs] flake8
ef2439e [Krisztián Szűcs] support for pandas categorical
affff8b [Krisztián Szűcs] fix DatetimeTZDtype
acd27ec [Krisztián Szűcs] another attempt to fix pandas types namespace
337430d [Krisztián Szűcs] fix pandas types namespace
ee0a27c [Krisztián Szűcs] infer pandas timestamp
e7aecc5 [Krisztián Szűcs] subtype checking
4bd5c5b [Krisztián Szűcs] little rename; trigger builds
27cae31 [Krisztián Szűcs] comment out array casting
13a09f7 [Krisztián Szűcs] numpy binary dtype
0b646f6 [Krisztián Szűcs] pass literal value to implicit casts; fix failing udf test;
fb04c84 [Krisztián Szűcs] fix doctest errors
43a581e [Krisztián Szűcs] higher precedence only upcasts
6c49f56 [Krisztián Szűcs] forbid double to float implicit cast
774c317 [Krisztián Szűcs] simplify higher precedence dtype
c2d82a8 [Krisztián Szűcs] infer pandas schema
a54f662 [Krisztián Szűcs] tests for implicit dtype cast
191a476 [Krisztián Szűcs] integer to category cast
a0ab226 [Krisztián Szűcs] infer numpy generic types
7ea04f4 [Krisztián Szűcs] factor out ir types' _can_compare methods
d97432d [Krisztián Szűcs] factor out _can_implicit_cast and _implicit_cast calls for comparison operations
a4f7874 [Krisztián Szűcs] allow double->float because float literal is inferred as double
1aaa292 [Krisztián Szűcs] resolve builtins float name conflict
7b6faf4 [Krisztián Szűcs] fixed failed decimal to floating casts
11ecaf7 [Krisztián Szűcs] multipledispatch to conda build reqs
27471f9 [Krisztián Szűcs] refactored datatype implicit casting
d332d8a [Krisztián Szűcs] make multipledispatch mandatory dependency
926943b [Krisztián Szűcs] infer_dtype draft
@kszucs

This comment has been minimized.

Member

kszucs commented Jan 25, 2018

This should be resolved by #1292

@kszucs

This comment has been minimized.

Member

kszucs commented Feb 7, 2018

@cpcloud I think You can close this.

@cpcloud cpcloud closed this Feb 7, 2018

Refactoring automation moved this from To do to Done Feb 7, 2018

@cpcloud cpcloud added the enhancement label Feb 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment