Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dtype, schema inference and implicit casting #1269

Closed
wants to merge 36 commits into from

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Dec 15, 2017

Relates to #1221

@kszucs kszucs changed the title [WIP] Dtype, schema inference [WIP] Dtype, schema inference and implicit casting Dec 15, 2017

@castable.register(Interval, Interval)
def can_cast_intervals(source, target):
return castable(source.value_type, target.value_type)
Copy link
Member

@cpcloud cpcloud Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is tricky because there are a few cases where casting between intervals becomes ambiguous, like casting anything longer than a week (that isn't a week itself) to weeks, e.g., casting 1 month to weeks. Some months may have more weeks than others.

Casting seconds to months is also fraught with annoyances.

I think we will need to check the units to make sure these are valid casts.

@cpcloud cpcloud added the feature Features or general enhancements label Dec 18, 2017
@cpcloud cpcloud added this to the 0.13 milestone Dec 18, 2017
@cpcloud
Copy link
Member

cpcloud commented Dec 22, 2017

@kszucs Looks like you need to rebase.

self._assert_can_compare()
return rules.shape_like_args(self.args, 'boolean')
if not rules.comparable(self.left, self.right):
raise TypeError('Arguments are not comparable')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show self.left.type() and self.right.type() in the error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

([pd.Timedelta('1 days'),
pd.Timedelta('-1 days 2 min 3us'),
pd.Timedelta('-2 days +23:57:59.999997')], dt.Interval('ns')),
# (pd.Categorical(['a', 'b', 'c', 'a']), 'category')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark this as pytest.mark.xfail with a specific exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually It passes locally. How should I create a categorical column working in each build?

ibis/util.py Outdated
from ibis.config import options


@contextmanager
def ignoring(*exceptions):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists in the standard library as contextlib.suppress since Python 3.4.

How about something inibis/compat.py like:

try:
    from contextlib import suppress
except ImportError:
    import contextlib


    @contextlib.contextmanager
    def suppress(...):
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great :) I wasn't aware.

('string', [], self.all_cols[:7] + self.all_cols[8:]),
('timestamp', [], self.all_cols[:-1]),
('decimal', [], self.all_cols[:4] + self.all_cols[7:])
('decimal', self.all_cols[:7], self.all_cols[7:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should clean up this test, it's quite hard to understand what's being tested here. I'll make an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should port these to the new test suite

def issubtype(self, parent):
return issubtype(self, parent)

def castable(self, target):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring here indicating that castable here means implicitly castable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I might be more verbose.

return True


@castable.register(UnsignedInteger, UnsignedInteger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can chain decorators here so you don't have to repeat the code for signed and unsigned integers, like this:

@castable.register(UnsignedInteger, UnsignedInteger)
@castable.register(SignedInteger, SignedInteger)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, any of these implementations that take the same number of arguments and return True can all be chained into one function to avoid repetition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

return TypeParser(value).parse()


infer = Dispatcher('infer')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a docstring to this (using the doc= argument to Dispatcher)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: from reading the code it looks like dtype is for mapping types from other systems (e.g., pandas and numpy) into ibis's type system, and infer is for getting the ibis schema or ibis type of data values such as literals, lists and DataFrames. Is that correct?

Can you make sure to clarify the distinction between these two functions in their respective docstrings. Some examples in the docstrings will likely help here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would You please convert this to an issue?

})


dtype = Dispatcher('dtype')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring here

assert customers.schema() == expected
@pytest.mark.parametrize(('pandas_dtype', 'ibis_dtype'), [
(DatetimeTZDtype(tz='US/Eastern', unit='ns'), dt.Timestamp('US/Eastern')),
# (CategoricalDtype(['a', 'b']), dt.Category(2)) compat problem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this pytest.mark.xfail with a specific exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing with the categorical column, how can I create a categorical dtype?

@kszucs kszucs changed the title [WIP] Dtype, schema inference and implicit casting Dtype, schema inference and implicit casting Dec 28, 2017
kszucs added a commit to kszucs/ibis that referenced this pull request Dec 29, 2017
kszucs added a commit to kszucs/ibis that referenced this pull request Dec 29, 2017
@cpcloud
Copy link
Member

cpcloud commented Jan 8, 2018

LGTM, merging.

@cpcloud cpcloud closed this in 71fc552 Jan 8, 2018
@cpcloud
Copy link
Member

cpcloud commented Jan 8, 2018

@kszucs Thank you!

@kszucs kszucs deleted the infer_schema branch May 30, 2022 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants