-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clickhouse: support group_concat operation
#2839
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lunaticus7 thanks can you add a release note
|
LGTM, though the linter fails. |
|
@Lunaticus7 can you run black: see https://github.com/ibis-project/ibis/pull/2839/checks?check_run_id=2942181027 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Lunaticus7, nice work. I added couple of comments, if you don't mind having a look.
ibis/backends/clickhouse/registry.py
Outdated
| arg, sep, where = expr.op().args | ||
| if where is not None: | ||
| arg = where.ifelse(arg, ibis.NA) | ||
| arg = translator.translate(arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? You're already applying translator.translate to arg in the return.
Do you mind adding another test that hits the code in the if please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right,it will cause a bug 😣
| expr = t.string_col.group_concat(',') | ||
| expected = """arrayStringConcat(groupArray(`string_col`), ',')""" | ||
| assert translate(expr) == expected | ||
| assert len(con.execute(expr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific on what this needs to return? Also, I guess the len is unnecessary, if len(...) != 0 I guess this will evaluate to true without the len. But would be better to test the value, or whatever possible, more than just saying that this should return something not empty, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have add test for where param in group_concat .
emmmm,i'm new in ibis and haven't found the right way to load test data and run ci locally,so i have no ideal to specify the return value of the group_concat expr 😇.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Lunaticus7, looks good, just couple of suggestions, in case they make sense to you.
| expected = """arrayStringConcat(groupArray(\ | ||
| CASE WHEN `bool_col` = 0 THEN `string_col` ELSE Null END), '-')""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal, but I think this approach is more readable (maybe just to me):
| expected = """arrayStringConcat(groupArray(\ | |
| CASE WHEN `bool_col` = 0 THEN `string_col` ELSE Null END), '-')""" | |
| expected = ("arrayStringConcat(groupArray(" | |
| "CASE WHEN `bool_col` = 0 THEN `string_col` ELSE Null END), '-')") |
Also, it'd probably be nice to have this as a separate test, so when it fails, we know which of the two asserts is failing faster.
|
@Lunaticus7 tests are failing, not sure if you've seen it. |
Sorry for the delay. I'm fighting with running CI locally 🤣(#2795 (comment)) and a little busy recently, maybe i could done this in the next week. |
|
I personally don't ever run the whole CI locally. I have the environment set up (the conda env), and then run the tests for specific backends. For example: PYTEST_BACKENDS="pandas sqlite" python -m pytest ibis/tests ibis/backends/testsFor clickhouse you can run the same, but you need to run the docker image, and call the scripts to download and load the test data to it. |
|
@Lunaticus7 do you have time to fix the conflicts and the broken tests? This shouldn't be far from ready. |
|
Hello @Lunaticus7! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-08-04 08:11:31 UTC |
Thanks for your tips of test, it works 🎉! About the broken test, i found all the column in testing data is
So i remove the test case about |
|
There are conflicts with master, can you merge master into your branch and fix them please. For the tests, if one can't pass right now, it's better to xfail it than to delete it. I'm not sure what's the best for the nulls. Can you check what other backends are doing, and try to do the same. |
|
@datapythonista And there are some details about the null error # for clickhouse backends and ibis-testing-data
import ibis
connection = ibis.clickhouse.connect(
host='127.0.0.1',
database='ibis_testing',
port=9000
)
table_alltypes = connection.table('functional_alltypes')
expr = table_alltypes.string_col.group_concat(',', table_alltypes.bool_col == 1)
expr.compile()
# will cause:
# > IbisTypeError: Cannot compute precedence for string[non-nullable] and null types
# the expected sql of `expr`
sql = """
SELECT arrayStringConcat(groupArray(CASE WHEN `bool_col` = 1 THEN `string_col` ELSE Null END), '-') AS `tmp`
FROM ibis_testing.`functional_alltypes`
"""
connection.raw_sql(sql)
# can get right resultI also tested the same case on mysql backend,and got the same result. It seems like a bug of ibis on computing precedence of dtypes, when target column(e.g. |
|
Sorry for the delay @Lunaticus7. Since the bug is general and unrelated to this change, I think we can move forward here, and open an issue for the general error. Does it make sense? |
|
@datapythonista Issue of the null error has moved to #2891 |
|
Lint fixed, I forgot pyproject.toml when black locally 🤕. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Lunaticus7 just couple of small things, but look great.
|
|
||
| @pytest.mark.parametrize( | ||
| # FIXME: The `where` param needs `Nullable` column | ||
| # but the all in testing data is not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to the xfail? You should be able to add a message to the xfail itself, and if for whatever reason that's not an option, don't make it a FIXME, which in general is for tasks that need to be done before it's merged.
In the xfail message, besides saying why the test is failing, you can add the issue number you created for it.
| if where_case is None: | ||
| where = None | ||
| else: | ||
| where = alltypes.bool_col == where_case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if where_case is None: | |
| where = None | |
| else: | |
| where = alltypes.bool_col == where_case | |
| where = None if where_case is None else alltypes.bool_col == where_case |
Thanks for your patience, it's all done 🥳. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Lunaticus7
|
thanks @Lunaticus7 |
Added
group_concatoperation to Clickhouse.