-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(clickhouse): make arrays non nullable #8501
fix(clickhouse): make arrays non nullable #8501
Conversation
|
ACTION NEEDED Ibis follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
2bb0f1f
to
f27f842
Compare
|
@mneedham Thanks for the PR! I can take care of fixing up the tests there. |
|
Seems like I can't push to PR, so I'll try to make suggestions |
|
@mneedham Can you apply this patch to your PR? diff --git a/ibis/backends/tests/test_array.py b/ibis/backends/tests/test_array.py
index 584c4ac3e..156c31bda 100644
--- a/ibis/backends/tests/test_array.py
+++ b/ibis/backends/tests/test_array.py
@@ -938,8 +938,8 @@ def flatten_data():
marks=[
pytest.mark.notyet(
["clickhouse"],
- reason="doesn't support nullable array elements",
- raises=ClickHouseDatabaseError,
+ reason="Arrays are never nullable",
+ raises=AssertionError,
)
],
),
@@ -950,8 +950,8 @@ def flatten_data():
marks=[
pytest.mark.notyet(
["clickhouse"],
- reason="doesn't support nullable array elements",
- raises=ClickHouseDatabaseError,
+ reason="Arrays are never nullable",
+ raises=AssertionError,
)
],
), |
f27f842
to
d40a8e5
Compare
|
@cpcloud have applied it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
@mneedham Just out of curiosity are |
As far as I know there aren't any plans to add that functionality in t he near future. I think people usually create an empty array for null use cases. I see someone did try to implement it, but that seems to have stalled - ClickHouse/ClickHouse#53443 |
|
@cpcloud hey - not sure where to ask this question, so gonna ask it here. I have this script: import os
import ibis
import ollama
from dotenv import load_dotenv
from pathlib import Path
import pandas as pd
import ibis.expr.datatypes as dt
rag_con = ibis.connect(f"clickhouse://localhost")
for table in rag_con.list_tables():
rag_con.drop_table(table)
rag_con.list_tables()
table_name = "docs5"
for filepath in Path("/Users/markhneedham/projects/clickhouse-docs/docs").glob("**/*.md"):
contents = filepath.read_text()
data = {
"filepath": [str(filepath).split("docs/")[-1]],
"contents": [contents],
}
t = pd.DataFrame(data)
schema = ibis.schema(
names=["filepath", "contents"],
types=[dt.String(nullable=False), dt.String(nullable=False)]
)
if table_name not in rag_con.list_tables():
rag_con.create_table(name=table_name, obj=t, schema=schema)
else:
rag_con.insert(name=table_name, obj=t)
rag_con.list_tables()
t = rag_con.table(table_name)
t = (
t.mutate(tokens_estimate=t["contents"].length() // 4)
.order_by(ibis._["tokens_estimate"].desc())
.relocate("filepath", "tokens_estimate")
)
t
def _embed(text: str) -> list[float]:
"""Text to fixed-length array embedding."""
text = text.replace("\n", " ")
try:
return (
ollama.embeddings(model='nomic-embed-text', prompt=text)['embedding']
)
except Exception as e:
print(e)
return [0.0] * 768
@ibis.udf.scalar.python
def embed(text: str, tokens_estimate: int) -> list[float]:
"""Text to fixed-length array embedding."""
if 0 < tokens_estimate < 8191:
return _embed(text)
return [0.0] * 768
t = t.mutate(
embedding=embed(t["contents"], t["tokens_estimate"])
).cache()
tSo I'm now running against How do I go about debugging that? Is there a way to see a list of the functions that have been registered? I can't tell if my function failed to register or if it's registered with an unexpected signature? |
|
@mneedham I will move these comments to an issue. That's the best place for bug reports. Alternatively, a GitHub Discussion would also be fine! |
Description of changes
This PR makes ClickHouse arrays non nullable. I ran into the issue when trying to convert text to an array of float values as part of a RAG pipeline and run into this error:
I think treating arrays the way that maps are treated should handle this.