LanceDB's only_if("col IN ('x', 'y') fails if col is of type dictionary due to safe_coerce_scalar missing a Dict arm.
We have a lancedb with a dictionary column that we would like to use in a filter.
The call chain is
Query::only_if("etld IN ('com', 'de')").execute_query
-> Scanner::create_plan -> Scanner::create_filter_plan
-> ExprFilter::to_datafusion
-> Planner::parse_filter
-> resolve_expr
-> coerce_expr
-> resolve_value
-> safe_coerce_scalar
safe_coerce_scalar is lacking an arm for dictionaries.
Test:
async fn dictionary_string_dataset() -> Dataset {
use arrow_array::{Int16Array, Int16DictionaryArray};
let schema = Arc::new(ArrowSchema::new(vec![ArrowField::new(
"etld",
DataType::Dictionary(Box::new(DataType::Int16), Box::new(DataType::Utf8)),
false,
)]));
let dictionary = Arc::new(StringArray::from(vec!["a", "b", "c"]));
let indices = Int16Array::from((0..30).map(|i| i % 3).collect::<Vec<_>>());
let dict_array = Int16DictionaryArray::try_new(indices, dictionary).unwrap();
let batch = RecordBatch::try_new(schema.clone(), vec![Arc::new(dict_array)]).unwrap();
let reader = RecordBatchIterator::new(vec![Ok(batch)], schema.clone());
Dataset::write(reader, "memory://test_dict_filter", None)
.await
.unwrap()
}
#[tokio::test]
async fn test_filter_on_dictionary_string_column() {
let dataset = dictionary_string_dataset().await;
// Equality predicate.
let count = dataset
.scan()
.filter("etld = 'a'")
.unwrap()
.try_into_batch()
.await
.unwrap()
.num_rows();
assert_eq!(count, 10);
// IN-list predicate.
let count = dataset
.scan()
.filter("etld IN ('a', 'b')")
.unwrap()
.try_into_batch()
.await
.unwrap()
.num_rows();
assert_eq!(count, 20);
}
LanceDB's
only_if("col IN ('x', 'y')fails if col is of type dictionary due to safe_coerce_scalar missing a Dict arm.We have a lancedb with a dictionary column that we would like to use in a filter.
The call chain is
safe_coerce_scalaris lacking an arm for dictionaries.Test: