Skip to content

Conversation

@akolov
Copy link
Contributor

@akolov akolov commented Feb 4, 2025

Check List

  • Tests has been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Issue Reference this PR resolves

[For example #12]

Description of Changes Made (if issue reference is not provided)

[Description goes here]

AvilaJulio and others added 30 commits November 15, 2024 20:00
I figured out that CI is broken for contributors. I am not sure, but I believe it's related to the fact that environment variables are defined while being empty due to security restrictions for CI's variables.
…ubehll (#8964)

This got removed in more recent versions of Rust.
generic-pool will just silently throw those away, so we should at least log them

Some supporting changes: 
* Set logger to simple console wrapper in driver tests
* Remove unused import
* Actually call connection.close in pool's destroy
… supports both (legacy and modern urls) (#8968)

* chore(testing-drivers): update databricks export to azure configuration

* fix azure blobs filtering

* add support for azure 'dfs.core.windows.net' alongside with 'blob.core.windows.net'

* update test snapshots
…lient (#8928)

Switch client library used to talk to ClickHouse to upstream one. Lots of related changes:

* Streaming now does not use streaming JSON parser, so we can't rely on `meta` field in JSON format. Instread it relies on `JSONCompactEachRowWithNamesAndTypes`: first two rows returned should contain names and types. https://clickhouse.com/docs/en/sql-reference/formats#jsoncompacteachrowwithnamesandtypes
* Streaming now use async iterators instread of Node.js streams internally. External API returns stream, as before
* Pooling moved completely to client library. `generic-pool` is not used at all, `dbMaxPoolSize`  is passed to client library to limit open sockets. New client maintains `http.Agent` internally, and have it's own idle timers, looks fine for us.
* Queries now does not send `session_id`, as we anyway expect queries to be independent, and don't use session-bound stuff, like temporary tables. Previous behaviour was kind of weird: session ids were attached to client in pool, but for every query it would acquire new client from pool, so nothing could actually utilize same session.
* `KILL QUERY` on cancellation now uses separate client instance, to avoid getting stuck on busy pool
* `query` method supports only `SELECT` queries, or other queries that return result sets. For DDL queries on this client library one have to use other methods. Because of that more overrides were necessary, like `dropTable`,  `createSchemaIfNotExists` or `createTable`.
* Driver now respects per-datasource `dbQueryTimeout` config
* fix(backend-shared): Rename `convertTimeStrToMs` to `convertTimeStrToSeconds`. It returns input number (which should be seconds) as it is, and 5 for '5s'
…tion (#8971)

cubesql can introduce `ScalarValue::TimestampNanosecond(..., Some("UTC"))` during constant folding, and `generate_sql_for_expr` trips on unsupported timezones.
This just allows `None` as well as `Some("UTC")` as timezones. Because DataFusion will always generate `UTC`, it should be fine for now.

See https://github.com/cube-js/arrow-datafusion/blob/dcf3e4aa26fd112043ef26fa4a78db5dbd443c86/datafusion/physical-expr/src/datetime_expressions.rs#L357-L367
…8977)

Error cause is not properly propagated in REST API, so use direct formatting for now
Data access policies conditions should be joined via AND operator,
but the initial implementation used OR by mistake

Also ensured that rbac smoke tests are ran as part of the CI
… extension (#8375)

* feat(duckdb-driver): Upgrade to DuckDB 1.0.0

* feat(duckdb-driver): remove installing and loading httpfs extension

httpfs installed by default and its auto loadable extension

---------

Co-authored-by: Konstantin Burkalev <KSDaemon@gmail.com>
* docs: Update Staging Environments docs

* fix

* fix
Update export bucket instructions to use unity catalog. The previous recommendation (DBFS) is no longer supported by Databricks.
Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.0.7 to 3.1.7.
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](cure53/DOMPurify@3.0.7...3.1.7)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…eries referencing dimensions from joined cubes (#8946)

* fix typo

* fix(schema-compiler): fix undefined columns for pre lambda agg queries referencing dims from joined cubes

* add tests

* change order of the references lookup
…(#8936)

* fix(schema-compiler): use query timezone for time granularity origin

* fix tests

* small improvement

* fix dateBin implementation across all Queries (timeStampCast → dateTimeCast)

* Fix MS SQL timeStampCast

* fix ms sql test
…ry. (#8872)

* chore(cubestore): make trace_id and span_id suitable for open telemetry.

* udpate TracingHelper trait

* add teardown to Configurator trait

* fix typo

* instantiate parent span in WorkerProcessing

* add opentelemetry* crates

* cargo fmt

* simplify trace_id_and_span_id.map reinitialization
* fix(dev_env_setup): use relative packages everywhere

* fix: use rollup

* fix: remove link:dev

* fix: don’t show “base” driver

* feat: version 2

* fix: rename
…g failure (#8983)

* fix(clickhouse-driver): Remove SQL from ClickHouse error messages

* fix(clickhouse-driver): Actually handle ping result

* fix(clickhouse-driver): Handle AggregateError in ping result
paveltiunov and others added 28 commits January 23, 2025 14:54
…ery class (#9111)

* Remove moment-timezone from vertica driver

* Remove moment-timezone from druid driver

* Remove moment-timezone from dremio driver

* Remove moment-timezone from cubestore driver

* update moment-timezone in api-gateway

* update moment-timezone in backend-shared

* Remove moment-timezone from query-orchestrator

* update moment-timezone in schema-compiler

* remove unused

* remove moment.HTML5_FMT.DATETIME_LOCAL_MS in favor of just literal

* linter fix

* add mysqlUseNamedTimezones flag

* fix dremio test

* Tiny edits

---------

Co-authored-by: Igor Lukanin <igor@cube.dev>
* Fixing case statements in snowflake driver

* Fixes case sensitivity for snowflake to be default and env driven

Adds env var for the snowflake driver to enable or disable the
case sensitivity and if not set will default to case insensitive

* Updating snowflake driver to respect case by default with added override.

* fix types for identIgnoreCase in snowflake driver

* fix

---------

Co-authored-by: Micheal Taylor <mike.taylor@nomihealth.com>
* create a basic cubeorchestrator project structure

* wip

* move flatbuffer schema/code to separate crate

* implement parse_cubestore_ws_result

* add cubeorchestrator/parse_cubestore_ws_result_message export

* use native parseCubestoreResultMessage

* init hashmap with capacity

* cargo fmt

* some optimizations and improvements

* a bit optimized version

* use cx.execute_scoped for optimization

* a bit more rust idiomatic code

* put native parseCubestoreResultMessage behind the flag

* tiny improvement

* cargo fmt

* cargo fmt

* cargo clippy fix

* update cubestore Dockerfile

* cargo fmt

* update cubestore Docker builds

* introduce CubeStoreResult struct

* create CubeStoreResultWrapper class and switch to lazy evaluation of results set

* add resToRawResultFn in API GW

* cargo fmt

* update resToResultFn

* call prepareAnnotation later

* remove bytes

* add cached flag to CubeStoreResultWrapper

* convert core data types from api gateway

* implement transformData and related helpers

* cargo fmt

* down chrono to same version as in cubesql (0.4.31)

* fail fast in api gw load()

* update cargo.lock (syn crate)

* linter fix

* implement get_query_granularities & get_pivot_query

* prepare transformQueryData native wrapper

* small optimization: do not use native parsing for short messages

* refactor transformValue and related

* types restructure

* debug and fix native transform_data()

* lazy transformData evaluation

* omplement get_final_cubestore_result & get_final_cubestore_result_multi in native

* cargo fmt

* cargo clippy fix

* refactor getVanillaRow

* implement get_final_cubestore_result_array() native

* fix native response flow for sqlApiLoad

* add postgres with native cubestore results driver tests

* Build native (without Python) in drivers tests

* workaround for native build in testings-drivers

* small improvements in CubeStoreResultWrapper

* make parse_cubestore_ws_result_message async

* make all native cubestore_result_transform functions async

* cargo fmt

* refactor results transformations

* yarn sync

* implement custom deserializer from JS Value to rust

and updated the cubestore result transformations with it

* refactor json_to_array_buffer()

* a bit of refactoring

* code rearrangement

* switch to use DBResponsePrimitive instead of just Strings

* refactoring

* always use transform data native for all results

* fix dtos for native query processing (thnx unit tests)

* cache loadNative

* remove not needed anymore native.transformQueryData

* add Build native for unit tests in CI

* refactor serde annotations

* remove unused

* annotate native query results processing functions

* add Build native for unit tests in CI for Debian without pushing

* add few unit tests

* fix some tests

* attempt to fix native build/test in docker-dev CI

* fix empty result set issue

* another fix in datetime parsing

* another fix in deserialization

* another fix in datetime parsing

* update postgres-native-cubestore-response-full.test.ts.snap

* another fix in deserialization

* cargo fmt

* another fix in datetime parsing

* update postgres-native-cubestore-response-full.test.ts.snap

* attempt to fix native build/test in docker-dev CI

* add some comments

* edits in result processing when streaming

* fix native response processing via websocket

* fix yarn lock

* attempt to fix native build/test in cloud integration tests in CI

* recreated unit tests for transform data in native

* run rust unit tests on push

* commented out cargo fmt/build/test for cubesql, cubesqlplanner

* remove transformdata JS implementation

* refactor push CI jobs

* rename CubestoreResultWrapper → ResultWrapper

* add isNative flag to ResultWrapper

* rename getResultInternal → prepareResultTransformData

* use ResultWrapper for all results

* fix getArray in ResultWrapper

* lint fix

* encapsulate rootResultObject into ResultWrapper

* add DataResult interface + implement it in result wrappers

* add isWrapper property to wrappers

* transform wrapped result later before returning the response

* fix lint warn

* fix async request porcessing

* some node version fix

* weird yarn.lock change. Wtf?

* full transition of result native-js-native

* fix @cubejs-backend/native version ref

* refactor ResultWrapper constructor

* fix string deserialize in transport

* extend DataResult interface

* support iterative access in ResultWrapper

* refactor ResultWrapper classes

* pass ResultWrapper to native and offload transformData from the eventloop

* remove obsolete getFinalQueryResultArray

* fix ws api subscriptionServer results processing

* pass SchemaRef to load methods.

* trying to fix flaky ws failing tests

* remove cubeorchestrator dependency from cubesql

* rewrite all load api calls to return RecordBatch

* fix @cubejs-backend/native version ref after rebase

* set prototype of ResultWrapper proxy to class prototype

* fix linter warnings

* fix yarn.lock

* linter fix

* Set CUBEJS_TESSERACT_ORCHESTRATOR: true for all tests workflows

* fix old refs after rebase

* update integration-cubestore job with the build of backend-native

* remove postgres-native-cubestore-response from testing drivers as we turnon new orchestrator for all tests
* fix post-release ci job

* fix cloud tests ci mb

* Add missed packages/cubejs-redshift-driver path for watching changes for drivers tests
* Update monitoring.mdx

* Wording

---------

Co-authored-by: Igor Lukanin <igor@cube.dev>
…ontext (#9120)

It is used to track whether `WrappedSelect` actually wraps ungrouped scan as opposed to old flag, which is used as push to Cube enabler.

Changes in cost are necessary because now ungrouped scans are tracked, we are getting proper values in `wrapped_select_ungrouped_scan` cost component, it would count the `WrappedSelect(ungrouped_scan=true)` nodes in extracted plan. And with old cost it would turns any ungrouped scan under wrapper more expensive than plan with same amount of wrappers (usually it's just 1 anyway), but with more nodes outside wrapper. Consider consuming projection: `Projection(WrappedSelect(ungrouped_scan=true))` vs `WrappedSelect(from=WrappedSelect(ungrouped_scan=true), ungrouped_scan=true)`. Plan with `Projection` would have `ast_size_outside_wrapper=1 wrapped_select_ungrouped_scan=1`, plan with `WrappedSelect` - `ast_size_outside_wrapper=0 wrapped_select_ungrouped_scan=2`, and second one is preferrable.

Also couple of related fixes:
* Mark distinct WrappedSelect as grouped
* Depend on a ungrouped flag for grouped join part: Wrapper can be ungrouped=true, push_to_cube=false, and this is unexpected in ungrouped-grouped join.
Allow to flatten filter node into internal WrappedSelect. This should allow to execute plans like Aggregate(Filter(Join(...))) as a single grouped wrapper.
…eration (#9160)

* fix(databricks-jdbc-driver): Fix extract epoch from timestamp SQL Generation

* add tests
…for Angular 12+ compatibility (#9152) Thanks to @HaidarZ!
Add support for complex join conditions for grouped joins

DataFusion plans non-trivial joins (ones that are not `l.column = r.column`) as `Filter(CrossJoin(...))`
To support ungrouped-grouped joins with queries like this SQL API needs to rewrite logical plan like that to `WrappedSelect` with join inside. To do that it need to distinguish between plan coming from regular `JOIN` and actual `CROSS JOIN` with `WHERE` on top. This is done with new `JoinCheckStage`: it starts on `Filter(CrossJoin(wrapper, wrapper))`, traverses all `AND`s in filter condition, checks that "leaves" in condition are comparing two join sides, and pulls up that fact. After that regular join rewrite can start on checked condition.

Supporting changes:

* Allow grouped join sides to have different in_projection flag

* Allow non-push_to_cube WrappedSelect in grouped subquery position in join

* Make zero members wrapper more expensive than filter member

* Replace alias to cube during wrapper pull up

* Wrap is_null expressions in parens, to avoid operator precedence issues
Expression like `(foo IS NOT NULL = bar IS NOT NULL)`` would try to compare `foo IS NOT NULL` with `bar`, not with `bar IS NOT NULL`
# Conflicts:
#	dev_env_setup.sh
#	package.json
#	tsconfig.json
#	yarn.lock
@akolov akolov merged commit 23858b4 into master Feb 4, 2025
@akolov akolov deleted the alex/dev branch February 4, 2025 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.