Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse offline store support #1224

Conversation

gingerwizard
Copy link
Contributor

@gingerwizard gingerwizard commented Dec 15, 2023

Description

Support for ClickHouse as offline store. Closes #1160

Type of change

This adds support for ClickHouse as an offline store. While CH could also be used as online store (and vector db) this is the main functional usecase.

I've made a few key design decisions/questions:

  • training sets, transformations and materializations are subject to update. We just rebuild these and use a standard table - materialized views in ClickHouse are a different concept. This means we can use a standard MergeTree table engine.
  • For offline tables there appears to be a need to support updates i.e. writeUpdate, which according to sql.go base provider is always done by entity and ts. For this table we therefore use a ReplacingMergeTree - inserts with the same entity and ts will be de-duplicated (latest preserved) at insert and query time (final is set on all queries).
  • For optimal query performance we have used the entity, ts as the ordering key on any offline tables. This key combination is also required to uniquely identify rows in our ReplacingMergeTree. The ts column can be null though - this means i need to use allow_nullable_key for the table - this is generally not recommended but probably ok for this usecase.

Its not clear to me what a materialization is vs an offline table i.e. OfflineTable. The former i assume is when a feature is represented and "materialized" into the store. OfflineTable looks the same structurally - entity, ts, value columns.

The above should be fine provided materializations cant recieve upserts like offline tables. If they are, we can make these tables a ReplacingMergeTree also.

A few other things:

  • The methods materializationCreate and transformationCreate have been modified to return []string vs string. We can't do these in a single statement and don't support multiple statements in a single query. I've updated other offline stores and discussed with @ahmadnazeri
  • the go client returns time.Time{} for the zero date. We coerce this to 1970. Unclear why the client does this but it seems intentional and i'm investigating.
  • We use DateTime64 with nanosecond precision. This might be unneccessary but its impossible to know what data we will get and complex to detect so we lean on the side of caution.
  • It might make sense to make entity LowCardinality(String) - without knowing common cardinalities its hard to say though and a premature optimization.
  • I've extended type support for uints, as well as int8 and int16. These are basic types and seem to have been ommitted but can significantly improve CH performance.

Does this correspond to an open issue?

#1160

Select type(s) of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have fixed any merge conflicts

aolfat and others added 30 commits November 28, 2023 15:04
Co-authored-by: sdreyer <sterling@featureform.com>
Co-authored-by: sdreyer <sterling@featureform.com>
…1193)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sdreyer <sterling@featureform.com>
… cli. (featureform#962)

Co-authored-by: sdreyer <sterling@featureform.com>
Copy link
Contributor

@ahmadnazeri ahmadnazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part! Left some comments on the PR. There are two parts missing:

  1. End to End Behave Tests
  2. Having instant of Clickhouse available for CICD like we do for Postgres

client/mkdocs.yml Outdated Show resolved Hide resolved
health/health_test.go Outdated Show resolved Hide resolved
health/health_test.go Outdated Show resolved Hide resolved
health/health_test.go Outdated Show resolved Hide resolved
provider/offline_test.go Outdated Show resolved Hide resolved
provider/clickhouse.go Show resolved Hide resolved
provider/clickhouse.go Show resolved Hide resolved
provider/clickhouse.go Show resolved Hide resolved
provider/clickhouse.go Outdated Show resolved Hide resolved
provider/clickhouse.go Outdated Show resolved Hide resolved
@gingerwizard
Copy link
Contributor Author

@ahmadnazeri i believe ive resolved all issues and added ci/cd tests + support for batch features. This should be ready to test.

anothserfile.txt Outdated Show resolved Hide resolved
@gingerwizard
Copy link
Contributor Author

@epps addressed comments

@ahmadnazeri ahmadnazeri changed the base branch from main to feature/clickhouse_provider January 9, 2024 23:46
@ahmadnazeri ahmadnazeri merged commit fe50993 into featureform:feature/clickhouse_provider Jan 9, 2024
ahmadnazeri added a commit that referenced this pull request Jan 16, 2024
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Dale McDiarmid <dale@clickhouse.com>
Co-authored-by: Ali Olfat <ali@featureform.com>
Co-authored-by: sdreyer <sterling@featureform.com>
Co-authored-by: Erik Eppel <erik@featureform.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: anthonylasso <34227093+anthonylasso@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Jan 16, 2024
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Dale McDiarmid <dale@clickhouse.com>
Co-authored-by: Ali Olfat <ali@featureform.com>
Co-authored-by: sdreyer <sterling@featureform.com>
Co-authored-by: Erik Eppel <erik@featureform.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: anthonylasso <34227093+anthonylasso@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ClickHouse as an Offline Store[Feature Request]:
6 participants