Skip to content

[HWORKS-2391] Dlthub docs#560

Merged
gibchikafa merged 2 commits intologicalclocks:mainfrom
gibchikafa:dlthub
Mar 26, 2026
Merged

[HWORKS-2391] Dlthub docs#560
gibchikafa merged 2 commits intologicalclocks:mainfrom
gibchikafa:dlthub

Conversation

@gibchikafa
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation for ingesting data into managed Feature Groups using dltHub, and documents the new CRM/Sales/Analytics and REST API Data Source connectors in the Hopsworks docs site.

Changes:

  • Extends mkdocs.yml navigation to include new connector pages and the dltHub ingestion guide.
  • Adds new how-to pages for CRM/Sales/Analytics and REST API Data Sources, plus a full dltHub ingestion workflow guide.
  • Updates existing Feature Group/Data Source index & usage pages to surface the new ingestion workflow.

Reviewed changes

Copilot reviewed 7 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mkdocs.yml Adds nav entries for the new Data Source connector docs and dltHub ingestion guide.
docs/user_guides/fs/feature_group/ingest_with_dlthub.md New end-to-end guide for creating a managed Feature Group via dltHub ingestion (UI + API examples).
docs/user_guides/fs/feature_group/index.md Adds a link to the new dltHub ingestion guide.
docs/user_guides/fs/data_source/usage.md Adds a section describing ingestion into managed Feature Groups and links to the new guide.
docs/user_guides/fs/data_source/index.md Lists the new CRM/Sales/Analytics and REST API Data Source connectors.
docs/user_guides/fs/data_source/creation/rest_api.md New how-to for creating a REST API Data Source in the UI.
docs/user_guides/fs/data_source/creation/crm_sales_analytics.md New how-to for creating a CRM/Sales/Analytics Data Source in the UI.
docs/assets/images/guides/fs/feature_group/dlthub_rest_page_number_pagination.png Adds a screenshot used in the new ingestion guide.

Comment on lines +293 to +297
from hopsworks_common.core import sink_job_configuration

fs = project.get_feature_store()
data_source = fs.get_data_source("my_sql_source").get_tables()[0]
data = data_source.get_data(use_cached=False)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python example uses project.get_feature_store() but project isn't defined in this code block (and there is no preceding login snippet in this page). Since Python fences are linted with snakeoil/ruff in this repo, this will likely fail with an undefined-name error; consider adding the minimal import hopsworks + project = hopsworks.login(...) setup (or otherwise making the snippet self-contained) before using project.

Copilot uses AI. Check for mistakes.
Comment on lines +323 to +326
```python
from hopsworks_common.core import rest_endpoint, sink_job_configuration
from hsfs.core import data_source as ds

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This REST ingestion example also relies on an undefined project variable (used to obtain fs). To keep the snippet ruff-clean under snakeoil, include the login/setup lines in the code block (or make it explicit and suppress the undefined-name linting in a way that passes CI).

Suggested change
```python
from hopsworks_common.core import rest_endpoint, sink_job_configuration
from hsfs.core import data_source as ds
```python
import hopsworks
from hopsworks_common.core import rest_endpoint, sink_job_configuration
from hsfs.core import data_source as ds
project = hopsworks.login()

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +283
- Use the [Feature Group creation guide](create.md) to understand managed feature groups in more detail.
- Use the [External Feature Group guide](create_external.md) if you want to query the source in place without copying data into Hopsworks.
- Use the [Online Ingestion Observability guide](online_ingestion_observability.md) to monitor ingestion behavior for online-enabled feature groups.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three "Next Steps" links use relative file paths (e.g. create.md). This repo's docs style guide warns these can break with mike versioning; prefer the mkdocs-autorefs style ([text][heading-id]) or another version-stable internal link format used in the project.

Suggested change
- Use the [Feature Group creation guide](create.md) to understand managed feature groups in more detail.
- Use the [External Feature Group guide](create_external.md) if you want to query the source in place without copying data into Hopsworks.
- Use the [Online Ingestion Observability guide](online_ingestion_observability.md) to monitor ingestion behavior for online-enabled feature groups.
- Use the [Feature Group creation guide][how-to-create-a-feature-group] to understand managed feature groups in more detail.
- Use the [External Feature Group guide][how-to-create-an-external-feature-group] if you want to query the source in place without copying data into Hopsworks.
- Use the [Online Ingestion Observability guide][online-ingestion-observability] to monitor ingestion behavior for online-enabled feature groups.

Copilot uses AI. Check for mistakes.
- Use full-load or incremental ingestion strategies.
- Build managed feature groups from SQL, CRM, or REST API sources.

For the full workflow, including schema selection, ingestion job configuration, loading strategies, and REST pagination, see [Ingest Data with dltHub](../feature_group/ingest_with_dlthub.md).
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new link to the dltHub ingestion guide is a relative path (../feature_group/ingest_with_dlthub.md). The docs style guide recommends avoiding relative file links because they can break across mike-versioned docs; consider switching to the version-stable internal link format (mkdocs-autorefs / heading-id references).

Suggested change
For the full workflow, including schema selection, ingestion job configuration, loading strategies, and REST pagination, see [Ingest Data with dltHub](../feature_group/ingest_with_dlthub.md).
For the full workflow, including schema selection, ingestion job configuration, loading strategies, and REST pagination, see [Ingest Data with dltHub][ingest-data-with-dlthub].

Copilot uses AI. Check for mistakes.
@gibchikafa gibchikafa merged commit 9dd8396 into logicalclocks:main Mar 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants