Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: DuckDB native support for HuggingFace urls #2817

Closed
wants to merge 25 commits into from

Conversation

AndreaFrancis
Copy link
Contributor

@AndreaFrancis AndreaFrancis commented May 15, 2024

Following up duckdb/duckdb#11831
Draft in progress to document DuckDB CLI usage with native support for HF URLs.
I plan to add:

  • Authentication for private and gated datasets (Using DuckDB Secrets Manager)
  • Query datasets (Some basic SELECT examples, DESCRIBE, SUMMARIZE for stats)
  • Perform SQL operations (Text functions and aggregations)
  • Combine datasets, export and publish on the Hub
  • Perform vector similarity search?

Please let me know if you have any better ideas or if I can remove some of the above points.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/duckdb_cli.md Outdated Show resolved Hide resolved
@AndreaFrancis AndreaFrancis changed the title WIP doc: DuckDB native support for HuggingFace urls doc: DuckDB native support for HuggingFace urls May 22, 2024
@AndreaFrancis AndreaFrancis marked this pull request as ready for review May 22, 2024 21:34
@AndreaFrancis AndreaFrancis requested review from a team and stevhliu May 22, 2024 21:34
@AndreaFrancis
Copy link
Contributor Author

  • Implement full-text search? -> Will add it in another PR if needed (I would rather to cover the basic cases)

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool and well-written! 👏

docs/source/duckdb_cli.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
AndreaFrancis and others added 3 commits May 23, 2024 08:33
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
docs/source/_toctree.yml Outdated Show resolved Hide resolved
docs/source/_toctree.yml Outdated Show resolved Hide resolved
Comment on lines 39 to 56
- local: duckdb
title: DuckDB
sections:
- local: duckdb_cli
title: DuckDB CLI
sections:
- local: duckdb_cli_auth
title: Authentication for private and gated datasets
- local: duckdb_cli_select
title: Query datasets
- local: duckdb_cli_sql
title: Perform SQL operations
- local: duckdb_cli_combine_and_export
title: Combine datasets and export
- local: duckdb_cli_vector_similarity_search
title: Perform vector similarity search
- local: duckdb_cli_fts
title: Implement full-text search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's because we don't support this depth of sections ? trying with one less level

Suggested change
- local: duckdb
title: DuckDB
sections:
- local: duckdb_cli
title: DuckDB CLI
sections:
- local: duckdb_cli_auth
title: Authentication for private and gated datasets
- local: duckdb_cli_select
title: Query datasets
- local: duckdb_cli_sql
title: Perform SQL operations
- local: duckdb_cli_combine_and_export
title: Combine datasets and export
- local: duckdb_cli_vector_similarity_search
title: Perform vector similarity search
- local: duckdb_cli_fts
title: Implement full-text search
- local: duckdb
title: DuckDB
sections:
- local: duckdb_cli
title: DuckDB CLI
- local: duckdb_cli_auth
title: Authentication for private and gated datasets
- local: duckdb_cli_select
title: Query datasets
- local: duckdb_cli_sql
title: Perform SQL operations
- local: duckdb_cli_combine_and_export
title: Combine datasets and export
- local: duckdb_cli_vector_similarity_search
title: Perform vector similarity search
- local: duckdb_cli_fts
title: Implement full-text search

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added but not sure why it is still now working ..

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing ! I just added a few edit suggestions to remove remaining @~parquet that are not useful.

I'm having issues making the docs preview work though

docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
docs/source/duckdb_cli_sql.md Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented May 23, 2024

Also I feel like this should be in hub-docs no ? at https://huggingface.co/docs/hub/datasets-duckdb

In the Viewer's docs we can redirect to the hub-docs, and present the more advanced use case of using the Parquet export provided by the viewer maybe

AndreaFrancis and others added 4 commits May 23, 2024 10:15
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
@AndreaFrancis
Copy link
Contributor Author

Also I feel like this should be in hub-docs no ? at https://huggingface.co/docs/hub/datasets-duckdb

I will close this PR in favor of https://github.com/huggingface/hub-docs/pull/1297and then will refer to it in Dataset viewer doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants