Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dataset load onboarding flow. #764

Merged
merged 6 commits into from
Oct 12, 2023
Merged

Improve dataset load onboarding flow. #764

merged 6 commits into from
Oct 12, 2023

Conversation

nsthorat
Copy link
Contributor

@nsthorat nsthorat commented Oct 12, 2023

When loading a dataset, show the field selection settings, improve the wording around it. Add a preview at the bottom.

Remove the datasetStore entirely. This is useless, and causes issues when using components from outside dataset pages.

dataset_onboard.mp4

https://huggingface.co/spaces/lilacai/nikhil_staging

@@ -55,6 +55,8 @@
SAMPLE_AVG_TEXT_LENGTH = 1000
MAX_TEXT_LEN_DISTINCT_COUNT = 250

DEFAULT_EMBEDDING = 'gte-small'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do EMBEDDING_SORT_PRIORITIES[0] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -95,7 +95,8 @@ def test_config_compute_signal(make_test_data: TestDataMaker) -> None:
name='test_dataset',
source=TestSource(),
# 'text' is the longest path, so should be set as the default setting.
settings=DatasetSettings(ui=DatasetUISettings(media_paths=[('text',)])))
settings=DatasetSettings(
ui=DatasetUISettings(media_paths=[('text',)]), preferred_embedding=DEFAULT_EMBEDDING))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having to define it explicitly, should we default it to DEFAULT_EMB in the basemodel definition of DatasetUISetting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nsthorat nsthorat merged commit 0de7eb7 into main Oct 12, 2023
4 checks passed
@nsthorat nsthorat deleted the nik-onboard branch October 12, 2023 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants