Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataset features and dataset preview for HuggingFaceDatasetSaver #5135

Merged
merged 6 commits into from Aug 9, 2023

Conversation

freddyaboulton
Copy link
Collaborator

@freddyaboulton freddyaboulton commented Aug 8, 2023

Description

Fixes:

  • If a media file (Image/Audio) was None, flagging would fail
  • Jsonl dataset format was not compatible with datasets library
  • Paths in json files now include the full path relative to the root of the dataset dir. This helps loading to HuggingFace datasets library.

Internal thread where this came up: https://huggingface.slack.com/archives/C02QZLG8GMN/p1691444822924499

To test:

Set up a demo with flagging

import gradio as gr

def echo(msg, history, img):
    return msg

with gr.Blocks() as demo:
    image = gr.Image(label='Image')
    chat = gr.ChatInterface(echo, additional_inputs=[image])
    flagger = gr.Button("Flag")

    flagging = gr.HuggingFaceDatasetSaver(
        dataset_name="chatinterface_with_image_csv",
        hf_token="<your token>",
        separate_dirs=True)
    flagging.setup([chat.chatbot, image], "chatinterface_with_image_csv_2")
    flagger.click(lambda *args: flagging.flag(flag_data=args), inputs=[chat.chatbot , image], outputs=None, preprocess=False)

demo.launch()

Flag a couple with and without the image as None, then check out the dataset:
https://huggingface.co/datasets/freddyaboulton/chatinterface_with_image_json

🎯 PRs Should Target Issues

Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.

Not adhering to this guideline will result in the PR being closed.

Tests

  1. PRs will only be merged if tests pass on CI. To run the tests locally, please set up your Gradio environment locally and run the tests: bash scripts/run_all_tests.sh

  2. You may need to run the linters: bash scripts/format_backend.sh and bash scripts/format_frontend.sh

@vercel
Copy link

vercel bot commented Aug 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
gradio ✅ Ready (Inspect) Visit Preview Aug 9, 2023 4:54pm

@freddyaboulton freddyaboulton marked this pull request as ready for review August 8, 2023 18:31
@gradio-pr-bot
Copy link
Contributor

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
gradio minor
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Add tests

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Aug 8, 2023

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-5135-all-demos


You can install the changes in this PR by running:

pip install https://gradio-builds.s3.amazonaws.com/ec1760b21a9a7c00fd9ead9364f4a1a6199905c2/gradio-3.39.0-py3-none-any.whl

@freddyaboulton freddyaboulton changed the title Add tests Fix dataset features and dataset preview for HuggingFaceDatasetSaver Aug 8, 2023
@freddyaboulton freddyaboulton added the t: fix A change that implements a fix label Aug 8, 2023
repo_id=self.dataset_id,
filename=path_in_repo,
repo_type="dataset",
if deserialized:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing errors if the image was None

@freddyaboulton freddyaboulton changed the title Fix dataset features and dataset preview for HuggingFaceDatasetSaver WIP: Fix dataset features and dataset preview for HuggingFaceDatasetSaver Aug 8, 2023
@freddyaboulton freddyaboulton force-pushed the hf-dataset-flagging-fixes branch 2 times, most recently from 60ee2c9 to 1dca866 Compare August 8, 2023 20:05
@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Aug 8, 2023

🎉 Chromatic build completed!

There are 0 visual changes to review.
There are 0 failed tests to fix.

@freddyaboulton freddyaboulton changed the title WIP: Fix dataset features and dataset preview for HuggingFaceDatasetSaver Fix dataset features and dataset preview for HuggingFaceDatasetSaver Aug 8, 2023
@abidlabs
Copy link
Member

abidlabs commented Aug 8, 2023

Taking a look!

@abidlabs
Copy link
Member

abidlabs commented Aug 8, 2023

Thanks for the fix @freddyaboulton! I can confirm that the flagging does not error if the Image is None. Generally the changes look good.

Tested with some old demos and they continue to work. I ran your demo as well. Two things I noticed in the flagged dataset: https://huggingface.co/datasets/abidlabs/chatinterface_with_image_csv

  1. When there is no image, the image string is "None" -- just "" would be better
  2. I don't get to see any image previews in the dataset viewer. Were you able to see image previews?

@severo
Copy link

severo commented Aug 9, 2023

I don't get to see any image previews in the dataset viewer. Were you able to see image previews?

to have images in the dataset viewer, the column has to use the Image type (currently, it's "string", so we display a string). More details here: https://huggingface.co/docs/datasets/image_dataset.

@freddyaboulton
Copy link
Collaborator Author

@abidlabs - I'm not sure if our flagging format has been compatible with the expected format by datasets/datasets-server. I spent some time looking into it this morning but I think it will take a bigger effort than is worth it now given our other priorities. I will file an issue to track my thoughts.

@abidlabs
Copy link
Member

abidlabs commented Aug 9, 2023

@abidlabs - I'm not sure if our flagging format has been compatible with the expected format by datasets/datasets-server. I spent some time looking into it this morning but I think it will take a bigger effort than is worth it now given our other priorities. I will file an issue to track my thoughts.

I agree, thanks @freddyaboulton -- LGTM

@freddyaboulton freddyaboulton merged commit 80727bb into main Aug 9, 2023
16 checks passed
@freddyaboulton freddyaboulton deleted the hf-dataset-flagging-fixes branch August 9, 2023 18:14
@pngwn pngwn mentioned this pull request Aug 9, 2023
@lhoestq
Copy link

lhoestq commented Aug 14, 2023

@abidlabs - I'm not sure if our flagging format has been compatible with the expected format by datasets/datasets-server. I spent some time looking into it this morning but I think it will take a bigger effort than is worth it now given our other priorities. I will file an issue to track my thoughts.

To have see the images in the viewer I think you can simply:

  • use null instead of "None" for none images
  • have a "file_name" field in metadata.jsonl with a relative path to the image
  • remove the configs field in the yaml in README.md

see the ImageFolder docs: https://huggingface.co/docs/datasets/image_dataset#imagefolder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t: fix A change that implements a fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants