Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the interface to transform dataframe to Huggingface dataset with a column typed with Image. #284

Closed
svjack opened this issue Mar 3, 2023 · 2 comments
Labels

Comments

@svjack
Copy link

svjack commented Mar 3, 2023

I review the construction of this project about dataframe. When use this dataframe with image_url column,
It seems, the inner logic of fast display image, is to render the cell by img src and send them to "repr_html" and render them in html format in Jupyter notebook rather than download the real image.
(and in the yaml config file define the image formatter for display different size images)
as your documentation say, use "to_" prefix methods (such as to_csv to_arrow) and so on, they all drop
the image column, but when use "write" and "read" method, it solely save the "config" (not trigger the truly download function)
This design makes a "lossy transformation" of image, when I want to init a Huggingface dataset from your dataframe rapidly, it is not convenient. (e.x. Dataset(df.to_arrow()) )
I think you should add a trigger for truly download the image of the image column and wrap it by a timeout
decorator (you already have _write_empty_image defination) add this function may be easy.

@krandiash
Copy link
Contributor

Generally, we avoid downloading / reading in images as far as possible, but it sounds like this might be a case where that isn't appropriate.

Would you mind pasting in a small example code snippet that shows what you would like to do from Meerkat -> HF Dataset? We can take a closer look.

@krandiash
Copy link
Contributor

Hi @svjack let us know if we can take a look, happy to help.

@krandiash krandiash added the stale label Mar 7, 2023
@krandiash krandiash closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants