JobManagerCrashedError when trying to generate train split viewer #2799

bghira · 2024-05-13T17:54:26Z

When loading this dataset onto the hub which contains an image field containing image bytes, I'm receiving a JobManagerCrashedError

It's not clear exactly why this happens, or the best way to encode the images in the dataset. I looked exhaustively for some examples on how to do this but wasn't much other than the data card spec

I added the features section to the dataset card with the theory that it simply didn't know how to decode that column, and the large size threw it off. That didn't change things, though the JobManagerCrashedError seemed to take longer to occur, maybe that's just an artifact of job scheduling on the backend.

The text was updated successfully, but these errors were encountered:

bghira · 2024-05-13T17:55:32Z

the code i've used to assemble the dataset:

data = []
for root, _, files in os.walk(args.input_folder):
    for file in tqdm(files, desc="Processing images"):
        try:
            image = Image.open(os.path.join(root, file))
        except:
            continue

        width, height = get_size(image)
        luminance = get_image_luminance(image)
        image_hash = get_image_hash(image)
        # Get the smallest original compressed representation of the image
        file_data = open(os.path.join(root, file), "rb").read()
        image_data = np.frombuffer(file_data, dtype=np.uint8)


        data.append((file, image_hash, width, height, luminance, image_data))

df = pd.DataFrame(data, columns=["filename", "image_hash", "width", "height", "luminance", "image"])
df.to_parquet(os.path.join(args.output_folder, "images.parquet"), index=False)

print("Done!")

github-actions · 2024-06-13T15:04:09Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JobManagerCrashedError when trying to generate train split viewer #2799

JobManagerCrashedError when trying to generate train split viewer #2799

bghira commented May 13, 2024

bghira commented May 13, 2024

github-actions bot commented Jun 13, 2024

JobManagerCrashedError when trying to generate train split viewer #2799

JobManagerCrashedError when trying to generate train split viewer #2799

Comments

bghira commented May 13, 2024

bghira commented May 13, 2024

github-actions bot commented Jun 13, 2024