Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow image bytes type during preprocessing #3971

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

vijayi1
Copy link
Contributor

@vijayi1 vijayi1 commented Mar 19, 2024

In image_feature.py, image bytes instance type is handled by all the read_image routines, except for preprocessing.
added the same bytes instance logic to the _finalize_preprocessing function.
tested the mnist example, with image paths as well as image bytes objects.

Copy link
Collaborator

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- @vijayi1 -- thank you very much for this enhancement!

Copy link

Unit Test Results

  6 files  ±0    6 suites  ±0   13m 49s ⏱️ - 3m 14s
12 tests ±0    7 ✔️  -   2    5 💤 +  2  0 ±0 
60 runs  ±0  30 ✔️  - 12  30 💤 +12  0 ±0 

Results for commit c98457c. ± Comparison against base commit c09d5dc.

This pull request skips 2 tests.
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[ames_housing.gbm.yaml]
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[mercedes_benz_greener.gbm.yaml]

Copy link
Contributor

@arnavgarg1 arnavgarg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Is it possible to add a simple test?

@vijayi1
Copy link
Contributor Author

vijayi1 commented Mar 19, 2024

I used the following on examples/mnist/ and trained with the df -

    import pandas as pd
    
    df = mnist.load()
    
    image_bytes = []
    for index, row in df.iterrows():
        img_path = row['image_path']
        f = open(img_path, mode="rb")
        img_bytes = f.read()
        f.close()
        image_bytes.append(img_bytes)
    
    df_bytes = pd.DataFrame.from_dict({'image_path':image_bytes})
    
    # replace image file paths with image bytes
    df = df.drop(['image_path'],axis=1)
    df = pd.concat([df, df_bytes], axis=1)
    
    #print(df.head())
    #df.to_parquet("temp.parquet")

@arnavgarg1
Copy link
Contributor

Sounds good!

@arnavgarg1 arnavgarg1 merged commit 437732f into ludwig-ai:master Mar 22, 2024
18 checks passed
@vijayi1 vijayi1 deleted the image-bytes-preprocessing branch April 8, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants