refactor: support sourcing images from either file path or in-memory data frame #1174

jimthompson5802 · 2021-05-04T03:11:45Z

Code Pull Requests

PR addresses Issue #484 and Issue #268.

Adds option to support sourcing images from file path or from in-memory as numpy arrays.

Why: renaming variables to support either a string for file path to image or image stored as numpy array in input data set.

Why: Able to accept either file path string or numpy array for image feature.

Why: unit test test_basic_image_feature test sourcing images from both file and as ndarrays in dataframe.

Why: Accounting for possible settings by the user.

Why: To bypass test combination that causes returning index values instead of image ndarrays for training

ludwig/features/image_feature.py

why: to comply with design standard on use of COLUMN key in feature dictionary and added helper function to declutter a long statement.

why: To explain rationale for special handling of unit test where images are sourced from the file system.

why: explict test to ensure the rest api call completed successfully for /predict and /batch_predict endpoints.

why: put in scaffolding to eventually support use of ndarray as image input for test.

why: support new capability int image feature. To support sending ndarray objects in REST api call, added helper functions to ludwig.utils.data_utils.py for serializing/deserializing ndarray objects to/from ludwig custom string format.

Why: Support passing skip_save_processed_input parameter to add_feature_data() methods. This will allow the methods to customize their setup based on the setting of skip_save_processed_input. Immediate need is to support image feature setup to support ndarray support.

jimthompson5802 · 2021-05-10T01:02:43Z

Ready for review. summary of changes:

Major changes in image_feature.py to support both file paths and ndarray as image features in a pandas data frame.
Modified api signature for couple core apis to support passing skip_save_processed_input flag to the image_feature add_feature_data() to handle different setup situations.
added unit test to test for file path and ndarray sourcing of images
added 3 helper functions to utils.data_utils to support passing ndarrays in a REST request
modified unit test for Ludwig server to support passing ndarray images
updated all other feature add_feature_data() methods for adding skip_save_processed_input parameter.

ludwig/features/image_feature.py

Why: Remove need for custom ludwig string format to handle ndarrays. Made more robust in file handling. Add capability to restore dtype for ndarray are set the same as in the original data source. Create ludwig.utils.server_utils to house the new serialize/deseriallize functions.

why: renamed variables to make more consistent. corrected a few log messages.

jimthompson5802 · 2021-05-15T19:31:21Z

Summary of changes:

Deprecated the the ludwig custom string format for ndarrays and removed the related helper functions from ludwig.utils.data_utils.
Complete re-write of the functions that serialize/deserialize the data used in the REST API. Removed the old functions. Created three new helper functions for the write of serializing and deserializing the REST API data. Housed these new helper functions in a new module ludwig.utils.server_utils.py.
Addressed review comments on variable naming, i.e., img_store -> img_entry.

Work remaining:

Have to add additional test for when in_memory and file/ndarray settings are different in training vs when predicting.
Code clean-up on now obsolete code that is currently commented out.

remove code made obsolete by PR ludwig-ai#1174 and address minor todos

provide additional details on how serialize helper function works.

removed code for the deprecated helper functions for ludwig custom string format.

jimthompson5802 · 2021-05-17T01:40:25Z

Converted back to draft mode. Clarification of requirements is forcing a rethink of current approach.

Why: replace function _write_file() with _read_image_buffer() this will eliminate need to write temporary files and do clean up of the temporary files. Remove helper functions for custom ludwig string format of ndarray.

Why: rename img_source to img_entry to be consistent with rest of code base.

Why: Use of Dask backend and use of hdf5 cache is incompatibile. With the recent change to support ndarray for images, there are two conditions that now affect if hdf5 cache is used for images. This change moves the backend test for use of hdf5 cache earlier in processing to avoid interactions with the two conditions that are currently in place.

Why: To support audio features that have not been been updated to support an option for ndarray representation. Add unit test for audio feature for model serving.

Why: allow testing of multiple or single record batch as independent tests.

Why: To be addressed as part of long-term update to the audio feature.

…images_from_memory

jimthompson5802 added 6 commits May 3, 2021 22:59

refactor: rename key image related variable names

2d2f3d0

Why: renaming variables to support either a string for file path to image or image stored as numpy array in input data set.

refactor: code to now accept numpy arrays

e82bf75

Why: Able to accept either file path string or numpy array for image feature.

teat: add unit test for ndarray images feature

0d2a16c

Why: unit test test_basic_image_feature test sourcing images from both file and as ndarrays in dataframe.

teat: add in-memory parameter for unit test

b5453cf

Why: Accounting for possible settings by the user.

teat: add skip_save_processed_input option to test case

9d85020

Why: Accounting for possible settings by the user.

test: insert additional if test for file test

9b6d638

Why: To bypass test combination that causes returning index values instead of image ndarrays for training

w4nderlust reviewed May 7, 2021

View reviewed changes

ludwig/features/image_feature.py Outdated Show resolved Hide resolved

w4nderlust reviewed May 7, 2021

View reviewed changes

ludwig/features/image_feature.py Outdated Show resolved Hide resolved

w4nderlust reviewed May 7, 2021

View reviewed changes

ludwig/features/image_feature.py Outdated Show resolved Hide resolved

jimthompson5802 added 7 commits May 7, 2021 20:51

refactor: incorporated reviewer comments

d836e72

why: to comply with design standard on use of COLUMN key in feature dictionary and added helper function to declutter a long statement.

Merge branch 'master' into support_sourcing_images_from_memory

1c6edbb

doc: update image test comments

1656dae

why: To explain rationale for special handling of unit test where images are sourced from the file system.

refactor: add assert on response status code

ce54d1c

why: explict test to ensure the rest api call completed successfully for /predict and /batch_predict endpoints.

refactor: prep for adding ndarray test

9ebb9b9

why: put in scaffolding to eventually support use of ndarray as image input for test.

jimthompson5802 marked this pull request as ready for review May 10, 2021 01:02

jimthompson5802 requested a review from w4nderlust May 10, 2021 01:04

w4nderlust reviewed May 11, 2021

View reviewed changes

ludwig/features/image_feature.py Outdated Show resolved Hide resolved

ludwig/features/image_feature.py Outdated Show resolved Hide resolved

ludwig/features/image_feature.py Outdated Show resolved Hide resolved

ludwig/features/image_feature.py Outdated Show resolved Hide resolved

jimthompson5802 added 2 commits May 15, 2021 14:48

refactor: incorporate reviewer comments & misc cleanup

f534409

why: renamed variables to make more consistent. corrected a few log messages.

jimthompson5802 requested a review from w4nderlust May 15, 2021 19:31

jimthompson5802 added 5 commits May 16, 2021 08:49

refactor: code clean up

e01f0a1

remove code made obsolete by PR ludwig-ai#1174 and address minor todos

Merge branch 'master' into support_sourcing_images_from_memory

d1aefd4

doc: update docstring

42317aa

provide additional details on how serialize helper function works.

doc: minor edits in docstring

15aa133

refactor: remove obsolete code

d6166e0

removed code for the deprecated helper functions for ludwig custom string format.

jimthompson5802 marked this pull request as draft May 17, 2021 01:39

jimthompson5802 added 6 commits May 17, 2021 08:05

refactor: model server unit test

4ef7051

Why: replace function _write_file() with _read_image_buffer() this will eliminate need to write temporary files and do clean up of the temporary files. Remove helper functions for custom ludwig string format of ndarray.

refactor: rename variable to be consistent

f73bc9b

Why: rename img_source to img_entry to be consistent with rest of code base.

refactor: re-introduce _write_file() function

3e2a2ce

Why: To support audio features that have not been been updated to support an option for ndarray representation. Add unit test for audio feature for model serving.

test: split out single vs multiple record audio unit test

7769afe

Why: allow testing of multiple or single record batch as independent tests.

refactor: short-term fix for Issue ludwig-ai#1181

3174467

Why: To be addressed as part of long-term update to the audio feature.

jimthompson5802 marked this pull request as ready for review May 21, 2021 00:34

Merge remote-tracking branch 'upstream/master' into support_sourcing_…

a1d4a22

…images_from_memory

w4nderlust approved these changes May 23, 2021

View reviewed changes

w4nderlust merged commit 30d164e into ludwig-ai:master May 23, 2021

This was referenced May 23, 2021

Is it possible to pass 2D numpy arrays as input instead of png/jpg for image classification? #484

Closed

pass image in memory to predict API for image classification #268

Closed

jimthompson5802 deleted the support_sourcing_images_from_memory branch July 10, 2021 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: support sourcing images from either file path or in-memory data frame #1174

refactor: support sourcing images from either file path or in-memory data frame #1174

jimthompson5802 commented May 4, 2021

jimthompson5802 commented May 10, 2021

jimthompson5802 commented May 15, 2021

jimthompson5802 commented May 17, 2021

refactor: support sourcing images from either file path or in-memory data frame #1174

refactor: support sourcing images from either file path or in-memory data frame #1174

Conversation

jimthompson5802 commented May 4, 2021

Code Pull Requests

jimthompson5802 commented May 10, 2021

jimthompson5802 commented May 15, 2021

jimthompson5802 commented May 17, 2021