ZeroDivisionError in audio feature prediction with only a single record #1181

jimthompson5802 · 2021-05-20T23:46:06Z

Describe the bug
When generating a model prediction involving an audio feature and the data set contains only one record, a ZeroDvisionError exception is raised in ludwig.utils.audio_utils.calcuate_var() function. The error occurs because the variable count = 1 which results in a zero in the denominator of this expression:

return (sum2 - ((sum1 * sum1) / float(count))) / float(count - 1)

To Reproduce
Steps to reproduce the behavior:
Run unit test test_server.py

Expected behavior
Successful prediction with only a single record.

Log file
Here is log and error messages:

PASSED [ 66%]FAILED [100%]Failed to run predict: float division by zero
Traceback (most recent call last):
  File "/opt/project/ludwig/serve.py", line 92, in predict
    dataset=[entry], data_format=dict
  File "/opt/project/ludwig/api.py", line 683, in predict
    backend=self.backend,
  File "/opt/project/ludwig/data/preprocessing.py", line 1728, in preprocess_for_prediction
    backend
  File "/opt/project/ludwig/data/preprocessing.py", line 162, in preprocess_for_prediction
    backend=backend
  File "/opt/project/ludwig/data/preprocessing.py", line 1080, in build_dataset
    skip_save_processed_input
  File "/opt/project/ludwig/data/preprocessing.py", line 1225, in build_data
    skip_save_processed_input
  File "/opt/project/ludwig/features/audio_feature.py", line 343, in add_feature_data
    backend
  File "/opt/project/ludwig/features/audio_feature.py", line 174, in _process_in_memory
    merged_stats['var'] = calculate_var(merged_stats['sum'], merged_stats['sum2'], merged_stats['count'])
  File "/opt/project/ludwig/utils/audio_utils.py", line 240, in calculate_var
    return (sum2 - ((sum1 * sum1) / float(count))) / float(count - 1)
ZeroDivisionError: float division by zero

tests/integration_tests/test_server.py:193 (test_server_integration_with_audio[True])
500 != 200

Expected :200
Actual   :500
<Click to see difference>

single_record = True, csv_filename = 'EA8A7B56DE.csv'

    @pytest.mark.parametrize('single_record', [False, True])
    def test_server_integration_with_audio(single_record, csv_filename):
        # Audio Inputs
        audio_dest_folder = os.path.join(os.getcwd(), 'generated_audio')
    
        # Resnet encoder
        input_features = [
            audio_feature(
                folder=audio_dest_folder,
            ),
            text_feature(encoder='embed', min_len=1),
            numerical_feature(normalization='zscore')
        ]
        output_features = [
            category_feature(vocab_size=2),
            numerical_feature()
        ]
    
        rel_path = generate_data(input_features, output_features, csv_filename)
        model, output_dir = train_model(input_features, output_features,
                                        data_csv=rel_path)
    
        app = server(model)
        client = TestClient(app)
        response = client.get('/')
        assert response.status_code == 200
    
        response = client.post('/predict')
        # expect the HTTP 400 error code for this situation
        assert response.status_code == 400
        assert response.json() == ALL_FEATURES_PRESENT_ERROR
    
        data_df = read_csv(rel_path)
    
        if single_record:
            # Single record prediction
            first_entry = data_df.T.to_dict()[0]
            data, files = convert_to_form(first_entry)
            server_response = client.post('/predict', data=data, files=files)
>           assert server_response.status_code == 200
E           assert 500 == 200

../../tests/integration_tests/test_server.py:233: AssertionError

Environment (please complete the following information):

OS: [e.g. iOS] Ludwig Docker container
Version [e.g. 22]
Python version: 3.6.9
Ludwig version: 0.4-dev0

Additional context
As short-term work-around modified denominator to be max(1, count-1).

The text was updated successfully, but these errors were encountered:

Why: To be addressed as part of long-term update to the audio feature.

…1326) resolves the zerodivisionerror when calculating sample variance in a batch of 1 audio input record.

jimthompson5802 added the bug Something isn't working label May 20, 2021

jimthompson5802 added a commit to jimthompson5802/ludwig that referenced this issue May 20, 2021

refactor: short-term fix for Issue ludwig-ai#1181

3174467

Why: To be addressed as part of long-term update to the audio feature.

jimthompson5802 mentioned this issue Sep 26, 2021

FIX: Issue 1181 resolves the ZeroDivisionError when calculating sample variance #1326

Merged

w4nderlust closed this as completed in #1326 Sep 26, 2021

w4nderlust pushed a commit that referenced this issue Sep 26, 2021

Fix #1181 audio feature preprocessing error when computing variance (#…

f344a26

…1326) resolves the zerodivisionerror when calculating sample variance in a batch of 1 audio input record.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroDivisionError in audio feature prediction with only a single record #1181

ZeroDivisionError in audio feature prediction with only a single record #1181

jimthompson5802 commented May 20, 2021

ZeroDivisionError in audio feature prediction with only a single record #1181

ZeroDivisionError in audio feature prediction with only a single record #1181

Comments

jimthompson5802 commented May 20, 2021