Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError in audio feature prediction with only a single record #1181

Closed
jimthompson5802 opened this issue May 20, 2021 · 0 comments · Fixed by #1326
Closed

ZeroDivisionError in audio feature prediction with only a single record #1181

jimthompson5802 opened this issue May 20, 2021 · 0 comments · Fixed by #1326
Labels
bug Something isn't working

Comments

@jimthompson5802
Copy link
Collaborator

Describe the bug
When generating a model prediction involving an audio feature and the data set contains only one record, a ZeroDvisionError exception is raised in ludwig.utils.audio_utils.calcuate_var() function. The error occurs because the variable count = 1 which results in a zero in the denominator of this expression:

return (sum2 - ((sum1 * sum1) / float(count))) / float(count - 1)

To Reproduce
Steps to reproduce the behavior:
Run unit test test_server.py

Expected behavior
Successful prediction with only a single record.

Log file
Here is log and error messages:

PASSED [ 66%]FAILED [100%]Failed to run predict: float division by zero
Traceback (most recent call last):
  File "/opt/project/ludwig/serve.py", line 92, in predict
    dataset=[entry], data_format=dict
  File "/opt/project/ludwig/api.py", line 683, in predict
    backend=self.backend,
  File "/opt/project/ludwig/data/preprocessing.py", line 1728, in preprocess_for_prediction
    backend
  File "/opt/project/ludwig/data/preprocessing.py", line 162, in preprocess_for_prediction
    backend=backend
  File "/opt/project/ludwig/data/preprocessing.py", line 1080, in build_dataset
    skip_save_processed_input
  File "/opt/project/ludwig/data/preprocessing.py", line 1225, in build_data
    skip_save_processed_input
  File "/opt/project/ludwig/features/audio_feature.py", line 343, in add_feature_data
    backend
  File "/opt/project/ludwig/features/audio_feature.py", line 174, in _process_in_memory
    merged_stats['var'] = calculate_var(merged_stats['sum'], merged_stats['sum2'], merged_stats['count'])
  File "/opt/project/ludwig/utils/audio_utils.py", line 240, in calculate_var
    return (sum2 - ((sum1 * sum1) / float(count))) / float(count - 1)
ZeroDivisionError: float division by zero

tests/integration_tests/test_server.py:193 (test_server_integration_with_audio[True])
500 != 200

Expected :200
Actual   :500
<Click to see difference>

single_record = True, csv_filename = 'EA8A7B56DE.csv'

    @pytest.mark.parametrize('single_record', [False, True])
    def test_server_integration_with_audio(single_record, csv_filename):
        # Audio Inputs
        audio_dest_folder = os.path.join(os.getcwd(), 'generated_audio')
    
        # Resnet encoder
        input_features = [
            audio_feature(
                folder=audio_dest_folder,
            ),
            text_feature(encoder='embed', min_len=1),
            numerical_feature(normalization='zscore')
        ]
        output_features = [
            category_feature(vocab_size=2),
            numerical_feature()
        ]
    
        rel_path = generate_data(input_features, output_features, csv_filename)
        model, output_dir = train_model(input_features, output_features,
                                        data_csv=rel_path)
    
        app = server(model)
        client = TestClient(app)
        response = client.get('/')
        assert response.status_code == 200
    
        response = client.post('/predict')
        # expect the HTTP 400 error code for this situation
        assert response.status_code == 400
        assert response.json() == ALL_FEATURES_PRESENT_ERROR
    
        data_df = read_csv(rel_path)
    
        if single_record:
            # Single record prediction
            first_entry = data_df.T.to_dict()[0]
            data, files = convert_to_form(first_entry)
            server_response = client.post('/predict', data=data, files=files)
>           assert server_response.status_code == 200
E           assert 500 == 200

../../tests/integration_tests/test_server.py:233: AssertionError

Environment (please complete the following information):

  • OS: [e.g. iOS] Ludwig Docker container
  • Version [e.g. 22]
  • Python version: 3.6.9
  • Ludwig version: 0.4-dev0

Additional context
As short-term work-around modified denominator to be max(1, count-1).

@jimthompson5802 jimthompson5802 added the bug Something isn't working label May 20, 2021
jimthompson5802 added a commit to jimthompson5802/ludwig that referenced this issue May 20, 2021
Why: To be addressed as part of long-term
update to the audio feature.
w4nderlust pushed a commit that referenced this issue Sep 26, 2021
…1326)

resolves the zerodivisionerror when calculating sample
variance in a batch of 1 audio input record.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant