Skip to content

Conversation

misrasaurabh1
Copy link
Contributor

Change Summary

📄 _display_error_loc() in pydantic/v1/error_wrappers.py

📈 Performance improved by 25% (0.25x faster)

⏱️ Runtime went down from 81.6 milliseconds to 65.2 milliseconds

Explanation and details

To optimize this code for better performance, we can avoid the generator expression in the join method by using a list comprehension directly. This small change reduces overhead.

string.join can't join generators because it does not know the length upfront, so it converts it to list internally and it ends up being slower than joining list comprehensions directly.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 13 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import pytest  # used for our unit tests
from pydantic.v1.error_wrappers import _display_error_loc

# unit tests

def test_basic_single_element_location():
    # Single element location
    error = {'loc': ['field1']}
    assert _display_error_loc(error) == 'field1'

def test_basic_multiple_elements_location():
    # Multiple elements location
    error = {'loc': ['field1', 'subfield2']}
    assert _display_error_loc(error) == 'field1 -> subfield2'

    error = {'loc': ['field1', 'subfield2', 'element3']}
    assert _display_error_loc(error) == 'field1 -> subfield2 -> element3'

def test_edge_empty_location_list():
    # Empty location list
    error = {'loc': []}
    assert _display_error_loc(error) == ''

def test_edge_non_string_elements():
    # Non-string elements in location list
    error = {'loc': [1, 2, 3]}
    assert _display_error_loc(error) == '1 -> 2 -> 3'

    error = {'loc': [None, True, 42.0]}
    assert _display_error_loc(error) == 'None -> True -> 42.0'

def test_error_handling_missing_loc_key():
    # Missing 'loc' key
    error = {'not_loc': ['field1']}
    with pytest.raises(KeyError):
        _display_error_loc(error)

def test_error_handling_non_iterable_loc_value():
    # Non-iterable 'loc' value
    error = {'loc': 'field1'}
    with pytest.raises(TypeError):
        _display_error_loc(error)

def test_performance_large_location_list():
    # Large location list
    error = {'loc': ['field' + str(i) for i in range(1000)]}
    expected_output = ' -> '.join('field' + str(i) for i in range(1000))
    assert _display_error_loc(error) == expected_output

def test_mixed_data_types_in_location_list():
    # Mixed data types in location list
    error = {'loc': ['field1', 2, 'subfield3', None]}
    assert _display_error_loc(error) == 'field1 -> 2 -> subfield3 -> None'

def test_special_characters_in_elements():
    # Special characters in elements
    error = {'loc': ['field1, 'sub@field2', '#element3']}
    assert _display_error_loc(error) == 'field1$ -> sub@field2 -> #element3'

def test_nested_structures():
    # Nested lists
    error = {'loc': [['field1', 'subfield2'], 'element3']}
    with pytest.raises(TypeError):
        _display_error_loc(error)

def test_unicode_characters():
    # Unicode characters
    error = {'loc': ['字段1', '子字段2']}
    assert _display_error_loc(error) == '字段1 -> 子字段2'

def test_immutable_types():
    # Immutable types (tuple)
    error = {'loc': ('field1', 'subfield2')}
    assert _display_error_loc(error) == 'field1 -> subfield2'

def test_large_scale_very_large_location_list():
    # Very large location list for performance and scalability
    error = {'loc': ['field' + str(i) for i in range(1000000)]}
    expected_output = ' -> '.join('field' + str(i) for i in range(1000000))
    assert _display_error_loc(error) == expected_output

🔘 (none found) − ⏪ Replay Tests

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

codeflash-ai bot and others added 2 commits June 6, 2024 03:21
Certainly! To optimize this code for better performance, we can avoid the generator expression in the `join` method by using a list comprehension directly. This small change reduces overhead. Here is the revised version.



In this specific snippet, the optimization gain would be minimal since list comprehensions and generator expressions are fast in general. However, it’s typically good practice to use list comprehensions for concrete sequences, and it ensures better readability and potentially slight performance improvements.
@github-actions github-actions bot added the relnotes-fix Used for bugfixes. label Jun 12, 2024
Copy link

codspeed-hq bot commented Jun 12, 2024

CodSpeed Performance Report

Merging #9653 will not alter performance

Comparing misrasaurabh1:codeflash/optimize-_display_error_loc-2024-06-06T03.21.31 (00aff67) with main (9dac684)

Summary

✅ 13 untouched benchmarks

@Olegt0rr
Copy link

@misrasaurabh1, how 25% was measured? CodSpeed prints untouched benchmarks only

@misrasaurabh1
Copy link
Contributor Author

Hi @Olegt0rr , the 25% gain is the speedup for only the _display_error_loc() function. It was measured on the inputs/tests we generated that are attached in the description of the PR. I also manually verified the gains with this quick benchmark I created.
image

Copy link
Contributor

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense that a list comprehension is going to be faster here! Increased space usage doesn't really matter in this case.

@sydney-runkle sydney-runkle added relnotes-performance Used for performance improvements. and removed relnotes-fix Used for bugfixes. labels Jun 18, 2024
@sydney-runkle sydney-runkle enabled auto-merge (squash) June 18, 2024 17:51
@sydney-runkle sydney-runkle merged commit 478ed07 into pydantic:main Jun 18, 2024
@sydney-runkle
Copy link
Contributor

Oops, this should have gone in v1.10.X-fixes.

@misrasaurabh1, let's skip opening any PRs against v1 files - we're not looking to spend time on performance improvements there.

@sydney-runkle
Copy link
Contributor

I already released our v1.10.17 patch, so this won't be included. Happy to consider other PRs that focus on V2 stuff though :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes-performance Used for performance improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants