Skip to content

Conversation

misrasaurabh1
Copy link
Contributor

Change Summary

📄 to_snake() in pydantic/alias_generators.py

📈 Performance improved by 27% (0.27x faster)

⏱️ Runtime went down from 472 microseconds to 371 microseconds

Explanation and details

To optimize the function, we can reduce the number of calls to re.sub and string replacement operations by performing all transformations in a single pass. Here's an optimized version of the to_snake function.

Explanation of Optimizations.

  1. Single Regular Expression: The re.sub function now takes a compiled regular expression that matches all the cases we care about for the transformations, including hyphens. This reduces the multiple passes over the string to a single pass.
  2. Single Pass Replacement: By combining the conditions into a single regular expression, we achieve all transformations in one pass, leading to improved performance.

This approach reduces the overhead of multiple function calls and iterations over the input string, making the function more efficient.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

✅ 180 Passed − ⚙️ Existing Unit Tests

(click to show existing tests)
- test_utils.py

✅ 60 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import re

import pytest  # used for our unit tests
from pydantic.alias_generators import to_snake


# unit tests
def test_basic_conversions():
    # Simple camelCase
    assert to_snake("camelCase") == "camel_case"
    # Simple PascalCase
    assert to_snake("PascalCase") == "pascal_case"
    # Simple kebab-case
    assert to_snake("kebab-case") == "kebab_case"

def test_mixed_case_with_numbers():
    # camelCase with numbers
    assert to_snake("camelCase123") == "camel_case_123"
    # PascalCase with numbers
    assert to_snake("PascalCase123") == "pascal_case_123"
    # kebab-case with numbers
    assert to_snake("kebab-case-123") == "kebab_case_123"

def test_all_uppercase_acronyms():
    # PascalCase with acronyms
    assert to_snake("HTTPResponse") == "http_response"
    # camelCase with acronyms
    assert to_snake("httpResponse") == "http_response"

def test_multiple_consecutive_uppercase_letters():
    # PascalCase with multiple uppercase sequences
    assert to_snake("XMLHTTPRequest") == "xml_http_request"
    # camelCase with multiple uppercase sequences
    assert to_snake("xmlHTTPRequest") == "xml_http_request"

def test_edge_cases():
    # Single character input
    assert to_snake("a") == "a"
    assert to_snake("A") == "a"
    # All lowercase input
    assert to_snake("lowercase") == "lowercase"
    # All uppercase input
    assert to_snake("UPPERCASE") == "uppercase"
    # Empty string
    assert to_snake("") == ""
    # String with no letters
    assert to_snake("123") == "123"

def test_special_characters():
    # Input with special characters (should remain unchanged except for hyphens)
    assert to_snake("special!@#") == "special!@#"
    assert to_snake("kebab-case!@#") == "kebab_case!@#"

def test_hyphenated_words():
    # Multiple hyphens
    assert to_snake("multi-hyphen-case") == "multi_hyphen_case"
    # Leading and trailing hyphens
    assert to_snake("-leading-hyphen") == "_leading_hyphen"
    assert to_snake("trailing-hyphen-") == "trailing_hyphen_"

def test_mixed_case_with_special_characters():
    # PascalCase with special characters
    assert to_snake("PascalCase!@#") == "pascal_case!@#"
    # camelCase with special characters
    assert to_snake("camelCase!@#") == "camel_case!@#"
    # kebab-case with special characters
    assert to_snake("kebab-case!@#") == "kebab_case!@#"

def test_large_scale_test_cases():
    # Very long camelCase string
    assert to_snake("aVeryLongCamelCaseStringWith123Numbers") == "a_very_long_camel_case_string_with_123_numbers"
    # Very long PascalCase string
    assert to_snake("AVeryLongPascalCaseStringWith123Numbers") == "a_very_long_pascal_case_string_with_123_numbers"
    # Very long kebab-case string
    assert to_snake("a-very-long-kebab-case-string-with-123-numbers") == "a_very_long_kebab_case_string_with_123_numbers"

def test_complex_mixed_cases():
    # Complex mixed case with numbers and special characters
    assert to_snake("ComplexMixedCase123!@#") == "complex_mixed_case_123!@#"
    assert to_snake("complexMixedCase123!@#") == "complex_mixed_case_123!@#"
    assert to_snake("complex-mixed-case-123!@#") == "complex_mixed_case_123!@#"

def test_non_standard_characters():
    # Input with Unicode characters
    assert to_snake("PascalCaseÜnicode") == "pascal_case_ünicode"
    assert to_snake("camelCaseÜnicode") == "camel_case_ünicode"
    assert to_snake("kebab-case-ünicode") == "kebab_case_ünicode"

def test_mixed_separators():
    # Input with mixed separators (spaces, underscores, hyphens)
    assert to_snake("Pascal_Case-With Spaces") == "pascal_case_with_spaces"
    assert to_snake("camel_Case-With Spaces") == "camel_case_with_spaces"
    assert to_snake("kebab-case_with Spaces") == "kebab_case_with_spaces"

def test_leading_and_trailing_whitespace():
    # Input with leading and trailing whitespace
    assert to_snake(" PascalCase ") == " pascal_case "
    assert to_snake(" camelCase ") == " camel_case "
    assert to_snake(" kebab-case ") == " kebab_case "

def test_multiple_consecutive_separators():
    # Input with multiple consecutive separators
    assert to_snake("Pascal--Case") == "pascal__case"
    assert to_snake("camel--Case") == "camel__case"
    assert to_snake("kebab--case") == "kebab__case"

def test_empty_substrings():
    # Input with empty substrings between separators
    assert to_snake("Pascal--Case") == "pascal__case"
    assert to_snake("camel--Case") == "camel__case"
    assert to_snake("kebab--case") == "kebab__case"

def test_non_standard_casing():
    # Input with non-standard casing patterns
    assert to_snake("PascalCASE") == "pascal_case"
    assert to_snake("camelCASE") == "camel_case"
    assert to_snake("kebab-CASE") == "kebab_case"

def test_numbers_and_special_characters_mixed():
    # Input with numbers and special characters mixed within words
    assert to_snake("Pascal123Case!@#") == "pascal_123_case!@#"
    assert to_snake("camel123Case!@#") == "camel_123_case!@#"
    assert to_snake("kebab-123-case!@#") == "kebab_123_case!@#"

def test_single_character_separators():
    # Input with single character separators
    assert to_snake("Pascal-Case") == "pascal_case"
    assert to_snake("camel-Case") == "camel_case"
    assert to_snake("kebab-Case") == "kebab_case"

def test_repeated_characters():
    # Input with repeated characters
    assert to_snake("PascalCCase") == "pascal_c_case"
    assert to_snake("camelCCase") == "camel_c_case"
    assert to_snake("kebab-CCase") == "kebab_c_case"

def test_long_sequences_of_uppercase_letters():
    # Input with long sequences of uppercase letters
    assert to_snake("PascalCASEWithLongSequence") == "pascal_case_with_long_sequence"
    assert to_snake("camelCASEWithLongSequence") == "camel_case_with_long_sequence"
    assert to_snake("kebab-CASE-With-Long-Sequence") == "kebab_case_with_long_sequence"

🔘 (none found) − ⏪ Replay Tests

This optimization was automatically discovered by codeflash.ai

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

codeflash-ai bot and others added 3 commits June 22, 2024 00:37
To optimize the function, we can reduce the number of calls to `re.sub` and string replacement operations by performing all transformations in a single pass. Here's an optimized version of the `to_snake` function.



### Explanation of Optimizations.
1. **Single Regular Expression**: The `re.sub` function now takes a compiled regular expression that matches all the cases we care about for the transformations, including hyphens. This reduces the multiple passes over the string to a single pass.
2. **Single Pass Replacement**: By combining the conditions into a single regular expression, we achieve all transformations in one pass, leading to improved performance.

This approach reduces the overhead of multiple function calls and iterations over the input string, making the function more efficient.
@github-actions github-actions bot added the relnotes-fix Used for bugfixes. label Jun 25, 2024
Copy link

codspeed-hq bot commented Jun 25, 2024

CodSpeed Performance Report

Merging #9747 will not alter performance

Comparing misrasaurabh1:codeflash/optimize-to_snake-2024-06-22T00.37.06 (cb8628e) with main (df7340d)

Summary

✅ 13 untouched benchmarks

@sydney-runkle sydney-runkle added relnotes-performance Used for performance improvements. and removed relnotes-fix Used for bugfixes. labels Jun 25, 2024
@sydney-runkle sydney-runkle enabled auto-merge (squash) June 25, 2024 13:10
@sydney-runkle sydney-runkle merged commit 5694da3 into pydantic:main Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes-performance Used for performance improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants