Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up generation of (some) high cardinality data #286

Merged
merged 3 commits into from
May 6, 2024

Conversation

cavokz
Copy link
Collaborator

@cavokz cavokz commented May 1, 2024

No description provided.

@cavokz cavokz changed the title Avoid superfluous recreation of ip_address values Speed up generation of high cardinality data May 1, 2024
@cavokz cavokz force-pushed the speed-up-high-cardinality-generation branch from 1f81774 to 1426c0f Compare May 1, 2024 16:15
cavokz added 3 commits May 3, 2024 15:25
Before:
         347454004 function calls in 160.129 seconds

   Ordered by: cumulative time
   List reduced from 79 to 16 due to restriction <0.2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100001    0.140    0.000  160.129    0.002 events_emitter.py:201(<genexpr>)
   100000    0.232    0.000  159.730    0.002 events_emitter.py:114(events_from_branch)
   200000    0.078    0.000  159.242    0.001 constraints.py:165(<genexpr>)
   100000    0.182    0.000  159.164    0.002 constraints.py:131(solve)
   300000    0.283    0.000  158.919    0.001 __init__.py:153(solve)
   200000    0.293    0.000  158.595    0.001 __init__.py:231(solve_field)
   200000    3.165    0.000  156.982    0.001 __init__.py:204(__call__)
     5000    0.216    0.000  150.487    0.030 type_ip.py:111(solve)
     5000   10.088    0.002  150.127    0.030 type_ip.py:113(<setcomp>)
 12502500    6.865    0.000  128.083    0.000 ipaddress.py:28(ip_address)
 12502500   14.778    0.000  121.218    0.000 ipaddress.py:1282(__init__)
 12497500   13.032    0.000  102.872    0.000 ipaddress.py:1183(_ip_int_from_string)
 12497500   14.944    0.000   84.416    0.000 {built-in method from_bytes}
 49990000   54.192    0.000   69.471    0.000 ipaddress.py:1209(_parse_octet)
 12502500    7.979    0.000   11.975    0.000 ipaddress.py:612(__hash__)
 63587501    6.363    0.000    6.363    0.000 {built-in method builtins.len}

After:
         10021767 function calls in 10.091 seconds

   Ordered by: cumulative time
   List reduced from 74 to 15 due to restriction <0.2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100001    0.128    0.000   10.090    0.000 events_emitter.py:201(<genexpr>)
   100000    0.218    0.000    9.715    0.000 events_emitter.py:114(events_from_branch)
   200000    0.077    0.000    9.258    0.000 constraints.py:165(<genexpr>)
   100000    0.181    0.000    9.181    0.000 constraints.py:131(solve)
   300000    0.267    0.000    8.942    0.000 __init__.py:153(solve)
   200000    0.280    0.000    8.634    0.000 __init__.py:231(solve_field)
   200000    2.651    0.000    7.079    0.000 __init__.py:204(__call__)
   100000    0.187    0.000    2.767    0.000 type_keyword.py:91(solve)
   100000    0.171    0.000    2.536    0.000 type_keyword.py:94(<listcomp>)
   100000    0.151    0.000    1.363    0.000 solution_space.py:225(generate)
   200000    0.331    0.000    1.275    0.000 __init__.py:53(emit_field)
     5000    0.155    0.000    1.240    0.000 type_ip.py:111(solve)
   100000    0.066    0.000    1.026    0.000 solution_space.py:228(<listcomp>)
   100000    0.108    0.000    1.003    0.000 solution_space.py:96(__sub__)
     5000    0.981    0.000    0.981    0.000 type_ip.py:113(<setcomp>)
Before:
         11415936 function calls in 264.498 seconds

   Ordered by: cumulative time
   List reduced from 74 to 15 due to restriction <0.2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100001    0.203    0.000  264.498    0.003 events_emitter.py:201(<genexpr>)
   100000    0.332    0.000  263.981    0.003 events_emitter.py:114(events_from_branch)
   200000    0.108    0.000  263.266    0.001 constraints.py:165(<genexpr>)
   100000    0.265    0.000  263.157    0.003 constraints.py:131(solve)
   300000    0.396    0.000  262.814    0.001 __init__.py:153(solve)
   200000    0.416    0.000  262.364    0.001 __init__.py:231(solve_field)
   200000   57.007    0.000  260.136    0.001 __init__.py:204(__call__)
    50000   21.841    0.000  198.809    0.004 type_ip.py:111(solve)
    50000  175.281    0.004  175.281    0.004 type_ip.py:113(<setcomp>)
   100000    0.270    0.000    3.928    0.000 type_keyword.py:91(solve)
   100000    0.279    0.000    3.603    0.000 type_keyword.py:94(<listcomp>)
   100000    0.241    0.000    1.851    0.000 solution_space.py:225(generate)
   200000    0.508    0.000    1.813    0.000 __init__.py:53(emit_field)
   100000    0.169    0.000    1.472    0.000 solution_space.py:96(__sub__)
   100000    0.094    0.000    1.357    0.000 solution_space.py:228(<listcomp>)

After:
         11265990 function calls in 111.346 seconds

   Ordered by: cumulative time
   List reduced from 73 to 15 due to restriction <0.2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100001    0.157    0.000  111.346    0.001 events_emitter.py:201(<genexpr>)
   100000    0.267    0.000  110.920    0.001 events_emitter.py:114(events_from_branch)
   200000    0.084    0.000  110.352    0.001 constraints.py:165(<genexpr>)
   100000    0.210    0.000  110.268    0.001 constraints.py:131(solve)
   300000    0.316    0.000  109.997    0.000 __init__.py:153(solve)
   200000    0.334    0.000  109.638    0.001 __init__.py:230(solve_field)
   200000   87.999    0.000  107.824    0.001 __init__.py:204(__call__)
    50000   15.159    0.000   16.250    0.000 type_ip.py:111(solve)
   100000    0.174    0.000    3.165    0.000 type_keyword.py:91(solve)
   100000    0.222    0.000    2.960    0.000 type_keyword.py:94(<listcomp>)
   100000    0.140    0.000    1.819    0.000 solution_space.py:96(__sub__)
   200000    0.401    0.000    1.480    0.000 __init__.py:53(emit_field)
   100000    0.210    0.000    1.227    0.000 solution_space.py:163(__isub__)
   100000    0.289    0.000    0.963    0.000 solution_space.py:33(__init__)
   200000    0.383    0.000    0.955    0.000 __init__.py:273(split_path)
With this we get a bit slower but we are again reproducible (sets are
unstable containers whereas dictionaries are stable).

Before:
         11265990 function calls in 111.346 seconds

   Ordered by: cumulative time
   List reduced from 73 to 15 due to restriction <0.2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100001    0.157    0.000  111.346    0.001 events_emitter.py:201(<genexpr>)
   100000    0.267    0.000  110.920    0.001 events_emitter.py:114(events_from_branch)
   200000    0.084    0.000  110.352    0.001 constraints.py:165(<genexpr>)
   100000    0.210    0.000  110.268    0.001 constraints.py:131(solve)
   300000    0.316    0.000  109.997    0.000 __init__.py:153(solve)
   200000    0.334    0.000  109.638    0.001 __init__.py:230(solve_field)
   200000   87.999    0.000  107.824    0.001 __init__.py:204(__call__)
    50000   15.159    0.000   16.250    0.000 type_ip.py:111(solve)
   100000    0.174    0.000    3.165    0.000 type_keyword.py:91(solve)
   100000    0.222    0.000    2.960    0.000 type_keyword.py:94(<listcomp>)
   100000    0.140    0.000    1.819    0.000 solution_space.py:96(__sub__)
   200000    0.401    0.000    1.480    0.000 __init__.py:53(emit_field)
   100000    0.210    0.000    1.227    0.000 solution_space.py:163(__isub__)
   100000    0.289    0.000    0.963    0.000 solution_space.py:33(__init__)
   200000    0.383    0.000    0.955    0.000 __init__.py:273(split_path)

After:
         11214853 function calls in 138.156 seconds

   Ordered by: cumulative time
   List reduced from 72 to 14 due to restriction <0.2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100001    0.210    0.000  138.156    0.001 events_emitter.py:201(<genexpr>)
   100000    0.317    0.000  137.655    0.001 events_emitter.py:114(events_from_branch)
   200000    0.107    0.000  136.985    0.001 constraints.py:165(<genexpr>)
   100000    0.239    0.000  136.879    0.001 constraints.py:131(solve)
   300000    0.367    0.000  136.565    0.000 __init__.py:153(solve)
   200000    0.381    0.000  136.151    0.001 __init__.py:230(solve_field)
   200000   61.247    0.000  134.103    0.001 __init__.py:204(__call__)
    50000   67.569    0.001   68.958    0.001 type_ip.py:111(solve)
   100000    0.190    0.000    3.515    0.000 type_keyword.py:91(solve)
   100000    0.254    0.000    3.294    0.000 type_keyword.py:94(<listcomp>)
   200000    0.451    0.000    1.668    0.000 __init__.py:53(emit_field)
   100000    0.224    0.000    1.634    0.000 solution_space.py:225(generate)
   100000    0.155    0.000    1.406    0.000 solution_space.py:96(__sub__)
   100000    0.089    0.000    1.166    0.000 solution_space.py:228(<listcomp>)
@cavokz cavokz force-pushed the speed-up-high-cardinality-generation branch 4 times, most recently from c5d0245 to f8b9023 Compare May 6, 2024 09:54
@cavokz cavokz changed the title Speed up generation of high cardinality data Speed up generation of (some) high cardinality data May 6, 2024
@cavokz cavokz merged commit 967f945 into elastic:main May 6, 2024
71 checks passed
@cavokz cavokz deleted the speed-up-high-cardinality-generation branch May 6, 2024 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant