-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up generation of (some) high cardinality data #286
Merged
cavokz
merged 3 commits into
elastic:main
from
cavokz:speed-up-high-cardinality-generation
May 6, 2024
Merged
Speed up generation of (some) high cardinality data #286
cavokz
merged 3 commits into
elastic:main
from
cavokz:speed-up-high-cardinality-generation
May 6, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cavokz
changed the title
Avoid superfluous recreation of
Speed up generation of high cardinality data
May 1, 2024
ip_address
values
cavokz
force-pushed
the
speed-up-high-cardinality-generation
branch
from
May 1, 2024 16:15
1f81774
to
1426c0f
Compare
Before: 347454004 function calls in 160.129 seconds Ordered by: cumulative time List reduced from 79 to 16 due to restriction <0.2> ncalls tottime percall cumtime percall filename:lineno(function) 100001 0.140 0.000 160.129 0.002 events_emitter.py:201(<genexpr>) 100000 0.232 0.000 159.730 0.002 events_emitter.py:114(events_from_branch) 200000 0.078 0.000 159.242 0.001 constraints.py:165(<genexpr>) 100000 0.182 0.000 159.164 0.002 constraints.py:131(solve) 300000 0.283 0.000 158.919 0.001 __init__.py:153(solve) 200000 0.293 0.000 158.595 0.001 __init__.py:231(solve_field) 200000 3.165 0.000 156.982 0.001 __init__.py:204(__call__) 5000 0.216 0.000 150.487 0.030 type_ip.py:111(solve) 5000 10.088 0.002 150.127 0.030 type_ip.py:113(<setcomp>) 12502500 6.865 0.000 128.083 0.000 ipaddress.py:28(ip_address) 12502500 14.778 0.000 121.218 0.000 ipaddress.py:1282(__init__) 12497500 13.032 0.000 102.872 0.000 ipaddress.py:1183(_ip_int_from_string) 12497500 14.944 0.000 84.416 0.000 {built-in method from_bytes} 49990000 54.192 0.000 69.471 0.000 ipaddress.py:1209(_parse_octet) 12502500 7.979 0.000 11.975 0.000 ipaddress.py:612(__hash__) 63587501 6.363 0.000 6.363 0.000 {built-in method builtins.len} After: 10021767 function calls in 10.091 seconds Ordered by: cumulative time List reduced from 74 to 15 due to restriction <0.2> ncalls tottime percall cumtime percall filename:lineno(function) 100001 0.128 0.000 10.090 0.000 events_emitter.py:201(<genexpr>) 100000 0.218 0.000 9.715 0.000 events_emitter.py:114(events_from_branch) 200000 0.077 0.000 9.258 0.000 constraints.py:165(<genexpr>) 100000 0.181 0.000 9.181 0.000 constraints.py:131(solve) 300000 0.267 0.000 8.942 0.000 __init__.py:153(solve) 200000 0.280 0.000 8.634 0.000 __init__.py:231(solve_field) 200000 2.651 0.000 7.079 0.000 __init__.py:204(__call__) 100000 0.187 0.000 2.767 0.000 type_keyword.py:91(solve) 100000 0.171 0.000 2.536 0.000 type_keyword.py:94(<listcomp>) 100000 0.151 0.000 1.363 0.000 solution_space.py:225(generate) 200000 0.331 0.000 1.275 0.000 __init__.py:53(emit_field) 5000 0.155 0.000 1.240 0.000 type_ip.py:111(solve) 100000 0.066 0.000 1.026 0.000 solution_space.py:228(<listcomp>) 100000 0.108 0.000 1.003 0.000 solution_space.py:96(__sub__) 5000 0.981 0.000 0.981 0.000 type_ip.py:113(<setcomp>)
Before: 11415936 function calls in 264.498 seconds Ordered by: cumulative time List reduced from 74 to 15 due to restriction <0.2> ncalls tottime percall cumtime percall filename:lineno(function) 100001 0.203 0.000 264.498 0.003 events_emitter.py:201(<genexpr>) 100000 0.332 0.000 263.981 0.003 events_emitter.py:114(events_from_branch) 200000 0.108 0.000 263.266 0.001 constraints.py:165(<genexpr>) 100000 0.265 0.000 263.157 0.003 constraints.py:131(solve) 300000 0.396 0.000 262.814 0.001 __init__.py:153(solve) 200000 0.416 0.000 262.364 0.001 __init__.py:231(solve_field) 200000 57.007 0.000 260.136 0.001 __init__.py:204(__call__) 50000 21.841 0.000 198.809 0.004 type_ip.py:111(solve) 50000 175.281 0.004 175.281 0.004 type_ip.py:113(<setcomp>) 100000 0.270 0.000 3.928 0.000 type_keyword.py:91(solve) 100000 0.279 0.000 3.603 0.000 type_keyword.py:94(<listcomp>) 100000 0.241 0.000 1.851 0.000 solution_space.py:225(generate) 200000 0.508 0.000 1.813 0.000 __init__.py:53(emit_field) 100000 0.169 0.000 1.472 0.000 solution_space.py:96(__sub__) 100000 0.094 0.000 1.357 0.000 solution_space.py:228(<listcomp>) After: 11265990 function calls in 111.346 seconds Ordered by: cumulative time List reduced from 73 to 15 due to restriction <0.2> ncalls tottime percall cumtime percall filename:lineno(function) 100001 0.157 0.000 111.346 0.001 events_emitter.py:201(<genexpr>) 100000 0.267 0.000 110.920 0.001 events_emitter.py:114(events_from_branch) 200000 0.084 0.000 110.352 0.001 constraints.py:165(<genexpr>) 100000 0.210 0.000 110.268 0.001 constraints.py:131(solve) 300000 0.316 0.000 109.997 0.000 __init__.py:153(solve) 200000 0.334 0.000 109.638 0.001 __init__.py:230(solve_field) 200000 87.999 0.000 107.824 0.001 __init__.py:204(__call__) 50000 15.159 0.000 16.250 0.000 type_ip.py:111(solve) 100000 0.174 0.000 3.165 0.000 type_keyword.py:91(solve) 100000 0.222 0.000 2.960 0.000 type_keyword.py:94(<listcomp>) 100000 0.140 0.000 1.819 0.000 solution_space.py:96(__sub__) 200000 0.401 0.000 1.480 0.000 __init__.py:53(emit_field) 100000 0.210 0.000 1.227 0.000 solution_space.py:163(__isub__) 100000 0.289 0.000 0.963 0.000 solution_space.py:33(__init__) 200000 0.383 0.000 0.955 0.000 __init__.py:273(split_path)
With this we get a bit slower but we are again reproducible (sets are unstable containers whereas dictionaries are stable). Before: 11265990 function calls in 111.346 seconds Ordered by: cumulative time List reduced from 73 to 15 due to restriction <0.2> ncalls tottime percall cumtime percall filename:lineno(function) 100001 0.157 0.000 111.346 0.001 events_emitter.py:201(<genexpr>) 100000 0.267 0.000 110.920 0.001 events_emitter.py:114(events_from_branch) 200000 0.084 0.000 110.352 0.001 constraints.py:165(<genexpr>) 100000 0.210 0.000 110.268 0.001 constraints.py:131(solve) 300000 0.316 0.000 109.997 0.000 __init__.py:153(solve) 200000 0.334 0.000 109.638 0.001 __init__.py:230(solve_field) 200000 87.999 0.000 107.824 0.001 __init__.py:204(__call__) 50000 15.159 0.000 16.250 0.000 type_ip.py:111(solve) 100000 0.174 0.000 3.165 0.000 type_keyword.py:91(solve) 100000 0.222 0.000 2.960 0.000 type_keyword.py:94(<listcomp>) 100000 0.140 0.000 1.819 0.000 solution_space.py:96(__sub__) 200000 0.401 0.000 1.480 0.000 __init__.py:53(emit_field) 100000 0.210 0.000 1.227 0.000 solution_space.py:163(__isub__) 100000 0.289 0.000 0.963 0.000 solution_space.py:33(__init__) 200000 0.383 0.000 0.955 0.000 __init__.py:273(split_path) After: 11214853 function calls in 138.156 seconds Ordered by: cumulative time List reduced from 72 to 14 due to restriction <0.2> ncalls tottime percall cumtime percall filename:lineno(function) 100001 0.210 0.000 138.156 0.001 events_emitter.py:201(<genexpr>) 100000 0.317 0.000 137.655 0.001 events_emitter.py:114(events_from_branch) 200000 0.107 0.000 136.985 0.001 constraints.py:165(<genexpr>) 100000 0.239 0.000 136.879 0.001 constraints.py:131(solve) 300000 0.367 0.000 136.565 0.000 __init__.py:153(solve) 200000 0.381 0.000 136.151 0.001 __init__.py:230(solve_field) 200000 61.247 0.000 134.103 0.001 __init__.py:204(__call__) 50000 67.569 0.001 68.958 0.001 type_ip.py:111(solve) 100000 0.190 0.000 3.515 0.000 type_keyword.py:91(solve) 100000 0.254 0.000 3.294 0.000 type_keyword.py:94(<listcomp>) 200000 0.451 0.000 1.668 0.000 __init__.py:53(emit_field) 100000 0.224 0.000 1.634 0.000 solution_space.py:225(generate) 100000 0.155 0.000 1.406 0.000 solution_space.py:96(__sub__) 100000 0.089 0.000 1.166 0.000 solution_space.py:228(<listcomp>)
cavokz
force-pushed
the
speed-up-high-cardinality-generation
branch
4 times, most recently
from
May 6, 2024 09:54
c5d0245
to
f8b9023
Compare
cavokz
changed the title
Speed up generation of high cardinality data
Speed up generation of (some) high cardinality data
May 6, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.