rpmautospec is very slow: grpc %changelog generated in 3 minutes #22

hroncok · 2023-11-01T11:07:01Z

It seems that the changelog at https://src.fedoraproject.org/rpms/grpc (specifically this commit which is the current rawhide branch) takes more than 3 minutes to generate on my modern machine.

$ time rpmautospec generate-changelog
...
real	3m17,616s
user	1m34,083s
sys	0m23,546s

The git log is not particularly complex or long. Since the conversion to rpmautospec and the creation of the changelog file, it has been linear. I believe this should take seconds, not minutes.

cc @musicinmybrain

The text was updated successfully, but these errors were encountered:

hroncok · 2023-11-01T11:07:15Z

FTR rpmautospec-0.3.5-1.fc37.noarch

musicinmybrain · 2023-11-01T11:47:25Z

It takes 20-25 seconds for me. I needed to disable the pager for this to work nicely:

$ time rpmautospec --no-pager generate-changelog
real    0m20.066s
user    0m13.784s
sys     0m6.203s

rpmautospec-0.3.5-1.fc38.noarch

That still seems like a lot for generating a changelog on a modern desktop. I’ll try profiling the script and see if I find anything interesting.

musicinmybrain · 2023-11-01T11:53:00Z

$ python3 -m cProfile /usr/bin/rpmautospec --no-pager generate-changelog
         237699 function calls (232471 primitive calls) in 19.851 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    110/1    0.000    0.000   19.851   19.851 {built-in method builtins.exec}
        1    0.000    0.000   19.851   19.851 rpmautospec:1(<module>)
        1    0.000    0.000   19.744   19.744 contextlib.py:78(inner)
        1    0.000    0.000   19.744   19.744 cli.py:131(main)
        1    0.000    0.000   19.741   19.741 changelog.py:54(main)
        1    0.000    0.000   19.741   19.741 changelog.py:48(produce_changelog)
        1    0.000    0.000   19.726   19.726 pkg_history.py:638(run)
        1    0.009    0.009   19.725   19.725 pkg_history.py:428(_run_on_history)
  614/205    0.001    0.000   19.720    0.096 {built-in method builtins.next}
      101    0.000    0.000   19.709    0.195 pkg_history.py:545(<listcomp>)
      303    0.002    0.000   19.707    0.065 pkg_history.py:185(release_number_visitor)
      102    0.025    0.000   19.705    0.193 pkg_history.py:166(_get_rpmverflags_for_commit)
      102    0.002    0.000   19.641    0.193 pkg_history.py:81(_get_rpmverflags)
      102    0.001    0.000   19.634    0.192 subprocess.py:506(run)
      102    0.001    0.000   19.570    0.192 subprocess.py:1165(communicate)
      102    0.004    0.000   19.569    0.192 subprocess.py:2055(_communicate)
      233    0.002    0.000   19.556    0.084 selectors.py:402(select)
      233   19.554    0.084   19.554    0.084 {method 'poll' of 'select.poll' objects}
       11    0.000    0.000    0.144    0.013 __init__.py:1(<module>)
[…]

So all of the time is going to calling this rpm command 102 times:

rpmautospec/rpmautospec/pkg_history.py

Lines 105 to 125 in 1a8eb8b

    
           rpm_cmd = [ 
        
               "rpm", 
        
               "--define", 
        
               "_invalid_encoding_terminates_build 0", 
        
               "--define", 
        
               autorelease_definition, 
        
               "--define", 
        
               "autochangelog %nil", 
        
               "--define", 
        
               f"__python /usr/bin/python{python_version}", 
        
               "--define", 
        
               f"python_sitelib /usr/lib/python{python_version}/site-packages", 
        
               "--define", 
        
               f"_sourcedir {path}", 
        
               "--define", 
        
               f"_builddir {path}", 
        
               "--qf", 
        
               query, 
        
               "--specfile", 
        
               f"{name}.spec", 
        
           ]

hroncok · 2023-11-01T11:53:53Z

Same for me, I tried to profile the Python code, but it spends all the time in rpm subprocess.

hroncok · 2023-11-01T11:57:28Z

@musicinmybrain Do you have Python macro memoization? That is Fedora 38+

If so, that would explain why my thing is slower. If the spec file is expanded 102 times, it probably spawns dozens of Python subprocesses each time on Fedora 37 :(

musicinmybrain · 2023-11-01T11:58:13Z

Yes, I’m testing on F38.

musicinmybrain · 2023-11-01T12:04:08Z

I see the same thing with the current main, 68c5057:

gh repo clone fedora-infra/rpmautospec
cd rpmautospec
python3 -m venv _e
. _e/bin/activate
time rpmautospec --no-pager generate-changelog ~/fedora/mine/grpc/

That makes it easier to investigate.

A typical rpm command is this:

[
    'rpm',
    '--define',
    '_invalid_encoding_terminates_build 0',
    '--define', 'autorelease(e:s:pb:n) E%{?-e*}_S%{?-s*}_P%{?-p:1}%{!?-p:0}_B%{?-b*}',
    '--define',
    'autochangelog %nil',
    '--define',
    '__python /usr/bin/python3.11',
    '--define',
    'python_sitelib /usr/lib/python3.11/site-packages',
    '--define',
    '_sourcedir /tmp/rpmautospec-h1q16w7p',
    '--define',
    '_builddir /tmp/rpmautospec-h1q16w7p',
    '--qf', '%|epoch?{%{epoch}:}:{}|%{version}\n%{release}\n',
    '--specfile',
    'grpc.spec'
]

…and it looks like the only thing changing from call to call is the temporary directory path. Maybe this is something that can be cached.

musicinmybrain · 2023-11-01T12:05:24Z

Oh, but the temporary directory is probably being manipulated and is the real input to the function. Hmm.

hroncok · 2023-11-01T12:11:00Z

I believe this wants to get an EVR for each commit. That is reasonable. And querying it by actually using rpm is probably the only correct way to do it (I don't know why it defines %__python to do it, but whatever).

I can think of one optimization. Only query the specfile up until the %prep section. That might not work and if it doesn't, fall back to full spec parsing. But if it does, it is significantly faster.

$ time rpm -q --specfile --queryformat='%|epoch?{%{epoch}:}:{}|%{version}\n%{release}\n' grpc.spec
1.48.4
1.fc37
...
real	0m0,532s
user	0m0,387s
sys	0m0,144s

$ time rpm -q --specfile --queryformat='%|epoch?{%{epoch}:}:{}|%{version}\n%{release}\n' grpc-cut-before-prep.spec
1.48.4
1.fc37
...
real	0m0,084s
user	0m0,055s
sys	0m0,030s

musicinmybrain · 2023-11-01T12:49:30Z

Just out of curiosity, I compared runtime of the rpm command in seconds (X axis) to spec file length in lines (Y axis) for packages for which I’m primary maintainer.

While there is some weak evidence of a probably-linear relationship in this dataset, mostly we see a bimodal distribution consisting of packages that use Python and those that don’t. This supports the idea that Python interpreter startups are dominating the evaluation time.

nphilipp · 2023-11-02T12:55:13Z

Thanks for reporting and investigating!

(I don't know why it defines %__python to do it, but whatever).

If I recall correctly, we set these two python macros as a workaround for packages using them where parsing broke if they aren’t set, in environments where the packages defining the macros aren’t installed – the Koji builders outside of the buildroot for example (at least at the time).

I can think of one optimization. Only query the specfile up until the %prep section. That might not work and if it doesn't, fall back to full spec parsing. But if it does, it is significantly faster.

I have no idea how much overhead cutting of the remainder is, let me check that (I guess in the grand scheme of things it’s negligible, but that’s just a guess).

nphilipp · 2023-11-02T14:25:58Z

OK, this is cool: Even seemingly unaffected spec files seem to benefit from parsing an abridged version. Namely, I tested this on current Rawhide versions of the grpc and skopeo spec files, the latter doesn’t have any reference to python in it, so shouldn’t be affected by the multitude of Python subprocesses being spawned. For gprc, the speed up was tremendous on my laptop (6.1 s vs. 27.5 s, i.e. 78.5 %), for skopeo it still saved about 1.5 seconds (14.9 s vs. 16.5 s, i.e. 10%).

Thanks for reporting this issue, for the analysis and the suggested optimization!

hroncok · 2023-11-02T14:40:23Z

Technically you only need to preserve as much of the specfile as is required to parse and query it. That are some tags like Name, Version, License, Summary etc. And the main package %description. As long as you cut it where %prep starts, you have a very high chance it's parseable, however not 100%, as I belive putting %prep before %description is valid (despite being crazy).
I belive the speedup in a general case is worth the risk of parsing the spec twice in very strange cases.

nphilipp · 2023-11-02T14:54:39Z

Hmm, I guess there would be a way to ensure (at least better) that all necessary %description sections are included, too, but in this case just failing and parsing the whole spec file is acceptable.

Attempt to parse an abridged version of the spec file first, i.e. one cut off at the `%prep` line, which should still contain everything needed to parse epoch, version and autorelease flags. Only if that fails, attempt to parse the full spec file. This speeds up things tremendously in the case of spec files using macros which run subprocesses, such as certain Python macros. Even spec files not affected by this are parsed (slightly) faster if shortened accordingly. Thanks to Miro Hrončok and Ben Beasley for analyzing the issue and suggesting this optimization. Fixes: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com>

Attempt to parse an abridged version of the spec file first, i.e. one cut off at the `%prep` line, which should still contain everything needed to parse epoch, version and autorelease flags. Only if that fails, attempt to parse the full spec file. This speeds up things tremendously in the case of spec files using macros which run subprocesses, such as certain Python macros. Even spec files not affected by this are parsed (slightly) faster if shortened accordingly. Thanks to Miro Hrončok and Ben Beasley for analyzing the issue and suggesting this optimization. Fixes: #22 Signed-off-by: Nils Philippsen <nils@redhat.com>

nphilipp · 2023-11-02T15:24:02Z

I’ll cherry-pick #23 into the 0.3.x branch and once I’ve sorted out build issues related to setuptools ⇒ poetry, I’ll submit a package update.

Attempt to parse an abridged version of the spec file first, i.e. one cut off at the `%prep` line, which should still contain everything needed to parse epoch, version and autorelease flags. Only if that fails, attempt to parse the full spec file. This speeds up things tremendously in the case of spec files using macros which run subprocesses, such as certain Python macros. Even spec files not affected by this are parsed (slightly) faster if shortened accordingly. Thanks to Miro Hrončok and Ben Beasley for analyzing the issue and suggesting this optimization. Fixes: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com> (cherry picked from commit af654c4)

Attempt to parse an abridged version of the spec file first, i.e. one cut off at the `%prep` line, which should still contain everything needed to parse epoch, version and autorelease flags. Only if that fails, attempt to parse the full spec file. This speeds up things tremendously in the case of spec files using macros which run subprocesses, such as certain Python macros. Even spec files not affected by this are parsed (slightly) faster if shortened accordingly. Thanks to Miro Hrončok and Ben Beasley for analyzing the issue and suggesting this optimization. Fixes: #22 Signed-off-by: Nils Philippsen <nils@redhat.com> (cherry picked from commit af654c4)

Related: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com>

Related: #22 Signed-off-by: Nils Philippsen <nils@redhat.com>

Related: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com>

Related: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com> (cherry picked from commit d2282f8)

Related: #22 Signed-off-by: Nils Philippsen <nils@redhat.com> (cherry picked from commit d2282f8)

nphilipp added performance Something is slow buildsystem Something affecting build systems labels Nov 2, 2023

nphilipp self-assigned this Nov 2, 2023

nphilipp mentioned this issue Nov 2, 2023

Optimize how spec files are parsed #23

Merged

nphilipp closed this as completed in #23 Nov 2, 2023

nphilipp mentioned this issue Nov 2, 2023

Optimize how spec files are parsed #24

Merged

nphilipp added a commit to nphilipp/rpmautospec that referenced this issue Nov 2, 2023

Don’t leave temporary files around, d’oh!

8b2364e

Related: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com>

nphilipp mentioned this issue Nov 2, 2023

Don’t leave temporary files around, d’oh! #25

Merged

nphilipp added a commit that referenced this issue Nov 2, 2023

Don’t leave temporary files around, d’oh!

d2282f8

Related: #22 Signed-off-by: Nils Philippsen <nils@redhat.com>

nphilipp added a commit to nphilipp/rpmautospec that referenced this issue Nov 2, 2023

Don’t leave temporary files around, d’oh!

1dedcbe

Related: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com>

nphilipp mentioned this issue Nov 2, 2023

Don’t leave temporary files around, d’oh! #26

Merged

nphilipp added a commit to nphilipp/rpmautospec that referenced this issue Nov 2, 2023

Don’t leave temporary files around, d’oh!

01c1966

Related: fedora-infra#22 Signed-off-by: Nils Philippsen <nils@redhat.com> (cherry picked from commit d2282f8)

nphilipp added a commit that referenced this issue Nov 2, 2023

Don’t leave temporary files around, d’oh!

9ecfbb4

Related: #22 Signed-off-by: Nils Philippsen <nils@redhat.com> (cherry picked from commit d2282f8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpmautospec is very slow: grpc %changelog generated in 3 minutes #22

rpmautospec is very slow: grpc %changelog generated in 3 minutes #22

hroncok commented Nov 1, 2023

hroncok commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023

hroncok commented Nov 1, 2023

hroncok commented Nov 1, 2023 •

edited

Loading

musicinmybrain commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023 •

edited

Loading

hroncok commented Nov 1, 2023 •

edited

Loading

musicinmybrain commented Nov 1, 2023 •

edited

Loading

nphilipp commented Nov 2, 2023

nphilipp commented Nov 2, 2023

hroncok commented Nov 2, 2023 •

edited

Loading

nphilipp commented Nov 2, 2023

nphilipp commented Nov 2, 2023

rpmautospec is very slow: grpc %changelog generated in 3 minutes #22

rpmautospec is very slow: grpc %changelog generated in 3 minutes #22

Comments

hroncok commented Nov 1, 2023

hroncok commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023

hroncok commented Nov 1, 2023

hroncok commented Nov 1, 2023 • edited Loading

musicinmybrain commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023

musicinmybrain commented Nov 1, 2023 • edited Loading

hroncok commented Nov 1, 2023 • edited Loading

musicinmybrain commented Nov 1, 2023 • edited Loading

nphilipp commented Nov 2, 2023

nphilipp commented Nov 2, 2023

hroncok commented Nov 2, 2023 • edited Loading

nphilipp commented Nov 2, 2023

nphilipp commented Nov 2, 2023

hroncok commented Nov 1, 2023 •

edited

Loading

musicinmybrain commented Nov 1, 2023 •

edited

Loading

hroncok commented Nov 1, 2023 •

edited

Loading

musicinmybrain commented Nov 1, 2023 •

edited

Loading

hroncok commented Nov 2, 2023 •

edited

Loading