Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance is slow for large installations #272

Closed
TomNewChao opened this issue Aug 3, 2023 · 18 comments
Closed

Performance is slow for large installations #272

TomNewChao opened this issue Aug 3, 2023 · 18 comments

Comments

@TomNewChao
Copy link

TomNewChao commented Aug 3, 2023

Hello,
We have more than 10000+ jobs in jenkins, The git trigger the job by using generic-webhook-trigger-plugin that will spend 2+ minutes. When I parse the code, I found that the time is consumed in the following code:
image
image

I'm wondering if it's possible to get the corresponding job based on the content of the request,I tried to modify the code for testing, using a job with a specified path to trigger, and found that the efficiency increased significantly,So,The JobFinder that Find the corresponding job according to the request content to trigger the task by request body is required.

@TomNewChao
Copy link
Author

I suggest:
1.It is recommended to use the radio button to select the jobfinder.
2.The jobfinder configure the path of the corresponding request content to obtain job related information to trigger.

I am very interested in contributing my pr with your permission.

@tomasbjerre
Copy link
Contributor

You are very welcome to fork the repo and open a pull-request.

@TomNewChao
Copy link
Author

You are very welcome to fork the repo and open a pull-request.

Thanks for your reply, I think it's feature development,Need a lot of PR to solve this issue,Would you help me to create a branch named feature/issue-272 in this repository?

@tomasbjerre
Copy link
Contributor

Only contributions from forks are allowed.

If you want to make a lot of changes it might be better if you keep your feature in your fork without merging to this main repo. To avoid causing problems for the 30k+ installations that are using it.

@TomNewChao
Copy link
Author

I see,ok,I will develop this issue on my forks repo.

@tomasbjerre
Copy link
Contributor

@TomNewChao
Copy link
Author

I pushed an alternative solution here. Adding a loading cache to JobFinder: https://github.com/jenkinsci/generic-webhook-trigger-plugin/pull/274/files#diff-6cc9e047aa25546ac7f7dcce5804eb64149b98bb21b7649d477dd05b99e611a1

Thanks for your commit, But Adding a loading cache to JobFinder may not work in 10000+ jobs, Because they may not repeatedly trigger the same work within the cache time, it may be better to find the corresponding job based on the content of the request body。

@tomasbjerre
Copy link
Contributor

If you use token in your jobs, all those jobs will invoke Jenkins.getInstance().getAllItems(ParameterizedJob.class); exactly the same way.

I made an alternative with the cache config on the global config page. I think that might be better so that users don't need to change their webhook url:s. #275

@tomasbjerre
Copy link
Contributor

I released 1.87.0 with the caching feature.

@TomNewChao
Copy link
Author

If you use token in your jobs, all those jobs will invoke Jenkins.getInstance().getAllItems(ParameterizedJob.class); exactly the same way.

I made an alternative with the cache config on the global config page. I think that might be better so that users don't need to change their webhook url:s. #275

I'm sorry, You didn't get me, I won't use token in my job, the specise job will invoke "Jenkins.getInstance().get().getItemByFullName(fullName, ParameterizedJobMixIn.ParameterizedJob.class);"
to get job, and The fullname is concatenated according to the content and configuration of the request body, Using cache will cause inconsistency between original data and cached data, when the original data is modified, So I still think my approach is the best, even though your code has been merged into。

@tomasbjerre
Copy link
Contributor

So a solution might be to start using a token and enable the cache? You can use the exact same token i all 10000 jobs.

@TomNewChao
Copy link
Author

no, I think it is good idea to use my idea that use the specise job, and Using the cache layer will cause waste of memory resources, data inconsistency, etc.

@TomNewChao
Copy link
Author

TomNewChao commented Aug 21, 2023

my idea is use the specise job, and more detail:

  1. use the config path express and the request body to generate a jenkins path that named fullname.
  2. use fullname and Jenkins.getInstance().get().getItemByFullName(fullName, ParameterizedJobMixIn.ParameterizedJob.class) to get job.
  3. filter by "Renderer.isMatching and Renderer.renderText" (available now)
  4. trigger there job

@tomasbjerre
Copy link
Contributor

How much memory does the cache allocate in your case?

@TomNewChao
Copy link
Author

How much memory does the cache allocate in your case?

In my case, A path expression find one ParameterizedJob at most, when i print the ParameterizedJob, it allocate 128 bytes.
image

@TomNewChao
Copy link
Author

TomNewChao commented Aug 22, 2023

When I reviewed your code again, I found that our solution may be different. You use the cache to store all the List ParameterizedJob, and then filter out the qualified ParameterizedJob according to the token. The time complexity of this is o(n); My solution is to use the path to find the corresponding job, and the time complexity is o(1); this is not mutually exclusive, it can completely coexist.

@tomasbjerre tomasbjerre changed the title Optional jobfinder is required Performance is slow for large installations Aug 22, 2023
@tomasbjerre
Copy link
Contributor

If one job needs 128 bytes, I do not se a problem with "waste of memory resources".

By "data inconsistency" you probably mean that any changes to the configuration will be delayed by the cache. I dont see a problem with that as the configurations are (in my experience) rarely changed.

As I stated in your first PR:

As I see it a PR is really just a way of asking for free maintenance. There is nothing stopping you from adjusting the code to your needs and use that in you Jenkins. But for me to maintain it, it has to be as simple as possible. If a complex feature only helps a few users I would rather not merge it.

So I dont want another solution here, the caching is the solution until I see some convincing motivation to why it is not.

@TomNewChao
Copy link
Author

If one job needs 128 bytes, I do not se a problem with "waste of memory resources".

By "data inconsistency" you probably mean that any changes to the configuration will be delayed by the cache. I dont see a problem with that as the configurations are (in my experience) rarely changed.

As I stated in your first PR:

As I see it a PR is really just a way of asking for free maintenance. There is nothing stopping you from adjusting the code to your needs and use that in you Jenkins. But for me to maintain it, it has to be as simple as possible. If a complex feature only helps a few users I would rather not merge it.

So I dont want another solution here, the caching is the solution until I see some convincing motivation to why it is not.

Yeah, You are right, But when I used the cache job finder for performance comparison in my development, I found that the cache finder had no effect on reducing the interface time-consuming time. I felt that the cache layer had no effect. i don't know why. The test scenario: jenkins version [Jenkins 2.361.4] and set Cache Get Jobs Minutes is 5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants