New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update/improve get deployment elastic #276
Update/improve get deployment elastic #276
Conversation
- The aggregations query will use now the search method of the client. The queries will use the scan helper. The second one is good if we are receiving a lot of information (more than 10K). - Add explanation about chunks division in the get_deployment method of the elastic source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference I see in speed compared to the previous version is amazing. Well done.
One thing we need to address is inconsistency with other sources. I get completely different results running the same queries with Jenkins for example. But this is something that should be addressed in a separate future PR.
Maybe you and @cescgina can take a look at it together and identify any shared components/blocks in the code.
Thanks. Totally. I've tested now
We should take in count the following things:
|
About this, I worked under the assumption that if the user just passed |
At the same way of get_deployment, we are doing one query for each job so if we have 2000 jobs then we have 2000 queries. As we can't send all the jobs to ask information of them in one single query because it's too big and it returns error, I've used chunks like: RedHatCRE#276 and we have reduced the time of the queries: From: INFO cibyl.orchestrator Took 551.80s to query system osp_jenkins using source: 'elasticsearch' of type: 'elasticsearch' using method get_builds To: INFO cibyl.orchestrator Took 117.57s to query system osp_jenkins using source elasticsearch of type elasticsearch using method get_builds
At the same way of get_deployment, we are doing one query for each job so if we have 2000 jobs then we have 2000 queries. As we can't send all the jobs to ask information of them in one single query because it's too big and it returns error, I've used chunks like: #276 and we have reduced the time of the queries: From: INFO cibyl.orchestrator Took 551.80s to query system osp_jenkins using source: 'elasticsearch' of type: 'elasticsearch' using method get_builds To: INFO cibyl.orchestrator Took 117.57s to query system osp_jenkins using source elasticsearch of type elasticsearch using method get_builds
Yes, we're showing all the jobs even those with no builds. I've changed the behavior here too. |
I added another change to show all jobs. Let's review it and we can continue working on the other topics in a corespondent task and branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
GROUP BY
to take the information of the last build. The way to approach it is using aggregations so I've modified the query used.chunk_size_for_search
quantity will be the size of every sub list. e.g, If we have 2000 jobs we will have the following calls: 2000 / 600 = 3.33 -> 4 calls.append_get_specific_field
we'll get just the specific fields we require from the source field, equivalent toSELECT field
in SQL instead ofSELECT *
.