Update/improve get deployment elastic #276

adrianfusco · 2022-05-25T14:07:47Z

I wanted to do the SQL equivalent of GROUP BY to take the information of the last build. The way to approach it is using aggregations so I've modified the query used.
We have been using the scan helper to get the documents from Elasticsearch through a query. In this case it doesn't support the aggregations so in this case we'll have a condition to use the search method of the elastic client.
We can't send a giant query in the request to the Elasticsearch for asking to all the jobs information so we have been asking the information of the deployment for each job 1:1. Instead of doing one query for job we create a list of jobs sub lists and do calls divided by chunks. chunk_size_for_search quantity will be the size of every sub list. e.g, If we have 2000 jobs we will have the following calls: 2000 / 600 = 3.33 -> 4 calls.
With method append_get_specific_field we'll get just the specific fields we require from the source field, equivalent to SELECT field in SQL instead of SELECT *.

- The aggregations query will use now the search method of the client. The queries will use the scan helper. The second one is good if we are receiving a lot of information (more than 10K). - Add explanation about chunks division in the get_deployment method of the elastic source

cibyl/plugins/openstack/sources/elasticsearch.py

bregman-arie

The difference I see in speed compared to the previous version is amazing. Well done.
One thing we need to address is inconsistency with other sources. I get completely different results running the same queries with Jenkins for example. But this is something that should be addressed in a separate future PR.

Maybe you and @cescgina can take a look at it together and identify any shared components/blocks in the code.

adrianfusco · 2022-05-26T10:06:27Z

The difference I see in speed compared to the previous version is amazing. Well done. One thing we need to address is inconsistency with other sources. I get completely different results running the same queries with Jenkins for example. But this is something that should be addressed in a separate future PR.

Maybe you and @cescgina can take a look at it together and identify any shared components/blocks in the code.

Thanks. Totally. I've tested now clear; cibyl -vvv -d --source elasticsearch --jobs --topology and I see a good difference:

INFO     cibyl.orchestrator   Took 3.49s to query system osp_jenkins using source elasticsearch of type elasticsearch using method get_deployment

We should take in count the following things:

We're not showing all the information in Elasticsearch because we don't have stored all the information of all jobs and builds as Jenkins is doing.
Using the Jenkins source we are showing all the jobs. The jobs that don't have information associate we display: No openstack information associated with this job. Using the Elasticsearch source I just show the jobs that have information associated. What do we prefer here?
Using Jenkins we're displaying all the information related to each node and role of this one. I don't have this information in Elasticsearch so I can't show it.

cescgina · 2022-05-26T10:49:44Z

2. Using the Jenkins source we are showing all the jobs. The jobs that don't have information associate we display: No openstack information associated with this job. Using the Elasticsearch source I just show the jobs that have information associated. What do we prefer here?

About this, I worked under the assumption that if the user just passed --jobs --topology without specifying any value for them, then no filtering is done. How does it work for builds in elasticsearch @adrianfusco? If for example called cibyl with --jobs --builds would it show the jobs with no builds?

At the same way of get_deployment, we are doing one query for each job so if we have 2000 jobs then we have 2000 queries. As we can't send all the jobs to ask information of them in one single query because it's too big and it returns error, I've used chunks like: RedHatCRE#276 and we have reduced the time of the queries: From: INFO cibyl.orchestrator Took 551.80s to query system osp_jenkins using source: 'elasticsearch' of type: 'elasticsearch' using method get_builds To: INFO cibyl.orchestrator Took 117.57s to query system osp_jenkins using source elasticsearch of type elasticsearch using method get_builds

At the same way of get_deployment, we are doing one query for each job so if we have 2000 jobs then we have 2000 queries. As we can't send all the jobs to ask information of them in one single query because it's too big and it returns error, I've used chunks like: #276 and we have reduced the time of the queries: From: INFO cibyl.orchestrator Took 551.80s to query system osp_jenkins using source: 'elasticsearch' of type: 'elasticsearch' using method get_builds To: INFO cibyl.orchestrator Took 117.57s to query system osp_jenkins using source elasticsearch of type elasticsearch using method get_builds

adrianfusco · 2022-05-30T14:22:05Z

Using the Jenkins source we are showing all the jobs. The jobs that don't have information associate we display: No openstack information associated with this job. Using the Elasticsearch source I just show the jobs that have information associated. What do we prefer here?

About this, I worked under the assumption that if the user just passed --jobs --topology without specifying any value for them, then no filtering is done. How does it work for builds in elasticsearch @adrianfusco? If for example called cibyl with --jobs --builds would it show the jobs with no builds?

Yes, we're showing all the jobs even those with no builds. I've changed the behavior here too.

adrianfusco · 2022-05-30T15:26:33Z

The difference I see in speed compared to the previous version is amazing. Well done. One thing we need to address is inconsistency with other sources. I get completely different results running the same queries with Jenkins for example. But this is something that should be addressed in a separate future PR.

Maybe you and @cescgina can take a look at it together and identify any shared components/blocks in the code.

I added another change to show all jobs. Let's review it and we can continue working on the other topics in a corespondent task and branch.

cescgina

LGTM

adrianfusco added 2 commits May 25, 2022 07:12

Add filtering field

6a96f61

adrianfusco requested a review from a team May 25, 2022 14:07

adrianfusco marked this pull request as draft May 25, 2022 14:07

cescgina reviewed May 25, 2022

View reviewed changes

cibyl/plugins/openstack/sources/elasticsearch.py Outdated Show resolved Hide resolved

bregman-arie reviewed May 25, 2022

View reviewed changes

Delete unnecesary dvr_argument variable

b86be80

adrianfusco mentioned this pull request May 30, 2022

Improve query time get_builds #290

Merged

Show all jobs even those that doesn't have information

4a49d82

adrianfusco and others added 3 commits May 30, 2022 16:27

Fix pep8 whitespace before

3f9fa92

Add aggregations + query result query in tests

37c6eb1

Merge branch 'main' into update/improve-get-deployment-elastic

04b82fe

adrianfusco marked this pull request as ready for review May 30, 2022 15:25

cescgina approved these changes May 30, 2022

View reviewed changes

adrianfusco merged commit 84b8f7f into RedHatCRE:main May 30, 2022

adrianfusco deleted the update/improve-get-deployment-elastic branch May 30, 2022 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update/improve get deployment elastic #276

Update/improve get deployment elastic #276

adrianfusco commented May 25, 2022

bregman-arie left a comment

adrianfusco commented May 26, 2022

cescgina commented May 26, 2022

adrianfusco commented May 30, 2022

adrianfusco commented May 30, 2022

cescgina left a comment

Update/improve get deployment elastic #276

Update/improve get deployment elastic #276

Conversation

adrianfusco commented May 25, 2022

bregman-arie left a comment

Choose a reason for hiding this comment

adrianfusco commented May 26, 2022

cescgina commented May 26, 2022

adrianfusco commented May 30, 2022

adrianfusco commented May 30, 2022

cescgina left a comment

Choose a reason for hiding this comment