Speed up upstream tests #490

Hritik14 · 2021-06-25T00:16:23Z

Fixes: #437

Earlier, one batch of advisories was requested from updated_advisories
method of the respective importers. This was inefficient as not all
importers respect batching internally. Eventually, we wish to eliminate
batches as well ( #338 ).
Now, the updated_advisories method of each importer is expected to
create at least one Advisory object. If it does so, the importer is
marked working.
This brings major performance improvement. It is a necessity to improve
this test as GitHub only allows 6 hrs of workflow time.
Before: ~6hrs, now ~9 minutes

Internally, the difference between both has faded and updated_advisories is preferred. Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>

Earlier, one batch of advisories was requested from updated_advisories method of the respective importers. This was inefficient as not all importers respect batching internally. Eventually, we wish to eliminate batches as well ( # 338 ). Now, the updated_advisories method of each importer is expected to create at least one Advisory object. If it does so, the importer is marked working. This brings major performance improvement. It is a necessity to improve this test as GitHub only allows 6 hrs of workflow time. Before: ~6hrs, now ~9 minutes Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>

sbs2001 · 2021-06-26T15:51:29Z

@Hritik14

About the comment in PR :

Earlier, one batch of advisories was requested from updated_advisories
method of the respective importers.

No, it fetches everything https://github.com/nexB/vulnerablecode/blob/d94f4f6aefa24ad1e3cce869a59da92a8c6abb75/vulnerabilities/tests/test_upstream.py#L17 .

If you're talking about https://github.com/nexB/vulnerablecode/blob/d94f4f6aefa24ad1e3cce869a59da92a8c6abb75/vulnerabilities/tests/test_upstream.py#L15 , then the batch_size=1 here means "make me batches , I don't care how many, but I want each batch to contain only 1 advisory object". Now it's another problem(less important) that some(most?) importers don't respect that.

About the PR

The refactoring of importers of to eliminate added/updated_advisories -> LGTM
The tests -> Checking if importer gives atleast 1 advisory -> good idea . But apart from that some flaws ->

a. These tests are for checking whether our code is able to parse and process upstream data. The upstream data can change any time, so we always need check everything. Checking/Processing just one advisory IMHO defeats the whole purpose.

b. The changed tests are super cryptic (I doubt I still understand it).

Hritik14 · 2021-06-26T16:49:54Z

@sbs2001

make me batches , I don't care how many, but I want each batch to contain only 1 advisory object

Is there any rationale behind using batches of size 1 here in that case ?

No, it fetches everything

I seem to have misinterpreted the batch usage here. I will fix the commit message once other problems are resolved. Regardless, as we are moving towards non batched processing, the usage of batch should be minimized (if not eliminated) in future code.

a. These tests are for checking whether our code is able to parse and process upstream data. The upstream data can change any time, so we always need check everything. Checking/Processing just one advisory IMHO defeats the whole purpose.

The earlier implementation only made sure that the updated_advisory method completes successfully. Now, imo if the format of upstream data ever changes, creating even one advisory would fail resulting in entire test failure. The target of updated_advisories method is to create Advisory objects and return them. If it does so in the usual way, the time requirement is huge. With the number of importers right now, it already takes over 6 hours, after adding others it will take a lot more.

The changed tests are super cryptic (I doubt I still understand it).

A bird eye view would be that the Advisory class in each of the importers is patched to keep a count of number of advisories created by that importer. After a certain count of Advisories have been created (here 1), the mocked function raises an interrupt that everything is ok. Here there could be one improvement that before raising the ok interrupt, we actually try to create a real Advisory object with the given parameters.

pombredanne

I hate the monkey patching but this makes full sense in the current state of the codebase and the lack of consistency on the importers side. Therefore this is a go IMHO.

Hritik14 added 2 commits June 25, 2021 05:49

Remove added_advisories for updated_advisories

a530627

Internally, the difference between both has faded and updated_advisories is preferred. Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>

Hritik14 force-pushed the velocity-test_upstream branch from 899101c to 6b29388 Compare June 25, 2021 00:19

Merge branch 'main' into velocity-test_upstream

f6c1635

pombredanne approved these changes Jul 15, 2021

View reviewed changes

Hritik14 merged commit 225fcd9 into aboutcode-org:main Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up upstream tests #490

Speed up upstream tests #490

Hritik14 commented Jun 25, 2021

sbs2001 commented Jun 26, 2021

Hritik14 commented Jun 26, 2021

pombredanne left a comment

Speed up upstream tests #490

Speed up upstream tests #490

Conversation

Hritik14 commented Jun 25, 2021

sbs2001 commented Jun 26, 2021

Hritik14 commented Jun 26, 2021

pombredanne left a comment

Choose a reason for hiding this comment