Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up upstream tests #490

Merged
merged 3 commits into from
Jul 15, 2021
Merged

Conversation

Hritik14
Copy link
Collaborator

Fixes: #437

Earlier, one batch of advisories was requested from updated_advisories
method of the respective importers. This was inefficient as not all
importers respect batching internally. Eventually, we wish to eliminate
batches as well ( #338 ).
Now, the updated_advisories method of each importer is expected to
create at least one Advisory object. If it does so, the importer is
marked working.
This brings major performance improvement. It is a necessity to improve
this test as GitHub only allows 6 hrs of workflow time.
Before: ~6hrs, now ~9 minutes

Internally, the difference between both has faded and updated_advisories
is preferred.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Earlier, one batch of advisories was requested from updated_advisories
method of the respective importers. This was inefficient as not all
importers respect batching internally. Eventually, we wish to eliminate
batches as well ( # 338 ).
Now, the updated_advisories method of each importer is expected to
create at least one Advisory object. If it does so, the importer is
marked working.
This brings major performance improvement. It is a necessity to improve
this test as GitHub only allows 6 hrs of workflow time.
Before: ~6hrs, now ~9 minutes

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@sbs2001
Copy link
Collaborator

sbs2001 commented Jun 26, 2021

@Hritik14

About the comment in PR :

Earlier, one batch of advisories was requested from updated_advisories
method of the respective importers. 

No, it fetches everything https://github.com/nexB/vulnerablecode/blob/d94f4f6aefa24ad1e3cce869a59da92a8c6abb75/vulnerabilities/tests/test_upstream.py#L17 .

If you're talking about https://github.com/nexB/vulnerablecode/blob/d94f4f6aefa24ad1e3cce869a59da92a8c6abb75/vulnerabilities/tests/test_upstream.py#L15 , then the batch_size=1 here means "make me batches , I don't care how many, but I want each batch to contain only 1 advisory object". Now it's another problem(less important) that some(most?) importers don't respect that.

About the PR

  1. The refactoring of importers of to eliminate added/updated_advisories -> LGTM
  2. The tests -> Checking if importer gives atleast 1 advisory -> good idea . But apart from that some flaws ->

a. These tests are for checking whether our code is able to parse and process upstream data. The upstream data can change any time, so we always need check everything. Checking/Processing just one advisory IMHO defeats the whole purpose.

b. The changed tests are super cryptic (I doubt I still understand it).

@Hritik14
Copy link
Collaborator Author

@sbs2001

make me batches , I don't care how many, but I want each batch to contain only 1 advisory object

Is there any rationale behind using batches of size 1 here in that case ?

No, it fetches everything

I seem to have misinterpreted the batch usage here. I will fix the commit message once other problems are resolved. Regardless, as we are moving towards non batched processing, the usage of batch should be minimized (if not eliminated) in future code.

a. These tests are for checking whether our code is able to parse and process upstream data. The upstream data can change any time, so we always need check everything. Checking/Processing just one advisory IMHO defeats the whole purpose.

The earlier implementation only made sure that the updated_advisory method completes successfully. Now, imo if the format of upstream data ever changes, creating even one advisory would fail resulting in entire test failure. The target of updated_advisories method is to create Advisory objects and return them. If it does so in the usual way, the time requirement is huge. With the number of importers right now, it already takes over 6 hours, after adding others it will take a lot more.

The changed tests are super cryptic (I doubt I still understand it).

A bird eye view would be that the Advisory class in each of the importers is patched to keep a count of number of advisories created by that importer. After a certain count of Advisories have been created (here 1), the mocked function raises an interrupt that everything is ok. Here there could be one improvement that before raising the ok interrupt, we actually try to create a real Advisory object with the given parameters.

Copy link
Collaborator

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate the monkey patching but this makes full sense in the current state of the codebase and the lack of consistency on the importers side. Therefore this is a go IMHO.

@Hritik14 Hritik14 merged commit 225fcd9 into aboutcode-org:main Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_upstream.py taking too long
3 participants