Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply Uniq filter to remove duplicate issues #649

Merged
merged 1 commit into from Dec 17, 2021
Merged

Apply Uniq filter to remove duplicate issues #649

merged 1 commit into from Dec 17, 2021

Conversation

uberfuzzy
Copy link
Contributor

Hello. This simple patch applies a uniq! call right before the problems printer to remove duplicate issue objects from the array.

This should mitigate the noise from the duplicate issue problem self stacking coming from the .concat calls.

I setup a set stub repo showing this off, https://github.com/uberfuzzy/proofer-dupe-test

The cause is the places where new issues are being collected, and added to the global collectors using .concat

This is what is roughly what is happening from what I've been able to trace down with a lot of debug prints:

  • Issue A is found
  • Issue added to check object's issues array (issues now [A])
  • Issues array is concat'd onto the large problem array. (problems now [A])
  • Issue B is found.
  • Issue added to check object's issues array (issues now [A,B])
  • Issues array is concat'd onto the larger problems array (problems now [A,A,B])
  • Issue C is found.
  • Issue added to check object's issues array (issues now [A,B,C])
  • Issues array is concat'd onto the larger problems array (problems now [A,A,B,A,B,C])
  • ... You see where this is going and how it can quickly echo chamber amplify a handful of issues into an avalanche :)

You can see this in the "before" output output files in the test repo, compared to the patched output output.

This is happening in any place where .concat is used for collecting issues from one array into the larger array for later printing, in both internal page file-exist checks, and also in the local hash checker. In most places the individual checks internal array isnt being cleared reset between, or the concat is being run inside a loop, rather than after it.

I will say that this is only a mitigation patch. This does not fix the core problem of the arrays being amplified onto themselves. I will leave that cleanup to someone more comfortable with the code to make larger structure and logic changes. I did try my hand at this, but ended up breaking more tests than I was fixing, and settled on this quick win to at least dampen the noise.

removes duplicate issue objects added from self stacking .concat calls
@gjtorikian
Copy link
Owner

Thanks for this

@gjtorikian gjtorikian merged commit 593357c into gjtorikian:main Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants