Skip to content

Optimize extractLinksFromPage by splitting words instead of lines#414

Merged
NGTmeaty merged 4 commits intomainfrom
scanner-split-by-word
Aug 5, 2025
Merged

Optimize extractLinksFromPage by splitting words instead of lines#414
NGTmeaty merged 4 commits intomainfrom
scanner-split-by-word

Conversation

@yzqzss
Copy link
Copy Markdown
Collaborator

@yzqzss yzqzss commented Aug 5, 2025

fix: #413


Also made it a little faster.

before: BenchmarkExtractLinksFromPageRelax-12  465  2701707 ns/op  58202 KiB/s  369388 B/op     1311 allocs/op
after:  BenchmarkExtractLinksFromPageRelax-12  632  1941306 ns/op  80999 KiB/s  390980 B/op    15188 allocs/op

@yzqzss yzqzss added the bug Something isn't working label Aug 5, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 73.33333% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.21%. Comparing base (dc1cc5b) to head (cabfe3d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/pkg/utils/gzip.go 57.89% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #414      +/-   ##
==========================================
+ Coverage   55.17%   55.21%   +0.04%     
==========================================
  Files         117      118       +1     
  Lines        7257     7282      +25     
==========================================
+ Hits         4004     4021      +17     
- Misses       2935     2940       +5     
- Partials      318      321       +3     
Flag Coverage Δ
e2etests 37.52% <26.66%> (-0.08%) ⬇️
unittests 31.26% <73.33%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yzqzss yzqzss requested a review from NGTmeaty August 5, 2025 08:44
NGTmeaty
NGTmeaty previously approved these changes Aug 5, 2025
Copy link
Copy Markdown
Collaborator

@NGTmeaty NGTmeaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for the quick changes! Do you think it would be worthwhile to add a test here?

@yzqzss
Copy link
Copy Markdown
Collaborator Author

yzqzss commented Aug 5, 2025

image

Wait me to gzip the html.

@yzqzss yzqzss force-pushed the scanner-split-by-word branch from c6d6f64 to 42c9d8b Compare August 5, 2025 11:24
Copy link
Copy Markdown
Collaborator

@NGTmeaty NGTmeaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@NGTmeaty NGTmeaty merged commit 66dd982 into main Aug 5, 2025
5 checks passed
@NGTmeaty NGTmeaty deleted the scanner-split-by-word branch August 5, 2025 17:58
@yzqzss yzqzss added the GSoC label Aug 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working GSoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bufio.Scanner does not have large enough buffer for outlinks parsing

3 participants