Skip to content

Optionally send outlinks to 2nd HQ project#465

Merged
NGTmeaty merged 5 commits intointernetarchive:mainfrom
vbanos:outlinks-other-project
Sep 11, 2025
Merged

Optionally send outlinks to 2nd HQ project#465
NGTmeaty merged 5 commits intointernetarchive:mainfrom
vbanos:outlinks-other-project

Conversation

@vbanos
Copy link
Copy Markdown
Collaborator

@vbanos vbanos commented Sep 4, 2025

Add new options --hq-outlinks-project and --hq-outlinks-hop-limit.

Send outlinks with hops >= hq-outlinks-hop-limit to the hq-outlinks-project via the special hqOutlinksProduceChan channel.

Add new options `--hq-outlinks-project` and `--hq-outlinks-hop-limit`.

Send outlinks with hops >= `hq-outlinks-hop-limit` to the
`hq-outlinks-project` via the special `hqOutlinksProduceChan` channel.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 33.33333% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.45%. Comparing base (24c2d0a) to head (ccf5dec).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
internal/pkg/source/hq/hq.go 0.00% 20 Missing ⚠️
internal/pkg/postprocessor/postprocessor.go 52.17% 10 Missing and 1 partial ⚠️
internal/pkg/controler/pipeline.go 36.36% 5 Missing and 2 partials ⚠️
internal/pkg/source/hq/producer.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #465      +/-   ##
==========================================
+ Coverage   55.33%   56.45%   +1.12%     
==========================================
  Files         128      130       +2     
  Lines        7939     8091     +152     
==========================================
+ Hits         4393     4568     +175     
+ Misses       3183     3157      -26     
- Partials      363      366       +3     
Flag Coverage Δ
e2etests 40.74% <33.33%> (+1.12%) ⬆️
unittests 29.33% <0.00%> (+0.95%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vbanos
Copy link
Copy Markdown
Collaborator Author

vbanos commented Sep 4, 2025

The identification of outlinks to send to hq-outlinks-project is done correctly. p.sendToHQOutlinks works great.
The problem is that the HQ client does not send the outlinks to the HQ project. Something is wrong there.

We init the 2nd HQ client like this:

hqOutlinks = hq.New(config.Get().HQKey, config.Get().HQSecret, config.Get().HQOutlinksPr
oject, config.Get().HQAddress)
hqOutlinks.Start(hqOutlinksFinishChan, hqOutlinksProduceChan)

postprocessor.sendToHQOutlinks writes to hqOutlinksProduceChan the outlinks (*models.Item) but HQ displays error:

2025-09-04T20:24:32Z INFO  postprocessor.g:138 | sending outlink to HQ component=postprocessor project=vangelis_test hop=1
2025-09-04T20:24:32Z INFO  postprocessor.g:138 | sending outlink to HQ component=postprocessor project=vangelis_test hop=1
2025-09-04T20:24:45Z ERROR websocket.go:68     | error dispatching message by type component=hq msg_type= err=unexpected end of JSON input len=0

I don't understand why we get this error. We send *model.Item but the payload hq.websocket tries to send is empty.

I'm using the following command to run Zeno. Then, I add a URL to the vangelis HQ project and I observe the activity.

./Zeno get hq --hq-project vangelis --hq-address http://hq.crawl1.archive.org --hq-key admin --hq-secret sss --hq-outlinks-project vangelis_test --hq-outlinks-hop-limit 1 --max-hops 1

@NGTmeaty
Copy link
Copy Markdown
Collaborator

NGTmeaty commented Sep 5, 2025

I believe I have seen that websocket error in production as well. I meant to create an issue but it shouldn't be effecting anything in this PR. I will take another look today.

@NGTmeaty NGTmeaty marked this pull request as draft September 5, 2025 07:10
@vbanos
Copy link
Copy Markdown
Collaborator Author

vbanos commented Sep 8, 2025

I know what the problem is.
The 2nd HQ client isn't initialized, I'm getting error in hq.Start -> ErrHQAlreadyInitialized.

We need 2 instances, one for main source and one for sending outlinks to
a different project (optional)
@vbanos vbanos changed the title WIP: Optionally send outlinks to 2nd HQ project Optionally send outlinks to 2nd HQ project Sep 8, 2025
@vbanos
Copy link
Copy Markdown
Collaborator Author

vbanos commented Sep 8, 2025

I have resolved the issues, this is ready for review / merge.

BTW, the error:
2025-09-04T20:24:45Z ERROR websocket.go:68 | error dispatching message by type component=hq msg_type= err=unexpected end of JSON input len=0
does not have any noticeable effects, everything works despite this AFAIK.

@vbanos vbanos self-assigned this Sep 8, 2025
@vbanos vbanos marked this pull request as ready for review September 8, 2025 19:40
@NGTmeaty NGTmeaty requested a review from CorentinB September 11, 2025 09:33
NGTmeaty
NGTmeaty previously approved these changes Sep 11, 2025
CorentinB
CorentinB previously approved these changes Sep 11, 2025
Copy link
Copy Markdown
Collaborator

@CorentinB CorentinB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all looks good to me, but if I may, I think --hq-outlinks-hop-limit not being explicitely described and explained is a problem. I think that it is unclear what it does.

@vbanos vbanos dismissed stale reviews from CorentinB and NGTmeaty via 3d3c90e September 11, 2025 18:34
@vbanos
Copy link
Copy Markdown
Collaborator Author

vbanos commented Sep 11, 2025

@CorentinB thank you for the suggestion, I added an extra comment for hq-outlinks-hop-limit.

@CorentinB
Copy link
Copy Markdown
Collaborator

@CorentinB thank you for the suggestion, I added an extra comment for hq-outlinks-hop-limit.

Thanks. Can it be added in the command itself so that it's explained when someone does -h?

@vbanos
Copy link
Copy Markdown
Collaborator Author

vbanos commented Sep 11, 2025

Yes, why not. Done!

Copy link
Copy Markdown
Collaborator

@NGTmeaty NGTmeaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both!

@NGTmeaty NGTmeaty merged commit 2d80002 into internetarchive:main Sep 11, 2025
2 checks passed
NGTmeaty added a commit that referenced this pull request Sep 28, 2025
@willmhowes willmhowes mentioned this pull request Sep 29, 2025
NGTmeaty added a commit that referenced this pull request Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants