Skip to content

Downloading the surrounding web pages #517

Answered by ato
krojtous asked this question in Q&A
Discussion options

You must be logged in to vote

My understanding is the "trans" in maxTransHops is transclusion (which often people call 'embed'), so applies to stylesheets, images etc not normal links.

Is there any setting for the crawler to download all web pages linked from the downloaded website?

I believe setting alsoCheckVia property to true on the acceptSurts (SurtPrefixedDecideRule) bean does this. The documentation describes it as:

Whether to also make the configured decision if a URI's 'via' URI (the
URI from which it was discovered) in SURT form begins with any of the
established prefixes. For example, can be used to ACCEPT URIs that are
'one hop off' URIs fitting the SURT prefixes. Default is false.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ato
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
2 participants
Converted from issue

This discussion was converted from issue #380 on September 30, 2022 00:45.