Add option to override spider configuration when starting a run #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the option of overriding some or all of a spider’s configuration at runtime.
To so, we can now pass an option
$overrides
parameter to theRoach::startSpider
function. This parameter takes an instance ofRoachPHP\Spider\Configuration\Overrides
which will get merged with the spider’s own configuration.Rationale
While having all configuration defined inside the spider class itself is certainly convenient, we’re essentially hard coding everything about a spider. For example, before this PR, it would not have been possible to dynamically pass different start URLs to spider. With this PR, we could now accept the start URL through the UI, load it from the database or conjure it up some other way. We can then start a run of a spider using that URL.
Note that his example assumes that the parsing logic you have written works for all of these dynamic URLs. Validating the start urls is still up to you.
The same holds true for all the other spider configuration values as well. Maybe we want to quickly fire off a test run against a specific URL without having to change the actual spider class itself. We could override the
concurrency
andrequestDelay
parameters accordingly as to not hammer the server with requests. We could also register theMaxRequestExtension
for this run to ensure that we stop the run after the first request.Note that when overriding values, they will replace the spider’s configuration, not get merged with it. In the example above we have to make sure to also register the
LoggerExtension
for the run even if the spider’s own configuration already registers it.