Skip to content
This repository has been archived by the owner on Mar 7, 2021. It is now read-only.

Find resources in <meta> redirects #216

Merged
merged 1 commit into from
Feb 14, 2016

Conversation

fredrikekelund
Copy link
Collaborator

We only discover those resources, we don't treat them as an actual
redirect (eg. emit a "fetchredirect" event). Fixes #190

// If the RegExp doesn't contain a global flag, we expect to find the
// resources in the capture groups, so we strip away the match as a
// whole (ie. the first entry in the matches array).
if (regex.toString().split("/").slice(-1)[0].indexOf("g") === -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoah, this is meta. I wonder if there's a simpler way to do this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the entire split and slice part, there's https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/flags, but that's not supported in node.

But I just realized that to avoid this entire logic (which I admit might not be the most solid) I could actually just wrap the entire regexp in a non-capturing group! I'll fix that later today

We only discover those resources, we don't treat them as an actual
redirect (eg. emit a "fetchredirect" event). Fixes simplecrawler#190
@fredrikekelund
Copy link
Collaborator Author

I went for a slightly more generalized solution: allowing functions in the discoverRegex array. What do you think about that, @cgiffard?

@cgiffard
Copy link
Member

Looks good. In future I think a good philosophy might be to make a simpler but more general regex, which might hit false positives, but which should be validated and filtered out later. :)

cgiffard added a commit that referenced this pull request Feb 14, 2016
@cgiffard cgiffard merged commit f487afd into simplecrawler:master Feb 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants