Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deny path *.<extension> blocking all indexing for a site #32

Closed
m-i-l opened this issue Jan 29, 2021 · 3 comments
Closed

Deny path *.<extension> blocking all indexing for a site #32

m-i-l opened this issue Jan 29, 2021 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@m-i-l
Copy link
Contributor

m-i-l commented Jan 29, 2021

A user entered a deny path of *.xml, which seems a reasonable way to express "don't index any XML files". However, the deny parameter on LinkExtractor interprets this is such a way that nothing on the site is indexed at all. I think this is because the regex is interpreted as *. which means everything, although escaping it with *. doesn't appear to resolve the issue.

@m-i-l m-i-l added the bug Something isn't working label Jan 29, 2021
@m-i-l m-i-l self-assigned this Jan 29, 2021
@m-i-l
Copy link
Contributor Author

m-i-l commented Jan 29, 2021

For now I'm catching and replacing *.xml with .xml$ which works, although it might need a more robust solution.

@m-i-l m-i-l closed this as completed Jan 29, 2021
@m-i-l m-i-l changed the title Deny path *.xml blocking all indexing for a site Deny path *.<extension> blocking all indexing for a site Feb 17, 2021
@m-i-l m-i-l reopened this Feb 17, 2021
@m-i-l
Copy link
Contributor Author

m-i-l commented Feb 17, 2021

Another user has set deny paths including '*.json' and '*.atom' which has blocked indexing again. Reopening. Going to need a more robust solution.

m-i-l added a commit that referenced this issue Feb 19, 2021
@m-i-l
Copy link
Contributor Author

m-i-l commented Feb 19, 2021

Implemented more robust solution using the regex exclusion_value = re.sub('^\*\.(\w+)$', r'.\1$', exclusion_value) to replace '.xml' with '.xml$', '.json' with '.json$' etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant