Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

bug 1199166 - block_user_agents decorator for views #3452

Merged
merged 1 commit into from Sep 3, 2015

Conversation

groovecoder
Copy link
Contributor

Not sure this settings.BLOCKABLE_USER_AGENTS approach is the best one. Need some r? on this overall approach ...

def block_user_agents(view_func):
blockable_user_agents = []
if hasattr(settings, 'BLOCKABLE_USER_AGENTS'):
blockable_user_agents = settings.BLOCKABLE_USER_AGENTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those three lines can be written as blockable_user_agents = getattr(settings, 'BLOCKABLE_USER_AGENTS', [])

@groovecoder groovecoder force-pushed the block-user-agents-1199166 branch 2 times, most recently from 45dc262 to 7910b54 Compare August 31, 2015 16:18
@groovecoder
Copy link
Contributor Author

Updated.

@groovecoder
Copy link
Contributor Author

Oh, still needs tests ...

@groovecoder
Copy link
Contributor Author

Adding tests with real user agent values showed me that we need to use pattern.search to catch the user agent strings. This is ready for final review & merge.

@groovecoder
Copy link
Contributor Author

@jwhitlock - can you take this one now?

@jwhitlock
Copy link
Contributor

Is kumascript blocked by this? I'm pretty sure document.get_children and document.as_json are used by KS macros.

@groovecoder
Copy link
Contributor Author

Not likely but it could potentially block KumaScript if it does something crazy like send 'curl' or 'get' as the user agent string. I'll verify before merging & deploying. (I'll also add YandexBot to the blockable agents)

@groovecoder groovecoder assigned groovecoder and unassigned jwhitlock Sep 2, 2015
@@ -12,6 +13,7 @@
from ..queries import MultiQuerySet


@block_user_agents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -47,6 +49,7 @@ def documents(request, category=None, tag=None):
return render(request, 'wiki/list/documents.html', context)


@block_user_agents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't miss this one - it's in robots.txt already)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😛

@jwhitlock
Copy link
Contributor

+r on code, with possible additions to robots.txt

@groovecoder groovecoder force-pushed the block-user-agents-1199166 branch 2 times, most recently from 5562f4b to 0804f5d Compare September 3, 2015 13:56
@jwhitlock
Copy link
Contributor

+r on robots.txt additions. Remember to confirm kumascript before it goes to production.

@groovecoder
Copy link
Contributor Author

  • Added the Disallow entries to robots.txt.
  • Added YandexBot to BLOCKABLE_USER_AGENTS since it triggered a db-heavy transaction yesterday.

Yup; still need to verify the kumascript user agent sent to Apache/django.

@groovecoder
Copy link
Contributor Author

After looking thru access logs for other crawling user agents, I also added:

"Baiduspider",
"CCBot",
"ScoutJet",

@groovecoder
Copy link
Contributor Author

I verified locally that KumaScript requests are sent to Apache without a blocked user agent string:

developer-local.allizom.org:443 127.0.0.1 - - [03/Sep/2015:16:57:06 +0000] "GET /en-US/docs/en-US/Template:dekiscript:page?raw=1 HTTP/1.1" 404 3947 "-" "-"

And cyliang sees the internal requests to production Apache are sent similarly - i.e., with "-" user agent:

 - - [03/Sep/2015:09:30:04 -0700] "GET /en-US/docs/MDN/Kuma/API?raw=1 HTTP/1.1" 200 6722 "-" "-"

So, this is good for merge + deploy.

@groovecoder groovecoder assigned jwhitlock and unassigned groovecoder Sep 3, 2015
jwhitlock added a commit that referenced this pull request Sep 3, 2015
bug 1199166 - block_user_agents decorator for views
@jwhitlock jwhitlock merged commit f10381a into master Sep 3, 2015
@groovecoder groovecoder deleted the block-user-agents-1199166 branch September 14, 2015 23:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants