bug 1199166 - block_user_agents decorator for views #3452
Conversation
def block_user_agents(view_func): | ||
blockable_user_agents = [] | ||
if hasattr(settings, 'BLOCKABLE_USER_AGENTS'): | ||
blockable_user_agents = settings.BLOCKABLE_USER_AGENTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those three lines can be written as blockable_user_agents = getattr(settings, 'BLOCKABLE_USER_AGENTS', [])
45dc262
to
7910b54
Compare
Updated. |
Oh, still needs tests ... |
7910b54
to
cf2cea5
Compare
Adding tests with real user agent values showed me that we need to use |
cf2cea5
to
5ffcc01
Compare
@jwhitlock - can you take this one now? |
Is kumascript blocked by this? I'm pretty sure |
Not likely but it could potentially block KumaScript if it does something crazy like send 'curl' or 'get' as the user agent string. I'll verify before merging & deploying. (I'll also add |
@@ -12,6 +13,7 @@ | |||
from ..queries import MultiQuerySet | |||
|
|||
|
|||
@block_user_agents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add https://developer.mozilla.org/en-US/docs/all and https://developer.mozilla.org/en-US/docs/tag/* to robots.txt as well?
@@ -47,6 +49,7 @@ def documents(request, category=None, tag=None): | |||
return render(request, 'wiki/list/documents.html', context) | |||
|
|||
|
|||
@block_user_agents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I didn't miss this one - it's in robots.txt already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😛
+r on code, with possible additions to robots.txt |
5562f4b
to
0804f5d
Compare
+r on robots.txt additions. Remember to confirm kumascript before it goes to production. |
Yup; still need to verify the kumascript user agent sent to Apache/django. |
0804f5d
to
7ef8e07
Compare
After looking thru access logs for other crawling user agents, I also added:
|
I verified locally that KumaScript requests are sent to Apache without a blocked user agent string:
And cyliang sees the internal requests to production Apache are sent similarly - i.e., with
So, this is good for merge + deploy. |
bug 1199166 - block_user_agents decorator for views
Not sure this
settings.BLOCKABLE_USER_AGENTS
approach is the best one. Need some r? on this overall approach ...