Skip to content

Commit

Permalink
Require all traffic to provide a User-Agent header.
Browse files Browse the repository at this point in the history
We want to be able to actually differentiate crawlers from each other,
so I've nudged them towards actually using a unique user agent (we
probably won't ever actually block generic UAs since folks sometimes do
actually use curl/wget from the command line).

Additionally, I've had a lot of cases lately where a crawler has been
outside of what we allow, but wasn't actually causing a service impact.
If I could contact those people without having to block their traffic, I
would. So I've also worded the message to try and nudge folks towards
including contact info, which most commercial bots already do.
  • Loading branch information
sgrif committed Oct 23, 2018
1 parent 7fc9d43 commit 68f6824
Show file tree
Hide file tree
Showing 6 changed files with 74 additions and 4 deletions.
2 changes: 2 additions & 0 deletions src/middleware/mod.rs
Expand Up @@ -20,6 +20,7 @@ mod ember_index_rewrite;
mod ensure_well_formed_500;
mod head;
mod log_request;
mod require_user_agent;
mod security_headers;
mod static_or_continue;

Expand Down Expand Up @@ -85,6 +86,7 @@ pub fn build_middleware(app: Arc<App>, endpoints: R404) -> MiddlewareBuilder {
let ips = ip_list.split(',').map(String::from).collect();
m.around(blacklist_ips::BlockIps::new(ips));
}
m.around(require_user_agent::RequireUserAgent::default());

if env != Env::Test {
m.around(log_request::LogRequests::default());
Expand Down
13 changes: 13 additions & 0 deletions src/middleware/no_user_agent_message.txt
@@ -0,0 +1,13 @@
We require that all requests include a `User-Agent` header. To allow us to determine the impact your bot has on our service, we ask that your user agent actually identify your bot, and not just report the HTTP client library you're using. Including contact information will also reduce the chance that we will need to take action against your bot.

Bad:
User-Agent: reqwest/0.9.1

Better:
User-Agent: my_crawler

Best:
User-Agent: my_crawler (my_crawler.com/info)
User-Agent: my_crawler (help@my_crawler.com)

If you believe you've received this message in error, please email help@crates.io and include the request id {}.
41 changes: 41 additions & 0 deletions src/middleware/require_user_agent.rs
@@ -0,0 +1,41 @@
//! Middleware that blocks requests with no user-agent header

use super::prelude::*;

use std::collections::HashMap;
use std::io::Cursor;
use util::request_header;

// Can't derive debug because of Handler.
#[allow(missing_debug_implementations)]
#[derive(Default)]
pub struct RequireUserAgent {
handler: Option<Box<dyn Handler>>,
}

impl AroundMiddleware for RequireUserAgent {
fn with_handler(&mut self, handler: Box<dyn Handler>) {
self.handler = Some(handler);
}
}

impl Handler for RequireUserAgent {
fn call(&self, req: &mut dyn Request) -> Result<Response, Box<dyn Error + Send>> {
let has_user_agent = request_header(req, "User-Agent") != "";
if !has_user_agent {
let body = format!(
include_str!("no_user_agent_message.txt"),
request_header(req, "X-Request-Id"),
);
let mut headers = HashMap::new();
headers.insert("Content-Length".to_string(), vec![body.len().to_string()]);
Ok(Response {
status: (403, "Forbidden"),
headers,
body: Box::new(Cursor::new(body.into_bytes())),
})
} else {
self.handler.as_ref().unwrap().call(req)
}
}
}
4 changes: 3 additions & 1 deletion src/tests/all.rs
Expand Up @@ -187,7 +187,9 @@ fn env(var: &str) -> String {
}

fn req(method: conduit::Method, path: &str) -> MockRequest {
MockRequest::new(method, path)
let mut request = MockRequest::new(method, path);
request.header("User-Agent", "conduit-test");
request
}

fn ok_resp(r: &conduit::Response) -> bool {
Expand Down
12 changes: 12 additions & 0 deletions src/tests/server.rs
@@ -0,0 +1,12 @@
use conduit::Method;

use {app, req};

#[test]
fn user_agent_is_required() {
let (_b, _app, middle) = app();

let mut req = req(Method::Get, "/api/v1/crates");
req.header("User-Agent", "");
bad_resp!(middle.call(&mut req));
}
6 changes: 3 additions & 3 deletions src/tests/util.rs
Expand Up @@ -158,7 +158,7 @@ pub struct MockAnonymousUser {

impl RequestHelper for MockAnonymousUser {
fn request_builder(&self, method: Method, path: &str) -> MockRequest {
MockRequest::new(method, path)
::req(method, path)
}

fn app(&self) -> &TestApp {
Expand All @@ -177,7 +177,7 @@ pub struct MockCookieUser {

impl RequestHelper for MockCookieUser {
fn request_builder(&self, method: Method, path: &str) -> MockRequest {
let mut request = MockRequest::new(method, path);
let mut request = ::req(method, path);
request.mut_extensions().insert(self.user.clone());
request
.mut_extensions()
Expand Down Expand Up @@ -218,7 +218,7 @@ pub struct MockTokenUser {

impl RequestHelper for MockTokenUser {
fn request_builder(&self, method: Method, path: &str) -> MockRequest {
let mut request = MockRequest::new(method, path);
let mut request = ::req(method, path);
request.header("Authorization", &self.token.token);
request
}
Expand Down

0 comments on commit 68f6824

Please sign in to comment.