Skip to content

Commit

Permalink
refactor: move all rewrite handling logic into the rewrites module (#353
Browse files Browse the repository at this point in the history
)

* Move all rewrite handling logic into the rewrites module

* Improve handling of query strings

* Fix display of rewrite/redirect errors, show underlying errors again

* Document rewrites to a different virtual host

---------

Co-authored-by: Jose Quintana <1700322+joseluisq@users.noreply.github.com>
  • Loading branch information
palant and joseluisq committed Apr 21, 2024
1 parent c33e8b7 commit 207fa4a
Show file tree
Hide file tree
Showing 4 changed files with 385 additions and 149 deletions.
37 changes: 34 additions & 3 deletions docs/content/features/url-rewrites.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# URL Rewrites
# URL Rewrites

**SWS** provides the ability to rewrite request URLs (routes) with Glob pattern-matching support.

Expand Down Expand Up @@ -28,8 +28,7 @@ The glob pattern functionality is powered by the [globset](https://docs.rs/globs

### Destination

The value can be either a local file path that maps to an existing file on the system or an external URL (URLs only in case of redirection).
It could look like `/some/directory/file.html`. It is worth noting that the `/` at the beginning indicates the server's root directory.
The value should be a relative or absolute URL. A relative URL could look like `/some/directory/file.html`. An absolute URL can be `https://external.example.com/` for example.

#### Replacements

Expand All @@ -41,6 +40,38 @@ Replacements order start from `0` to `n` and are defined with a dollar sign foll
When using replacements, also group your Glob pattern by surrounding them with curly braces so every group should map to its corresponding replacement.<br>
For example: `source = "**/{*}.{png,gif}"`

#### Destination processing

How destination is processed depends on whether the `redirect` key (see below) is present. If it is present, SWS will perform an *external* redirect. It will send a redirect response to the client, and the browser will usually proceed to the destination. In case of a relative URL, it will be another page on the same server. An absolute URL can result in navigation to another server.

Without a `redirect` key, SWS will perform an *internal* redirect. It will attempt to retrieve the file denoted by the destination and send it to the client. While it is possible to specify an absolute URL here as well, it will always be processed by the same SWS instance. It will result by the request being mapped to a different [virtual host](virtual-hosting.md) however if a matching virtual host is present.

#### Different roots within the same virtual host

Normally, different root directories are only possible with different virtual hosts. Rewrites however allow exposing another root in a subdirectory for example. For that, you add an internal virtual host that isn't normally visible from outside, e.g. `internal.local`. You then rewrite the requests to the subdirectory to the internal virtual host. For example:

```toml
[general]
root = "/usr/srv/www"

[advanced]

[[advanced.rewrites]]
source = "/test/{**}"
destination = "http://internal.local/test/$1"

[[advanced.virtual-hosts]]
host = "internal.local"
root = "/usr/srv/alternative-root"
```

A request to `/index.html` will be mapped to `/usr/srv/www/index.html`, yet `/test/hi.txt` will be mapped to the file `/usr/srv/alternative-root/test/hi.txt`.

This approach has two caveats:

1. When SWS produces redirects (e.g. redirecting `http://internal.local/test/subdir` to `http://internal.local/test/subdir/`), it isn't aware of rewrites. Unless the path part of the URL is identical before and after rewrite (like in the example above), this will result in broken redirects.
2. While the `internal.local` virtual host isn't normally accessed directly, this doesn't mean that it isn't possible for someone knowing (or guessing) its name. You should consider all files under the virtual host's root as public. Don't put any secrets in it even if these aren't accessible via rewrites.

### Redirect

An optional number that indicates the HTTP response code (redirect).
Expand Down
113 changes: 25 additions & 88 deletions src/handler.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ use crate::{
control_headers, cors, custom_headers, error_page, health,
http_ext::MethodExt,
maintenance_mode, redirects, rewrites, security_headers,
settings::{file::RedirectsKind, Advanced},
settings::Advanced,
static_files::{self, HandleOpts},
virtual_hosts, Error, Result,
};
Expand Down Expand Up @@ -188,26 +188,19 @@ impl RequestHandler {
return result;
}

let method = req.method();
let headers = req.headers();
let uri = req.uri();

let mut uri_path = uri.path().to_owned();
let uri_query = uri.query();

// Health requests aren't logged here but in health module.
tracing::info!(
"incoming request: method={} uri={}{}",
method,
uri,
req.method(),
req.uri(),
remote_addr_str,
);

// Reject in case of incoming HTTP request method is not allowed
if !method.is_allowed() {
if !req.method().is_allowed() {
return error_page::error_response(
uri,
method,
req.uri(),
req.method(),
&StatusCode::METHOD_NOT_ALLOWED,
&self.opts.page404,
&self.opts.page50x,
Expand Down Expand Up @@ -241,86 +234,30 @@ impl RequestHandler {
return result;
}

// Rewrites
if let Some(result) = rewrites::pre_process(&self.opts, req) {
return result;
}

// Advanced options
if let Some(advanced) = &self.opts.advanced_opts {
// Rewrites
if let Some(rewrite) = rewrites::rewrite_uri_path(
uri_path.clone().as_str(),
advanced.rewrites.as_deref(),
) {
// Rewrites: Handle replacements (placeholders)
if let Some(regex_caps) = rewrite.source.captures(uri_path.as_str()) {
let caps_range = 0..regex_caps.len();
let caps = caps_range
.clone()
.filter_map(|i| regex_caps.get(i).map(|s| s.as_str()))
.collect::<Vec<&str>>();

let patterns = caps_range
.map(|i| format!("${}", i))
.collect::<Vec<String>>();

let dest = rewrite.destination.as_str();

tracing::debug!("url rewrites glob pattern: {:?}", patterns);
tracing::debug!("url rewrites regex equivalent: {}", rewrite.source);
tracing::debug!("url rewrites glob pattern captures: {:?}", caps);
tracing::debug!("url rewrites glob pattern destination: {:?}", dest);

if let Ok(ac) = aho_corasick::AhoCorasick::new(patterns) {
if let Ok(dest) = ac.try_replace_all(dest, &caps) {
tracing::debug!(
"url rewrites glob pattern destination replaced: {:?}",
dest
);
uri_path = dest;
}
}
}

// Rewrites: Handle redirections
if let Some(redirect_type) = &rewrite.redirect {
let loc = match HeaderValue::from_str(uri_path.as_str()) {
Ok(val) => val,
Err(err) => {
tracing::error!("invalid header value from current uri: {:?}", err);
return error_page::error_response(
uri,
method,
&StatusCode::INTERNAL_SERVER_ERROR,
&self.opts.page404,
&self.opts.page50x,
);
}
};
let mut resp = Response::new(Body::empty());
resp.headers_mut().insert(hyper::header::LOCATION, loc);
*resp.status_mut() = match redirect_type {
RedirectsKind::Permanent => StatusCode::MOVED_PERMANENTLY,
RedirectsKind::Temporary => StatusCode::FOUND,
};
return Ok(resp);
}
}

// If the "Host" header matches any virtual_host, change the root directory
if let Some(root) =
virtual_hosts::get_real_root(headers, advanced.virtual_hosts.as_deref())
virtual_hosts::get_real_root(req.headers(), advanced.virtual_hosts.as_deref())
{
base_path = root;
}
}

let uri_path = &uri_path;
let index_files = index_files.as_ref();

// Static files
match static_files::handle(&HandleOpts {
method,
headers,
method: req.method(),
headers: req.headers(),
base_path,
uri_path,
uri_query,
uri_path: req.uri().path(),
uri_query: req.uri().query(),
#[cfg(feature = "directory-listing")]
dir_listing,
#[cfg(feature = "directory-listing")]
Expand Down Expand Up @@ -365,13 +302,13 @@ impl RequestHandler {
feature = "compression-deflate"
))]
if self.opts.compression && !_is_precompressed {
resp = match compression::auto(method, headers, resp) {
resp = match compression::auto(req.method(), req.headers(), resp) {
Ok(res) => res,
Err(err) => {
tracing::error!("error during body compression: {:?}", err);
return error_page::error_response(
uri,
method,
req.uri(),
req.method(),
&StatusCode::INTERNAL_SERVER_ERROR,
&self.opts.page404,
&self.opts.page50x,
Expand Down Expand Up @@ -399,7 +336,7 @@ impl RequestHandler {
Err(status) => {
// Check for a fallback response
#[cfg(feature = "fallback-page")]
if method.is_get()
if req.method().is_get()
&& status == StatusCode::NOT_FOUND
&& !self.opts.page_fallback.is_empty()
{
Expand Down Expand Up @@ -433,13 +370,13 @@ impl RequestHandler {
feature = "compression-deflate"
))]
if self.opts.compression {
resp = match compression::auto(method, headers, resp) {
resp = match compression::auto(req.method(), req.headers(), resp) {
Ok(res) => res,
Err(err) => {
tracing::error!("error during body compression: {:?}", err);
return error_page::error_response(
uri,
method,
req.uri(),
req.method(),
&StatusCode::INTERNAL_SERVER_ERROR,
&self.opts.page404,
&self.opts.page50x,
Expand All @@ -462,8 +399,8 @@ impl RequestHandler {

// Otherwise return an error response
error_page::error_response(
uri,
method,
req.uri(),
req.method(),
&status,
&self.opts.page404,
&self.opts.page50x,
Expand Down
118 changes: 61 additions & 57 deletions src/redirects.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@

use headers::HeaderValue;
use hyper::{Body, Request, Response, StatusCode};
use regex::Regex;

use crate::{error_page, handler::RequestHandlerOpts, settings::Redirects, Error};

/// Applies redirect rules to a request if necessary.
pub(crate) fn pre_process(
opts: &RequestHandlerOpts,
req: &Request<Body>,
Expand All @@ -28,76 +30,78 @@ pub(crate) fn pre_process(
if let Some(uri_port) = uri.port_u16() {
uri_host.push_str(&format!(":{}", uri_port));
}
if let Some(matched) = get_redirection(&uri_host, uri_path, Some(redirects)) {
// Redirects: Handle replacements (placeholders)
let regex_caps = if let Some(regex_caps) = matched.source.captures(uri_path) {
regex_caps
} else {
return handle_error(
"unexpected regex failure",
"extracting captures failed",
opts,
req,
);
};
let matched = get_redirection(&uri_host, uri_path, Some(redirects))?;
let dest = match replace_placeholders(uri_path, &matched.source, &matched.destination) {
Ok(dest) => dest,
Err(err) => return handle_error(err, opts, req),
};

let caps_range = 0..regex_caps.len();
let caps = caps_range
.clone()
.map(|i| regex_caps.get(i).map(|s| s.as_str()).unwrap_or(""))
.collect::<Vec<&str>>();
match HeaderValue::from_str(&dest) {
Ok(loc) => {
let mut resp = Response::new(Body::empty());
resp.headers_mut().insert(hyper::header::LOCATION, loc);
*resp.status_mut() = matched.kind;
tracing::trace!(
"uri matches redirects glob pattern, redirecting with status '{}'",
matched.kind
);
Some(Ok(resp))
}
Err(err) => handle_error(
Error::new(err).context("invalid header value from current uri"),
opts,
req,
),
}
}

let patterns = caps_range
.map(|i| format!("${}", i))
.collect::<Vec<String>>();
/// Replaces placeholders in the destination URI by matching capture groups from the original URI.
pub(crate) fn replace_placeholders(
orig_uri: &str,
regex: &Regex,
dest_uri: &str,
) -> Result<String, Error> {
let regex_caps = if let Some(regex_caps) = regex.captures(orig_uri) {
regex_caps
} else {
return Err(Error::msg("regex didn't match, extracting captures failed"));
};

let dest = &matched.destination;
let caps_range = 0..regex_caps.len();
let caps = caps_range
.clone()
.map(|i| regex_caps.get(i).map(|s| s.as_str()).unwrap_or(""))
.collect::<Vec<&str>>();

tracing::debug!("url redirects glob pattern: {:?}", patterns);
tracing::debug!("url redirects regex equivalent: {}", matched.source);
tracing::debug!("url redirects glob pattern captures: {:?}", caps);
tracing::debug!("url redirects glob pattern destination: {:?}", dest);
let patterns = caps_range
.map(|i| format!("${}", i))
.collect::<Vec<String>>();

let ac = match aho_corasick::AhoCorasick::new(patterns) {
Ok(ac) => ac,
Err(err) => {
return handle_error("failed creating Aho-Corasick matcher", err, opts, req)
}
};
let dest = match ac.try_replace_all(dest, &caps) {
Ok(dest) => dest.to_string(),
Err(err) => return handle_error("failed replacing captures", err, opts, req),
};
tracing::debug!("url redirects/rewrites glob pattern: {patterns:?}");
tracing::debug!("url redirects/rewrites regex equivalent: {regex}");
tracing::debug!("url redirects/rewrites glob pattern captures: {caps:?}");
tracing::debug!("url redirects/rewrites glob pattern destination: {dest_uri:?}");

tracing::debug!(
"url redirects glob pattern destination replaced: {:?}",
dest
);
match HeaderValue::from_str(&dest) {
Ok(loc) => {
let mut resp = Response::new(Body::empty());
resp.headers_mut().insert(hyper::header::LOCATION, loc);
*resp.status_mut() = matched.kind;
tracing::trace!(
"uri matches redirects glob pattern, redirecting with status '{}'",
matched.kind
);
Some(Ok(resp))
}
Err(err) => handle_error("invalid header value from current uri", err, opts, req),
let ac = match aho_corasick::AhoCorasick::new(patterns) {
Ok(ac) => ac,
Err(err) => return Err(Error::new(err).context("failed creating Aho-Corasick matcher")),
};
match ac.try_replace_all(dest_uri, &caps) {
Ok(dest) => {
tracing::debug!("url redirects/rewrites glob pattern destination replaced: {dest:?}");
Ok(dest.to_string())
}
} else {
None
Err(err) => Err(Error::new(err).context("failed replacing captures")),
}
}

fn handle_error<E: std::fmt::Display>(
msg: &str,
err: E,
/// Logs error and produces an Internal Server Error response.
pub(crate) fn handle_error(
err: Error,
opts: &RequestHandlerOpts,
req: &Request<Body>,
) -> Option<Result<Response<Body>, Error>> {
tracing::error!("{msg}: {err}");
tracing::error!("{err:?}");
Some(error_page::error_response(
req.uri(),
req.method(),
Expand Down
Loading

0 comments on commit 207fa4a

Please sign in to comment.