Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace fallthrough routes with param validators #4291

Closed
Rich-Harris opened this issue Mar 11, 2022 · 15 comments
Closed

Replace fallthrough routes with param validators #4291

Rich-Harris opened this issue Mar 11, 2022 · 15 comments
Labels
feature request New feature or request
Milestone

Comments

@Rich-Harris
Copy link
Member

Describe the problem

SvelteKit has an unusual feature called fallthrough routes which allows ambiguity between routes to be resolved by the routes themselves.

It's a powerful feature, but a niche one, and it comes at a significant cost. It adds non-trivial complexity to the SvelteKit codebase, makes things conceptually less clear, leads to undefined behaviour (should event.locals persist between different routes that are candidates for the same request?), has undesirable API implications (load has to return something, even if it doesn't return anything), and terrible performance implications in the case where a route does fall through.

We could get 90% of the value of fallthrough routes with a fraction of the cost if we had some way of validating route parameters before attempting to render or navigate.

Describe the proposed solution

Credit for this idea goes to @mrkishi; any flaws were introduced by me.

Suppose you're building a calendar app, and for some reason you're forced to have routes like this:

src/routes/[date]
src/routes/[eventid]

When someone visits /2022-03-12, either route could match. Rather than adding fallthrough logic to src/routes/[date], we could add a validator that SvelteKit uses to check the format:

// src/params/date.js
export default param => /^\d{4}-\d{2}-\d{2}$/.test(param);

Seeing that 2022-03-12 passes the test, we can select the src/routes/[date] route immediately.

If we know that every eventid is a UUID, we could test that too. Since UUIDs are a generic concept, we need a more generic name than eventid:

src/routes/[eventid=uuid]

Now, we can add a UUID validator...

// src/params/uuid.js
export default param => /[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}/.test(param);

...and if someone visits /neither-a-date-nor-a-uuid we can 404 without doing anything else. Not only that, but the routes can have complete confidence that their parameters are valid.

(We could also do enum checks for things like src/routes/[year]/[month]/[day]...

// src/params/month.js
const months = new Set('jan feb mar apr may jun jul aug sep oct nov dec'.split(' '));
export default param => months.has(param);

...though I don't have any immediate thoughts on how that would internationalize.)

Validating vs parsing

One thing I'm not sure of: whether these functions should return booleans, or a parsed representation of the parameter. For example, date.js could return a Date object (or a Temporal, once that's possible):

// src/params/date.js
export default (param) => {
  const match = /^(\d{4})-(\d{2})-(\d{2})$/.exec(param);
  if (!match) return;
  
  return new Date(+match[1], +match[2] - 1, +match[3]);
};

In that case, params.date would be the returned Date. This makes sense insofar as you're not applying the regex twice, and if you change the data format from yyyy-mm-dd to something else you only need to do it in one place. But it makes the types harder to reason about, and we then have to figure out what to return when a parameter is invalid.

Alternatives considered

Server-side async validators

We could make validators even more powerful by specifying that some can only run on the server (i.e. the client communicates via JSON), and allowing them to be asynchronous. For example, if you had src/routes/blog/[post]...

// src/params/post.server.js
import db from '$lib/db';

export default async (param) => {
  const post = await db.get('post', param);
  return post;
};

...params.post could be a fully hydrated post object, or the route could 404 if the database didn't contain the post in question.

This is a seductive idea (I was seduced by it) but I think it would be a mistake. It gives you a third way to load data (alongside page/section endpoints and load), leading to the question 'where am I supposed to put this data loading logic?', and it doesn't scale — as soon as you need to refer to event.locals, or make a database query that involves multiple parameters, you're back to endpoints. It's sometimes worth sacrificing a little bit of power to make things more coherent.

Nesting validators where they're used

In the samples above, I've put validators in src/params/*.js. It's tempting to suggest colocating them with the routes that use them...

src/routes/products/__params/id.js
src/routes/products/[id].svelte
src/routes/events/__params/id.js
src/routes/events/[id].svelte

...but this feels to me like it would lead to more confusion and less code reuse. I would advocate for doing this instead...

src/routes/products/[id=productid].svelte
src/routes/events/[id=eventid].svelte

...which has the side-effect of creating a kind of central ontology of the concepts in your app.

Putting validators in a single file

My examples show src/params/date.js and src/params/uuid.js rather than this:

// src/params.js
export const date = param => /^\d{4}-\d{2}-\d{2}$/.test(param);
export const uuid = param => /[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}/.test(param);

This is intentional, so that we retain the option to code-split and lazy-load validators.

Importance

would make my life easier

Additional Information

Escape hatch

There are some legitimate (if somewhat niche) use cases for fallthrough that aren't covered by this proposal, so it's important to note that there is an escape hatch — you can create an umbrella route and distinguish between routes in load:

<script context="module">
  export function load(page) {
    const module = await (page.params.foo = 'one'
      ? import('./One.svelte')
      : import('./Two.svelte');

    const loaded = await module.load(page);

    return {
      ...loaded,
      props: {
        ...loaded?.props,
        component: module.default
      }
    };
  }
</script>

<script>
  export let component;
</script>

<svelte:component this={component} {...$$restProps} />

Middleware

One of the reasons I'm motivated to make this change is that people have requested some form of scoped middleware that only runs for a subset of pages:

// src/routes/admin/__middleware.js
export function handle({ event, resolve }) {
  if (!event.locals.user?.is_admin) {
    return new Response('YOU SHALL NOT PASS', {
     status: 403
    });
  }

  return resolve(event);
}

We haven't designed it yet, but it's likely that we'll want to 'chain' handle functions — i.e. the root handle in src/hooks.js (which might become src/routes/__middleware.js) would, by calling resolve, invoke the admin middleware if someone was requesting a page under /admin.

That becomes nightmarishly hard to reason about if route matching happens during resolve rather than before handle is invoked. You can't know, in the general case, which handle functions to execute.

Route logging

#3907 proposes a new hook for logging which route ended up handling a given request. Most likely, this would involve exposing the internal route key (i.e. /blog/my-article has a key like blog/[slug]) on the event object. If the route is known before handle runs, we don't need to introduce another hook. This is a small example, but it shows how removing fallthrough routes would pay dividends that might not be immediately obvious.

@vwkd
Copy link

vwkd commented Mar 11, 2022

Might be worth considering to do the parameter verification in the page itself, so no yet additional folders or files are needed. Maybe it could be an export in the module script tag like a params function?

@rmunn
Copy link
Contributor

rmunn commented Mar 11, 2022

One thing I'm not sure of: whether these functions should return booleans, or a parsed representation of the parameter.

I'm leaning towards "parsed representation", and returning undefined when the parameter is invalid. Being able to know in route handlers that params.id will always be a number is quite convenient, because it means I don't have to write a logic branch in the handler that says "If id is not a number, return 400 Bad Request".

Though that's a thought. If I have that logic in the handler, I can return 400 Bad Request with a JSON object describing the problem: "{ code: 'invalid_url_param', message: 'id should be a number, got xyz instead' }". With param validators, though, /api/user/xyz (instead of /api/user/3) would return a 404 because it fell through all the routes and was unable to find one to handle the request.

Though I guess that can be handled by opting out of the validation system. If there's only one /api/user/[id].ts then there's no need for validators to check which route to handle. So in that case I can just let params.id be a string and parse it in the handler function (just as I have to do now), and return a 400 Bad Request when I want to.

The only drawback I can see is that in some rare use cases, you might want to return undefined as a valid value for a prop. In that rare case, though, you could have the route validator return Symbol.for('undefined') and then your handler would have to have the line if (params.id == Symbol.for('undefined')) params.id = undefined. So that use case is still possible, and will be rate enough I don't think there's any real drawback to using undefined as the return value that means "fall through to look at the next param validator, if any".

@Rich-Harris
Copy link
Member Author

@vwkd that would defeat the object, which is to match the route before loading the code for it.

@rmunn yep, the validator is only required for disambiguation, like if you had a situation like this:

/api/user/[id] // numeric
/api/user/[reserved] // a list of reserved user IDs like 'admin'

Absent that situation you could certainly do the validation in the route, though I would argue that a 400 is semantically incorrect anyway — a 400 is a bad request for an extant resource, and relates to problems with the request body or parameters rather than the resource identifier.

Being able to know in route handlers that params.id will always be a number is quite convenient

If you were okay with a 404 for /api/user/xyz then the validator would ensure that params.id was a numeric string, so you can at least coerce it with confidence in the route. Doing the transformation ahead of time would save even that step, but your Symbol.for('undefined') suggestion shows just how unwieldy and confusing the API could become as a result. Sticking with validation is probably better. One thing the maintainers discussed last night was maybe changing export default to export const validate so that we would, at least, reserve the ability to add export const transform in future.

@Crisfole
Copy link
Contributor

I think I lean toward truthy/falsey for the validators. It means you have to handle falsey values correctly and you don't bump into issues where the parsed representation is falsey or null:

/api/user/0

In that case the parsed representation is 0 which is falsey, but you still want it to match. I could come up with similar examples with null or something, but I think that's demonstrative enoguh?

@Crisfole
Copy link
Contributor

Crisfole commented Mar 11, 2022

I'd also gently lean toward this api:

type Predicate = (segment: string) => boolean;
interface Tester {
  test: Predicate;
}
type Validator = Predicate | Tester;

It allows a regex to be used directly as a validator.

@benmccann
Copy link
Member

+1 to removing fallthrough. I just did a search on GitHub and couldn't find a single project using it

I wonder if we even need validators / parsers. It's very rare that I have overloaded routes and when I do I need to hit the database most of the time anyway to see if there's content. The escape hatch seems like it would work well enough that we could use it even in simpler cases given how rare it is and don't need to introduce the extra complexity of validators / parsers at all. We can always remove fallthrough and see if many people miss it for doing things like regexes on parameters before deciding to add more features around that

@geoffrich
Copy link
Member

I just did a search on GitHub and couldn't find a single project using it

@benmccann I don't think that's true - I was able to find a few projects using it. Still, it's not a lot of usage.

@Rich-Harris
Copy link
Member Author

I looked through a few of those search results and they all seem to be using fallthrough to trigger a 404 rather than because of any route ambiguity.

I do have one good example of it being useful — disambiguating between these:

Am somewhat torn, as this could be done with validators much more pleasantly than with the escape hatch.

@georgecrawford
Copy link
Contributor

georgecrawford commented Mar 11, 2022

Just to drop another (albeit tenuous) use-case for fallthrough: #4282 (comment). My infrastructure requires some responses from URLs with __ prefixes, e.g. /__health. Double-underscore prefixes are 'reserved' in SvelteKit. The neatest way of serving these URLs that I've found so far is to name a file [__health].svelte (i.e. using a wildcard), and to 'test' that the wildcard param matches the desired URL pathname in the load() function:

<script context="module">
  export async function load({ url, props }) {
    if (url.pathname !== '/__health') {
      return { fallthrough: true };
    }

    return {props}
  }
</script>

If it doesn't match, we fallthrough to the next similar file. If we removed the concept of explicit fallthrough, then I assume I'd have to either use the proposed 'Escape Hatch', or write a validator for each such route, such as src/params/__health.js:

export default param => param === '__health';

I do fully understand that there are likely to be very negative performance implications of my current solution, as each file is evaluated and the load() run before we finally find a match, so perhaps the two alternatives are better, even if they make the filesystem and implementation a bit more clumsy.

Going back to this comment, with the response from @Rich-Harris that:

that would defeat the object, which is to match the route before loading the code for it.

...could we consider an approach whereby a page defines a validator:

<script context="module">
  export validatePath = pathname => pathname === '/__health';

  // or:
  export validateParams = params => (params.length === 1 && params[0] === '__health');
</script>

...but that the compiler somehow collects these into a separate location in the router (akin to the files in /src/params) where they can be used to tests URLs against page routes before requiring the code for them to be loaded and run. Could that work?

@Conduitry
Copy link
Member

Conduitry commented Mar 11, 2022

Hm. We let users customize the rules for which files should result in routes, should we also (😬) let users customize the special __ filenames? I don't know what that would look like exactly. That may be more of a question for #4274/#4275.

@Rich-Harris
Copy link
Member Author

I can't remember where we were discussing this recently but we talked about handling encoding in filenames, so you could have a route like src/routes/%5f%5fhealth and it would map to __health

@Rich-Harris
Copy link
Member Author

...but that the compiler somehow collects these into a separate location in the router (akin to the files in /src/params) where they can be used to tests URLs against page routes before requiring the code for them to be loaded and run. Could that work?

In theory yes, but in practice it's a real can of worms. Even if you can get it to work reliably (which means moving potentially side-effecty imports into the right place, and other sorts of edge cases that make me tired to think about), it masks the underlying mechanics in a way that makes the system as a whole more difficult to understand

@moritzebeling
Copy link

moritzebeling commented Mar 13, 2022

If I understand correctly, this could be really helpful to improve the i18n situation, where you want to have both a top-level lang-route but also some urls that are not translated:

[lang]/translated-page.svelte
/non/translated/file.jpg

because then I could validate if lang is included in my list of supported languages.

@Rich-Harris
Copy link
Member Author

In the longer term we might well want some special handling for [lang] params so that i18n is more of a first class concern, but in the meantime yes, that's a perfect example of how this could be useful

@cdcarson
Copy link
Contributor

"URL forgiveness" on the framework level is a bug not a feature. If a user clicks on a link that I haven't accounted for, then I expect an error to be thrown, not for SvelteKit to handle it according to some baroque file naming logic, a la WordPress. All the (perfectly valid) cases above can and should be handled in userland.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants