Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex-validated string types (feedback reset) #41160

Open
RyanCavanaugh opened this issue Oct 19, 2020 · 100 comments
Open

Regex-validated string types (feedback reset) #41160

RyanCavanaugh opened this issue Oct 19, 2020 · 100 comments
Labels
Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript

Comments

@RyanCavanaugh
Copy link
Member

RyanCavanaugh commented Oct 19, 2020

This is a pickup of #6579. With the addition of #40336, a large number of those use cases have been addressed, but possibly some still remain.

Update 2023-04-11: Reviewed use cases and posted a write-up of our current evaluation

Search Terms

regex string types

Suggestion

Open question: For people who had upvoted #6579, what use cases still need addressing?

Note: Please keep discussion on-topic; moderation will be a bit heavier to avoid off-topic tangents

Examples

(please help)

Checklist

My suggestion meets these guidelines:

  • [?] This wouldn't be a breaking change in existing TypeScript/JavaScript code
  • [?] This wouldn't change the runtime behavior of existing JavaScript code
  • [?] This could be implemented without emitting different JS based on the types of the expressions
  • [?] This isn't a runtime feature (e.g. library functionality, non-ECMAScript syntax with JavaScript output, etc.)
  • [?] This feature would agree with the rest of TypeScript's Design Goals.
@RyanCavanaugh RyanCavanaugh added Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript labels Oct 19, 2020
@AnyhowStep
Copy link
Contributor

AnyhowStep commented Oct 19, 2020

Use case 1, URL path building libraries,

/*snip*/
createTestCard : f.route()
    .append("/platform")
    .appendParam(s.platform.platformId, /\d+/)
    .append("/stripe")
    .append("/test-card")
/*snip*/

These are the constraints for .append(),

  • ✔️ Must start with leading forward slash (/)
  • ❌ Must not end with trailing forward slash (/)
  • ❌ Must not contain colon character (:); it is reserved for parameters
  • ❌ Must not contain two, or more, forward slashes consecutively (//)

Use case 2,

  • ❌ Hexadecimal/binary/decimal/etc. strings of non-trivial length (explosion of union types)

Use case 3, safer RegExp constructor (and similar functions?),

new(pattern: string, flags?: PatternOf</^[gimsuy]*$/>): RegExp
  • flags should only contain the characters g,i,m,s,u,y
  • ❌ Each character should only be used once (To be fair, this condition would be hard for regexes, too, requiring negative lookahead or many states)
  • ❌ Characters can be specified in any order

@yume-chan
Copy link
Contributor

Template string type can only be used in conditional type, so it's really a "type validator", not a "type" itself. It also focuses more on manipulating strings, I think it's a different design goal from Regex-validated types.

It's doable to use conditional types to constrain parameters, for example taken from #6579 (comment)

declare function takesOnlyHex<StrT extends string> (
    hexString : Accepts<HexStringLen6, StrT> extends true ? StrT : {__err : `${StrT} is not a hex-string of length 6`}
) : void;

However I think this parttern has several issues:

  1. It's not a common pattern, and cumbersome to repeat every time.
  2. The type parameter should be inferred, but was used in a condition before it "can" be inferred, which is unintuitive.
  3. TypeScript still doesn't support partial generic inferrence (Implement partial type argument inference using the _ sigil #26349) so it may be hard to use this pattern with more generic parameters.

@bmix
Copy link

bmix commented Oct 21, 2020

Would this allow me to define type constraints for String to match the XML specification's Name constructs (short summary) and QNames by expressing them as regular expressions? If so, I am all for it :-)

@ksabry
Copy link

ksabry commented Oct 21, 2020

@AnyhowStep It isn't the cleanest, but with conditional types now allowing recursion, it seems we can accomplish these cases with template literal types: playground link

@AnyhowStep
Copy link
Contributor

AnyhowStep commented Oct 22, 2020

We can have compile-time regular expressions now.
But anything requiring conditional types and a generic type param to check is a non-feature to me.

(Well, non-feature when I'm trying to use TypeScript for work. All personal projects have --noEmit enabled because real TS programmers execute in compile-time)

@arcanis
Copy link

arcanis commented Dec 12, 2020

Open question: For people who had upvoted #6579, what use cases still need addressing?

We have a strongly-typed filesystem library, where the user is expected to manipulate "clean types" like Filename or PortablePath versus literal strings (they currently obtain those types by using the as operator on literals, or calling a validator for user-provided strings):

export interface PathUtils {
  cwd(): PortablePath;

  normalize(p: PortablePath): PortablePath;
  join(...paths: Array<PortablePath | Filename>): PortablePath;
  resolve(...pathSegments: Array<PortablePath | Filename>): PortablePath;
  isAbsolute(path: PortablePath): boolean;
  relative(from: PortablePath, to: PortablePath): P;
  dirname(p: PortablePath): PortablePath;
  basename(p: PortablePath, ext?: string): Filename;
  extname(p: PortablePath): string;

  readonly sep: PortablePath;
  readonly delimiter: string;

  parse(pathString: PortablePath): ParsedPath<PortablePath>;
  format(pathObject: FormatInputPathObject<PortablePath>): PortablePath;

  contains(from: PortablePath, to: PortablePath): PortablePath | null;
}

I'm investigating template literals to remove the as syntax, but I'm not sure we'll be able to use them after all:

  • They don't raise errors very well
  • Interfaces are a pain to type (both declaration and implementation would have to be generics)
  • More generally, we would have to migrate all our existing functions to become generics, and our users would have too

The overhead sounds overwhelming, and makes it likely that there are side effects that would cause problems down the road - causing further pain if we need to revert. Ideally, the solution we're looking for would leave the code above intact, we'd just declare PortablePath differently.

@RyanCavanaugh
Copy link
Member Author

RyanCavanaugh commented Dec 14, 2020

@arcanis it really sounds like you want nominal types (#202), since even if regex types existed, you'd still want the library consumer to go through the validator functions?

@hanneswidrig
Copy link

I have a strong use case for Regex-validated string types. AWS Lambda function names have a maximum length of 64 characters. This can be manually checked in a character counter but it's unnecessarily cumbersome given that the function name is usually composed with identifying substrings.

As an example, this function name can be partially composed with the new work done in 4.1/4.2. However there is no way to easily create a compiler error in TypeScript since the below function name will be longer than 64 characters.

type LambdaServicePrefix = 'my-application-service';
type LambdaFunctionIdentifier = 'dark-matter-upgrader-super-duper-test-function';
type LambdaFunctionName = `${LambdaServicePrefix}-${LambdaFunctionIdentifier}`;
const lambdaFunctionName: LambdaFunctionName  = 'my-application-service-dark-matter-upgrader-super-duper-test-function';

This StackOverflow Post I created was asking this very same question.

With the continued rise of TypeScript in back-end related code, statically defined data would be a likely strong use case for validating the string length or the format of the string.

@johnbillion
Copy link

johnbillion commented Apr 29, 2021

TypeScript supports literal types, template literal types, and enums. I think a string pattern type is a natural extension that allows for non-finite value restrictions to be expressed.

I'm writing type definitions for an existing codebase. Many arguments and properties accept strings of a specific format:

  • ❌ Formatted representation of a date, eg "2021-04-29T12:34:56"
  • ❌ Comma-separated list of integers, eg "1,2,3,4,5000"
  • ❌ Valid MIME type, eg "image/jpeg"
  • ❌ Valid hex colour code, already mentioned several times
  • ❌ Valid IPv4 or IPv6 address

@fabiospampinato
Copy link

fabiospampinato commented May 4, 2021

I'd like to argue against @RyanCavanaugh's claim in the first post saying that:

a large number of those use cases have been addressed, but possibly some still remain.

As it stands presently TypeScript can't even work with the following type literal:

type Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9;

type Just5Digits = `${Digit}${Digit}${Digit}${Digit}${Digit}`;

Throwing an "Expression produces a union type that is too complex to represent.(2590)" error.

That's the equivalent of the following regex:

/^\d{5}$/

Just 5 digits in a row.

Almost all useful regexes are more complicated than that, and TypeScript already gives up with that, hence I'd argue the opposite of that claim is true: a small number of use cases have been addressed and the progress with template literals has been mostly orthogonal really.

@ghost
Copy link

ghost commented May 29, 2021

What about validation of JSON schema's patternProperties regex in TypeScript interfaces for the parsed object? This is a PERFECT application of the regex-validated string feature.

Possible syntax using a matchof keyword:

import { IJSONSchema, IJSONSchemaMap } from 'vs/base/common/jsonSchema';

export const UnscopedKeyPtn: string = '^[^\\[\\]]*$';

export type UnscopedKey = string & matchof RegExp(UnscopedKeyPtn);

export tokenColorSchema: IJSONSchema = {
    properties: {},
    patternProperties: { [UnscopedKeyPtn]: { type: 'object' } }
};

export interface ITokenColors {
    [colorId: UnscopedKey]: string;
}

@sushruth
Copy link

sushruth commented Jun 1, 2021

I just want to add to the need for this because template literals do not behave the way we think explicitly -

type UnionType = {
    kind: `kind_${string}`,
    one: boolean;
} | {
    kind: `kind_${string}_again`,
    two: string;
}

const union: UnionType = {
//     ~~~~~ > Error here -
/**
Type '{ kind: "type1_123"; }' is not assignable to type 'UnionType'.
  Property 'two' is missing in type '{ kind: "type1_123"; }' but required in type '{ kind: `type1_${string}_again`; two: string; }'.ts(2322)
*/
    kind: 'type1_123',
}

this shows template literals are not unique and one can be a subset of another while that is not the intention of use. Regex would let us have a $ at the end to denote end of string that would help discriminate between the constituent types of this union clearly.

@ghost
Copy link

ghost commented Jun 2, 2021

(CC @Igmat) It occurs to me that there's a leaning towards using regex tests as type literals in #6579, i.e.

type CssColor = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK

It seems that regexes are usually interpreted as values by the TS compiler. When used as a type, this usually throws an error that keeps types and values as distinct as possible. What do you think of:

  • using a *of keyword to cast regex values into a regex-validated type (maybe matchof)
  • having a keyword check for conditional types (maybe matches)
type CssColor = matchof /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK

Editing this to note something - the RegExp.prototype.test method can accept numbers and other non-string primitives. I think that's a neat feature. If people want to strictly validate strings, they can use a intersection type with string. 😄

TL:DR; regex literal types aren't intuitively and visibly types without explicit regex->type casting, can we propose that?

@Etheryte
Copy link

Etheryte commented Jun 2, 2021

I'm not sure what the benefit of a separate keyword is here. There doesn't seem to be a case where it could be ambiguous whether the regex is used as a type or as a value, unless I'm missing something? I think #6579 (comment) and the replies below it already sketch out a syntax that hits the sweet spot of being both succinct and addressing all the use cases.

Regarding the intersection, the input to Regex.prototype.test is always turned into a string first, so that seems superfluous.

@ghost
Copy link

ghost commented Jun 2, 2021

Good to know about RegExp.prototype.test.

The ambiguity seems straightforward to me. As we know, TypeScript is a JS superset & regex values can be used as variables.

To me, a regex literal is just not an intuitive type - it doesn't imply "string that matches this regexp restriction". It's common convention to camelcase regex literals and add a "Regex" suffix, but that variable name convention as a type looks really ugly:

export cssColorRegex: RegExp = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: cssColorRegex = '#000000'; // OK
//           ^ lc 👎 ^ two options:
//                   - A. use Regex for value clarity but type confusion or 
//                   - B. ditch Regex for unclear value name but clear type name

The original proposal does suggests JSON schemas which would use the regex as a type and a value (if implemented).

@Etheryte
Copy link

Etheryte commented Jun 2, 2021

Perhaps I wasn't very clear, there doesn't seem to be a case where it would be ambiguous for the compiler whether a regex is a type or a value. Just as you can use string literals both as values and as types:

const foo = "literal"; // Used as a value
const bar: "literal" = foo; // Used as a type

The exact same approach can be applied for regex types without ambiguity.

@ghost
Copy link

ghost commented Jun 2, 2021

My concern is that the regex means two different things in the two contexts - literal vs "returns true from RegExp.test method". The latter seems like a type system feature exclusively - it wouldn't be intuitive unless there's syntax to cast the regex into a type

@ghost
Copy link

ghost commented Jun 5, 2021

There is also the issue of regex literals and regex types possibly being used as superclasses:

If all regex literals and type variables are cast into validators implicitly without a keyword, how do we use RegExp interfaces and regex literals with optional methods as a object type?

To me, context loss in #41160 (comment) is enough reason to add a keyword, but this is another reason. I'm unsure of the name I suggested but I do prefer the use of an explicit type cast.

@edazpotato
Copy link

edazpotato commented Jul 10, 2021

I would love this! I've had tons of issues that could be easily solved with RegEx types.

For example, a very basic IETF language tag type that accepts strings like "en-GB" or "en-US" but rejects strings that don't match the casing correctly.
Using template literals (doesn't work):
image
How it could be done easily with RegEx types:

export type CountryCode = /^[a-z]{2}-[A-Z]{2}$/;

(I know that technically you can represent this sort of type, but it's just a simple example)

@matthew-dean

This comment was marked as off-topic.

@matthew-dean
Copy link

matthew-dean commented May 2, 2023

Oh, just a thought about this:

what if you tested for /^\d\d\d$/ instead of /^\d+$/?

These are different types, but maybe you meant something like this:

type FirstRegex = string & Pattern<typeof /^\d\d\d$/>
type SecondRegex = string & Pattern<typeof /^\d{3}$/>

const myValue: FirstRegex = '123'

function doSomething(s: SecondRegex) {
  console.log(s)
}
doSomething(myValue) // should not be an error

The key is that it should not matter how the regex is defined. They're the same type because both have the same sub-type constraint. Parsing systems often manage this by compiling regex and static strings to a common set of constraints. In other words, the following 3 are all the same type:

type One = string & ('a' | 'b')
type Two = string & Pattern<typeof /^[ab]$/>
type Three = string & Pattern<typeof /^[a-b]$/>

The fact that it's a regex shouldn't matter. The fact that the regex is written differently shouldn't matter. Regexes can be statically analyzed.

(Note: It’s not clear to me if a ^ or $ character should be needed to indicate a string is a “full match”. Probably? It depends on how one reasons about the problem.)

@pstovik
Copy link

pstovik commented May 9, 2023

Our use case is about consistent definition for product/service version property

  • we restrict the string to "[0-9]+.[0-9]+"
  • we use "typescript-json-schema" to generate schema from TypeScript (as source of truth) and we use ajv for JSON input validations (via that schema)
    Having the "regex feature" would be nice "out of the box" support in typescript restriction while DEV is coding and consistent JSON schema for runtime validation.

@TehShrike
Copy link

Right, but presumably you want nominal types for all kinds of nominality, not just the subset of nominality that can be expressed with a regular expression

I don't believe most (any?) of my use cases overlap with nominal types.

I have a bunch of functions that take/return strings in the format YYYY-MM-DD. I don't care that they are all working off of the same definition of an ISO 8601 date string, I just care that they all match that pattern.

Same goes for decimal numbers stored in strings, or strings that represent country codes. I very rarely run into a case where I want nominal type with TypeScript (though I think it has happened once or twice, I don't think it had to do with strings).

Nominal types might incidentally solve some use cases (I'm probably motivated to only have one definition of country codes around my codebase), but it would be wholly inappropriate for others:

  • I want to have a bunch of generic/independent utilities that take/return ISO 8601 dates as strings
  • I want to be able to type financial-number so that it accepts only strings that match these regular expression, no matter where those strings come from.

Incidentally, being able to use generics in regex return types would be sweet, so that financial_number.toString(4) could return type /^\d+\.\d{4}$/.

@ljharb
Copy link
Contributor

ljharb commented Jun 21, 2023

I'd hope you care about more than the pattern, otherwise 2023-99-00 would be considered a valid date.

@TehShrike
Copy link

I'd hope you care about more than the pattern, otherwise 2023-99-00 would be considered a valid date.

it's true, my current type is actually

type Months = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12'
type Days = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12' | '13' | '14' | '15' | '16' | '17' | '18' | '19' | '21' | '22' | '23' | '24' | '25' | '26' | '27' | '28' | '29' | '30' | '31'

type IsoDate = `${ number }${ number }${ number }${ number }-${ Months }-${ Days }`

😅

which could obviously be improved on with regular expressions. Even with regular expressions it would take some effort to make the type fully reflect valid dates on the gregorian calendar, but I'll take what I can get.

@ljharb
Copy link
Contributor

ljharb commented Aug 9, 2023

Whose favorite day isn’t February 31st, after all

@shaedrich
Copy link

Not sure but maybe, someone has a use case that can already be solved by this workaround:
https://mastodon.online/@dylhunn@towns.gay/109479824045137188

@oliveryasuna
Copy link

oliveryasuna commented Oct 19, 2023

To address the checklist, assuming this is a compile-time check only:

  • This wouldn't be a breaking change in existing TypeScript/JavaScript code

As long as there is an explicit regex notation. For example, we could use /.

type IntegerString = `${/\d+/}`.
  • This wouldn't change the runtime behavior of existing JavaScript code
  • This could be implemented without emitting different JS based on the types of the expressions
  • This isn't a runtime feature (e.g. library functionality, non-ECMAScript syntax with JavaScript output, etc.)

As long as this is a compile-time check only.


This would be an absurdly useful feature. Imagine how smart and type-safe fluent SQL libraries would become.

@saltman424
Copy link

Another thing to add, this isn't just helpful for validation, but also for extracting information. E.g.

type Id<
  TVersion extends Id.Version = Id.Version,
  TPartialId extends Id.PartialId = Id.PartialId,
  TContext extends Id.Context | undefined = Id.Context | undefined
> = TContext extends undefined ? `${TVersion}:${TPartialId}` : `${TVersion}:${TContext}:${TPartialId}`
namespace Id {
  export type Version = /v\d+/
  export namespace Version {
    export type Of<TId extends Id> = TId extends Id<infer TVersion> ? TVersion : never
  }

  export type PartialId = /\w+/
  export namespace PartialId {
    export type Of<TId extends Id> = TId extends Id<any, infer TPartialId> ? TPartialId : never
  }

  export type Context = /\w+/
  export namespace Context {
    export type Of<TId extends Id> = TId extends Id<any, any, infer TContext> ? TContext : never
  }
}

type MyId = Id<'v1', 'myPartialId', 'myContext'> // 'v1:myContext:myPartialId'
type MyPartialId = Id.PartialId.Of<MyId> // 'myPartialId'

This can be done with just string instead of a regular expression, but that leads to ambiguity. In the above example, 'myContext:myPartial' could be interpreted as a single Id.PartialId.

@tsujp
Copy link

tsujp commented Nov 5, 2023

This constructs a literal string type containing only the allowed characters. If you attempt to pass invalid characters you get back never. This is fine for my usecase (albeit a lot more TypeScript than I'd like for something simple), maybe it will help others until this becomes a smoother experience in TypeScript.

type HexDigit =
   | 0
   | 1
   | 2
   | 3
   | 4
   | 5
   | 6
   | 7
   | 8
   | 9
   | 'a'
   | 'b'
   | 'c'
   | 'd'
   | 'e'
   | 'f'

// Construct a string type with all characters not in union `HexDigit` removed.
export type OnlyHexDigits<Str, Acc extends string = ''> =
   Str extends `${infer D extends HexDigit}${infer Rest}`
      ? OnlyHexDigits<Rest, `${Acc}${D}`>
      : Acc

// Return given type `Hex` IFF it was unchanged (and thus valid) by `OnlyHexDigits`.
export type HexIntLiteral<
   Hex,
   FilteredHex = OnlyHexDigits<Hex>
> =
   Hex extends FilteredHex
      ? Hex
      : never

// Effectively an alias of `HexIntLiteral<'123'>`.
function hexInt<Hex extends string> (n: Hex & HexIntLiteral<Hex>) {
   return n as HexIntLiteral<Hex>
}

// Without the 'alias' form.
declare const t1: HexIntLiteral<'123'> // '123'
declare const t2: HexIntLiteral<'cafebabe'> // 'cafebabe'

// Using the 'alias' form.
const t3 = hexInt('zzzz') // never
const t4 = hexInt('a_b_c_d') // never
const t5 = hexInt('9287319283712ababababdefffababa12312') // <-- that

// Remember, the type is a string literal so `let` is still (as far as TypeScript
//   is concerned) immutable (not _really_).
let t6 = hexInt('cafe123')

t6 = '123' // We (humans) know '123' is valid, but `t6` is a string literal `cafe123`
           //   so this is an error (2232): type '123' not assignable to type 'cafe123'
           //   because we construct a _string literal_ type.

This can likely be simplified but I waste a lot of time code golfing TypeScript types so I abstain this time.

@mauriziocescon
Copy link

mauriziocescon commented Apr 26, 2024

My case:

const obj = {
  _test1: '1', 
  test2: '2',
  _test3: '3',
  test4: '4',
};

function removeKeysStartingWith_(obj: Record<string, unknown>): Record<string, unknown> {
  const x: Record<string, unknown> = {};

  Object.keys(obj)
    .filter(key => !/^_/i.test(key))
    .forEach(key => x[key] = obj[key]);

    return x;
}

// {"test2":"2", "test4":"4"} 

I cannot express the fact that the return object of a function cannot have keys starting with "_". I cannot define the precise keyof set without a RegExp (to be used in combination with conditional types).

@RyanCavanaugh
Copy link
Member Author

@mauriziocescon template literal strings work fine for this; you don't need regexes

const obj1 = {
  _test1: '1', 
  test2: '2',
  _test3: '3'
};
type RemoveUnderscore<K> = K extends `_${string}` ? never : K;
type NoUnderscores<T> = {
    [K in keyof T as RemoveUnderscore<K>]: T[K];
}
declare function removeKeysStartingWith_<T extends object>(obj: T): NoUnderscores<T>; 
const p1 = removeKeysStartingWith_(obj1);
p1.test2; // ok
p1._test1; // not ok

@mauriziocescon
Copy link

Thanks a lot for the instantaneous feedback! I missed that part... 😅

@Peeja
Copy link
Contributor

Peeja commented Apr 26, 2024

@mauriziocescon Be careful, though: that type means that you definitely do not know whether any keys beginning with _, not that you know that they don't. Without exact types, TypeScript can't express the latter. But the former is usually good enough.

@saltman424
Copy link

saltman424 commented Apr 26, 2024

@RyanCavanaugh

Use case

I would like to use this type:

type Word = /^w+$/

I use this as a building block for many template strings. E.g.:

// I mainly don't want `TPartialId` to contain ':',
// as that would interfere with my ability to parse this string
type Id<
  TType extends Type,
  TPartialId extends Word
> = `${Type}:${TPartialId}`

Answers to some of your questions

I use this in a mix of static and dynamic use cases. E.g.

const validId: Id = 'sometype:valid'
// this should not be allowed
const invalidId: Id = 'sometype:invalid:'

declare function createId<TType extends Type, TPartialId extends Word>(
  type: TType,
  partialId: TPartialId
):  Id<TType, TPartialId>
declare function getPartialId<TId extends Id>(
  id: TId
): TId extends Id<any, infer TPartialId> ? TPartialId : Word

declare function generateWord(): Word

I absolutely want to use regular expression types in template literals (as seen in above examples). However, while it would be nice to have, I don't need to be able to use anything within my regular expression types. (e.g. I don't really need type X = /${Y}+/; type Y = 'abc')

I would appreciate the ability to do something like this:

const WORD_REGEXP = /^\w+$/
export type Word = Regex<typeof WORD_REGEXP>
export function isWord(val: unknown): val is Word {
  return typeof val === 'string' && WORD_REGEXP.test(val)
}

However, if I had to write the same regular expression twice, it would still be better than the current state.

I don't think the above part approaches nominal typing. At a high level, regular expression is basically a structural type for a string. You can determine if a string matches the regular expression solely based on the string's contents, ignoring any metadata about the string. With that being said, I do acknowledge that it is harder to determine if a type for a string matches a regular expression, which is where things get kind of nominal. Specifically, to your point:

There's also a problem of the implicit subtyping behavior you'd want here -- what if you tested for /^\d\d\d$/ instead of /^\d+$/? Programmers are very particular about what they think the "right" way to write a regex are, so the feature implies either implementing regex subtyping so that the subset behavior can be validated, or enduring endless flamewars in places like DT as people argue about which regex is the correct one for a given problem.

If you are within one project, you should create one type with whatever the "right" regex for that project is and reference that everywhere. If you are working with a library, you should use the type from that library. Either way, you shouldn't have to recreate a regular expression type in the way that you think is "right." And if you want to add additional restrictions, just use intersection. Although, I do recognize that without subtyping, things do get pretty nominal when determining if types match a regular expression. However, we currently deal with that type of problem with deferred evaluation of type parameters in functions/classes. So semi-nominal types in certain contexts doesn't seem to be a deal-breaker. Although, I do acknowledge deferred type parameters are never fun to deal with

Most functions with implicit data formats aren't also publishing a canonical regex for their data format.

To be fair, the canonical regex doesn't generally matter externally at the moment. If it did matter externally, e.g. it was used in a type, they would be more likely to publish it

Alternative: template string enhancements

I do agree that enhancements to template strings could work. In my use case, these would be sufficient:

  1. Some way to repeat 0+ or 1+ times (maybe circular references - see below)
  2. Preferably, built in utility types for \w, \d, \s, and other similar RegExp features. (e.g. type Digit = '0' | '1' | '2' | ...)

With these, I could do something like:

type WordCharacter = 'a' | 'b' | ... (preferably this is built into TypeScript)
type Word = `${WordCharacter}${Word | ''}` // === /^\w+$/
type WordOrEmpty = Word | '' // === /^\w*$/

However, these would not work if I wanted to do this through negation, which I had thought about. E.g.:

type PartialId = /^[^:]+$/

If you like these enhancements, I can put them in proposals in one or more separate issues

@samueldcorbin
Copy link

samueldcorbin commented Jun 30, 2024

To add a very straightforward use case to this: custom element names.

Custom element names must begin with a lowercase letter, must have a dash, and are frequently defined as string literals, not dynamically. This seems like something that TypeScript should absolutely be able to handle, it's easy for people to carelessly forget that the elements have to have a dash or must be lowercased, and it's annoying to only get it at runtime.

Sometimes people define custom element names dynamically, but they define them as literals often too. It would be nice if we could at least check the literals, even if we can't check the dynamic ones.

On the whole, the discussion of this proposal is extremely frustrating to read. The evaluation begins with "Checking string literals is easy and straightforward". Great. So why is adding an easy and straightforward thing being held up for literal years by discussion about maybe adding much less easy and much less straightforward things?

I understand the general sentiment that you want to be careful about making a simple syntax for the easy case that accidentally blocks future extension of functionality when you get to the hard cases, but that doesn't look like an issue here. Maybe capture groups would be useful, maybe dynamic strings would be useful. But adding support for string literals and regex without capture groups is easy and doesn't block adding support for dynamic strings and capture groups later.

@Oblarg
Copy link

Oblarg commented Aug 18, 2024

Another use-case: dynamically-typed reducers for event-based programming:

image

With current template literals, it's a bit cumbersome to do this even for a simple prefix search, and generally unreliable/impossible to do anything much more complicated than that. It turns out this is not so difficult to do for arbitrary-depth substitution of a single wildcard character (see above), thanks to recursive types - but regex-validated string types would make this way more powerful, especially when topic lists are old and not ideally systematic.

(an aside: the fact that typescript can infer the reduction of the declared union for the tooltip here is pretty darn impressive, though it falls back to the flat payload union if you change some of the intermediate types to use captures rather than explicit generic parameters)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript
Projects
None yet
Development

No branches or pull requests