-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex-validated string types (feedback reset) #41160
Comments
Use case 1, URL path building libraries, /*snip*/
createTestCard : f.route()
.append("/platform")
.appendParam(s.platform.platformId, /\d+/)
.append("/stripe")
.append("/test-card")
/*snip*/ These are the constraints for
Use case 2,
Use case 3, safer new(pattern: string, flags?: PatternOf</^[gimsuy]*$/>): RegExp
|
Template string type can only be used in conditional type, so it's really a "type validator", not a "type" itself. It also focuses more on manipulating strings, I think it's a different design goal from Regex-validated types. It's doable to use conditional types to constrain parameters, for example taken from #6579 (comment) declare function takesOnlyHex<StrT extends string> (
hexString : Accepts<HexStringLen6, StrT> extends true ? StrT : {__err : `${StrT} is not a hex-string of length 6`}
) : void; However I think this parttern has several issues:
|
Would this allow me to define type constraints for String to match the XML specification's Name constructs (short summary) and QNames by expressing them as regular expressions? If so, I am all for it :-) |
@AnyhowStep It isn't the cleanest, but with conditional types now allowing recursion, it seems we can accomplish these cases with template literal types: playground link |
We can have compile-time regular expressions now. (Well, non-feature when I'm trying to use TypeScript for work. All personal projects have |
We have a strongly-typed filesystem library, where the user is expected to manipulate "clean types" like export interface PathUtils {
cwd(): PortablePath;
normalize(p: PortablePath): PortablePath;
join(...paths: Array<PortablePath | Filename>): PortablePath;
resolve(...pathSegments: Array<PortablePath | Filename>): PortablePath;
isAbsolute(path: PortablePath): boolean;
relative(from: PortablePath, to: PortablePath): P;
dirname(p: PortablePath): PortablePath;
basename(p: PortablePath, ext?: string): Filename;
extname(p: PortablePath): string;
readonly sep: PortablePath;
readonly delimiter: string;
parse(pathString: PortablePath): ParsedPath<PortablePath>;
format(pathObject: FormatInputPathObject<PortablePath>): PortablePath;
contains(from: PortablePath, to: PortablePath): PortablePath | null;
} I'm investigating template literals to remove the
The overhead sounds overwhelming, and makes it likely that there are side effects that would cause problems down the road - causing further pain if we need to revert. Ideally, the solution we're looking for would leave the code above intact, we'd just declare |
I have a strong use case for Regex-validated string types. AWS Lambda function names have a maximum length of 64 characters. This can be manually checked in a character counter but it's unnecessarily cumbersome given that the function name is usually composed with identifying substrings. As an example, this function name can be partially composed with the new work done in 4.1/4.2. However there is no way to easily create a compiler error in TypeScript since the below function name will be longer than 64 characters. type LambdaServicePrefix = 'my-application-service';
type LambdaFunctionIdentifier = 'dark-matter-upgrader-super-duper-test-function';
type LambdaFunctionName = `${LambdaServicePrefix}-${LambdaFunctionIdentifier}`;
const lambdaFunctionName: LambdaFunctionName = 'my-application-service-dark-matter-upgrader-super-duper-test-function'; This StackOverflow Post I created was asking this very same question. With the continued rise of TypeScript in back-end related code, statically defined data would be a likely strong use case for validating the string length or the format of the string. |
TypeScript supports literal types, template literal types, and enums. I think a string pattern type is a natural extension that allows for non-finite value restrictions to be expressed. I'm writing type definitions for an existing codebase. Many arguments and properties accept strings of a specific format:
|
I'd like to argue against @RyanCavanaugh's claim in the first post saying that:
As it stands presently TypeScript can't even work with the following type literal: type Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9;
type Just5Digits = `${Digit}${Digit}${Digit}${Digit}${Digit}`; Throwing an "Expression produces a union type that is too complex to represent.(2590)" error. That's the equivalent of the following regex: /^\d{5}$/ Just 5 digits in a row. Almost all useful regexes are more complicated than that, and TypeScript already gives up with that, hence I'd argue the opposite of that claim is true: a small number of use cases have been addressed and the progress with template literals has been mostly orthogonal really. |
What about validation of JSON schema's Possible syntax using a import { IJSONSchema, IJSONSchemaMap } from 'vs/base/common/jsonSchema';
export const UnscopedKeyPtn: string = '^[^\\[\\]]*$';
export type UnscopedKey = string & matchof RegExp(UnscopedKeyPtn);
export tokenColorSchema: IJSONSchema = {
properties: {},
patternProperties: { [UnscopedKeyPtn]: { type: 'object' } }
};
export interface ITokenColors {
[colorId: UnscopedKey]: string;
} |
I just want to add to the need for this because template literals do not behave the way we think explicitly - type UnionType = {
kind: `kind_${string}`,
one: boolean;
} | {
kind: `kind_${string}_again`,
two: string;
}
const union: UnionType = {
// ~~~~~ > Error here -
/**
Type '{ kind: "type1_123"; }' is not assignable to type 'UnionType'.
Property 'two' is missing in type '{ kind: "type1_123"; }' but required in type '{ kind: `type1_${string}_again`; two: string; }'.ts(2322)
*/
kind: 'type1_123',
} this shows template literals are not unique and one can be a subset of another while that is not the intention of use. Regex would let us have a |
(CC @Igmat) It occurs to me that there's a leaning towards using regex tests as type literals in #6579, i.e. type CssColor = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK It seems that regexes are usually interpreted as values by the TS compiler. When used as a type, this usually throws an error that keeps types and values as distinct as possible. What do you think of:
type CssColor = matchof /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK
TL:DR; regex literal types aren't intuitively and visibly types without explicit regex->type casting, can we propose that? |
I'm not sure what the benefit of a separate keyword is here. There doesn't seem to be a case where it could be ambiguous whether the regex is used as a type or as a value, unless I'm missing something? I think #6579 (comment) and the replies below it already sketch out a syntax that hits the sweet spot of being both succinct and addressing all the use cases. Regarding the intersection, the input to |
Good to know about The ambiguity seems straightforward to me. As we know, TypeScript is a JS superset & regex values can be used as variables. To me, a regex literal is just not an intuitive type - it doesn't imply "string that matches this regexp restriction". It's common convention to camelcase regex literals and add a "Regex" suffix, but that variable name convention as a type looks really ugly: export cssColorRegex: RegExp = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: cssColorRegex = '#000000'; // OK
// ^ lc 👎 ^ two options:
// - A. use Regex for value clarity but type confusion or
// - B. ditch Regex for unclear value name but clear type name The original proposal does suggests JSON schemas which would use the regex as a type and a value (if implemented). |
Perhaps I wasn't very clear, there doesn't seem to be a case where it would be ambiguous for the compiler whether a regex is a type or a value. Just as you can use string literals both as values and as types: const foo = "literal"; // Used as a value
const bar: "literal" = foo; // Used as a type The exact same approach can be applied for regex types without ambiguity. |
My concern is that the regex means two different things in the two contexts - literal vs "returns true from RegExp.test method". The latter seems like a type system feature exclusively - it wouldn't be intuitive unless there's syntax to cast the regex into a type |
There is also the issue of regex literals and regex types possibly being used as superclasses:
If all regex literals and type variables are cast into validators implicitly without a keyword, how do we use To me, context loss in #41160 (comment) is enough reason to add a keyword, but this is another reason. I'm unsure of the name I suggested but I do prefer the use of an explicit type cast. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Oh, just a thought about this:
These are different types, but maybe you meant something like this: type FirstRegex = string & Pattern<typeof /^\d\d\d$/>
type SecondRegex = string & Pattern<typeof /^\d{3}$/>
const myValue: FirstRegex = '123'
function doSomething(s: SecondRegex) {
console.log(s)
}
doSomething(myValue) // should not be an error The key is that it should not matter how the regex is defined. They're the same type because both have the same sub-type constraint. Parsing systems often manage this by compiling regex and static strings to a common set of constraints. In other words, the following 3 are all the same type: type One = string & ('a' | 'b')
type Two = string & Pattern<typeof /^[ab]$/>
type Three = string & Pattern<typeof /^[a-b]$/> The fact that it's a regex shouldn't matter. The fact that the regex is written differently shouldn't matter. Regexes can be statically analyzed. (Note: It’s not clear to me if a |
Our use case is about consistent definition for product/service version property
|
I don't believe most (any?) of my use cases overlap with nominal types. I have a bunch of functions that take/return strings in the format YYYY-MM-DD. I don't care that they are all working off of the same definition of an ISO 8601 date string, I just care that they all match that pattern. Same goes for decimal numbers stored in strings, or strings that represent country codes. I very rarely run into a case where I want nominal type with TypeScript (though I think it has happened once or twice, I don't think it had to do with strings). Nominal types might incidentally solve some use cases (I'm probably motivated to only have one definition of country codes around my codebase), but it would be wholly inappropriate for others:
Incidentally, being able to use generics in regex return types would be sweet, so that |
I'd hope you care about more than the pattern, otherwise |
it's true, my current type is actually type Months = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12'
type Days = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12' | '13' | '14' | '15' | '16' | '17' | '18' | '19' | '21' | '22' | '23' | '24' | '25' | '26' | '27' | '28' | '29' | '30' | '31'
type IsoDate = `${ number }${ number }${ number }${ number }-${ Months }-${ Days }` 😅 which could obviously be improved on with regular expressions. Even with regular expressions it would take some effort to make the type fully reflect valid dates on the gregorian calendar, but I'll take what I can get. |
Whose favorite day isn’t February 31st, after all |
Not sure but maybe, someone has a use case that can already be solved by this workaround: |
To address the checklist, assuming this is a compile-time check only:
As long as there is an explicit regex notation. For example, we could use type IntegerString = `${/\d+/}`.
As long as this is a compile-time check only. This would be an absurdly useful feature. Imagine how smart and type-safe fluent SQL libraries would become. |
Another thing to add, this isn't just helpful for validation, but also for extracting information. E.g. type Id<
TVersion extends Id.Version = Id.Version,
TPartialId extends Id.PartialId = Id.PartialId,
TContext extends Id.Context | undefined = Id.Context | undefined
> = TContext extends undefined ? `${TVersion}:${TPartialId}` : `${TVersion}:${TContext}:${TPartialId}`
namespace Id {
export type Version = /v\d+/
export namespace Version {
export type Of<TId extends Id> = TId extends Id<infer TVersion> ? TVersion : never
}
export type PartialId = /\w+/
export namespace PartialId {
export type Of<TId extends Id> = TId extends Id<any, infer TPartialId> ? TPartialId : never
}
export type Context = /\w+/
export namespace Context {
export type Of<TId extends Id> = TId extends Id<any, any, infer TContext> ? TContext : never
}
}
type MyId = Id<'v1', 'myPartialId', 'myContext'> // 'v1:myContext:myPartialId'
type MyPartialId = Id.PartialId.Of<MyId> // 'myPartialId' This can be done with just |
This constructs a literal string type containing only the allowed characters. If you attempt to pass invalid characters you get back type HexDigit =
| 0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
| 'a'
| 'b'
| 'c'
| 'd'
| 'e'
| 'f'
// Construct a string type with all characters not in union `HexDigit` removed.
export type OnlyHexDigits<Str, Acc extends string = ''> =
Str extends `${infer D extends HexDigit}${infer Rest}`
? OnlyHexDigits<Rest, `${Acc}${D}`>
: Acc
// Return given type `Hex` IFF it was unchanged (and thus valid) by `OnlyHexDigits`.
export type HexIntLiteral<
Hex,
FilteredHex = OnlyHexDigits<Hex>
> =
Hex extends FilteredHex
? Hex
: never
// Effectively an alias of `HexIntLiteral<'123'>`.
function hexInt<Hex extends string> (n: Hex & HexIntLiteral<Hex>) {
return n as HexIntLiteral<Hex>
}
// Without the 'alias' form.
declare const t1: HexIntLiteral<'123'> // '123'
declare const t2: HexIntLiteral<'cafebabe'> // 'cafebabe'
// Using the 'alias' form.
const t3 = hexInt('zzzz') // never
const t4 = hexInt('a_b_c_d') // never
const t5 = hexInt('9287319283712ababababdefffababa12312') // <-- that
// Remember, the type is a string literal so `let` is still (as far as TypeScript
// is concerned) immutable (not _really_).
let t6 = hexInt('cafe123')
t6 = '123' // We (humans) know '123' is valid, but `t6` is a string literal `cafe123`
// so this is an error (2232): type '123' not assignable to type 'cafe123'
// because we construct a _string literal_ type. This can likely be simplified but I waste a lot of time code golfing TypeScript types so I abstain this time. |
My case: const obj = {
_test1: '1',
test2: '2',
_test3: '3',
test4: '4',
};
function removeKeysStartingWith_(obj: Record<string, unknown>): Record<string, unknown> {
const x: Record<string, unknown> = {};
Object.keys(obj)
.filter(key => !/^_/i.test(key))
.forEach(key => x[key] = obj[key]);
return x;
}
// {"test2":"2", "test4":"4"} I cannot express the fact that the return object of a function cannot have keys starting with "_". I cannot define the precise keyof set without a RegExp (to be used in combination with conditional types). |
@mauriziocescon template literal strings work fine for this; you don't need regexes const obj1 = {
_test1: '1',
test2: '2',
_test3: '3'
};
type RemoveUnderscore<K> = K extends `_${string}` ? never : K;
type NoUnderscores<T> = {
[K in keyof T as RemoveUnderscore<K>]: T[K];
}
declare function removeKeysStartingWith_<T extends object>(obj: T): NoUnderscores<T>;
const p1 = removeKeysStartingWith_(obj1);
p1.test2; // ok
p1._test1; // not ok |
Thanks a lot for the instantaneous feedback! I missed that part... 😅 |
@mauriziocescon Be careful, though: that type means that you definitely do not know whether any keys beginning with |
Use caseI would like to use this type: type Word = /^w+$/ I use this as a building block for many template strings. E.g.: // I mainly don't want `TPartialId` to contain ':',
// as that would interfere with my ability to parse this string
type Id<
TType extends Type,
TPartialId extends Word
> = `${Type}:${TPartialId}` Answers to some of your questionsI use this in a mix of static and dynamic use cases. E.g. const validId: Id = 'sometype:valid'
// this should not be allowed
const invalidId: Id = 'sometype:invalid:'
declare function createId<TType extends Type, TPartialId extends Word>(
type: TType,
partialId: TPartialId
): Id<TType, TPartialId>
declare function getPartialId<TId extends Id>(
id: TId
): TId extends Id<any, infer TPartialId> ? TPartialId : Word
declare function generateWord(): Word I absolutely want to use regular expression types in template literals (as seen in above examples). However, while it would be nice to have, I don't need to be able to use anything within my regular expression types. (e.g. I don't really need I would appreciate the ability to do something like this: const WORD_REGEXP = /^\w+$/
export type Word = Regex<typeof WORD_REGEXP>
export function isWord(val: unknown): val is Word {
return typeof val === 'string' && WORD_REGEXP.test(val)
} However, if I had to write the same regular expression twice, it would still be better than the current state. I don't think the above part approaches nominal typing. At a high level, regular expression is basically a structural type for a string. You can determine if a string matches the regular expression solely based on the string's contents, ignoring any metadata about the string. With that being said, I do acknowledge that it is harder to determine if a type for a string matches a regular expression, which is where things get kind of nominal. Specifically, to your point:
If you are within one project, you should create one type with whatever the "right" regex for that project is and reference that everywhere. If you are working with a library, you should use the type from that library. Either way, you shouldn't have to recreate a regular expression type in the way that you think is "right." And if you want to add additional restrictions, just use intersection. Although, I do recognize that without subtyping, things do get pretty nominal when determining if types match a regular expression. However, we currently deal with that type of problem with deferred evaluation of type parameters in functions/classes. So semi-nominal types in certain contexts doesn't seem to be a deal-breaker. Although, I do acknowledge deferred type parameters are never fun to deal with
To be fair, the canonical regex doesn't generally matter externally at the moment. If it did matter externally, e.g. it was used in a type, they would be more likely to publish it Alternative: template string enhancementsI do agree that enhancements to template strings could work. In my use case, these would be sufficient:
With these, I could do something like: type WordCharacter = 'a' | 'b' | ... (preferably this is built into TypeScript)
type Word = `${WordCharacter}${Word | ''}` // === /^\w+$/
type WordOrEmpty = Word | '' // === /^\w*$/ However, these would not work if I wanted to do this through negation, which I had thought about. E.g.: type PartialId = /^[^:]+$/ If you like these enhancements, I can put them in proposals in one or more separate issues |
To add a very straightforward use case to this: custom element names. Custom element names must begin with a lowercase letter, must have a dash, and are frequently defined as string literals, not dynamically. This seems like something that TypeScript should absolutely be able to handle, it's easy for people to carelessly forget that the elements have to have a dash or must be lowercased, and it's annoying to only get it at runtime. Sometimes people define custom element names dynamically, but they define them as literals often too. It would be nice if we could at least check the literals, even if we can't check the dynamic ones. On the whole, the discussion of this proposal is extremely frustrating to read. The evaluation begins with "Checking string literals is easy and straightforward". Great. So why is adding an easy and straightforward thing being held up for literal years by discussion about maybe adding much less easy and much less straightforward things? I understand the general sentiment that you want to be careful about making a simple syntax for the easy case that accidentally blocks future extension of functionality when you get to the hard cases, but that doesn't look like an issue here. Maybe capture groups would be useful, maybe dynamic strings would be useful. But adding support for string literals and regex without capture groups is easy and doesn't block adding support for dynamic strings and capture groups later. |
Another use-case: dynamically-typed reducers for event-based programming:
(an aside: the fact that typescript can infer the reduction of the declared union for the tooltip here is pretty darn impressive, though it falls back to the flat payload union if you change some of the intermediate types to use captures rather than explicit generic parameters) |
This is a pickup of #6579. With the addition of #40336, a large number of those use cases have been addressed, but possibly some still remain.
Update 2023-04-11: Reviewed use cases and posted a write-up of our current evaluation
Search Terms
regex string types
Suggestion
Open question: For people who had upvoted #6579, what use cases still need addressing?
Note: Please keep discussion on-topic; moderation will be a bit heavier to avoid off-topic tangents
Examples
(please help)
Checklist
My suggestion meets these guidelines:
The text was updated successfully, but these errors were encountered: