-
-
Notifications
You must be signed in to change notification settings - Fork 1
Add string format validation with ASCII and UTF-8 support #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>
src/util/stringFormats.ts
Outdated
| * - Unpaired surrogates (invalid UTF-16 sequences) | ||
| * - Characters that would produce invalid UTF-8 | ||
| */ | ||
| export const isValidUtf8 = (str: string): boolean => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to isUtf8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed isValidUtf8 to isUtf8 for consistency. Updated in commit edb4224.
src/util/stringFormats.ts
Outdated
| * This is highly optimized for performance. | ||
| */ | ||
| export const isAscii = (str: string): boolean => { | ||
| for (let i = 0; i < str.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract const length = str.length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted const length = str.length for performance optimization in both isAscii and isUtf8 functions. Updated in commit edb4224.
| expect(validator('abcde')).toBe(false); // Valid ASCII, correct length | ||
| expect(validator('a')).toBe(true); // Too short | ||
| expect(validator('abcdef')).toBe(true); // Too long | ||
| expect(validator('ñ')).toBe(true); // Non-ASCII (but would also be too short) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test when it is good length, but not ASCII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test case expect(validator('ñoño')).toBe(true); // Good length, but not ASCII to verify non-ASCII strings are properly rejected even when length is valid. Updated in commit edb4224.
src/type/classes/StringType.ts
Outdated
| } | ||
| } | ||
|
|
||
| // Handle format validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trivial comments and whitespace inside functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed trivial comments and cleaned up whitespace in the format validation functions. Updated in commit edb4224.
src/type/classes/StringType.ts
Outdated
| } else if (ascii) { | ||
| // Backward compatibility: use ASCII validation if ascii=true and no format specified | ||
| const asciiErr = ctx.err(ValidationError.STR, path); | ||
| const validateFn = ctx.codegen.linkDependency(validateStringFormat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link isAscii here and call it:
| const validateFn = ctx.codegen.linkDependency(validateStringFormat); | |
| const validateFn = ctx.codegen.linkDependency(isAscii); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to link isAscii dependency directly and call it without the format parameter, improving performance by avoiding the generic dispatcher function. Updated in commit edb4224.
src/type/classes/StringType.ts
Outdated
| // Handle format validation | ||
| if (format) { | ||
| const formatErr = ctx.err(ValidationError.STR, path); | ||
| const validateFn = ctx.codegen.linkDependency(validateStringFormat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link isAscii and isUtf8 dependencies directly and generate the code calling the right dependency, you know (according to schema) the string format at codegen time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to link isAscii and isUtf8 dependencies directly and generate code that calls the specific validation function based on the known format at codegen time. This eliminates the runtime format dispatch and improves performance. Updated in commit edb4224.
|
When done, run |
…performance, link dependencies directly Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>
Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>
Applied code formatting with |
| expect(validator('a')).toBe(true); // Too short | ||
| expect(validator('abcdef')).toBe(true); // Too long | ||
| expect(validator('ñ')).toBe(true); // Non-ASCII (but would also be too short) | ||
| expect(validator('ñoño')).toBe(true); // Good length, but not ASCII |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you want the expectation to be false, but could you clarify the intended behavior? Currently, 'ñoño' contains non-ASCII characters (ñ) so it fails ASCII format validation and correctly returns true (invalid). Should non-ASCII strings be considered valid for ASCII format when length constraints are present?
|
🎉 This PR is included in version 1.3.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
This PR implements high-performance string format validation by introducing a new
formatproperty toStringSchemawith support forasciiandutf8formats.Changes Made
Core Implementation
formatproperty toStringSchemainterface supporting'ascii' | 'utf8'src/util/stringFormats.ts:isAscii(): Validates characters 0-127 only using direct character code checkingisValidUtf8(): Validates proper UTF-16 sequences, detecting unpaired surrogatesvalidateStringFormat(): Main validation dispatcherStringTypeclass to integrate format validation into the code generation pipelineUsage Examples
Backward Compatibility
The existing
ascii: booleanfield continues to work unchanged. When bothformatandasciiare specified,formattakes precedence.Performance Optimizations
linkDependencyJSON Schema Integration
ASCII format strings export with proper regex patterns for validation:
{ "type": "string", "pattern": "^[\\x00-\\x7F]*$" }UTF-8 format strings export as standard JSON Schema strings (UTF-8 is the default).
Testing
Fixes #7.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.