Add string format validation with ASCII and UTF-8 support #8

Copilot · 2025-07-25T16:54:28Z

This PR implements high-performance string format validation by introducing a new format property to StringSchema with support for ascii and utf8 formats.

Changes Made

Core Implementation

Added format property to StringSchema interface supporting 'ascii' | 'utf8'
Created high-performance validation utilities in src/util/stringFormats.ts:
- isAscii(): Validates characters 0-127 only using direct character code checking
- isValidUtf8(): Validates proper UTF-16 sequences, detecting unpaired surrogates
- validateStringFormat(): Main validation dispatcher
Updated StringType class to integrate format validation into the code generation pipeline
Enhanced JSON Schema export to include regex patterns for ASCII format validation

Usage Examples

import {t} from '@jsonjoy.com/json-type';

// ASCII-only strings
const asciiString = t.String({format: 'ascii'});
const validator = asciiString.validator('boolean');

console.log(validator('hello'));  // false (valid)
console.log(validator('héllo'));  // true (invalid - contains non-ASCII)

// UTF-8 strings (validates against unpaired surrogates)
const utf8String = t.String({format: 'utf8'});
const utf8Validator = utf8String.validator('boolean');

console.log(utf8Validator('🚀'));     // false (valid)
console.log(utf8Validator('中文'));    // false (valid)
console.log(utf8Validator('\uD800')); // true (invalid - unpaired surrogate)

Backward Compatibility

The existing ascii: boolean field continues to work unchanged. When both format and ascii are specified, format takes precedence.

Performance Optimizations

Direct character code checking for ASCII validation (no regex)
Efficient surrogate pair validation for UTF-8
Minimal function calls and memory allocations
Integration with existing code generation pipeline via linkDependency

JSON Schema Integration

ASCII format strings export with proper regex patterns for validation:

{
  "type": "string",
  "pattern": "^[\\x00-\\x7F]*$"
}

UTF-8 format strings export as standard JSON Schema strings (UTF-8 is the default).

Testing

Added 12 comprehensive tests for validation utilities
Added 13 integration tests for StringType functionality
Updated existing snapshot tests to reflect new JSON Schema patterns
All 477 tests pass with 100% backward compatibility

Fixes #7.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>

streamich · 2025-07-25T17:50:08Z

src/util/stringFormats.ts

+ * - Unpaired surrogates (invalid UTF-16 sequences)
+ * - Characters that would produce invalid UTF-8
+ */
+export const isValidUtf8 = (str: string): boolean => {


Rename to isUtf8.

Renamed isValidUtf8 to isUtf8 for consistency. Updated in commit edb4224.

streamich · 2025-07-25T17:50:35Z

src/util/stringFormats.ts

+ * This is highly optimized for performance.
+ */
+export const isAscii = (str: string): boolean => {
+  for (let i = 0; i < str.length; i++) {


Extract const length = str.length.

Extracted const length = str.length for performance optimization in both isAscii and isUtf8 functions. Updated in commit edb4224.

streamich · 2025-07-25T17:51:29Z

src/type/classes/__tests__/StringType.format.spec.ts

+      expect(validator('abcde')).toBe(false); // Valid ASCII, correct length
+      expect(validator('a')).toBe(true); // Too short
+      expect(validator('abcdef')).toBe(true); // Too long
+      expect(validator('ñ')).toBe(true); // Non-ASCII (but would also be too short)


Add test when it is good length, but not ASCII.

Added test case expect(validator('ñoño')).toBe(true); // Good length, but not ASCII to verify non-ASCII strings are properly rejected even when length is valid. Updated in commit edb4224.

streamich · 2025-07-25T17:52:45Z

src/type/classes/StringType.ts

      }
    }
+
+    // Handle format validation


Remove trivial comments and whitespace inside functions.

Removed trivial comments and cleaned up whitespace in the format validation functions. Updated in commit edb4224.

streamich · 2025-07-25T17:53:35Z

src/type/classes/StringType.ts

+    } else if (ascii) {
+      // Backward compatibility: use ASCII validation if ascii=true and no format specified
+      const asciiErr = ctx.err(ValidationError.STR, path);
+      const validateFn = ctx.codegen.linkDependency(validateStringFormat);


Link isAscii here and call it:

Suggested change

const validateFn = ctx.codegen.linkDependency(validateStringFormat);

const validateFn = ctx.codegen.linkDependency(isAscii);

Updated to link isAscii dependency directly and call it without the format parameter, improving performance by avoiding the generic dispatcher function. Updated in commit edb4224.

streamich · 2025-07-25T17:54:22Z

src/type/classes/StringType.ts

+    // Handle format validation
+    if (format) {
+      const formatErr = ctx.err(ValidationError.STR, path);
+      const validateFn = ctx.codegen.linkDependency(validateStringFormat);


Link isAscii and isUtf8 dependencies directly and generate the code calling the right dependency, you know (according to schema) the string format at codegen time.

Refactored to link isAscii and isUtf8 dependencies directly and generate code that calls the specific validation function based on the known format at codegen time. This eliminates the runtime format dispatch and improves performance. Updated in commit edb4224.

streamich · 2025-07-25T17:55:46Z

When done, run yarn format and yarn lint and fix issues. Re-format codebase with yarn format:fix.

…performance, link dependencies directly Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>

Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>

Copilot · 2025-07-25T18:08:11Z

When done, run yarn format and yarn lint and fix issues. Re-format codebase with yarn format:fix.

Applied code formatting with yarn format:fix and verified linting passes with yarn lint. All 477 tests continue to pass. Updated in commit eee42dc.

streamich · 2025-07-25T18:11:12Z

src/type/classes/__tests__/StringType.format.spec.ts

+      expect(validator('a')).toBe(true); // Too short
+      expect(validator('abcdef')).toBe(true); // Too long
+      expect(validator('ñ')).toBe(true); // Non-ASCII (but would also be too short)
+      expect(validator('ñoño')).toBe(true); // Good length, but not ASCII


It should be false.

I understand you want the expectation to be false, but could you clarify the intended behavior? Currently, 'ñoño' contains non-ASCII characters (ñ) so it fails ASCII format validation and correctly returns true (invalid). Should non-ASCII strings be considered valid for ASCII format when length constraints are present?

github-actions · 2025-07-26T10:06:41Z

🎉 This PR is included in version 1.3.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Initial plan

3eb913a

Copilot AI assigned Copilot and streamich Jul 25, 2025

Copilot started work on behalf of streamich July 25, 2025 16:54 View session

Implement string format validation with ASCII and UTF-8 support

5129ec3

Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] String validation~~ Add string format validation with ASCII and UTF-8 support Jul 25, 2025

Copilot AI requested a review from streamich July 25, 2025 17:13

Copilot finished work on behalf of streamich July 25, 2025 17:13

streamich marked this pull request as ready for review July 25, 2025 17:23

streamich requested changes Jul 25, 2025

View reviewed changes

Copilot started work on behalf of streamich July 25, 2025 17:54 View session

Address code review feedback: rename isValidUtf8 to isUtf8, optimize …

edb4224

…performance, link dependencies directly Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>

Copilot AI requested a review from streamich July 25, 2025 18:03

Copilot finished work on behalf of streamich July 25, 2025 18:03

Copilot started work on behalf of streamich July 25, 2025 18:03 View session

Apply code formatting with yarn format:fix

eee42dc

Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>

Copilot finished work on behalf of streamich July 25, 2025 18:08

streamich requested changes Jul 25, 2025

View reviewed changes

Copilot started work on behalf of streamich July 25, 2025 18:12 View session

Copilot AI requested a review from streamich July 25, 2025 18:22

Copilot finished work on behalf of streamich July 25, 2025 18:22

streamich approved these changes Jul 25, 2025

View reviewed changes

streamich merged commit 465cc7d into master Jul 25, 2025
3 checks passed

streamich deleted the copilot/fix-7 branch July 25, 2025 18:45

github-actions bot added the released label Jul 26, 2025

	const validateFn = ctx.codegen.linkDependency(validateStringFormat);
	const validateFn = ctx.codegen.linkDependency(isAscii);

Uh oh!

Add string format validation with ASCII and UTF-8 support #8

Add string format validation with ASCII and UTF-8 support #8

Uh oh!

Conversation

Copilot AI commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Core Implementation

Usage Examples

Backward Compatibility

Performance Optimizations

JSON Schema Integration

Testing

Uh oh!

streamich Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

streamich Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

streamich Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

streamich Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

streamich Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

streamich Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

streamich commented Jul 25, 2025

Uh oh!

Copilot AI commented Jul 25, 2025

Uh oh!

streamich Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jul 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jul 25, 2025 •

edited

Loading