Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Aug 1, 2025

This PR implements a complete Apache Avro binary encoder for the json-pack library, following the Apache Avro 1.12.0 specification.

Implementation

The implementation includes three main components:

1. AvroSchemaValidator

A comprehensive schema validator that validates Avro schemas and ensures values conform to their schemas:

const validator = new AvroSchemaValidator();
const schema = {
  type: 'record',
  name: 'User',
  fields: [
    {name: 'id', type: 'int'},
    {name: 'name', type: 'string'}
  ]
};

validator.validateSchema(schema); // true
validator.validateValue({id: 123, name: 'John'}, schema); // true

2. AvroEncoder

A basic Avro binary encoder implementing the BinaryJsonEncoder interface for encoding values without schema validation:

const writer = new Writer();
const encoder = new AvroEncoder(writer);
const encoded = encoder.encode({name: 'test', value: 42});

3. AvroSchemaEncoder

A schema-aware encoder that validates values against schemas before encoding:

const schemaEncoder = new AvroSchemaEncoder(writer);
const encoded = schemaEncoder.encode(userData, userSchema);

Key Features

  • Complete Avro binary encoding: Supports all Avro types including primitives, records, enums, arrays, maps, unions, and fixed-length data
  • Proper encoding semantics: Uses zigzag encoding for signed integers, IEEE 754 for floats, variable-length encoding for lengths, and UTF-8 for strings
  • Schema validation: Comprehensive validation of both schemas and values against schemas
  • Type safety: Full TypeScript support with detailed type definitions for all Avro schema types
  • Error handling: Clear error messages for invalid schemas and non-conforming values
  • Union type support: Automatic type detection or explicit index selection for union types

Testing

Added 112 comprehensive test cases covering:

  • All primitive and complex Avro types
  • Schema validation edge cases
  • Value validation against schemas
  • Binary encoding correctness
  • Error conditions and edge cases

All existing tests continue to pass, ensuring no regressions.

Fixes #41.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement Apache Avro encoder feat: implement Apache Avro encoder with schema validation Aug 1, 2025
Copilot AI requested a review from streamich August 1, 2025 14:26
Copilot finished work on behalf of streamich August 1, 2025 14:26
@streamich streamich marked this pull request as ready for review August 1, 2025 18:49
Comment on lines 93 to 95
this.writer.ensureCapacity(4);
this.writer.view.setFloat32(this.writer.x, float, true); // little-endian
this.writer.move(4);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-reference this.writer only once:

const writer = this.writer;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it everywhere in the code: de-reference pointers only once, where possible. Performance is the number one feature of this library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Optimized all performance-critical methods to cache this.writer to local variables to reduce property access overhead. Commit d6842c4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented throughout the codebase. All writer access patterns in hot paths now use local variable caching for better performance. Commit d6842c4

* Writes an Avro string value with UTF-8 encoding and length prefix.
*/
public writeStr(str: string): void {
const bytes = new TextEncoder().encode(str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO NOT use TextEncoder it is very slow. The Writer has fast string encoding methods. Or see other codec implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced TextEncoder with Writer's native utf8() method for much faster string encoding. The implementation now reserves space for the length prefix, writes the string, then fills in the actual byte count. Commit d6842c4

public writeObj(obj: Record<string, unknown>): void {
const entries = Object.entries(obj);
this.writeVarIntUnsigned(entries.length);
for (const [key, value] of entries) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use ForOfLoop use plain ForLoop, it is faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted all for-of loops to plain for loops throughout both AvroEncoder and AvroSchemaEncoder for better performance. Commit d6842c4

/**
* Writes an ASCII string (same as regular string in Avro)
*/
public writeAsciiStr(str: string): void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Writer instance should have a way to write .ascii() strings, see other code for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated writeAsciiStr to use Writer's ascii() method for faster ASCII string encoding instead of the generic UTF-8 path. Commit d6842c4

/**
* Writes a null value with schema validation.
*/
public writeNull(schema: AvroSchema): void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these primitive encoding methods, we should specify the specific schema type, no? In this case AvroNullSchema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added specific AvroNullSchema type to the writeNull method signature and imported the type for better type safety. Commit d6842c4

Copy link
Collaborator

@streamich streamich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When done, make sure tests pass, linter and formatter pass.

Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>
Copilot AI requested a review from streamich August 1, 2025 20:11
Copilot finished work on behalf of streamich August 1, 2025 20:11
Co-authored-by: streamich <9773803+streamich@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Aug 1, 2025

When done, make sure tests pass, linter and formatter pass.

All checks now pass:
✅ Tests: 2833 tests passing
✅ Linter: TSLint clean
✅ Formatter: Prettier clean

Fixed formatting issues in the Avro implementation files. Commit 6b9bd54

Copilot finished work on behalf of streamich August 1, 2025 20:16
@streamich streamich merged commit 9254afe into master Aug 1, 2025
2 checks passed
@streamich streamich deleted the copilot/fix-41 branch August 1, 2025 21:47
@github-actions
Copy link

github-actions bot commented Aug 1, 2025

🎉 This PR is included in version 1.6.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Apache Avro encoder

2 participants