## Schema Evolution: Forward Compatibility

Forward compatibility ensures that consumers using the new schema can read data produced with the old schema. This is useful when you want to add new fields to your schema without breaking existing consumers.

### Example: Adding a New Field to the User Schema

Let's evolve the existing User Avro schema by adding a new field `age`. When adding a new field to a schema for forward compatibility, it is important to provide a default value. This ensures that the new schema can read data produced with the old schema, which does not contain the new field.

In [None]:
import requests
import json

# Schema Registry URL
schema_registry_url = 'http://schema-registry:8081'

# Evolved User Avro Schema
evolved_user_avro_schema = {
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": "string"},
        {"name": "age", "type": "int", "default": 0}  # New field with default value
    ]
}

# Register Evolved Avro Schema
response = requests.post(
    f"{schema_registry_url}/subjects/user-avro-value/versions",
    headers={"Content-Type": "application/vnd.schemaregistry.v1+json"},
    data=json.dumps({"schema": json.dumps(evolved_user_avro_schema)})
)

if response.status_code == 200:
    print("Evolved Avro schema registered successfully!")
else:
    print(f"Failed to register evolved Avro schema: {response.text}")

In [None]:
from confluent_kafka import Producer
from confluent_kafka.serialization import StringSerializer, SerializationContext, MessageField
from confluent_kafka.schema_registry import SchemaRegistryClient, Schema
from confluent_kafka.schema_registry.avro import AvroSerializer
import time
import avro.schema

# Kafka Configuration
conf = {
    'bootstrap.servers': "kafka-broker-1:29094,kafka-broker-2:29094"
}

# Schema Registry Configuration
schema_registry_conf = {'url': 'http://schema-registry:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)

# Evolved User Avro Schema
evolved_user_avro_schema_str = """
{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": "string"},
        {"name": "age", "type": "int", "default": 0}
    ]
}
"""

# Create Schema object
evolved_user_schema = Schema(evolved_user_avro_schema_str, "AVRO")
# Create Avro Serializer
avro_serializer = AvroSerializer(schema_registry_client, evolved_user_schema)
# Create Producer Instance
producer = Producer(conf)
# Kafka Topic
topic = "user-avro"

# Produce User Messages with Evolved Schema
for i in range(10):
    user = {'id': i, 'name': f"User {i}", 'email': f"user{i}@example.com", 'age': 25 + i}
    producer.produce(
        topic=topic,
        key=StringSerializer('utf_8')(str(i), SerializationContext(topic, MessageField.KEY)),
        value=avro_serializer(user, SerializationContext(topic, MessageField.VALUE))
    )
    print(f"Produced: {user}")
    producer.flush()  # Ensure delivery
    time.sleep(1)  # Simulate delay between messages

print("All user messages with evolved schema produced successfully!")

In [None]:
from confluent_kafka import Consumer
from confluent_kafka.serialization import StringDeserializer, SerializationContext, MessageField
from confluent_kafka.schema_registry.avro import AvroDeserializer

# Consumer Configuration
consumer_conf = {
    'bootstrap.servers': "kafka-broker-1:29094,kafka-broker-2:29094",
    'group.id': 'user-avro-consumer-group',
    'auto.offset.reset': 'earliest'
}

# Create Consumer Instance
consumer = Consumer(consumer_conf)
# Subscribe to the topic
consumer.subscribe(["user-avro"])

# Old User Avro Schema (without 'age' field)
old_user_avro_schema_str = """
{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": "string"}
    ]
}
"""

# Create Avro Deserializer
old_user_schema = Schema(old_user_avro_schema_str, "AVRO")
avro_deserializer = AvroDeserializer(schema_registry_client, old_user_schema)

# Consume Messages
try:
    while True:
        msg = consumer.poll(1.0)
        if msg is None:
            continue
        if msg.error():
            print(f"Consumer error: {msg.error()}")
            continue
        user = avro_deserializer(msg.value(), SerializationContext(msg.topic(), MessageField.VALUE))
        print(f"Consumed: {user}")
except KeyboardInterrupt:
    pass
finally:
    consumer.close()

print("All user messages with old schema consumed successfully!")

## Schema Evolution: Backward Compatibility

Backward compatibility ensures that consumers using the old schema can read data produced with the new schema. This is critical when upgrading producers before consumers.

### Example: Removing a Field from the User Schema

Let's evolve the User Avro schema by removing the `age` field. For backward compatibility, we need to ensure that consumers using the older schema (which includes the `age` field) can still process messages produced with the new schema (which doesn't include the `age` field).

In [None]:
import requests
import json

# Schema Registry URL
schema_registry_url = 'http://schema-registry:8081'

# Backward Compatible User Avro Schema (removing the 'age' field)
backward_user_avro_schema = {
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": "string"}
    ]
}

# Register Backward Compatible Avro Schema to the SAME topic subject
response = requests.post(
    f"{schema_registry_url}/subjects/user-avro-value/versions",  # Note: Using the same subject as before
    headers={"Content-Type": "application/vnd.schemaregistry.v1+json"},
    data=json.dumps({"schema": json.dumps(backward_user_avro_schema)})
)

if response.status_code == 200:
    print("Backward compatible Avro schema registered successfully to the same topic!")
else:
    print(f"Failed to register backward compatible Avro schema: {response.text}")

In [None]:
from confluent_kafka import Producer
from confluent_kafka.serialization import StringSerializer, SerializationContext, MessageField
from confluent_kafka.schema_registry import SchemaRegistryClient, Schema
from confluent_kafka.schema_registry.avro import AvroSerializer
import time

# Kafka Configuration
conf = {
    'bootstrap.servers': "kafka-broker-1:29094,kafka-broker-2:29094"
}

# Schema Registry Configuration
schema_registry_conf = {'url': 'http://schema-registry:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)

# Backward Compatible User Avro Schema
backward_user_avro_schema_str = """
{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": "string"}
    ]
}
"""

# Create Schema object
backward_user_schema = Schema(backward_user_avro_schema_str, "AVRO")
# Create Avro Serializer
avro_serializer = AvroSerializer(schema_registry_client, backward_user_schema)
# Create Producer Instance
producer = Producer(conf)
# Kafka Topic - Use the SAME topic as before
topic = "user-avro"

# Produce User Messages with Backward Compatible Schema
for i in range(10):
    user = {'id': i + 100, 'name': f"User {i + 100}", 'email': f"user{i + 100}@example.com"}
    producer.produce(
        topic=topic,
        key=StringSerializer('utf_8')(str(i + 100), SerializationContext(topic, MessageField.KEY)),
        value=avro_serializer(user, SerializationContext(topic, MessageField.VALUE))
    )
    print(f"Produced (new schema without age): {user}")
    producer.flush()  # Ensure delivery
    time.sleep(1)  # Simulate delay between messages

print("All user messages with backward compatible schema produced successfully to the same topic!")

In [None]:
from confluent_kafka import Consumer
from confluent_kafka.serialization import SerializationContext, MessageField
from confluent_kafka.schema_registry.avro import AvroDeserializer

# Consumer Configuration
consumer_conf = {
    'bootstrap.servers': "kafka-broker-1:29094,kafka-broker-2:29094",
    'group.id': 'user-same-topic-consumer-group',
    'auto.offset.reset': 'earliest'
}

# Create Consumer Instance
consumer = Consumer(consumer_conf)
# Subscribe to the same topic as before
consumer.subscribe(["user-avro"])

# Old User Avro Schema (with 'age' field)
old_user_with_age_schema_str = """
{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": "string"},
        {"name": "age", "type": "int", "default": 0}
    ]
}
"""

# Create Avro Deserializer with old schema (including age)
old_schema_with_age = Schema(old_user_with_age_schema_str, "AVRO")
avro_deserializer = AvroDeserializer(schema_registry_client, old_schema_with_age)

# Consume Messages
try:
    count = 0
    while count < 15:  # Increased to catch both old and new messages
        msg = consumer.poll(1.0)
        if msg is None:
            continue
        if msg.error():
            print(f"Consumer error: {msg.error()}")
            continue
        user = avro_deserializer(msg.value(), SerializationContext(msg.topic(), MessageField.VALUE))
        if user['id'] < 100:
            print(f"Consumed original message with age field: {user}")
        else:
            print(f"Consumed new message with defaulted age (0): {user}")
        count += 1
except KeyboardInterrupt:
    pass
finally:
    consumer.close()

print("Successfully consumed messages with both schema versions from the same topic!")

## Schema Compatibility Summary

- **Forward Compatibility**: New schema can read old data (Adding fields with defaults)
- **Backward Compatibility**: Old schema can read new data (Removing fields or adding optional fields)
- **Full Compatibility**: Schemas are both forward and backward compatible

Schema Registry enforces these compatibility rules, preventing incompatible schema changes from breaking your data pipeline.

## Important Considerations for Schema Compatibility

### Default Values are Critical

1. **Default Values for Forward Compatibility**:
   - When adding new fields, they **must** have default values specified
   - Without defaults, consumers using the new schema will fail when reading old data (where the new field doesn't exist)
   - Example: `{"name": "age", "type": "int", "default": 0}`

2. **Default Values for Backward Compatibility**:
   - Old consumers expecting fields that no longer exist in the new schema need those fields to have defaults
   - The default value is what old consumers will receive when the field is absent in the new data

3. **Types and Default Value Consistency**:
   - Default values must match the field type
   - For complex types, ensure default values are valid according to the schema

4. **Breaking Changes to Avoid**:
   - Changing field types (e.g., from `int` to `string`)
   - Removing fields without previously having defaults
   - Changing the schema name or namespace
   - Adding required fields without defaults

5. **Schema Evolution Best Practices**:
   - Start with a minimal schema and evolve gradually
   - Always test compatibility before deploying schema changes to production
   - Use the Schema Registry's compatibility checking features
   - Document all schema changes and their compatibility implications

Remember: Without proper default values, schema evolution will fail, and your producers or consumers will encounter errors when processing messages.