# Avro File Format (Apache Avro)

**Apache Avro** is a row-oriented binary serialization format designed for efficient data exchange and schema evolution in distributed systems. It is widely used in Kafka, Hadoop, and streaming pipelines.

* Schema-Based Serialization
* Data is written with a schema defined in JSON.
* The schema can be embedded in the file (common) or managed externally (e.g., Schema Registry).

In [2]:
"""
{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "age", "type": ["null", "int"], "default": null}
    ]
}
"""

'\n{\n    "type": "record",\n    "name": "User",\n    "fields": [\n        {"name": "id", "type": "int"},\n        {"name": "name", "type": "string"},\n        {"name": "age", "type": ["null", "int"], "default": null}\n    ]\n}\n'

**Compact & Fast**

Uses binary encoding â†’ smaller size than JSON/CSV.
Optimized for fast write/read, especially in streaming.

**Schema Evolution (Key Strength)**

Avro supports backward and forward compatibility:
Add/remove fields
Change defaults
Reader and writer schemas can differ
This is critical in event-driven systems.

**Row-Oriented Storage**

Stores data record by record.

**Ideal for:**

    Streaming
    Message queues
    Incremental ingestion

    Not ideal for heavy analytical scans (columnar formats are better).

**Avro File Structure**

An .avro file typically contains:

**Header**

    Magic bytes
    Metadata (including schema)

**Data Blocks**

    Serialized records

**Sync Marker**

    Enables file splitting in distributed systems

| Feature          | Avro             | Parquet         | ORC       |
| ---------------- | ---------------- | --------------- | --------- |
| Storage          | Row-based        | Columnar        | Columnar  |
| Schema           | Embedded JSON    | External        | External  |
| Best For         | Streaming, Kafka | Analytics, OLAP | Analytics |
| Compression      | Yes              | Yes (better)    | Yes       |
| Schema Evolution | Excellent        | Limited         | Limited   |
