# YAML Parsing and Generation in Python

YAML (YAML Ain't Markup Language) is a human-readable data serialization format. Python provides excellent support for working with YAML through the `PyYAML` library. This tutorial will cover the core concepts of parsing and generating YAML in Python.

## Installing PyYAML

First, you need to install the PyYAML library. You can do this using pip:

In [None]:
pip install pyyaml

## Parsing YAML

### Basic Parsing

To parse YAML in Python, you use the `yaml.safe_load()` function. This function takes a string or a file object containing YAML data and returns a Python object.

In [None]:
import yaml

yaml_string = """
name: John Doe
age: 30
city: New York
"""

data = yaml.safe_load(yaml_string)
print(data)

This will output:

```python
{'name': 'John Doe', 'age': 30, 'city': 'New York'}
```

### Parsing from a File

To parse YAML from a file:

In [None]:
with open('config.yaml', 'r') as file:
    data = yaml.safe_load(file)

### Parsing Multiple YAML Documents

If your YAML file contains multiple documents separated by `---`, you can use `yaml.safe_load_all()`:

In [None]:
yaml_string = """
---
document: 1
---
document: 2
"""

for doc in yaml.safe_load_all(yaml_string):
    print(doc)

This will output:

```python
{'document': 1}
{'document': 2}
```

## Generating YAML

### Basic YAML Generation

To generate YAML from Python objects, use the `yaml.dump()` function:

In [None]:
data = {
    'name': 'Jane Doe',
    'age': 28,
    'city': 'San Francisco'
}

yaml_string = yaml.dump(data)
print(yaml_string)

This will output:

```yaml
age: 28
city: San Francisco
name: Jane Doe
```

### Customizing YAML Output

You can customize the YAML output using various parameters:

In [None]:
yaml_string = yaml.dump(data, default_flow_style=False, sort_keys=False)
print(yaml_string)

This will output:

```yaml
name: Jane Doe
age: 28
city: San Francisco
```

### Writing YAML to a File

To write YAML directly to a file:

In [None]:
with open('output.yaml', 'w') as file:
    yaml.dump(data, file)

### Generating Multiple YAML Documents

To generate multiple YAML documents:

In [None]:
data1 = {'document': 1}
data2 = {'document': 2}

with open('multi_doc.yaml', 'w') as file:
    yaml.dump_all([data1, data2], file)

## Advanced Parsing and Generation

### Custom Tag Handling

PyYAML allows you to define custom tags for complex Python objects:

In [None]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

def person_representer(dumper, person):
    return dumper.represent_mapping('!person', {'name': person.name, 'age': person.age})

def person_constructor(loader, node):
    value = loader.construct_mapping(node)
    return Person(value['name'], value['age'])

yaml.add_representer(Person, person_representer)
yaml.add_constructor('!person', person_constructor)

# Usage
person = Person("Alice", 30)
yaml_string = yaml.dump(person)
print(yaml_string)

loaded_person = yaml.safe_load(yaml_string)
print(f"Name: {loaded_person.name}, Age: {loaded_person.age}")

### Preserving Order of Keys

To preserve the order of keys in dictionaries, you can use `OrderedDict`:

In [None]:
from collections import OrderedDict

yaml.add_representer(OrderedDict, lambda dumper, data: dumper.represent_mapping('tag:yaml.org,2002:map', data.items()))

data = OrderedDict([
    ('first', 1),
    ('second', 2),
    ('third', 3)
])

yaml_string = yaml.dump(data)
print(yaml_string)

### Error Handling

When parsing YAML, it's important to handle potential errors:

In [None]:
try:
    data = yaml.safe_load(yaml_string)
except yaml.YAMLError as e:
    print(f"Error parsing YAML: {e}")

## Performance Considerations

### Using LibYAML

For better performance, especially with large YAML files, you can use LibYAML:

In [None]:
pip install pyyaml cython

Then in your Python code:

In [None]:
import yaml
try:
    from yaml import CSafeLoader as SafeLoader, CDumper as Dumper
except ImportError:
    from yaml import SafeLoader, Dumper

data = yaml.load(yaml_string, Loader=SafeLoader)
yaml_output = yaml.dump(data, Dumper=Dumper)

### Streaming for Large Files

For very large YAML files, you can use streaming:

In [None]:
def parse_large_yaml(file_path):
    with open(file_path, 'r') as file:
        for event in yaml.parse(file):
            if isinstance(event, yaml.ScalarEvent):
                print(f"Scalar: {event.value}")
            elif isinstance(event, yaml.SequenceStartEvent):
                print("Sequence Start")
            elif isinstance(event, yaml.MappingStartEvent):
                print("Mapping Start")

This tutorial covers the core concepts of YAML parsing and generation in Python, providing you with the tools to effectively work with YAML in your Python projects. Remember to always use `safe_load()` instead of `load()` when parsing untrusted YAML to prevent potential security issues.