# 异步编程与生成器全面教程

本教程涵盖Python中的异步编程和生成器的核心概念与实践应用。

## 目录
1. [生成器基础](#生成器基础)
2. [生成器高级应用](#生成器高级应用)
3. [异步编程基础](#异步编程基础)
4. [异步编程进阶](#异步编程进阶)
5. [异步生成器](#异步生成器)
6. [实战案例](#实战案例)

## 1. 生成器基础 {#生成器基础}

### 1.1 什么是生成器？

生成器是一种特殊的迭代器，使用 `yield` 关键字来返回值，能够暂停和恢复执行。

In [None]:
# 基础生成器示例
def simple_generator():
    """最简单的生成器示例"""
    print("开始执行")
    yield 1
    print("继续执行")
    yield 2
    print("最后执行")
    yield 3

# 创建生成器对象
gen = simple_generator()
print(f"生成器对象: {gen}")
print(f"类型: {type(gen)}")

# 逐步获取值
print("\n--- 开始迭代 ---")
print(f"第一次调用: {next(gen)}")
print(f"第二次调用: {next(gen)}")
print(f"第三次调用: {next(gen)}")

### 1.2 生成器表达式

类似列表推导式，但使用圆括号，更节省内存。

In [None]:
# 列表推导式 vs 生成器表达式
import sys

# 列表推导式 - 立即生成所有元素
list_comp = [x**2 for x in range(10000)]
print(f"列表推导式内存占用: {sys.getsizeof(list_comp)} bytes")

# 生成器表达式 - 惰性计算
gen_exp = (x**2 for x in range(10000))
print(f"生成器表达式内存占用: {sys.getsizeof(gen_exp)} bytes")

# 使用生成器表达式
print(f"\n前5个平方数: {list(x for i, x in enumerate(gen_exp) if i < 5)}")

### 1.3 斐波那契数列生成器

经典的生成器应用案例。

In [None]:
def fibonacci(n):
    """生成前n个斐波那契数"""
    a, b = 0, 1
    count = 0
    while count < n:
        yield a
        a, b = b, a + b
        count += 1

# 生成前10个斐波那契数
print("前10个斐波那契数:")
for num in fibonacci(10):
    print(num, end=' ')
    
# 使用列表转换
print(f"\n\n前15个斐波那契数列表: {list[int](fibonacci(15))}")

前10个斐波那契数:
0 1 1 2 3 5 8 13 21 34 

前15个斐波那契数列表: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]


## 2. 生成器高级应用 {#生成器高级应用}

### 2.1 生成器的send()方法

生成器不仅可以产出值，还可以接收值。

In [6]:
from typing import Any


def echo_generator():
    """可以接收值的生成器"""
    print("生成器启动")
    while True:
        received = yield
        if received is None:
            break
        print(f"收到: {received}")
        
# 使用示例
gen = echo_generator()
print(gen)
print(type(gen))
# next(gen)  # 启动生成器
gen.send(None) # 启动生成器
gen.send("Hello")
gen.send("World")
gen.send(42)
gen.close()  # 关闭生成器

<generator object echo_generator at 0x10d46d220>
<class 'generator'>
生成器启动


收到: Hello
收到: World
收到: 42


### 2.2 生产者-消费者模式

使用生成器实现经典的生产者-消费者模式。

In [None]:
def consumer():
    """消费者生成器"""
    r = ''
    while True:
        n = yield r
        if not n:
            return
        print(f'[消费者] 正在消费 {n}...')
        r = '200 OK'

def produce(c):
    """生产者函数"""
    c.send(None)  # 启动生成器
    n = 0
    while n < 5:
        n = n + 1
        print(f'[生产者] 正在生产 {n}...')
        r = c.send(n)
        print(f'[生产者] 消费者返回: {r}')
    c.close()

# 执行
print("=== 生产者-消费者模式演示 ===")
c = consumer()
produce(c)

### 2.3 yield from 语法

`yield from` 用于委托给另一个生成器。

In [None]:
def sub_generator():
    """子生成器"""
    yield "子生成器: A"
    yield "子生成器: B"
    return "子生成器完成"

def delegating_generator():
    """委托生成器"""
    yield "委托生成器开始"
    result = yield from sub_generator()
    print(f"收到返回值: {result}")
    yield "委托生成器结束"

# 使用
for value in delegating_generator():
    print(value)

### 2.4 流式数据处理

生成器非常适合处理大型数据流。

In [None]:
def read_large_file(file_path, chunk_size=1024):
    """逐块读取大文件"""
    with open(file_path, 'r') as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            yield chunk

def filter_lines(lines, keyword):
    """过滤包含关键字的行"""
    for line in lines:
        if keyword in line:
            yield line

def process_numbers(numbers):
    """处理数字流"""
    for num in numbers:
        if num % 2 == 0:
            yield num * 2

# 演示数字流处理
numbers = range(20)
processed = process_numbers(numbers)
print("处理后的偶数（x2）:", list(processed))

## 3. 异步编程基础 {#异步编程基础}

### 3.1 协程与async/await

Python 3.5+ 引入了 `async/await` 语法来简化异步编程。

In [None]:
import asyncio
import time

async def hello_world():
    """最简单的协程"""
    print("Hello")
    await asyncio.sleep(1)
    print("World")

# 执行协程
print("=== 基础协程示例 ===")
await hello_world()  # 在 Jupyter 中可以直接 await

### 3.2 并发执行多个协程

使用 `asyncio.gather()` 并发执行多个协程。

In [None]:
import asyncio
import time

async def task(name, delay):
    """模拟异步任务"""
    print(f"任务 {name} 开始")
    await asyncio.sleep(delay)
    print(f"任务 {name} 完成（耗时 {delay}s）")
    return f"结果-{name}"

# 串行执行（同步方式）
print("=== 串行执行 ===")
start = time.time()
await task("A", 2)
await task("B", 1)
await task("C", 1)
print(f"串行总耗时: {time.time() - start:.2f}s\n")

# 并发执行（异步方式）
print("=== 并发执行 ===")
start = time.time()
results = await asyncio.gather(
    task("A", 2),
    task("B", 1),
    task("C", 1)
)
print(f"并发总耗时: {time.time() - start:.2f}s")
print(f"返回结果: {results}")

### 3.3 异步上下文管理器

使用 `async with` 管理异步资源。

In [None]:
import asyncio

class AsyncResource:
    """异步资源管理器"""
    
    async def __aenter__(self):
        print("获取资源...")
        await asyncio.sleep(0.5)
        print("资源已获取")
        return self
    
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        print("释放资源...")
        await asyncio.sleep(0.5)
        print("资源已释放")
    
    async def do_work(self):
        print("正在工作...")
        await asyncio.sleep(1)
        print("工作完成")

# 使用异步上下文管理器
async with AsyncResource() as resource:
    await resource.do_work()

## 4. 异步编程进阶 {#异步编程进阶}

### 4.1 异步HTTP请求

使用 `aiohttp` 进行高效的并发HTTP请求。

In [None]:
import asyncio
import aiohttp
import time

async def fetch(session, url):
    """获取单个URL的数据"""
    print(f"正在请求: {url}")
    try:
        async with session.get(url, timeout=10) as response:
            data = await response.json()
            print(f"完成请求: {url} (状态: {response.status})")
            return {"url": url, "data": data, "status": response.status}
    except Exception as e:
        print(f"请求失败: {url} - {e}")
        return {"url": url, "error": str(e)}

async def batch_fetch(urls):
    """批量获取多个URL的数据"""
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        return await asyncio.gather(*tasks, return_exceptions=True)

# 测试批量请求
urls = [
    "https://api.github.com/users/github",
    "https://api.github.com/users/python",
    "https://api.github.com/users/microsoft"
]

print("=== 异步HTTP请求示例 ===")
start = time.time()
results = await batch_fetch(urls)
print(f"\n总耗时: {time.time() - start:.2f}s")
print(f"\n成功请求数: {sum(1 for r in results if 'data' in r)}")

### 4.2 任务控制与取消

使用 `asyncio.Task` 进行任务管理。

In [None]:
import asyncio

async def long_running_task(name, duration):
    """长时间运行的任务"""
    try:
        print(f"任务 {name} 开始 (预计 {duration}s)")
        for i in range(duration):
            await asyncio.sleep(1)
            print(f"任务 {name}: {i+1}s")
        print(f"任务 {name} 完成")
        return f"结果-{name}"
    except asyncio.CancelledError:
        print(f"任务 {name} 被取消")
        raise

# 创建任务
task1 = asyncio.create_task(long_running_task("A", 3))
task2 = asyncio.create_task(long_running_task("B", 5))

# 等待2秒后取消任务2
await asyncio.sleep(2)
print("\n--- 取消任务B ---")
task2.cancel()

# 等待所有任务（包括已取消的）
results = await asyncio.gather(task1, task2, return_exceptions=True)
print(f"\n结果: {results}")

### 4.3 信号量与限流

使用信号量控制并发数量。

In [None]:
import asyncio
import time

async def download_file(semaphore, file_id):
    """模拟下载文件（限制并发数）"""
    async with semaphore:  # 获取信号量
        print(f"[{time.strftime('%H:%M:%S')}] 开始下载文件 {file_id}")
        await asyncio.sleep(2)  # 模拟下载耗时
        print(f"[{time.strftime('%H:%M:%S')}] 完成下载文件 {file_id}")
        return f"文件-{file_id}"

async def download_all_files(file_count, max_concurrent):
    """下载所有文件，限制最大并发数"""
    semaphore = asyncio.Semaphore(max_concurrent)
    tasks = [download_file(semaphore, i) for i in range(file_count)]
    return await asyncio.gather(*tasks)

# 下载10个文件，最多同时下载3个
print("=== 限流下载示例（最多3个并发） ===")
start = time.time()
results = await download_all_files(file_count=10, max_concurrent=3)
print(f"\n总耗时: {time.time() - start:.2f}s")
print(f"下载完成: {len(results)} 个文件")

### 4.4 异步队列

使用 `asyncio.Queue` 实现生产者-消费者模式。

In [None]:
import asyncio
import random

async def producer(queue, producer_id, item_count):
    """生产者"""
    for i in range(item_count):
        item = f"P{producer_id}-Item{i}"
        await queue.put(item)
        print(f"[生产者{producer_id}] 生产: {item}")
        await asyncio.sleep(random.uniform(0.1, 0.5))
    print(f"[生产者{producer_id}] 完成")

async def consumer(queue, consumer_id):
    """消费者"""
    while True:
        item = await queue.get()
        if item is None:  # 终止信号
            queue.task_done()
            break
        print(f"[消费者{consumer_id}] 消费: {item}")
        await asyncio.sleep(random.uniform(0.2, 0.8))
        queue.task_done()
    print(f"[消费者{consumer_id}] 完成")

async def main():
    """主函数"""
    queue = asyncio.Queue(maxsize=5)  # 限制队列大小
    
    # 启动2个生产者和3个消费者
    producers = [
        asyncio.create_task(producer(queue, i, 5)) 
        for i in range(2)
    ]
    consumers = [
        asyncio.create_task(consumer(queue, i)) 
        for i in range(3)
    ]
    
    # 等待所有生产者完成
    await asyncio.gather(*producers)
    
    # 等待队列清空
    await queue.join()
    
    # 发送终止信号给所有消费者
    for _ in consumers:
        await queue.put(None)
    
    # 等待所有消费者完成
    await asyncio.gather(*consumers)

print("=== 异步队列示例 ===")
await main()

## 5. 异步生成器 {#异步生成器}

### 5.1 基础异步生成器

结合异步编程和生成器的特性。

In [None]:
import asyncio

async def async_counter(n):
    """异步计数器"""
    for i in range(n):
        await asyncio.sleep(0.5)
        yield i

# 使用异步生成器
print("=== 异步生成器示例 ===")
async for num in async_counter(5):
    print(f"收到: {num}")

### 5.2 异步数据流处理

使用异步生成器处理实时数据流。

In [None]:
import asyncio
import random
import time

async def fetch_data_stream(source_id, count):
    """模拟从数据源获取数据流"""
    for i in range(count):
        await asyncio.sleep(random.uniform(0.1, 0.5))
        data = {
            "source": source_id,
            "index": i,
            "value": random.randint(1, 100),
            "timestamp": time.time()
        }
        yield data

async def process_stream(stream):
    """处理数据流"""
    async for item in stream:
        # 模拟数据处理
        processed = {
            **item,
            "processed": item["value"] * 2,
            "status": "ok" if item["value"] > 50 else "low"
        }
        yield processed

# 使用示例
print("=== 异步数据流处理 ===")
stream = fetch_data_stream("sensor-1", 5)
processed_stream = process_stream(stream)

async for data in processed_stream:
    print(f"数据: 值={data['value']}, 处理后={data['processed']}, 状态={data['status']}")

### 5.3 合并多个异步流

同时处理多个异步数据源。

In [None]:
import asyncio
import random

async def sensor_data(sensor_id, count):
    """模拟传感器数据流"""
    for i in range(count):
        await asyncio.sleep(random.uniform(0.2, 0.8))
        yield f"[传感器{sensor_id}] 数据-{i}"

async def merge_streams(*streams):
    """合并多个异步流"""
    # 为每个流创建迭代器
    iterators = [stream.__aiter__() for stream in streams]
    
    # 创建任务获取每个流的下一个值
    async def get_next(iterator, index):
        try:
            value = await iterator.__anext__()
            return (index, value, False)
        except StopAsyncIteration:
            return (index, None, True)
    
    pending = {
        asyncio.create_task(get_next(it, i)) 
        for i, it in enumerate(iterators)
    }
    
    active_count = len(iterators)
    
    while active_count > 0:
        done, pending = await asyncio.wait(
            pending, 
            return_when=asyncio.FIRST_COMPLETED
        )
        
        for task in done:
            index, value, is_done = await task
            
            if not is_done:
                yield value
                # 为这个流创建新的任务
                pending.add(
                    asyncio.create_task(get_next(iterators[index], index))
                )
            else:
                active_count -= 1

# 使用示例
print("=== 合并多个异步流 ===")
stream1 = sensor_data(1, 3)
stream2 = sensor_data(2, 3)
stream3 = sensor_data(3, 3)

merged = merge_streams(stream1, stream2, stream3)
async for data in merged:
    print(data)

## 6. 实战案例 {#实战案例}

### 6.1 网页爬虫

使用异步编程实现高效的网页爬虫。

In [None]:
import asyncio
import aiohttp
from urllib.parse import urljoin, urlparse
import time

class AsyncWebCrawler:
    """异步网页爬虫"""
    
    def __init__(self, max_concurrent=5, timeout=10):
        self.max_concurrent = max_concurrent
        self.timeout = timeout
        self.visited = set()
        self.results = []
    
    async def fetch_page(self, session, url):
        """获取页面内容"""
        if url in self.visited:
            return None
        
        self.visited.add(url)
        print(f"正在爬取: {url}")
        
        try:
            async with session.get(url, timeout=self.timeout) as response:
                content = await response.text()
                result = {
                    "url": url,
                    "status": response.status,
                    "length": len(content),
                    "content_type": response.headers.get("content-type", "")
                }
                self.results.append(result)
                print(f"完成: {url} (状态: {response.status}, 大小: {len(content)}字节)")
                return result
        except Exception as e:
            print(f"错误: {url} - {e}")
            return None
    
    async def crawl(self, urls):
        """爬取URL列表"""
        semaphore = asyncio.Semaphore(self.max_concurrent)
        
        async def fetch_with_limit(url):
            async with semaphore:
                return await self.fetch_page(session, url)
        
        async with aiohttp.ClientSession() as session:
            tasks = [fetch_with_limit(url) for url in urls]
            await asyncio.gather(*tasks, return_exceptions=True)
        
        return self.results

# 使用示例
urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/2",
    "https://httpbin.org/status/200",
    "https://httpbin.org/html"
]

print("=== 异步网页爬虫示例 ===")
crawler = AsyncWebCrawler(max_concurrent=3)
start = time.time()
results = await crawler.crawl(urls)
print(f"\n总耗时: {time.time() - start:.2f}s")
print(f"成功爬取: {len(results)} 个页面")

### 6.2 实时日志处理

使用异步生成器实时处理日志文件。

In [None]:
import asyncio
import random
from datetime import datetime

async def log_generator(log_count):
    """模拟日志生成器"""
    levels = ["INFO", "WARNING", "ERROR", "DEBUG"]
    messages = [
        "用户登录成功",
        "数据库查询完成",
        "缓存命中",
        "API请求超时",
        "内存使用率高"
    ]
    
    for i in range(log_count):
        await asyncio.sleep(random.uniform(0.1, 0.5))
        log = {
            "id": i,
            "timestamp": datetime.now().isoformat(),
            "level": random.choice(levels),
            "message": random.choice(messages)
        }
        yield log

async def filter_logs(log_stream, level="ERROR"):
    """过滤特定级别的日志"""
    async for log in log_stream:
        if log["level"] == level:
            yield log

async def analyze_logs(log_stream):
    """分析日志"""
    stats = {"INFO": 0, "WARNING": 0, "ERROR": 0, "DEBUG": 0}
    
    async for log in log_stream:
        stats[log["level"]] += 1
        print(f"[{log['timestamp']}] {log['level']}: {log['message']}")
    
    return stats

# 使用示例
print("=== 实时日志处理 ===")
log_stream = log_generator(10)
stats = await analyze_logs(log_stream)
print(f"\n日志统计: {stats}")

# 过滤ERROR日志
print("\n=== 仅显示ERROR日志 ===")
log_stream = log_generator(20)
error_stream = filter_logs(log_stream, "ERROR")
async for log in error_stream:
    print(f"[ERROR] {log['timestamp']}: {log['message']}")

### 6.3 数据库批量操作

使用异步编程优化数据库操作。

In [None]:
import asyncio
import time

# 模拟数据库操作
class MockAsyncDB:
    """模拟异步数据库"""
    
    async def insert(self, table, data):
        """插入数据"""
        await asyncio.sleep(0.1)  # 模拟IO延迟
        return {"id": data.get("id"), "status": "inserted"}
    
    async def query(self, table, condition):
        """查询数据"""
        await asyncio.sleep(0.15)  # 模拟IO延迟
        return {"result": [condition]}
    
    async def update(self, table, data):
        """更新数据"""
        await asyncio.sleep(0.12)  # 模拟IO延迟
        return {"id": data.get("id"), "status": "updated"}

async def batch_insert(db, records):
    """批量插入记录"""
    tasks = [db.insert("users", record) for record in records]
    results = await asyncio.gather(*tasks)
    return results

async def batch_query(db, ids):
    """批量查询记录"""
    tasks = [db.query("users", {"id": id}) for id in ids]
    results = await asyncio.gather(*tasks)
    return results

# 使用示例
db = MockAsyncDB()

# 批量插入
print("=== 批量插入数据 ===")
records = [{"id": i, "name": f"用户{i}", "email": f"user{i}@example.com"} for i in range(10)]
start = time.time()
insert_results = await batch_insert(db, records)
print(f"插入 {len(insert_results)} 条记录，耗时: {time.time() - start:.2f}s")

# 批量查询
print("\n=== 批量查询数据 ===")
ids = list(range(10))
start = time.time()
query_results = await batch_query(db, ids)
print(f"查询 {len(query_results)} 条记录，耗时: {time.time() - start:.2f}s")

### 6.4 文件处理管道

组合生成器实现数据处理管道。

In [None]:
import asyncio
import json

async def read_json_lines(data_list):
    """模拟读取JSON行"""
    for item in data_list:
        await asyncio.sleep(0.1)
        yield json.dumps(item)

async def parse_json(line_stream):
    """解析JSON"""
    async for line in line_stream:
        try:
            yield json.loads(line)
        except json.JSONDecodeError as e:
            print(f"解析错误: {e}")

async def filter_data(data_stream, min_value):
    """过滤数据"""
    async for item in data_stream:
        if item.get("value", 0) >= min_value:
            yield item

async def transform_data(data_stream):
    """转换数据"""
    async for item in data_stream:
        transformed = {
            "id": item.get("id"),
            "value": item.get("value", 0) * 2,
            "category": item.get("category", "unknown").upper()
        }
        yield transformed

async def aggregate_data(data_stream):
    """聚合数据"""
    total = 0
    count = 0
    categories = {}
    
    async for item in data_stream:
        total += item["value"]
        count += 1
        category = item["category"]
        categories[category] = categories.get(category, 0) + 1
    
    return {
        "total": total,
        "count": count,
        "average": total / count if count > 0 else 0,
        "categories": categories
    }

# 使用示例
print("=== 数据处理管道 ===")
data = [
    {"id": 1, "value": 10, "category": "a"},
    {"id": 2, "value": 25, "category": "b"},
    {"id": 3, "value": 15, "category": "a"},
    {"id": 4, "value": 5, "category": "c"},
    {"id": 5, "value": 30, "category": "b"}
]

# 构建处理管道
pipeline = read_json_lines(data)
pipeline = parse_json(pipeline)
pipeline = filter_data(pipeline, min_value=10)  # 过滤值>=10的数据
pipeline = transform_data(pipeline)

# 执行管道并聚合
result = await aggregate_data(pipeline)
print(f"\n聚合结果:")
print(f"  总和: {result['total']}")
print(f"  数量: {result['count']}")
print(f"  平均值: {result['average']:.2f}")
print(f"  分类统计: {result['categories']}")

## 总结

### 生成器优势
- 内存效率：惰性计算，不需要一次性加载所有数据
- 代码简洁：用yield简化复杂的迭代逻辑
- 管道处理：可以组合多个生成器形成处理链

### 异步编程优势
- 高并发：单线程处理大量IO操作
- 资源效率：避免线程开销
- 代码清晰：async/await语法简洁直观

### 最佳实践
1. **生成器**：适用于大数据集、流式处理、无限序列
2. **异步编程**：适用于IO密集型任务（网络请求、文件读写、数据库操作）
3. **异步生成器**：结合两者优势，处理异步数据流
4. **并发控制**：使用信号量限制并发数量
5. **错误处理**：使用try-except和return_exceptions处理异常

### 注意事项
- 异步编程不适合CPU密集型任务（应使用多进程）
- 生成器只能迭代一次
- 异步函数必须在异步上下文中调用
- 注意避免阻塞操作（如time.sleep，应使用asyncio.sleep）

## 参考资源

- [Python asyncio 官方文档](https://docs.python.org/3/library/asyncio.html)
- [PEP 492 - Coroutines with async and await syntax](https://www.python.org/dev/peps/pep-0492/)
- [PEP 525 - Asynchronous Generators](https://www.python.org/dev/peps/pep-0525/)
- [aiohttp 文档](https://docs.aiohttp.org/)