在 复杂逻辑，需要**多个列参与计算** 或者 **调用外部函数** 的情况，推荐使用`apply`

场景	|使用方式|	示例|
|---|---|---|
|按列处理|	df.apply(func)	|计算每列统计量
|按行处理|	df.apply(func, axis=1)|	组合多个列的值
|元素处理|	series.apply(func)	|对每个元素进行转换
|简单运算|	避免使用apply，用向量化操作|	df['A'] * 2
|复杂逻辑|	使用apply	|涉及多个列的条件计算


lambda row: - 定义一个匿名函数，参数是 row（代表DataFrame的每一行）   
f"..." - f-string 格式化字符串   
{row['Name']} - 插入该行的 'Name' 列值   
{row['sypnopsis']} - 插入该行的 'sypnopsis' 列值   
{row['Genres']} - 插入该行的 'Genres' 列值   

axis=0：按列操作  
axis=1：按行操作，row 参数就是每一行的数据  

In [8]:
import pandas as pd

# 示例数据
anime = pd.DataFrame({
    'Name': ['Naruto', 'One Piece', 'Attack on Titan'],
    'sypnopsis': [
        'A young ninja dreams of becoming Hokage',
        'Pirate adventure to find the ultimate treasure', 
        'Humanity fights against giant humanoid creatures'
    ],
    'Genres': ['Action, Adventure, Comedy', 'Action, Adventure, Fantasy', 'Action, Drama, Fantasy']
})

# 应用 lambda 函数
anime['combined_info'] = anime.apply(
    lambda row: f"Title: {row['Name']}. Overview: {row['sypnopsis']} Genres: {row['Genres']}", 
    axis=1
)

print(anime['combined_info'])

0    Title: Naruto. Overview: A young ninja dreams ...
1    Title: One Piece. Overview: Pirate adventure t...
2    Title: Attack on Titan. Overview: Humanity fig...
Name: combined_info, dtype: object


# 等效写法

In [9]:
# 示例数据
anime1 = pd.DataFrame({
    'Name': ['Naruto', 'One Piece', 'Attack on Titan'],
    'sypnopsis': [
        'A young ninja dreams of becoming Hokage',
        'Pirate adventure to find the ultimate treasure', 
        'Humanity fights against giant humanoid creatures'
    ],
    'Genres': ['Action, Adventure, Comedy', 'Action, Adventure, Fantasy', 'Action, Drama, Fantasy']
})
def create_combined_info(row):
    return f"Title: {row['Name']}. Overview: {row['sypnopsis']} Genres: {row['Genres']}"

anime1['combined_info'] = anime.apply(create_combined_info, axis=1)
print(anime1['combined_info'])

0    Title: Naruto. Overview: A young ninja dreams ...
1    Title: One Piece. Overview: Pirate adventure t...
2    Title: Attack on Titan. Overview: Humanity fig...
Name: combined_info, dtype: object


In [11]:
anime2 = pd.DataFrame({
    'Name': ['Naruto', 'One Piece', 'Attack on Titan'],
    'sypnopsis': [
        'A young ninja dreams of becoming Hokage',
        'Pirate adventure to find the ultimate treasure', 
        'Humanity fights against giant humanoid creatures'
    ],
    'Genres': ['Action, Adventure, Comedy', 'Action, Adventure, Fantasy', 'Action, Drama, Fantasy']
})
anime2['combined_info'] = "Title: " + anime['Name'] + ". Overview: " + anime['sypnopsis'] + " Genres: " + anime['Genres']
print(anime2['combined_info'])

0    Title: Naruto. Overview: A young ninja dreams ...
1    Title: One Piece. Overview: Pirate adventure t...
2    Title: Attack on Titan. Overview: Humanity fig...
Name: combined_info, dtype: object


## 按列操作

In [4]:
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
print(df)
# 对每列应用函数（默认 axis=0）
result = df.apply(np.sum)  # 计算每列的和
print(result)
# 输出:
# A     6
# B    15
# C    24
# dtype: int64

   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
A     6
B    15
C    24
dtype: int64


In [2]:
# 对每行应用函数
result = df.apply(np.sum, axis=1)
print(result)


# 使用 lambda 函数
df['row_sum'] = df.apply(lambda row: row['A'] + row['B'] + row['C'], axis=1)
print(df)

0    12
1    15
2    18
dtype: int64
   A  B  C  row_sum
0  1  4  7       12
1  2  5  8       15
2  3  6  9       18


### `x.title()`是 Python 中字符串对象的一个内置方法，用于将字符串转换为标题格式。该方法会将字符串中每个单词的首字母转换为大写，其余字母转换为小写，并返回一个新的字符串，原字符串不会被修改。

In [6]:
my_string = "hello, world!"
new_string = my_string.title()
print(new_string)

Hello, World!


## 数据清洗

In [5]:
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'CHARLIE'],
    'age': ['25', '30', '35'],
    'salary': ['$50000', '$60000', '$70000']
})

# 统一姓名格式
df['name'] = df['name'].apply(lambda x: x.title())

# 转换数据类型
df['age'] = df['age'].apply(int)

# 处理货币数据
df['salary'] = df['salary'].apply(lambda x: int(x.replace('$', '')))

print(df)

      name  age  salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000


## 复杂数据处理

In [7]:
df = pd.DataFrame({
    'email': ['alice@example.com', 'bob@gmail.com', 'charlie@yahoo.com'],
    'full_name': ['Alice Smith', 'Bob Johnson', 'Charlie Brown']
})

# 提取域名
df['domain'] = df['email'].apply(lambda x: x.split('@')[1])

# 提取姓氏
df['last_name'] = df['full_name'].apply(lambda x: x.split()[-1])

print(df)

               email      full_name       domain last_name
0  alice@example.com    Alice Smith  example.com     Smith
1      bob@gmail.com    Bob Johnson    gmail.com   Johnson
2  charlie@yahoo.com  Charlie Brown    yahoo.com     Brown
