# Pandas怎样新增数据列？
在进行数据分析时，经常需要按照一定条件创建新的数据列，然后进行下一步分析。

1. 直接赋值 <br>
2. df.apply方法<br>
3. df.assign方法<br>
4. 按条件选择分组分别赋值

In [2]:
import pandas as pd

## 0. 读取csv数据到DataFrame

In [3]:
fpath="datas/weather_20230115134249.csv"
df = pd.read_csv(fpath)

In [4]:
df.head()

Unnamed: 0,日期,城市,行政区,观测站,气温(度),相对湿度(%),累积雨量(mm)
0,2015-1-1,新北市,烏來區,福山,13.7℃,92,0.0
1,2015-1-2,臺南市,安平區,安平,23.5℃,70,0.0
2,2015-1-3,臺東縣,東河鄉,七塊厝,19.6℃,86,0.0
3,2015-1-4,新北市,貢寮區,福隆,14.2℃,96,-99.0
4,2015-1-5,南投縣,仁愛鄉,小奇萊,8.3℃,57,0.0


## 1. 直接赋值的方法
实例：清理温度列，变成数字类型

In [6]:
# 替换掉温度的后缀℃
df.loc[:,"气温(度)"] = df["气温(度)"].str.replace("℃","").astype("float")

实例：计算温度差

In [7]:
# 注意，df["气温(度)"]其实是一个Series，后面的减法返回的是Series
df.loc[:,"wencha"]=df["气温(度)"]-2

In [15]:
df.head()

Unnamed: 0,日期,城市,行政区,观测站,气温(度),相对湿度(%),累积雨量(mm),wencha
0,2015-1-1,新北市,烏來區,福山,13.7,92,0.0,11.7
1,2015-1-2,臺南市,安平區,安平,23.5,70,0.0,21.5
2,2015-1-3,臺東縣,東河鄉,七塊厝,19.6,86,0.0,17.6
3,2015-1-4,新北市,貢寮區,福隆,14.2,96,-99.0,12.2
4,2015-1-5,南投縣,仁愛鄉,小奇萊,8.3,57,0.0,6.3


##  2. df.apply方法
Apply a function along an axis of the DataFrame.

Object passed to the function are Series objects whose index is either the DataFrame's index(axis=0) or the DataFrame's columns(axis=1).

实例：添加一列温度类型：
* 如果最高温度大于33度就是高温
* 低于-5度就是低温
* 否则就是常温

In [19]:
def get_wendu_type(x):
    if x["气温(度)"] > 33:
        return '高温'
    if x["气温(度)"] < -10:
        return '低温'
    return '常温'

# 注意需要设置axis==1,这是Series的index是columns
df.loc[:,"wendu_type"] = df.apply(get_wendu_type,axis=1)

In [22]:
# 查看 温度类型的计数
df["wendu_type"].value_counts()

常温    473
低温     11
Name: wendu_type, dtype: int64

## 3.df.assign方法
Assign new columns to a DataFrame.
Return a new object with all original columns in addition to new ones.

实例：将温度从摄氏度变成华氏度

In [24]:
# 可以同事添加多个新的列
df.assign(
    #摄氏度转华氏度
    tem_huashi = lambda x : x["气温(度)"] * 9 / 5 + 32
)

Unnamed: 0,日期,城市,行政区,观测站,气温(度),相对湿度(%),累积雨量(mm),wencha,wendu_type,tem_huashi
0,2015-1-1,新北市,烏來區,福山,13.7,92,0.0,11.7,常温,56.66
1,2015-1-2,臺南市,安平區,安平,23.5,70,0.0,21.5,常温,74.30
2,2015-1-3,臺東縣,東河鄉,七塊厝,19.6,86,0.0,17.6,常温,67.28
3,2015-1-4,新北市,貢寮區,福隆,14.2,96,-99.0,12.2,常温,57.56
4,2015-1-5,南投縣,仁愛鄉,小奇萊,8.3,57,0.0,6.3,常温,46.94
...,...,...,...,...,...,...,...,...,...,...
479,2016-4-24,新北市,板橋區,板橋,15.6,73,5.0,13.6,常温,60.08
480,2016-4-25,宜蘭縣,大同鄉,翠峰湖,7.8,100,0.0,5.8,常温,46.04
481,2016-4-26,花蓮縣,壽豐鄉,大坑,15.7,94,0.0,13.7,常温,60.26
482,2016-4-27,新竹縣,五峰鄉,雪霸,15.7,62,0.0,13.7,常温,60.26


## 4. 按条件选择分组分别赋值
按条件先选择温度，然后对这部分数据赋值新列<br>
实例：温度减去10度小于15度，则认为温差大

In [25]:
# 先创建空列（这是第一种创建新列的方法）
df['wencha_type']=''
df.loc[df["气温(度)"]-10 < 15,"wencha_type"]="温差大"
df.loc[df["气温(度)"]-10 >= 15,"wencha_type"]="温差正常"

In [26]:
df["wencha_type"].value_counts()

温差大     410
温差正常     74
Name: wencha_type, dtype: int64