# Pandas怎样新增数据列？
在进行数据分析时，经常需要按照一定条件创建新的数据列，然后进行下一步分析。

1. 直接赋值 <br>
2. df.apply方法<br>
3. df.assign方法<br>
4. 按条件选择分组分别赋值

In [1]:
import pandas as pd

## 0. 读取csv数据到DataFrame

In [2]:
fpath="../datas/weather_20230115134249.csv"
df = pd.read_csv(fpath)

In [3]:
df.head()

Unnamed: 0,日期,城市,行政区,观测站,气温(度),相对湿度(%),累积雨量(mm)
0,2015-01-01,新北市,烏來區,福山,13.7℃,92,0.0
1,2015-01-02,臺南市,安平區,安平,23.5℃,70,0.0
2,2015-01-03,臺東縣,東河鄉,七塊厝,19.6℃,86,0.0
3,2015-01-04,新北市,貢寮區,福隆,14.2℃,96,-99.0
4,2015-01-05,南投縣,仁愛鄉,小奇萊,8.3℃,57,0.0


## 1. 直接赋值的方法
实例：清理温度列，变成数字类型

In [4]:
# 替换掉温度的后缀℃
df.loc[:,"气温(度)"] = df["气温(度)"].str.replace("℃","").astype("float")

实例：计算温度差

In [5]:
# 注意，df["气温(度)"]其实是一个Series，后面的减法返回的是Series
df.loc[:,"wencha"]=df["气温(度)"]-2

In [6]:
df.head()

Unnamed: 0,日期,城市,行政区,观测站,气温(度),相对湿度(%),累积雨量(mm),wencha
0,2015-01-01,新北市,烏來區,福山,13.7,92,0.0,11.7
1,2015-01-02,臺南市,安平區,安平,23.5,70,0.0,21.5
2,2015-01-03,臺東縣,東河鄉,七塊厝,19.6,86,0.0,17.6
3,2015-01-04,新北市,貢寮區,福隆,14.2,96,-99.0,12.2
4,2015-01-05,南投縣,仁愛鄉,小奇萊,8.3,57,0.0,6.3


##  2. df.apply方法
Apply a function along an axis of the DataFrame.

Object passed to the function are Series objects whose index is either the DataFrame's index(axis=0) or the DataFrame's columns(axis=1).

实例：添加一列温度类型：
* 如果最高温度大于33度就是高温
* 低于-5度就是低温
* 否则就是常温

In [7]:
def get_wendu_type(x):
    if x["气温(度)"] > 33:
        return '高温'
    if x["气温(度)"] < -10:
        return '低温'
    return '常温'

# 注意需要设置axis==1,这是Series的index是columns
df.loc[:,"wendu_type"] = df.apply(get_wendu_type,axis=1)

In [8]:
# 查看 温度类型的计数
df["wendu_type"].value_counts()

常温    473
低温     11
Name: wendu_type, dtype: int64

## 3.df.assign方法
Assign new columns to a DataFrame.
Return a new object with all original columns in addition to new ones.

实例：将温度从摄氏度变成华氏度

In [9]:
# 可以同事添加多个新的列
df.assign(
    #摄氏度转华氏度
    tem_huashi = lambda x : x["气温(度)"] * 9 / 5 + 32
)

Unnamed: 0,日期,城市,行政区,观测站,气温(度),相对湿度(%),累积雨量(mm),wencha,wendu_type,tem_huashi
0,2015-01-01,新北市,烏來區,福山,13.7,92,0.0,11.7,常温,56.66
1,2015-01-02,臺南市,安平區,安平,23.5,70,0.0,21.5,常温,74.30
2,2015-01-03,臺東縣,東河鄉,七塊厝,19.6,86,0.0,17.6,常温,67.28
3,2015-01-04,新北市,貢寮區,福隆,14.2,96,-99.0,12.2,常温,57.56
4,2015-01-05,南投縣,仁愛鄉,小奇萊,8.3,57,0.0,6.3,常温,46.94
5,2015-01-06,嘉義縣,大林鎮,大林,23.2,63,0.0,21.2,常温,73.76
6,2015-01-07,花蓮縣,玉里鎮,玉里,18.5,85,0.0,16.5,常温,65.30
7,2015-01-08,嘉義市,東區,嘉義市東區,25.0,64,0.0,23.0,常温,77.00
8,2015-01-09,宜蘭縣,頭城鎮,龜山島,12.0,100,0.5,10.0,常温,53.60
9,2015-01-10,新北市,平溪區,火燒寮,12.3,100,2.5,10.3,常温,54.14


## 4. 按条件选择分组分别赋值
按条件先选择温度，然后对这部分数据赋值新列<br>
实例：温度减去10度小于15度，则认为温差大

In [10]:
# 先创建空列（这是第一种创建新列的方法）
df['wencha_type']=''
df.loc[df["气温(度)"]-10 < 15,"wencha_type"]="温差大"
df.loc[df["气温(度)"]-10 >= 15,"wencha_type"]="温差正常"

In [11]:
df["wencha_type"].value_counts()

温差大     410
温差正常     74
Name: wencha_type, dtype: int64