### Pandas怎样对每个分组应用apply函数
知识：
+ pandas的groupby遵从split,apply,combine的模式
**GroupBy.apply(function)**
+ function的第一个参数是dataframe
+ function的返回结果，可是dataframe,series,value

本次目标：
+ 怎样对数值列按分组归一化
+ 怎样取每个数据的topn

### 实例一、数值列按列进行归一化
用户对电影评分的归一化

In [1]:
import pandas as pd
import numpy as np
ratings=pd.read_csv(
    "./datas/movielens-1m/ratings.dat",
    sep="::",
    engine="python",
    names="userId,movieId,rating,timestamp".split(",")
)
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [2]:
# 对用户评分经行归一化
df2=ratings.groupby("userId",as_index=True).agg({"rating":[np.max,np.min]})
df2.head()

Unnamed: 0_level_0,rating,rating
Unnamed: 0_level_1,amax,amin
userId,Unnamed: 1_level_2,Unnamed: 2_level_2
1,5,3
2,5,1
3,5,1
4,5,1
5,5,1


In [3]:
# 定义实现归一化的函数
def norm(df):
    min_value=df["rating"].min()
    max_value=df["rating"].max()
    df["norm_value"]=df["rating"].apply(
        lambda x:(x-min_value)/(max_value-min_value))  # 对每一个value执行操作
    return df

**使用apply的意义:**
+ df.apply()调用apply的时候会把 dataframe或者series的值传递下来。series传递value,dataframe传递series。
+ 相当于完成**取值，计算赋值**两个工作
**groupby + apply的时候有三个步骤**
+ groupby完成split操作
+ 

In [4]:
ratings=ratings.groupby("userId").apply(norm)
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp,norm_value
0,1,1193,5,978300760,1.0
1,1,661,3,978302109,0.0
2,1,914,3,978301968,0.0
3,1,3408,4,978300275,0.5
4,1,2355,5,978824291,1.0


### 实例2：怎样取每个分组的topn
+ 获取每个月的最高2天的数据

In [5]:
fpath = "./datas/beijing_tianqi/beijing_tianqi_2018.csv"
df = pd.read_csv(fpath)
# 替换掉温度的后缀℃
df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃", "").astype('int32')
df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃", "").astype('int32')
# 新增一列为月份
df['yuefen'] = df['ymd'].str[:7]
df.head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,yuefen
0,2018-01-01,3,-6,晴~多云,东北风,1-2级,59,良,2,2018-01
1,2018-01-02,2,-5,阴~多云,东北风,1-2级,49,优,1,2018-01
2,2018-01-03,2,-5,多云,北风,1-2级,28,优,1,2018-01
3,2018-01-04,0,-8,阴,东北风,1-2级,28,优,1,2018-01
4,2018-01-05,3,-6,多云~晴,西北风,1-2级,50,优,1,2018-01


In [6]:
df2=df.groupby("yuefen").apply(lambda x : x["bWendu"].max())
df2.head()

yuefen
2018-01     7
2018-02    12
2018-03    27
2018-04    30
2018-05    35
dtype: int64

In [10]:
def get_topn(df,topn):
    return df.sort_values(by="bWendu")[["ymd","bWendu"]][-topn:]
df.groupby("yuefen").apply(get_topn,topn=3).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,ymd,bWendu
yuefen,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-01,16,2018-01-17,6
2018-01,13,2018-01-14,6
2018-01,18,2018-01-19,7
2018-02,58,2018-02-28,9
2018-02,53,2018-02-23,10


我们看到，groupby的apply函数返回的dataframe，其实和原来的dataframe是不一样的