transform函数一直没懂，直到看到这篇文章  ``Understanding the Transform Function in Pandas``, 假设我们有下面的销售数据，有三个不同的 **订单id** （10001， 10005和10006）,每个 订单id又含有多个产品。

In [15]:
import pandas as pd

df = pd.read_excel("sales_transactions.xlsx")
df

Unnamed: 0,account,name,order,sku,quantity,unit price,ext price
0,383080,Will LLC,10001,B1-20000,7,33.69,235.83
1,383080,Will LLC,10001,S1-27722,11,21.12,232.32
2,383080,Will LLC,10001,B1-86481,3,35.99,107.97
3,412290,Jerde-Hilpert,10005,S1-06532,48,55.82,2679.36
4,412290,Jerde-Hilpert,10005,S1-82801,21,13.62,286.02
5,412290,Jerde-Hilpert,10005,S1-06532,9,92.55,832.95
6,412290,Jerde-Hilpert,10005,S1-47412,44,78.91,3472.04
7,412290,Jerde-Hilpert,10005,S1-27722,36,25.42,915.12
8,218895,Kulas Inc,10006,S1-27722,32,95.66,3061.12
9,218895,Kulas Inc,10006,B1-33087,23,22.55,518.65


### 问题

**每个订单中各个单品费用分布及占比**

例如,订单10001总价$576.12，细分一下

```
B1-20000 = $235.83 or 40.9%
S1-27722 = $232.32 or 40.3%
B1-86481 = $107.97 or 18.7%
```

如果还是没看懂，直接看代码吧。通过代码理解，比看大邓唠叨更有效果。

**Talk is cheep, show me your code**

### 第一种方法-merge
如果熟悉pandas的话，应该先试图对dataframe进行groupby

In [4]:
df.groupby('order')['ext price'].sum()

order
10001     576.12
10005    8185.49
10006    3724.49
Name: ext price, dtype: float64

## 图
现在的难点是如何将刚刚分析出的数据与原始数据加工到一起，因为新旧数据的长度是不一样的。

最直接的方法就是新建一个dataframe

In [12]:
order_total = df.groupby('order')['ext price'].sum().rename('Order_Total').reset_index()
df_1 = df.merge(order_total)
df_1['Percent_of_Order'] = df_1['ext price']/df_1['Order_Total']
df_1

Unnamed: 0,account,name,order,sku,quantity,unit price,ext price,Order_Total,Percent_of_Order
0,383080,Will LLC,10001,B1-20000,7,33.69,235.83,576.12,0.409342
1,383080,Will LLC,10001,S1-27722,11,21.12,232.32,576.12,0.403249
2,383080,Will LLC,10001,B1-86481,3,35.99,107.97,576.12,0.187409
3,412290,Jerde-Hilpert,10005,S1-06532,48,55.82,2679.36,8185.49,0.32733
4,412290,Jerde-Hilpert,10005,S1-82801,21,13.62,286.02,8185.49,0.034942
5,412290,Jerde-Hilpert,10005,S1-06532,9,92.55,832.95,8185.49,0.101759
6,412290,Jerde-Hilpert,10005,S1-47412,44,78.91,3472.04,8185.49,0.42417
7,412290,Jerde-Hilpert,10005,S1-27722,36,25.42,915.12,8185.49,0.111798
8,218895,Kulas Inc,10006,S1-27722,32,95.66,3061.12,3724.49,0.82189
9,218895,Kulas Inc,10006,B1-33087,23,22.55,518.65,3724.49,0.139254


### 第二种方法- transform
使用原始dataframe，让我们看看经过transform与groupby处理后，我们得到的都是什么

In [13]:
df.groupby('order')['ext price'].transform('sum')

0      576.12
1      576.12
2      576.12
3     8185.49
4     8185.49
5     8185.49
6     8185.49
7     8185.49
8     3724.49
9     3724.49
10    3724.49
11    3724.49
Name: ext price, dtype: float64

我们发现得到的数据长度与groupby长度不同（长度是3），而与原始数据df的长度是一样的。

所以我们可以一行代码解决本文的问题

In [16]:
df["Percent_of_Order"] = df["ext price"] /  df.groupby('order')["ext price"].transform('sum')
df

Unnamed: 0,account,name,order,sku,quantity,unit price,ext price,Percent_of_Order
0,383080,Will LLC,10001,B1-20000,7,33.69,235.83,0.409342
1,383080,Will LLC,10001,S1-27722,11,21.12,232.32,0.403249
2,383080,Will LLC,10001,B1-86481,3,35.99,107.97,0.187409
3,412290,Jerde-Hilpert,10005,S1-06532,48,55.82,2679.36,0.32733
4,412290,Jerde-Hilpert,10005,S1-82801,21,13.62,286.02,0.034942
5,412290,Jerde-Hilpert,10005,S1-06532,9,92.55,832.95,0.101759
6,412290,Jerde-Hilpert,10005,S1-47412,44,78.91,3472.04,0.42417
7,412290,Jerde-Hilpert,10005,S1-27722,36,25.42,915.12,0.111798
8,218895,Kulas Inc,10006,S1-27722,32,95.66,3061.12,0.82189
9,218895,Kulas Inc,10006,B1-33087,23,22.55,518.65,0.139254
