<center><h1>Chapter 4 Grouping</h1></center>

In [1]:
import numpy as np
import pandas as pd

## 1. Grouping mode and its objects
### 1. General grouping mode
Grouping operations are widely used in daily life, for example:

* According to $\color{#FF0000}{sex}$ grouping, calculate the $\color{#00FF00}{average value}$ of the national population $\color{#00FF00}{life expectancy}$
* According to $\color{#FF0000}{season}$ grouping, $\color{#00FF00}{temperature}$ of each season is $\color{#0000FF}{in-group standardization}$
* According to $\color{#FF0000}{class}$ filter out the $\color{#00FF00}{class with an average value of more than 80 points}$ of $\color{#00FF00}{math score}$ within the group

It is not difficult to see from the above examples that in order to implement the grouping operation, three elements must be clarified: $\color{#FF0000}{grouping basis}$, $\color{#00FF00}{data source}$, $\color{#0000FF}{operation and its return result}$. At the same time, from the perspective of sufficiency, if these three aspects are clarified, a grouping operation can be determined, so the general pattern of the grouping code is: 
```
df.groupby(grouping basis)[data source].use operation
```
For example, the code in the first example should be as follows:
```
df.groupby('Gender')['Longevity'].mean()
```

Now let’s go back to the student physical test data set. If we want to calculate the median height by gender, we can write it as follows:

In [2]:
df = pd.read_csv('../data/learn_pandas.csv')
df.groupby('Gender')['Height'].median()

Gender
Female    159.6
Male      173.4
Name: Height, dtype: float64

### 2. The essence of grouping
The examples mentioned above are all grouping by a single dimension, such as gender. What if we need to group by multiple dimensions? In fact, we only need to pass a list of corresponding column names in `groupby`. For example, if we want to group by school and gender, and calculate the mean height, we can write it as follows:

In [3]:
df.groupby(['School', 'Gender'])['Height'].mean()

School                         Gender
Fudan University               Female    158.776923
                               Male      174.212500
Peking University              Female    158.666667
                               Male      172.030000
Shanghai Jiao Tong University  Female    159.122500
                               Male      176.760000
Tsinghua University            Female    159.753333
                               Male      171.638889
Name: Height, dtype: float64

So far, the grouping criteria of `groupby` can be directly obtained from the column by name. If you want to group by some complex logic, for example, group by whether the student weight exceeds the overall mean, and also calculate the mean height.

First, you should write the grouping conditions:

In [4]:
condition = df.Weight > df.Weight.mean()

Then pass it into `groupby`:

In [5]:
df.groupby(condition)['Height'].mean()

Weight
False    159.034646
True     172.705357
Name: Height, dtype: float64

#### 【Practice】
Please divide the weight into three groups: high, normal, and low according to the upper and lower quartiles, and calculate the mean of the height.
#### 【END】
As can be seen from the index, the final result is actually grouped according to the value of the element in the condition list (here is `True` and `False`). The following uses a random letter sequence to verify this idea:

In [6]:
item = np.random.choice(list('abc'), df.shape[0])
df.groupby(item)['Height'].mean()

a    163.094828
b    163.874603
c    162.666129
Name: Height, dtype: float64

The index here is the element in the original item. If multiple sequences are passed into `groupby`, the final grouping is based on the unique combination of the corresponding rows of the two sequences:

In [7]:
df.groupby([condition, item])['Height'].mean()

Weight   
False   a    159.334146
        b    159.257143
        c    158.543182
True    a    172.164706
        b    173.109524
        c    172.744444
Name: Height, dtype: float64

From this, we can see that the column name passed in before is just a simple notation. In fact, it is equivalent to passing in one or more columns. The final grouping is based on the unique value of the data source combination. The specific group category can be known through `drop_duplicates`:

In [8]:
df[['School', 'Gender']].drop_duplicates()

Unnamed: 0,School,Gender
0,Shanghai Jiao Tong University,Female
1,Peking University,Male
2,Shanghai Jiao Tong University,Male
3,Fudan University,Female
4,Fudan University,Male
5,Tsinghua University,Female
9,Peking University,Female
16,Tsinghua University,Male


In [9]:
df.groupby([df['School'], df['Gender']])['Height'].mean()

School                         Gender
Fudan University               Female    158.776923
                               Male      174.212500
Peking University              Female    158.666667
                               Male      172.030000
Shanghai Jiao Tong University  Female    159.122500
                               Male      176.760000
Tsinghua University            Female    159.753333
                               Male      171.638889
Name: Height, dtype: float64

### 3. Groupby object
It can be noticed that when performing the grouping operation, the methods called are all from the `groupby` object in `pandas`, which defines many methods and has some convenient properties.

In [10]:
gb = df.groupby(['School', 'Grade'])
gb

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001C1E7AB1408>

Through the `ngroups` attribute, you can get the number of groups:

In [11]:
gb.ngroups

16

Through the `groups` attribute, you can return a dictionary mapping from $\color{#FF0000}{group name}$ to $\color{#FF0000}{group index list}$:

In [12]:
res = gb.groups
res.keys() # 字典的值由于是索引，元素个数过多，此处只展示字典的键

dict_keys([('Fudan University', 'Freshman'), ('Fudan University', 'Junior'), ('Fudan University', 'Senior'), ('Fudan University', 'Sophomore'), ('Peking University', 'Freshman'), ('Peking University', 'Junior'), ('Peking University', 'Senior'), ('Peking University', 'Sophomore'), ('Shanghai Jiao Tong University', 'Freshman'), ('Shanghai Jiao Tong University', 'Junior'), ('Shanghai Jiao Tong University', 'Senior'), ('Shanghai Jiao Tong University', 'Sophomore'), ('Tsinghua University', 'Freshman'), ('Tsinghua University', 'Junior'), ('Tsinghua University', 'Senior'), ('Tsinghua University', 'Sophomore')])

#### 【Practice】
The previous section introduced how to get specific group categories through `drop_duplicates`. Now please use the `groups` attribute to complete a similar function.
#### 【END】
When `size` is used as a `DataFrame` attribute, the returned value is the table length multiplied by the table width, but on the `groupby` object, it means counting the number of elements in each group:

In [13]:
gb.size()

School                         Grade    
Fudan University               Freshman      9
                               Junior       12
                               Senior       11
                               Sophomore     8
Peking University              Freshman     13
                               Junior        8
                               Senior        8
                               Sophomore     5
Shanghai Jiao Tong University  Freshman     13
                               Junior       17
                               Senior       22
                               Sophomore     5
Tsinghua University            Freshman     17
                               Junior       22
                               Senior       14
                               Sophomore    16
dtype: int64

The `get_group` method can be used to directly get the row corresponding to the group. At this time, the specific name of the group must be known:

In [14]:
gb.get_group(('Fudan University', 'Freshman'))

Unnamed: 0,School,Grade,Name,Gender,Height,Weight,Transfer,Test_Number,Test_Date,Time_Record
15,Fudan University,Freshman,Changqiang Yang,Female,156.0,49.0,N,3,2020/1/1,0:05:25
28,Fudan University,Freshman,Gaoqiang Qin,Female,170.2,63.0,N,2,2020/1/7,0:05:24
63,Fudan University,Freshman,Gaofeng Zhao,Female,152.2,43.0,N,2,2019/10/31,0:04:00
70,Fudan University,Freshman,Yanquan Wang,Female,163.5,55.0,N,1,2019/11/19,0:04:07
73,Fudan University,Freshman,Feng Wang,Male,176.3,74.0,N,1,2019/9/26,0:03:31
105,Fudan University,Freshman,Qiang Shi,Female,164.5,52.0,N,1,2019/12/11,0:04:23
108,Fudan University,Freshman,Yanqiang Xu,Female,152.4,38.0,N,1,2019/12/8,0:05:03
157,Fudan University,Freshman,Xiaoli Lv,Female,152.5,45.0,N,2,2019/9/11,0:04:17
186,Fudan University,Freshman,Yanjuan Zhao,Female,,53.0,N,2,2019/10/9,0:04:21


Here are 2 attributes and 2 methods, and the previous `mean` and `median` are methods on the `groupby` object. These functions are highly similar to many other functions, and will be introduced in the following sections.
### 4. Three major operations of grouping
After being familiar with some basic knowledge of grouping, go back to the three examples at the beginning, and you may find some clues, that is, the data types returned by these three types of grouping are not the same:

* In the first example, each group returns a scalar value, which can be the average, median, group capacity `size`, etc.

* In the second example, the original sequence is standardized, that is, each group returns a `Series` type

* In the third example, it is neither a scalar nor a sequence, and the row of the entire group is returned, that is, the `DataFrame` type is returned

From this, the three major operations of grouping are derived: aggregation, transformation and filtering, which correspond to the operations of the three examples respectively. The following will introduce the corresponding `agg`, `transform` and `filter` functions and their operations respectively.
## 2. Aggregate functions
### 1. Built-in aggregate functions
Before introducing agg, we must first understand some aggregate functions directly defined in the groupby object, because their speed is basically optimized internally and should be given priority when using the function. Based on the principle of returning scalar values, it includes the following functions: `max/min/mean/median/count/all/any/idxmax/idxmin/mad/nunique/skew/quantile/sum/std/var/sem/size/prod`.

In [15]:
gb = df.groupby('Gender')['Height']
gb.idxmin()

Gender
Female    143
Male      199
Name: Height, dtype: int64

In [16]:
gb.quantile(0.95)

Gender
Female    166.8
Male      185.9
Name: Height, dtype: float64

#### 【Practice】
Please refer to the document to clarify the meaning of `all/any/mad/skew/sem/prod` functions.
#### 【END】
When the incoming data source contains multiple columns, these aggregate functions will be iteratively calculated according to the columns:

In [17]:
gb = df.groupby('Gender')[['Height', 'Weight']]
gb.max()

Unnamed: 0_level_0,Height,Weight
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,170.2,63.0
Male,193.9,89.0


### 2. agg method
Although many convenient functions are defined on the `groupby` object, there are still the following inconveniences:

* Unable to use multiple functions at the same time
* Unable to use a specific aggregate function for a specific column
* Unable to use a custom aggregate function
* Unable to directly customize the result column name before aggregation

Here is how to solve these four types of problems with the `agg` function:

[a] Use multiple functions

When using multiple aggregate functions, you need to pass in the strings corresponding to the built-in aggregate functions in the form of a list. All the strings mentioned previously are legal.

In [18]:
gb.agg(['sum', 'idxmax', 'skew'])

Unnamed: 0_level_0,Height,Height,Height,Weight,Weight,Weight
Unnamed: 0_level_1,sum,idxmax,skew,sum,idxmax,skew
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Female,21014.0,28,-0.219253,6469.0,28,-0.268482
Male,8854.9,193,0.437535,3929.0,2,-0.332393


From the results, the column index is a multi-level index. The first layer is the data source, and the second layer is the aggregation method used. Aggregation is applied to each column one by one, so the result is 6 columns.

[b] Use specific aggregation functions for specific columns

For the special correspondence between methods and columns, it can be achieved by constructing a dictionary and passing it into `agg`, where the dictionary uses the column name as the key and the aggregate string or string list as the value.

In [19]:
gb.agg({'Height':['mean','max'], 'Weight':'count'})

Unnamed: 0_level_0,Height,Height,Weight
Unnamed: 0_level_1,mean,max,count
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,159.19697,170.2,135
Male,173.62549,193.9,54


#### 【Practice】
Please use the dictionary input method in 【b】 to complete the equivalent aggregation task in 【a】.
#### 【END】
【c】Use custom functions

Specific custom functions can be used in `agg`, $\color{#FF0000}{Note that the parameters of the input function are the columns in the previous data source, and the calculation is performed column by column}$. The following group calculates the range of height and weight:

In [20]:
gb.agg(lambda x: x.mean()-x.min())

Unnamed: 0_level_0,Height,Weight
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,13.79697,13.918519
Male,17.92549,21.759259


#### 【Practice】
In the `groupby` object, you can use the `describe` method to summarize statistical information. Please use multiple aggregation functions at the same time to complete the same function as this method.
#### 【END】
Since the sequence is passed in, the methods and properties on the sequence can be used in the function. You only need to ensure that the return value is a scalar. The following example means that if the mean of the group's indicator exceeds the overall mean of the indicator, High is returned, otherwise Low is returned.

In [21]:
def my_func(s):
    res = 'High'
    if s.mean() <= df[s.name].mean():
        res = 'Low'
    return res
gb.agg(my_func)

Unnamed: 0_level_0,Height,Weight
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,Low,Low
Male,High,High


[d] Rename the aggregation result

If you want to rename the column name of the aggregation result, you only need to rewrite the position of the above function into a tuple. The first element of the tuple is the new name, and the second position is the original function, including the aggregation string and the custom function. Here are some examples:

In [22]:
gb.agg([('range', lambda x: x.max()-x.min()), ('my_sum', 'sum')])

Unnamed: 0_level_0,Height,Height,Weight,Weight
Unnamed: 0_level_1,range,my_sum,range,my_sum
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Female,24.8,21014.0,29.0,6469.0
Male,38.2,8854.9,38.0,3929.0


In [23]:
gb.agg({'Height': [('my_func', my_func), 'sum'], 'Weight': lambda x:x.max()})

Unnamed: 0_level_0,Height,Height,Weight
Unnamed: 0_level_1,my_func,sum,<lambda>
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,Low,21014.0,63.0
Male,High,8854.9,89.0


Also note that when using a single aggregate for one or more columns, you need to add square brackets to rename them, otherwise you will not know whether it is a new name or a built-in function string that was entered incorrectly by hand:

In [24]:
gb.agg([('my_sum', 'sum')])

Unnamed: 0_level_0,Height,Weight
Unnamed: 0_level_1,my_sum,my_sum
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2
Female,21014.0,6469.0
Male,8854.9,3929.0


In [25]:
gb.agg({'Height': [('my_func', my_func), 'sum'], 'Weight': [('range', lambda x:x.max())]})

Unnamed: 0_level_0,Height,Height,Weight
Unnamed: 0_level_1,my_func,sum,range
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,Low,21014.0,63.0
Male,High,8854.9,89.0


## III. Transformation and Filtering
### 1. Transformation Function and Transform Method
The return value of the transformation function is a sequence of the same length. The most commonly used built-in transformation functions are accumulation functions: `cumcount/cumsum/cumprod/cummax/cummin`. They are used in a similar way to aggregation functions, except that they perform intra-group accumulation operations. In addition, fill-type and sliding-window-type transformation functions are defined on the `groupby` object. The general forms of these functions will be discussed in Chapter 7 and Chapter 10, respectively, and are omitted here.

In [26]:
gb.cummax().head()

Unnamed: 0,Height,Weight
0,158.9,46.0
1,166.5,70.0
2,188.9,89.0
3,,46.0
4,188.9,89.0


#### 【Practice】
In the `groupby` object, the `rank` method is also a useful transformation function. Please refer to its function and give an example of its use.
#### 【END】
When using custom transformation, you need to use the `transform` method. The called custom function, $\color{#FF0000}{its input value is a sequence of data sources}$, is consistent with the input type of `agg`, and its final return result is a `DataFrame` with row and column indexes consistent with the data source.

Now we standardize the height and weight by subtracting the group mean and dividing by the group standard deviation:

In [27]:
gb.transform(lambda x: (x-x.mean())/x.std()).head()

Unnamed: 0,Height,Weight
0,-0.05876,-0.354888
1,-1.010925,-0.355
2,2.167063,2.089498
3,,-1.279789
4,0.053133,0.159631


#### 【Practice】
For the `transform` method, it is not possible to use a specific transformation on a specified column by passing in a dictionary like `agg`. If you need to implement this function in a `transform` call, please provide a solution.
#### 【END】
It was mentioned earlier that `transform` can only return sequences of the same length, but in fact it can also return a scalar, which will cause the result to be broadcast to the entire group it belongs to. This $\color{#FF0000}{scalar broadcast}$ technique is very common in feature engineering. For example, construct two new columns of features to represent the mean height and weight of the gender group of the sample:

In [28]:
gb.transform('mean').head() # 传入返回标量的函数也是可以的

Unnamed: 0,Height,Weight
0,159.19697,47.918519
1,173.62549,72.759259
2,173.62549,72.759259
3,159.19697,47.918519
4,173.62549,72.759259


### 2. Group Index and Filtering

In the previous chapter, we introduced the usage of index, so what is the difference between index and filtering?

Filtering in grouping is filtering for groups, while indexing is filtering for rows. The return value in the second chapter, whether it is a Boolean list, an element list, or a position list, is essentially filtering for rows, that is, if it meets the filtering conditions, it will be selected into the result table, otherwise it will not be selected.

Group filtering is a generalization of row filtering, which means that if the result of counting all the rows of a group returns `True`, it will be retained, and if it returns `False`, the group will be filtered, and finally all the rows corresponding to the unfiltered groups are concatenated and returned as `DataFrame`.

In the `groupby` object, the `filter` method is defined to filter the group, where the input parameter of the custom function is the `DataFrame` itself that constitutes the data source. In the `groupby` object defined in the previous example, the input is `df[['Height', 'Weight']]`, so all table methods and properties can be used accordingly in the custom function, and you only need to ensure that the return value of the custom function is a Boolean value.

For example, in the original table, all groups with a capacity greater than 100 are obtained by filtering:

In [29]:
gb.filter(lambda x: x.shape[0] > 100).head()

Unnamed: 0,Height,Weight
0,158.9,46.0
3,,41.0
5,158.0,51.0
6,162.5,52.0
7,161.9,50.0


#### 【Practice】
Conceptually, the indexing function is a subset of the group filtering function. Please use the `filter` function to complete the `loc[...]` function. Here, it is assumed that "`...`" is an element list. 
#### 【END】
## 4. Cross-column grouping
### 1. Introduction of apply
The previous sections introduced three major grouping operations, but in fact there is a common grouping scenario that cannot be handled by any of the methods introduced above. For example, the body mass index BMI is defined as follows:
$${\rm BMI} = {\rm\frac{Weight}{Height^2}}$$
Where the units of weight and height are kilograms and meters respectively, it is necessary to calculate the mean of the group BMI by group.

First, this is obviously not a filtering operation, so `filter` does not meet the requirements; second, the returned mean is a scalar rather than a sequence, so `transform` does not meet the requirements; finally, it seems that the `agg` function can be used to handle it, but it has been emphasized before that the aggregation function is processed column by column, and it is not possible to $\color{#FF0000}{process multiple columns of data at the same time}$. Therefore, the `apply` function is introduced to solve this problem.

### 2. Use of apply
In terms of design, the parameters passed to the custom function of `apply` are exactly the same as those of `filter`, except that the latter only allows Boolean values ​​to be returned. Now solve the above calculation problem as follows:

In [30]:
def BMI(x):
    Height = x['Height']/100
    Weight = x['Weight']
    BMI_value = Weight/Height**2
    return BMI_value.mean()
gb.apply(BMI)

Gender
Female    18.860930
Male      24.318654
dtype: float64

In addition to returning scalars, the `apply` method can also return one-dimensional `Series` and two-dimensional `DataFrame`, but how should the dimensions of the data frame they produce and the number of levels of the multi-level index change? The following three sets of examples will make it very easy to understand how the results are generated:

[a] Scalar case: The result is ``Series``, and the index is the same as the result of ``agg``

In [31]:
gb = df.groupby(['Gender','Test_Number'])[['Height','Weight']]
gb.apply(lambda x: 0)

Gender  Test_Number
Female  1              0
        2              0
        3              0
Male    1              0
        2              0
        3              0
dtype: int64

In [32]:
gb.apply(lambda x: [0, 0]) # 虽然是列表，但是作为返回值仍然看作标量

Gender  Test_Number
Female  1              [0, 0]
        2              [0, 0]
        3              [0, 0]
Male    1              [0, 0]
        2              [0, 0]
        3              [0, 0]
dtype: object

[b] `Series` case: The result is a `DataFrame`, the row index is the same as the scalar case, and the column index is the index of the `Series`

In [33]:
gb.apply(lambda x: pd.Series([0,0],index=['a','b']))

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b
Gender,Test_Number,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,1,0,0
Female,2,0,0
Female,3,0,0
Male,1,0,0
Male,2,0,0
Male,3,0,0


#### 【Practice】
Please try to return a `Series` of the same length but different indexes based on some characteristics of the group in the custom function passed in `apply`. Will there be an error?
#### 【END】
【c】`DataFrame` case: The result is a `DataFrame`, with the innermost row index added to the original `agg` result index of each group, and another layer of the returned `DataFrame` row index. At the same time, the column index of the grouped result `DataFrame` is consistent with the column index of the returned `DataFrame`.

In [34]:
gb.apply(lambda x: pd.DataFrame(np.ones((2,2)), index = ['a','b'], columns=pd.Index([('w','x'),('y','z')])))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,w,y
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,x,z
Gender,Test_Number,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Female,1,a,1.0,1.0
Female,1,b,1.0,1.0
Female,2,a,1.0,1.0
Female,2,b,1.0,1.0
Female,3,a,1.0,1.0
Female,3,b,1.0,1.0
Male,1,a,1.0,1.0
Male,1,b,1.0,1.0
Male,2,a,1.0,1.0
Male,2,b,1.0,1.0


#### 【Practice】
Please try to return a `DataFrame` of the same size but different column indexes based on some characteristics of the group in the custom function passed in by `apply`. Will there be an error? If only the row index is different, will there be an error?
#### 【END】
Finally, it should be emphasized that the flexibility of the `apply` function is obtained at the expense of certain performance. Unless you need to use cross-column group processing, you should use other specially designed `groupby` object methods, otherwise there will be a large gap in performance. At the same time, when using aggregation functions and transformation functions, you should also give priority to using built-in functions, which have been highly optimized for performance and are generally faster than using custom functions.
#### 【Practice】
The `cov` and `corr` functions are also defined in the `groupby` object, which conceptually also belong to cross-column group processing. Please use the `gb` object defined previously, use the apply function to implement the same function as `gb.cov()` and compare their performance.
#### 【END】
## 5. Exercises
### Ex1: Car Dataset
There is a car data set, where `Brand, Disp., HP` represent car brand, engine capacity, and engine output respectively.

In [35]:
df = pd.read_csv('../data/car.csv')
df.head(3)

Unnamed: 0,Brand,Price,Country,Reliability,Mileage,Type,Weight,Disp.,HP
0,Eagle Summit 4,8895,USA,4.0,33,Small,2560,97,113
1,Ford Escort 4,7402,USA,2.0,33,Small,2345,114,90
2,Ford Festiva 4,6319,Korea,4.0,37,Small,1845,81,63


1. First filter out cars that belong to more than 2 `Countries`, that is, if the `Country` of the car does not appear more than 2 times in the overall data set, remove it, and then calculate the price mean, price variation coefficient, and the number of cars in the `Country` by grouping `Country`. The coefficient of variation is calculated by dividing the standard deviation by the mean, and the coefficient of variation is renamed `CoV` in the result.
2. Group by the first third, middle third, and last third of the position in the table, and calculate the mean of `Price`.
3. Group the type `Type`, calculate the maximum and minimum values ​​of `Price` and `HP` respectively, and the result will produce a multi-level index. Please use underscores to merge the multi-level column index into a single-level index.
4. Group the type `Type`, and normalize `HP` with `min-max` within the group.
5. Group the type `Type`, and calculate the correlation coefficient between `Disp.` and `HP`.

### Ex2: Implement the transform function
* The constructor of the `groupby` object is `my_groupby(df, group_cols)`
* Supports single-column and multi-column grouping
* Supports `my_groupby(df)[col].transform(my_func)` function with scalar broadcasting
* `pandas`'s `transform` cannot be calculated across columns, please support this function, that is, still return `Series` but the `col` parameter is multiple columns
* No need to consider performance and exception handling, just implement the above functions, and compare the results with `transform` in `pandas` while giving test examples to see if they are consistent