# 第五章 变形

In [141]:
import numpy as np
import pandas as pd

## 一、长宽表的变形
一个表中把性别存储在某一个列中，那么它就是关于性别的长表；如果把性别作为列名，列中的元素是某一其他的相关特征数值，那么这个表是关于性别的宽表

In [142]:
pd.DataFrame({'Gender':['F','F','M','M'],'Height':[163, 160, 175, 180]})#关于性别的长表

Unnamed: 0,Gender,Height
0,F,163
1,F,160
2,M,175
3,M,180


In [143]:
pd.DataFrame({'Height: F':[163, 160],'Height: M':[175, 180]})

Unnamed: 0,Height: F,Height: M
0,163,175
1,160,180


### 1.pivot
pivot 是一种典型的长表变宽表的函数

In [144]:
df = pd.DataFrame({'Class':[1,1,2,2],'Name':['San Zhang','San Zhang','Si Li','Si Li'], 'Subject':['Chinese','Math','Chinese','Math'],'Grade':[80,75,90,85]})
df

Unnamed: 0,Class,Name,Subject,Grade
0,1,San Zhang,Chinese,80
1,1,San Zhang,Math,75
2,2,Si Li,Chinese,90
3,2,Si Li,Math,85


变形后的行索引、需要转到列索引的列，以及这些列和行索引对应的数值，它们分别对应了 pivot 方法中的 index, columns, values 参数。新生成表的列索引是 columns 对应列的 unique 值，而新表的行索引是 index 对应列的 unique 值，而 values 对应了想要展示的数值列。

In [145]:
df.pivot(index='Name', columns='Subject', values='Grade')

Subject,Chinese,Math
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
San Zhang,80,75
Si Li,90,85


注意：原表中的 index 和 columns 对应两个列的行组合必须唯一

In [146]:
df.loc[1, 'Subject'] = 'Chinese'
try:
    df.pivot(index='Name', columns='Subject', values='Grade')
except Exception as e:
    Err_Msg = e
    
Err_Msg

ValueError('Index contains duplicate entries, cannot reshape')

pivot 相关的三个参数允许被设置为列表，这也意味着会返回多级索引

In [147]:
df = pd.DataFrame({'Class':[1, 1, 2, 2, 1, 1, 2, 2],'Name':['San Zhang', 'San Zhang', 'Si Li', 'Si Li','San Zhang', 'San Zhang', 'Si Li', 'Si Li'], 'Examination': ['Mid', 'Final', 'Mid', 'Final','Mid', 'Final', 'Mid', 'Final'],'Subject':['Chinese', 'Chinese', 'Chinese', 'Chinese','Math', 'Math', 'Math', 'Math'],'Grade':[80, 75, 85, 65, 90, 85, 92, 88],'rank':[10, 15, 21, 15, 20, 7, 6, 2]})
df

Unnamed: 0,Class,Name,Examination,Subject,Grade,rank
0,1,San Zhang,Mid,Chinese,80,10
1,1,San Zhang,Final,Chinese,75,15
2,2,Si Li,Mid,Chinese,85,21
3,2,Si Li,Final,Chinese,65,15
4,1,San Zhang,Mid,Math,90,20
5,1,San Zhang,Final,Math,85,7
6,2,Si Li,Mid,Math,92,6
7,2,Si Li,Final,Math,88,2


In [148]:
pivot_multi = df.pivot(index = ['Class', 'Name'],columns = ['Subject','Examination'],values = ['Grade','rank'])
pivot_multi

Unnamed: 0_level_0,Unnamed: 1_level_0,Grade,Grade,Grade,Grade,rank,rank,rank,rank
Unnamed: 0_level_1,Subject,Chinese,Chinese,Math,Math,Chinese,Chinese,Math,Math
Unnamed: 0_level_2,Examination,Mid,Final,Mid,Final,Mid,Final,Mid,Final
Class,Name,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3
1,San Zhang,80,75,90,85,10,15,20,7
2,Si Li,85,65,92,88,21,15,6,2


### 2.pivot_table
pivot 的使用依赖于唯一性条件，那如果不满足唯一性条件，那么必须通过聚合操作使得相同行列组合对应的多个值变为一个值。例如，张三和李四都参加了两次语文考试和数学考试，按照学院规定，最后的成绩是两次考试分数的平均值，此时就无法通过 pivot 函数来完成。

In [149]:
df = pd.DataFrame({'Name':['San Zhang', 'San Zhang','San Zhang', 'San Zhang','Si Li', 'Si Li', 'Si Li', 'Si Li'],'Subject':['Chinese', 'Chinese', 'Math', 'Math','Chinese', 'Chinese', 'Math', 'Math'],'Grade':[80, 90, 100, 90, 70, 80, 85, 95]})
df

Unnamed: 0,Name,Subject,Grade
0,San Zhang,Chinese,80
1,San Zhang,Chinese,90
2,San Zhang,Math,100
3,San Zhang,Math,90
4,Si Li,Chinese,70
5,Si Li,Chinese,80
6,Si Li,Math,85
7,Si Li,Math,95


pandas 中提供了 pivot_table 来实现，其中的 aggfunc 参数就是使用的聚合函数

In [150]:
df.pivot_table(index = 'Name',columns = 'Subject',values = 'Grade',aggfunc = 'mean')

Subject,Chinese,Math
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
San Zhang,85,95
Si Li,75,90


In [151]:
df.pivot_table(index = 'Name',columns = 'Subject',values = 'Grade',aggfunc = lambda x:x.mean())

Subject,Chinese,Math
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
San Zhang,85,95
Si Li,75,90


 pivot_table 具有边际汇总的功能，可以通过设置 margins=True 来实现

In [152]:
df.pivot_table(index = 'Name',columns = 'Subject',values = 'Grade',aggfunc = 'mean',margins=True)

Subject,Chinese,Math,All
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
San Zhang,85,95.0,90.0
Si Li,75,90.0,82.5
All,80,92.5,86.25


#### 【练一练】
在上面的边际汇总例子中，行或列的汇总为新表中行元素或者列元素的平均值，而总体的汇总为新表中四个元素的平均值。这种关系一定成立吗？若不成立，请给出一个例子来说明。


In [153]:
df = pd.DataFrame({'Class':[1, 1,1, 2, 2, 1, 1, 2, 2],'Name':['San Zhang', 'San Zhang', 'San Zhang','Si Li', 'Si Li','San Zhang', 'San Zhang', 'Si Li', 'Si Li'],'Subject':['Chinese', 'Chinese','Chinese', 'Chinese', 'Chinese','Math', 'Math', 'Math', 'Math'],'Grade':[80, 75,70, 85, 65, 90, 85, 92, 88],'rank':[10,10, 15, 21, 15, 20, 7, 6, 2]})
df

Unnamed: 0,Class,Name,Subject,Grade,rank
0,1,San Zhang,Chinese,80,10
1,1,San Zhang,Chinese,75,10
2,1,San Zhang,Chinese,70,15
3,2,Si Li,Chinese,85,21
4,2,Si Li,Chinese,65,15
5,1,San Zhang,Math,90,20
6,1,San Zhang,Math,85,7
7,2,Si Li,Math,92,6
8,2,Si Li,Math,88,2


In [154]:
df.pivot_table(index = 'Name',columns = 'Subject',values = 'Grade',aggfunc = 'mean',margins=True)

Subject,Chinese,Math,All
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
San Zhang,75.0,87.5,80.0
Si Li,75.0,90.0,82.5
All,75.0,88.75,81.111111


这个例子里面行的汇总不是新表中行元素的平均值，因为新表中行元素包含的数值不一样，张三有三门中文成绩，而只有两门数学成绩，边际汇总是原表里面所有科目的平均值

### 3.melt
melt 函数把宽表转为长表

In [155]:
df = pd.DataFrame({'Class':[1,2],'Name':['San Zhang', 'Si Li'],'Chinese':[80, 90],'Math':[80, 75]})
df

Unnamed: 0,Class,Name,Chinese,Math
0,1,San Zhang,80,80
1,2,Si Li,90,75


In [156]:
df_melted = df.melt(id_vars = ['Class', 'Name'],value_vars = ['Chinese', 'Math'],var_name = 'Subject',value_name = 'Grade')
df_melted

Unnamed: 0,Class,Name,Subject,Grade
0,1,San Zhang,Chinese,80
1,2,Si Li,Chinese,90
2,1,San Zhang,Math,80
3,2,Si Li,Math,75


需要从列压缩的变量写在value_vars中，压缩成新的一列的列变量名是var_name

melt和pivot为互逆过程

In [157]:
df_unmelted = df_melted.pivot(index = ['Class', 'Name'],columns='Subject',values='Grade')
df_unmelted

Unnamed: 0_level_0,Subject,Chinese,Math
Class,Name,Unnamed: 2_level_1,Unnamed: 3_level_1
1,San Zhang,80,80
2,Si Li,90,75


In [158]:
df_unmelted = df_unmelted.reset_index().rename_axis(columns={'Subject':''})
df_unmelted

Unnamed: 0,Class,Name,Chinese,Math
0,1,San Zhang,80,80
1,2,Si Li,90,75


In [159]:
df_unmelted.equals(df)

True

### 4.wide_to_long

In [160]:
df = pd.DataFrame({'Class':[1,2],'Name':['San Zhang', 'Si Li'],'Chinese_Mid':[80, 75], 'Math_Mid':[90, 85],'Chinese_Final':[80, 75], 'Math_Final':[90, 85]})
df

Unnamed: 0,Class,Name,Chinese_Mid,Math_Mid,Chinese_Final,Math_Final
0,1,San Zhang,80,90,80,90
1,2,Si Li,75,85,75,85


In [161]:
pd.wide_to_long(df,stubnames=['Chinese', 'Math'],i = ['Class', 'Name'],j='Examination',sep='_',suffix='.+')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Chinese,Math
Class,Name,Examination,Unnamed: 3_level_1,Unnamed: 4_level_1
1,San Zhang,Mid,80,90
1,San Zhang,Final,80,90
2,Si Li,Mid,75,85
2,Si Li,Final,75,85


In [162]:
res = pivot_multi.copy()
res.columns = res.columns.map(lambda x:'_'.join(x))#将多级列名用_连接
res = res.reset_index()#释放所有索引
res = pd.wide_to_long(res, stubnames=['Grade', 'rank'],    i = ['Class', 'Name'],j = 'Subject_Examination',sep = '_',suffix = '.+')
res = res.reset_index()
res[['Subject', 'Examination']] = res['Subject_Examination'].str.split('_', expand=True)
res = res[['Class', 'Name', 'Examination','Subject', 'Grade', 'rank']].sort_values('Subject')
res = res.reset_index(drop=True)
res

Unnamed: 0,Class,Name,Examination,Subject,Grade,rank
0,1,San Zhang,Mid,Chinese,80,10
1,1,San Zhang,Final,Chinese,75,15
2,2,Si Li,Mid,Chinese,85,21
3,2,Si Li,Final,Chinese,65,15
4,1,San Zhang,Mid,Math,90,20
5,1,San Zhang,Final,Math,85,7
6,2,Si Li,Mid,Math,92,6
7,2,Si Li,Final,Math,88,2


## 二、索引的变形
### 1.stack与unstack
第二章中提到了利用 swaplevel 或者 reorder_levels 进行索引内部的层交换，下面就要讨论 行列索引之间 的交换，由于这种交换带来了 DataFrame 维度上的变化，因此属于变形操作。

- unstack:把行索引转为列索引

In [163]:
df = pd.DataFrame(np.ones((4,2)),index = pd.Index([('A', 'cat', 'big'),('A', 'dog', 'small'),('B', 'cat', 'big'),('B', 'dog', 'small')]),columns=['col_1', 'col_2'])
df

Unnamed: 0,Unnamed: 1,Unnamed: 2,col_1,col_2
A,cat,big,1.0,1.0
A,dog,small,1.0,1.0
B,cat,big,1.0,1.0
B,dog,small,1.0,1.0


In [164]:
df.unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,col_1,col_1,col_2,col_2
Unnamed: 0_level_1,Unnamed: 1_level_1,big,small,big,small
A,cat,1.0,,1.0,
A,dog,,1.0,,1.0
B,cat,1.0,,1.0,
B,dog,,1.0,,1.0


unstack 的主要参数是移动的层号，默认转化最内层，移动到列索引的最内层，同时支持同时转化多个层

In [165]:
df.unstack(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,col_1,col_1,col_2,col_2
Unnamed: 0_level_1,Unnamed: 1_level_1,big,small,big,small
A,cat,1.0,,1.0,
A,dog,,1.0,,1.0
B,cat,1.0,,1.0,
B,dog,,1.0,,1.0


In [166]:
df.unstack([0,2])

Unnamed: 0_level_0,col_1,col_1,col_1,col_1,col_2,col_2,col_2,col_2
Unnamed: 0_level_1,A,A,B,B,A,A,B,B
Unnamed: 0_level_2,big,small,big,small,big,small,big,small
cat,1.0,,1.0,,1.0,,1.0,
dog,,1.0,,1.0,,1.0,,1.0


类似于 pivot 中的唯一性要求，在 unstack 中必须保证 被转为列索引的行索引层 和 被保留的行索引层 构成的组合是唯一的

In [167]:
my_index = df.index.to_list()
my_index[1] = my_index[0]
df.index = pd.Index(my_index)
df

Unnamed: 0,Unnamed: 1,Unnamed: 2,col_1,col_2
A,cat,big,1.0,1.0
A,cat,big,1.0,1.0
B,cat,big,1.0,1.0
B,dog,small,1.0,1.0


In [168]:
try:
    df.unstack()
except Exception as e:
     Err_Msg = e

Err_Msg

ValueError('Index contains duplicate entries, cannot reshape')

- stack:把列索引的层压入行索引

In [169]:
df = pd.DataFrame(np.ones((4,2)),index = pd.Index([('A', 'cat', 'big'),('A', 'dog', 'small'),('B', 'cat', 'big'),('B', 'dog', 'small')]),columns=['index_1', 'index_2']).T
df

Unnamed: 0_level_0,A,A,B,B
Unnamed: 0_level_1,cat,dog,cat,dog
Unnamed: 0_level_2,big,small,big,small
index_1,1.0,1.0,1.0,1.0
index_2,1.0,1.0,1.0,1.0


In [170]:
df.stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,A,A,B,B
Unnamed: 0_level_1,Unnamed: 1_level_1,cat,dog,cat,dog
index_1,big,1.0,,1.0,
index_1,small,,1.0,,1.0
index_2,big,1.0,,1.0,
index_2,small,,1.0,,1.0


In [171]:
df.stack([1, 2])

Unnamed: 0,Unnamed: 1,Unnamed: 2,A,B
index_1,cat,big,1.0,1.0
index_1,dog,small,1.0,1.0
index_2,cat,big,1.0,1.0
index_2,dog,small,1.0,1.0


## 三、其他变形函数
### 1.crosstab
crosstab 并不是一个值得推荐使用的函数，因为它能实现的所有功能 pivot_table 都能完成，并且速度更快。

In [172]:
df = pd.read_csv('D:\datawhale\joyful-pandas\data\learn_pandas.csv')
pd.crosstab(index = df.School, columns = df.Transfer)

Transfer,N,Y
School,Unnamed: 1_level_1,Unnamed: 2_level_1
Fudan University,38,1
Peking University,28,2
Shanghai Jiao Tong University,53,0
Tsinghua University,62,4


In [173]:
pd.crosstab(index = df.School, columns = df.Transfer,values = [0]*df.shape[0], aggfunc = 'count')

Transfer,N,Y
School,Unnamed: 1_level_1,Unnamed: 2_level_1
Fudan University,38.0,1.0
Peking University,28.0,2.0
Shanghai Jiao Tong University,53.0,
Tsinghua University,62.0,4.0


In [174]:
df.pivot_table(index = 'School',columns = 'Transfer',values = 'Name',aggfunc = 'count')

Transfer,N,Y
School,Unnamed: 1_level_1,Unnamed: 2_level_1
Fudan University,38.0,1.0
Peking University,28.0,2.0
Shanghai Jiao Tong University,53.0,
Tsinghua University,62.0,4.0


In [175]:
pd.crosstab(index = df.School, columns = df.Transfer,values = df.Height, aggfunc = 'mean')

Transfer,N,Y
School,Unnamed: 1_level_1,Unnamed: 2_level_1
Fudan University,162.04375,177.2
Peking University,163.42963,162.4
Shanghai Jiao Tong University,163.953846,
Tsinghua University,163.253571,164.55


#### 【练一练】
前面提到了 crosstab 的性能劣于 pivot_table ，请选用多个聚合方法进行验证。

In [176]:
 %timeit -n 30 pd.crosstab(index = df.School, columns = df.Transfer,values = [0]*df.shape[0], aggfunc = 'count')

4.87 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 30 loops each)


In [177]:
%timeit -n 30 df.pivot_table(index = 'School',columns = 'Transfer',values = 'Name',aggfunc = 'count')

4.57 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 30 loops each)


In [178]:
%timeit -n 30 pd.crosstab(index = df.School, columns = df.Transfer,values = [0]*df.shape[0], aggfunc = 'mean')

5.22 ms ± 90.6 µs per loop (mean ± std. dev. of 7 runs, 30 loops each)


In [179]:
%timeit -n 30 df.pivot_table(index = 'School',columns = 'Transfer',values = 'Height',aggfunc = 'mean')

4.57 ms ± 89.9 µs per loop (mean ± std. dev. of 7 runs, 30 loops each)


### 2.explore
explode 参数能够对某一列的元素进行纵向的展开，被展开的单元格必须存储 list, tuple, Series, np.ndarray 中的一种类型。

In [180]:
df_ex = pd.DataFrame({'A': [[1, 2],'my_str',{1, 2},pd.Series([3, 4])],'B': 1})
df_ex

Unnamed: 0,A,B
0,"[1, 2]",1
1,my_str,1
2,"{1, 2}",1
3,0 3 1 4 dtype: int64,1


In [181]:
df_ex.explode('A')#注意索引还是以前的索引

Unnamed: 0,A,B
0,1,1
0,2,1
1,my_str,1
2,"{1, 2}",1
3,3,1
3,4,1


### 3.get_dummies
get_dummies m是用于特征构建的重要函数之一，其作用是把类别特征转为指示变量。

In [182]:
pd.get_dummies(df.Grade).head()

Unnamed: 0,Freshman,Junior,Senior,Sophomore
0,1,0,0,0
1,1,0,0,0
2,0,0,1,0
3,0,0,0,1
4,0,0,0,1


## 四、练习
### EX1：美国非法药物数据集
现有一份关于美国非法药物的数据集，其中 SubstanceName, DrugReports 分别指药物名称和报告数量：

In [183]:
df = pd.read_csv('D:\datawhale\joyful-pandas\data\drugs.csv').sort_values(['State','COUNTY','SubstanceName'],ignore_index=True)
df.head()

Unnamed: 0,YYYY,State,COUNTY,SubstanceName,DrugReports
0,2011,KY,ADAIR,Buprenorphine,3
1,2012,KY,ADAIR,Buprenorphine,5
2,2013,KY,ADAIR,Buprenorphine,4
3,2014,KY,ADAIR,Buprenorphine,27
4,2015,KY,ADAIR,Buprenorphine,5


1.将数据转为如下的形式：

In [184]:
res=df.pivot(index=['State','COUNTY','SubstanceName'], columns='YYYY', values='DrugReports').reset_index().rename_axis(columns={'YYYY':''})
res.head()

Unnamed: 0,State,COUNTY,SubstanceName,2010,2011,2012,2013,2014,2015,2016,2017
0,KY,ADAIR,Buprenorphine,,3.0,5.0,4.0,27.0,5.0,7.0,10.0
1,KY,ADAIR,Codeine,,,1.0,,,,,1.0
2,KY,ADAIR,Fentanyl,,,1.0,,,,,
3,KY,ADAIR,Heroin,,,1.0,2.0,,1.0,,2.0
4,KY,ADAIR,Hydrocodone,6.0,9.0,10.0,10.0,9.0,7.0,11.0,3.0


2.将第1问中的结果恢复为原表。

In [185]:
res_melted=res.melt(id_vars = ['State','COUNTY','SubstanceName'],value_vars = [i for i in range(2010,2018)],var_name = 'YYYY',value_name = 'DrugReports').dropna(subset=['DrugReports'])
res_melted=res_melted[df.columns].sort_values(['State','COUNTY','SubstanceName'],ignore_index=True).astype({'YYYY':'int64',"DrugReports":'int64'})
res_melted.head()
#用res_melted[df.columns]令新dataframe与原dataframe的列排序一致
#用dropna(subset=['DrugReports'])删除drugreports为nan的行
#用sort_values令dataframe的根据指定列的数据排序，ignore_index=True。而不是按行索引排序
#用astype使某列的数据类型转化成某一种

Unnamed: 0,YYYY,State,COUNTY,SubstanceName,DrugReports
0,2011,KY,ADAIR,Buprenorphine,3
1,2012,KY,ADAIR,Buprenorphine,5
2,2013,KY,ADAIR,Buprenorphine,4
3,2014,KY,ADAIR,Buprenorphine,27
4,2015,KY,ADAIR,Buprenorphine,5


In [186]:
res_melted.equals(df)

True

3.按 State 分别统计每年的报告数量总和，其中 State, YYYY 分别为列索引和行索引，要求分别使用 pivot_table 函数与 groupby+unstack 两种不同的策略实现，并体会它们之间的联系。



In [187]:
df.pivot_table(index = 'YYYY',columns = 'State',values = 'DrugReports',aggfunc = 'sum')

State,KY,OH,PA,VA,WV
YYYY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010,10453,19707,19814,8685,2890
2011,10289,20330,19987,6749,3271
2012,10722,23145,19959,7831,3376
2013,11148,26846,20409,11675,4046
2014,11081,30860,24904,9037,3280
2015,9865,37127,25651,8810,2571
2016,9093,42470,26164,10195,2548
2017,9394,46104,27894,10448,1614


In [188]:
df.groupby(['YYYY','State'])['DrugReports'].sum().unstack(1)

State,KY,OH,PA,VA,WV
YYYY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010,10453,19707,19814,8685,2890
2011,10289,20330,19987,6749,3271
2012,10722,23145,19959,7831,3376
2013,11148,26846,20409,11675,4046
2014,11081,30860,24904,9037,3280
2015,9865,37127,25651,8810,2571
2016,9093,42470,26164,10195,2548
2017,9394,46104,27894,10448,1614


### EX2:特殊的wide_to_long方法
从功能上看， melt 方法应当属于 wide_to_long 的一种特殊情况，即 stubnames 只有一类。请使用 wide_to_long 生成 melt 一节中的 df_melted 。（提示：对列名增加适当的前缀）

In [189]:
df = pd.DataFrame({'Class':[1,2],'Name':['San Zhang', 'Si Li'],'Chinese':[80, 90],'Math':[80, 75]})
df

Unnamed: 0,Class,Name,Chinese,Math
0,1,San Zhang,80,80
1,2,Si Li,90,75


In [190]:
df_melted = df.melt(id_vars = ['Class', 'Name'],value_vars = ['Chinese', 'Math'],var_name = 'Subject',value_name = 'Grade')
df_melted

Unnamed: 0,Class,Name,Subject,Grade
0,1,San Zhang,Chinese,80
1,2,Si Li,Chinese,90
2,1,San Zhang,Math,80
3,2,Si Li,Math,75


In [191]:
df = df.rename(columns={'Chinese':'Grade_Chinese', 'Math':'Grade_Math'})

In [192]:
pd.wide_to_long(df,stubnames='Grade',i = ['Class', 'Name'],j='Subject',sep='_',suffix='.+').reset_index()

Unnamed: 0,Class,Name,Subject,Grade
0,1,San Zhang,Chinese,80
1,1,San Zhang,Math,80
2,2,Si Li,Chinese,90
3,2,Si Li,Math,75
