## 问题
如何将数据中，同一股票代码同一年的某个字段加总成一条？

我想把某公司同一年的数据var加总到一起


## 思路
可以通过pandas库实现这个需求

1. 获取公司股票代码列表
2. 获取某公司年份列表
3. 对某个公司同年的var进行加总  (var代指一个字段或变量)
4. for循环对所有的公司重复2-3操作

## 准备数据

In [13]:
import numpy as np
import pandas as pd

#强制股票代码转为str类型
df = pd.read_excel('data.xlsx', converters={"code": str})
df.head()

Unnamed: 0,code,year,baladded,valadded
0,1,2000,-65856130.0,-65856130.0
1,1,2000,-65856130.0,-65856130.0
2,1,2002,-486919258.0,-439683780.0
3,1,2002,-486919258.0,-439683780.0
4,1,2007,1288000.0,1288000.0




## 代码
### 1. 获取公司股票代码列表

In [6]:
codes = df.code.unique()
codes

array(['000001', '000002', '000004', '000005', '000006'], dtype=object)

### 2. 获取某公司年份列表
以000001为例

In [7]:
years = set(df[df['code']=='000001']['year'].values)
years

{2000, 2002, 2007, 2008, 2010, 2013, 2019}

### 3. 对某个公司同年的baladded进行加总

以000001公司2000年为例

In [16]:
ndf = df[df['code']=='000001']
ndf.head()

Unnamed: 0,code,year,baladded,valadded
0,1,2000,-65856130.0,-65856130.0
1,1,2000,-65856130.0,-65856130.0
2,1,2002,-486919258.0,-439683780.0
3,1,2002,-486919258.0,-439683780.0
4,1,2007,1288000.0,1288000.0


In [17]:
ndf[ndf['year']==2000]

Unnamed: 0,code,year,baladded,valadded
0,1,2000,-65856130.0,-65856130.0
1,1,2000,-65856130.0,-65856130.0


In [18]:
ndf[ndf['year']==2000]['baladded']

0   -65856130.0
1   -65856130.0
Name: baladded, dtype: float64

In [19]:
ndf[ndf['year']==2000]['baladded'].sum()

-131712260.0

### for循环对所有的公司重复2-3操作
汇总代码

In [20]:
results = []
codes = df.code.unique()
for code in codes:
    years = set(df[df['code']==code]['year'].values)
    for year in years:
        ndf = df[df['code']==code]
        baladded_sum = ndf[ndf['year']==year]['baladded'].sum()
        data = (code, year, baladded_sum)
        results.append(data)
        
result_df = pd.DataFrame(results, columns=['code', 'year', 'baladded_sum'])
result_df

Unnamed: 0,code,year,baladded_sum
0,1,2019,274000000.0
1,1,2000,-131712300.0
2,1,2002,-973838500.0
3,1,2007,2576000.0
4,1,2008,492462000.0
5,1,2010,-958002000.0
6,1,2013,890000000.0
7,2,2016,333907300.0
8,2,2017,514196700.0
9,2,2018,1781193000.0


In [21]:
#保存结果
result_df.to_csv('result.csv', index=False)