# 结论推导测试题
在下面的空白处探索 `store_data.csv`，回答以下测试题。

In [1]:
import pandas as pd

**需要探讨的问题**  

* 哪家商店最后一个月的总销售额最高？  
* 哪家商店的平均销售额最高？  
* 哪家商店在 2016 年 3 月 13 日这个星期的销量最高？  
* 商店 C 在哪个星期销量最差？  
* 哪家商店在最近 3 个月内销量最高？


In [2]:
# 导入并加载数据
df=pd.read_csv('store_data.csv')

In [3]:
df.head(3)

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
0,2014-05-04,2643,8257,3893,6231,1294
1,2014-05-11,6444,5736,5634,7092,2907
2,2014-05-18,9646,2552,4253,5447,4736


In [5]:
df.tail()

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
195,2018-01-28,282,6351,7759,5558,1028
196,2018-02-04,4853,6503,4187,5956,1458
197,2018-02-11,9202,3677,4540,6186,243
198,2018-02-18,3512,7511,4151,5596,3501
199,2018-02-25,7560,6904,3569,5045,2585


In [6]:
# 探索数据
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 6 columns):
week      200 non-null object
storeA    200 non-null int64
storeB    200 non-null int64
storeC    200 non-null int64
storeD    200 non-null int64
storeE    200 non-null int64
dtypes: int64(5), object(1)
memory usage: 9.5+ KB


探索数据后发现数据并无null值

总共200组数据

### 问题1： 计算最后一个月的总销售额  
总体思路：  
* 取出最后一个月的数据
* 设置week为索引（唯一）
* 对重置索引后的df的其余列求和

#### 1-1获取最后一个月的数据

In [24]:
# 最后一个月的总销售额
new_df=df.tail(4)
new_df

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
196,2018-02-04,4853,6503,4187,5956,1458
197,2018-02-11,9202,3677,4540,6186,243
198,2018-02-18,3512,7511,4151,5596,3501
199,2018-02-25,7560,6904,3569,5045,2585


#### 1-2 设置week为索引

设置week为索引，注意需要设置`inplace`为 `true`即在原地修改

In [25]:
new_df.set_index(['week'],inplace=True)
new_df

Unnamed: 0_level_0,storeA,storeB,storeC,storeD,storeE
week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-02-04,4853,6503,4187,5956,1458
2018-02-11,9202,3677,4540,6186,243
2018-02-18,3512,7511,4151,5596,3501
2018-02-25,7560,6904,3569,5045,2585


#### 1-3 重新设置索引后再对剩余的列求和

In [26]:
new_df.sum()

storeA    25127
storeB    24595
storeC    16447
storeD    22783
storeE     7787
dtype: int64

In [28]:
total=0
for temp in new_df.sum():
    total+=temp

In [29]:
total

96739

** 结论：最后一个月销售额最高的商店 **  
A

### 问题2：求平均销售额

In [11]:
# 平均销售额
df.mean()

storeA    5865.480
storeB    6756.710
storeC    4942.105
storeD    5431.405
storeE    2580.025
dtype: float64

** 哪家商店的平均销售额最高 **  
B

### 问题3：哪家商店在 2016 年 3 月 13 日这个星期的销量最高？  
df中获取满足条件的数据，可使用`df[df==xx条件的方式]`

In [34]:
# 2016 年 3 月 13 日的销售额
target_df=df[df['week']=='2016-03-13']
target_df

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
97,2016-03-13,2054,1390,5112,5513,2536


In [41]:
sale = df.loc[df['week'] == '2016-03-13']
sale

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
97,2016-03-13,2054,1390,5112,5513,2536


In [36]:
target_df.set_index(['week'],inplace=True)
target_df

Unnamed: 0_level_0,storeA,storeB,storeC,storeD,storeE
week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-03-13,2054,1390,5112,5513,2536


In [46]:
target_total=0
for val in target_df.iloc[0]:
#     print(val)
    target_total+=int(val)
print(target_total)

16605


### 问题4：C 店销售额最低的一周

使用条件查询，获取`df['storeC']==min(df)`条件下的df

In [17]:
# C 店销售额最低的一周
df.iloc(df['storeC'].argmin())

<pandas.core.indexing._iLocIndexer at 0x7fadda6654e0>

In [5]:
min(df['storeC'])

927

storeC最小值出现的索引位置

In [6]:
df['storeC'].argmin()

9

** 注意： ** 以下两种方式均可

In [9]:
df.loc[df['storeC']==min(df['storeC'])]

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
9,2014-07-06,8567,3228,927,3277,168


In [10]:
df[df['storeC']==min(df['storeC'])]

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
9,2014-07-06,8567,3228,927,3277,168


** 结论：**  
2014-07-06

### 问题5：最近 3 个月的总销售额  
步骤：
* 查看`df['week']`的最大值与最小值，最大值即为离现在`最近的日期`  
  并根据距离现在最近的日期往前推三个月，最为日期的范围
* 设置`week`为索引  
* 求和

In [47]:
# 最近 3 个月的总销售额
dates_ranges=df['week']

In [49]:
min(dates_ranges)

'2014-05-04'

In [50]:
max(dates_ranges)

'2018-02-25'

In [53]:
df=df[(df['week']>'2017-12-25')&(df['week']<'2018-02-25')]
df

Unnamed: 0,week,storeA,storeB,storeC,storeD,storeE
191,2017-12-31,11875,1527,6711,5265,1701
192,2018-01-07,8978,11312,4158,5019,3842
193,2018-01-14,6963,4014,4215,7153,3097
194,2018-01-21,5553,3971,3761,6255,3071
195,2018-01-28,282,6351,7759,5558,1028
196,2018-02-04,4853,6503,4187,5956,1458
197,2018-02-11,9202,3677,4540,6186,243
198,2018-02-18,3512,7511,4151,5596,3501


In [54]:
# 设置索引为week
df.set_index(['week'],inplace=True)
df

Unnamed: 0_level_0,storeA,storeB,storeC,storeD,storeE
week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-12-31,11875,1527,6711,5265,1701
2018-01-07,8978,11312,4158,5019,3842
2018-01-14,6963,4014,4215,7153,3097
2018-01-21,5553,3971,3761,6255,3071
2018-01-28,282,6351,7759,5558,1028
2018-02-04,4853,6503,4187,5956,1458
2018-02-11,9202,3677,4540,6186,243
2018-02-18,3512,7511,4151,5596,3501


In [56]:
df.sum()

storeA    51218
storeB    44866
storeC    39482
storeD    46988
storeE    17941
dtype: int64

In [57]:
total=0
for val in df.sum():
    total+=val
print(total)

200495


** 结论：最近三个月销量最高的商店是 **  
A

可以参考的网站：  
https://github.com/nirupamaprv/Data-Analysis/blob/master/conclusions_quiz.ipynb