### Pivot_Table - Quick Data Analysis with Pandas

Why - 將銷售資料匯入進行樞紐分析，再將樞紐分析結果匯回 Excel 檔案。這篇文章的目的是讓你對一些交互式的python工具有一個基本的了解，以及你如何使用這些工具以一種非常快速和可重複的方式進行一些複雜的分析。我計劃花更多的時間去看這樣的例子，以顯示這個工具集有多麼有用，並繼續讓人們知道，當涉及到復雜的數據分析時，還有比Excel更好的選擇

In [1]:
import pandas as pd
import numpy as np

In [2]:
# read

dt=pd.read_csv("data/df-sample-sales.csv")
dt.head()

Unnamed: 0,Account Number,Account Name,sku,category,quantity,unit price,ext price,date
0,803666,Fritsch-Glover,HX-24728,Belt,1,98.98,98.98,2014-09-28 11:56:02
1,64898,O'Conner Inc,LK-02338,Shirt,9,34.8,313.2,2014-04-24 16:51:22
2,423621,Beatty and Sons,ZC-07383,Shirt,12,60.24,722.88,2014-09-17 17:26:22
3,137865,"Gleason, Bogisich and Franecki",QS-76400,Shirt,5,15.25,76.25,2014-01-30 07:34:02
4,435433,Morissette-Heathcote,RU-25060,Shirt,19,51.83,984.77,2014-08-24 06:18:12


In [8]:
dt.describe()

Unnamed: 0,Account Number,quantity,unit price,ext price
count,1000.0,1000.0,1000.0,1000.0
mean,480941.809,10.565,54.06643,570.17994
std,291330.331287,5.887311,26.068011,443.949007
min,510.0,1.0,10.01,11.13
25%,217002.75,5.0,31.1875,203.765
50%,461305.0,11.0,53.24,456.34
75%,734587.0,16.0,75.1,849.1075
max,998940.0,20.0,100.0,1958.6


實際上，我們可以從 describe 命令中了解到一些相當有用的信息。

- 我們可以知道，客戶平均每筆交易購買了 10.56 件物品
- 交易的平均成本是 570.17 美元
- 最小和最大值，這樣你就能理解數據的範圍。

In [3]:
report = dt.pivot_table(index=['Account Name'],  # Rows
                        columns=['category'],    # Cols
                        values=['quantity'],     # Values
                        fill_value=0,            # fill NaN To 0
                        aggfunc=np.sum)          # Values summarize by SUM
report.head(10)

Unnamed: 0_level_0,quantity,quantity,quantity
category,Belt,Shirt,Shoes
Account Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Abbott PLC,0,0,19
"Abbott, Rogahn and Bednar",0,18,0
Abshire LLC,0,18,2
"Altenwerth, Stokes and Paucek",0,13,0
Ankunding-McCullough,0,2,0
"Armstrong, Champlin and Ratke",7,36,0
"Armstrong, McKenzie and Greenholt",0,0,4
Armstrong-Williamson,19,0,0
Aufderhar and Sons,0,0,2
Aufderhar-O'Hara,0,0,11


This looks much cleaner! We will do one more thing with this example to show some of the power of the pivot_table. Let’s see how much in sales we did as well:

In [5]:
report = dt.pivot_table(index=['Account Name'],
                           columns=['category'], 
                           values=['ext price','quantity'],
                           fill_value=0,
                           aggfunc=np.sum)
report.head()

Unnamed: 0_level_0,ext price,ext price,ext price,quantity,quantity,quantity
category,Belt,Shirt,Shoes,Belt,Shirt,Shoes
Account Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Abbott PLC,0.0,0.0,755.44,0,0,19
"Abbott, Rogahn and Bednar",0.0,615.6,0.0,0,18,0
Abshire LLC,0.0,720.18,90.34,0,18,2
"Altenwerth, Stokes and Paucek",0.0,843.31,0.0,0,13,0
Ankunding-McCullough,0.0,132.3,0.0,0,2,0


If we want, we can even output this to Excel. We have to convert it back to a DataFrame, then we can write it out to excel

In [6]:
report.to_excel('data/df-pivot_table.xlsx', sheet_name='Sheet1')