## Panda
Pandas 是 Python 的一个开源数据分析和处理库。  
名字来自 “Panel Data”（面板数据）。  
它建立在 NumPy 之上，提供了更高级的数据结构和操作工具。  
常用于：数据清洗、数据分析、数据可视化前的准备。  
官方[Panda使用手册](https://pandas.pydata.org/pandas-docs/stable/)

__Series__: 一维数据，类似于一列（带索引的数组）。

In [84]:
import pandas as pd
groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])
print(groceries)
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)
x = 'bananas' in groceries
y = 'bread' in groceries
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)
print('How many eggs do we need to buy:', groceries['eggs'])
print('How many eggs and apples do we need to buy:\n', groceries.loc[['eggs', 'apples']]) #使用带标签的索引
print('Do we need milk and bread:\n', groceries.iloc[[2, 3]]) #使用数值索引

eggs       30
apples      6
milk      Yes
bread      No
dtype: object
Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements
The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')
Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True
How many eggs do we need to buy: 30
How many eggs and apples do we need to buy:
 eggs      30
apples     6
dtype: object
Do we need milk and bread:
 milk     Yes
bread     No
dtype: object


In [85]:
print('Original Grocery List:\n', groceries)
groceries['eggs'] = 2
print('Modified Grocery List:\n', groceries)
print('We remove apples (out of place):\n', groceries.drop('apples'))

Original Grocery List:
 eggs       30
apples      6
milk      Yes
bread      No
dtype: object
Modified Grocery List:
 eggs        2
apples      6
milk      Yes
bread      No
dtype: object
We remove apples (out of place):
 eggs       2
milk     Yes
bread     No
dtype: object


In [86]:
import numpy as np
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])
print('Original grocery list of fruits:\n ', fruits)
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print('fruits  *2:\n', fruits*  2) # We multiply each item in fruits by 2 
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2


Original grocery list of fruits:
  apples     10
oranges     6
bananas     3
dtype: int64
fruits + 2:
 apples     12
oranges     8
bananas     5
dtype: int64
fruits - 2:
 apples     8
oranges    4
bananas    1
dtype: int64
fruits  *2:
 apples     20
oranges    12
bananas     6
dtype: int64
fruits / 2:
 apples     5.0
oranges    3.0
bananas    1.5
dtype: float64


In [87]:
print('EXP(X) = \n', np.exp(fruits))
print('SQRT(X) =\n', np.sqrt(fruits))
print('POW(X,2) =\n',np.power(fruits,2))

EXP(X) = 
 apples     22026.465795
oranges      403.428793
bananas       20.085537
dtype: float64
SQRT(X) =
 apples     3.162278
oranges    2.449490
bananas    1.732051
dtype: float64
POW(X,2) =
 apples     100
oranges     36
bananas      9
dtype: int64


In [88]:
items = {'Alice': pd.Series(data = [40, 110, 500, 45], index = ['book', 'glasses', 'bike', 'pants']),
         'Bob': pd.Series(data = [245, 25, 55], index = ['bike', 'pants', 'watch'])}
shopping_carts = pd.DataFrame(items) #二维表格数据
print('Shopping Carts:\n', shopping_carts)

Shopping Carts:
          Alice    Bob
bike     500.0  245.0
book      40.0    NaN
glasses  110.0    NaN
pants     45.0   25.0
watch      NaN   55.0


In [89]:
data = {'Alice': pd.Series([40, 110, 500, 45]),
        'Bob': pd.Series([245, 25, 55])}
df = pd.DataFrame(data)
print(df)
print('shopping_carts has shape:', shopping_carts.shape)
print('shopping_carts has dimension:', shopping_carts.ndim)
print('shopping_carts has a total of:', shopping_carts.size, 'elements')
print('The data in shopping_carts is:\n', shopping_carts.values)
print('The row index in shopping_carts is:', shopping_carts.index)
print('The column index in shopping_carts is:', shopping_carts.columns)

   Alice    Bob
0     40  245.0
1    110   25.0
2    500   55.0
3     45    NaN
shopping_carts has shape: (5, 2)
shopping_carts has dimension: 2
shopping_carts has a total of: 10 elements
The data in shopping_carts is:
 [[500. 245.]
 [ 40.  nan]
 [110.  nan]
 [ 45.  25.]
 [ nan  55.]]
The row index in shopping_carts is: Index(['bike', 'book', 'glasses', 'pants', 'watch'], dtype='object')
The column index in shopping_carts is: Index(['Alice', 'Bob'], dtype='object')


In [90]:
bob_shopping_cart = pd.DataFrame(items, columns=['Bob'])
print( bob_shopping_cart)
sel_shopping_cart = pd.DataFrame(items, index = ['pants', 'book'])
print(sel_shopping_cart)

       Bob
bike   245
pants   25
watch   55
       Alice   Bob
pants     45  25.0
book      40   NaN


In [91]:
data = {'Floats': [4.5, 8.2, 9.6],
        'Integers': [1, 2, 3]}
df = pd.DataFrame(data, index = ['label 1', 'label 2', 'label 3'])
print(df)

items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]
store_items = pd.DataFrame(items2, index = ['store 1', 'store 2'])
print(store_items)


         Floats  Integers
label 1     4.5         1
label 2     8.2         2
label 3     9.6         3
         bikes  pants  watches  glasses
store 1     20     30       35      NaN
store 2     15      5       10     50.0


In [92]:
print('How many bikes are in each store:\n', store_items[['bikes']])
print('How many bikes and pants are in each store:\n', store_items[['bikes', 'pants']])
print('What items are in Store 1:\n', store_items.loc[['store 1']])
print('How many bikes are in Store 2:', store_items['bikes']['store 2'])

How many bikes are in each store:
          bikes
store 1     20
store 2     15
How many bikes and pants are in each store:
          bikes  pants
store 1     20     30
store 2     15      5
What items are in Store 1:
          bikes  pants  watches  glasses
store 1     20     30       35      NaN
How many bikes are in Store 2: 15


In [93]:
store_items['shirts'] = [15, 2]
print(store_items)
store_items['suits'] = store_items['pants'] + store_items['shirts']
print(store_items)
new_items = [{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4}]
new_store = pd.DataFrame(new_items, index = ['store 3'])
print(new_store)
store_items = pd.concat([store_items, new_store])
print(store_items)
store_items['new watches'] = store_items['watches'][1:]
print(store_items)
store_items.insert(4, 'shoes', [8, 5, 0])
print(store_items)


         bikes  pants  watches  glasses  shirts
store 1     20     30       35      NaN      15
store 2     15      5       10     50.0       2
         bikes  pants  watches  glasses  shirts  suits
store 1     20     30       35      NaN      15     45
store 2     15      5       10     50.0       2      7
         bikes  pants  watches  glasses
store 3     20     30       35        4
         bikes  pants  watches  glasses  shirts  suits
store 1     20     30       35      NaN    15.0   45.0
store 2     15      5       10     50.0     2.0    7.0
store 3     20     30       35      4.0     NaN    NaN
         bikes  pants  watches  glasses  shirts  suits  new watches
store 1     20     30       35      NaN    15.0   45.0          NaN
store 2     15      5       10     50.0     2.0    7.0         10.0
store 3     20     30       35      4.0     NaN    NaN         35.0
         bikes  pants  watches  glasses  shoes  shirts  suits  new watches
store 1     20     30       35      NaN     

In [94]:
store_items.pop('new watches') # 删除特定列
print(store_items)
store_items = store_items.drop(['watches', 'shoes'], axis=1) # 删除多列
print(store_items)
store_items = store_items.drop(['store 2', 'store 1'], axis=0) # 删除多行
print(store_items)
store_items = store_items.rename(columns = {'bikes': 'hats'}) # 重命名列
print(store_items)
store_items = store_items.rename(index = {'store 3': 'last store'}) # 重命名行
print(store_items)
store_items = store_items.set_index('pants') # 设置新索引
print(store_items)

         bikes  pants  watches  glasses  shoes  shirts  suits
store 1     20     30       35      NaN      8    15.0   45.0
store 2     15      5       10     50.0      5     2.0    7.0
store 3     20     30       35      4.0      0     NaN    NaN
         bikes  pants  glasses  shirts  suits
store 1     20     30      NaN    15.0   45.0
store 2     15      5     50.0     2.0    7.0
store 3     20     30      4.0     NaN    NaN
         bikes  pants  glasses  shirts  suits
store 3     20     30      4.0     NaN    NaN
         hats  pants  glasses  shirts  suits
store 3    20     30      4.0     NaN    NaN
            hats  pants  glasses  shirts  suits
last store    20     30      4.0     NaN    NaN
       hats  glasses  shirts  suits
pants                              
30       20      4.0     NaN    NaN


In [95]:
#处理NaN
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35, 'shirts': 15, 'shoes':8, 'suits':45},
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5, 'shirts': 2, 'shoes':5, 'suits':7},
          {'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4, 'shoes':10}]

store_items = pd.DataFrame(items2, index = ['store 1', 'store 2', 'store 3'])
print(store_items)
x =  store_items.isnull().sum().sum() 
print('Number of NaN values in our DataFrame:', x)
store_items.isnull() #返回是否是NaN的布尔值DataFrame
store_items.isnull().sum() #返回每列的NaN数量
store_items.isnull().sum().sum() #返回总的NaN数量
store_items = store_items.fillna(0) #用0填充NaN
print(store_items)
store_items.dropna(axis = 0) #删除含有NaN的行
store_items.dropna(axis = 1) #删除含有NaN的列
store_items.ffill(axis=0) #用前一个值填充NaN
store_items.bfill(axis=0) #用后一个值填充NaN
store_items.interpolate(method='linear', axis=0) #线性插值填充NaN

         bikes  pants  watches  shirts  shoes  suits  glasses
store 1     20     30       35    15.0      8   45.0      NaN
store 2     15      5       10     2.0      5    7.0     50.0
store 3     20     30       35     NaN     10    NaN      4.0
Number of NaN values in our DataFrame: 3
         bikes  pants  watches  shirts  shoes  suits  glasses
store 1     20     30       35    15.0      8   45.0      0.0
store 2     15      5       10     2.0      5    7.0     50.0
store 3     20     30       35     0.0     10    0.0      4.0


Unnamed: 0,bikes,pants,watches,shirts,shoes,suits,glasses
store 1,20,30,35,15.0,8,45.0,0.0
store 2,15,5,10,2.0,5,7.0,50.0
store 3,20,30,35,0.0,10,0.0,4.0


In [None]:
google_stock = pd.read_csv('./GOOG.csv')
google_stock.head() # 显示前5行数据
google_stock.tail() # 显示最后5行数据
google_stock.describe() # 显示数据的统计信息
google_stock.isnull().any() # 返回是否含有NaN的布尔值Series
google_stock.max() # 返回每列的最大值
google_stock.min() # 返回每列的最小值
google_stock.mean() # 返回每列的平均值
google_stock.corr() # 计算每列之间的相关性
google_stock.groupby('Date').mean() # 按日期分组并计算平均值