# Pandas 功能
1. 允许为行和列设定标签；
2. 可以针对实际序列数据计算公洞统计学指标；
3. 轻松地处理 NaN 值；
4. 能够将不同格式的数据加载到 DataFrame 中；
5. 可以将不同的数据集合并到一起；
6. 与 NumPy 和 Matplotlib 集成；

In [1]:
import pandas as pd #导入Pandas

In [2]:
# 创建一个 Pandas Series
groceries = pd.Series(data = [30,6,'Yes','No'], index = ['eggs','apples','milk','bread'])
groceries


eggs       30
apples      6
milk      Yes
bread      No
dtype: object

In [3]:
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')

Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements


In [4]:
# 单独输出 Pandas Series 的索引标签和数据
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)

The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')


### 如果你处理的是非常庞大的 Pandas Series，并且不清楚是否存在某个索引标签，可以使用 in 命令检查是否存在该标签：

In [5]:
# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries
# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries
# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)

Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True


### Pandas Series 提供了两个属性 .loc 和 .iloc，帮助我们清晰地表明指代哪种情况。属性 .loc 表示 位置，用于明确表明我们使用的是标签索引。同样，属性 .iloc 表示整型位置，用于明确表明我们使用的是数字索引

In [6]:
print('How many eggs and apples do we need to buy:\n',groceries.loc[['eggs','apples']])

How many eggs and apples do we need to buy:
 eggs      30
apples     6
dtype: object


In [7]:
# we use iloc to access multiple numerical indices
print('Do we need milk and bread:\n', groceries.iloc[[2,3]]) 

Do we need milk and bread:
 milk     Yes
bread     No
dtype: object


In [8]:
#更改Pandas Series中元素
groceries['eggs'] = 2  #直接修改了当前队列
groceries


eggs        2
apples      6
milk      Yes
bread      No
dtype: object

### 可以使用 .drop() 方法删除 Pandas Series 中的条目

In [9]:
#不会更改被修改的原始 Series
groceries.drop('apples')
groceries

eggs        2
apples      6
milk      Yes
bread      No
dtype: object

In [10]:
#添加inplace参数，设置为True，那么久从原始 Series 删除元素
groceries.drop('apples',inplace=True)
groceries

eggs       2
milk     Yes
bread     No
dtype: object

### Pandas算数运算

In [11]:
fruits = pd.Series(data = [10,6,3], index = ['apples','oranges','bananas'])
fruits

apples     10
oranges     6
bananas     3
dtype: int64

In [12]:
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits

fruits + 2:
 apples     12
oranges     8
bananas     5
dtype: int64


In [13]:
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2 

fruits * 2:
 apples     20
oranges    12
bananas     6
dtype: int64


In [14]:
import numpy as np
# print('EXP(X) = \n', np.exp(fruits))
# print() 
# print('SQRT(X) =\n', np.sqrt(fruits))
# print()
# print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2

In [15]:
# We add 2 only to the bananas 操作指定元素
print('Amount of bananas + 2 = ', fruits['bananas'] + 2)

Amount of bananas + 2 =  5


In [16]:
print('Amount of apples - 2 = ', fruits.iloc[0] - 2)

Amount of apples - 2 =  8


In [17]:
print('We double the amount of apples and oranges:\n', fruits[['apples', 'oranges']] * 2)

We double the amount of apples and oranges:
 apples     20
oranges    12
dtype: int64


In [18]:
print('We half the amount of apples and oranges:\n', fruits.loc[['apples', 'oranges']] / 2)

We half the amount of apples and oranges:
 apples     5.0
oranges    3.0
dtype: float64


### Pandas DataFrames: 具有带标签的行和列的二维数据结构，可以存储很多类型的数据

In [20]:
# We create a dictionary of Pandas Series 
items = {'Bob':pd.Series(data=[245,25,55],index=['bike','pants','watch']),
        'Alice':pd.Series(data=[10,110,500,45],index=['book','glasses','bike','pants'])}
print(type(items))

<class 'dict'>


In [22]:
shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,Alice,Bob
bike,500.0,245.0
book,10.0,
glasses,110.0,
pants,45.0,25.0
watch,,55.0


### DataFrame 的行标签根据构建字典所用的两个 Pandas Series 的key创建而成；DataFrame 的列标签来自字典的键;