|dimension| keyword |           description          |
|---------|---------|--------------------------------|
|    1    | Series  |      带标签的一维同构数组       |
|    2    |DataFrame|带标签的，大小可变的，二维异构表格|

Pandas 所有数据结构的值都是可变的，但数据结构的大小并非都是可变的，比如，Series 的长度不可改变，但 DataFrame 里就可以插入列。

Pandas 里，绝大多数方法都不改变原始的输入数据，而是复制数据，生成新的对象。 一般来说，原始输入数据**不变**更稳妥。

 ***Pandas 和 NumPy 的本质区别***：
 - NumPy 数组只有一种数据类型
 - DataFrame **每列**的数据类型**各不相同**

# 一.生成对象

In [9]:
# 用值列表生成Series时，Pandas 默认自动生成整数索引：
import numpy as np
import pandas as pd
s = pd.Series([1,2,3,np.nan,4,5,6])
s

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
6    6.0
dtype: float64

In [35]:
# 用含日期时间索引与标签的 NumPy 数组生成 DataFrame:
dates = pd.date_range('20210827', periods=6)
print(dates)

# randn生成数据符合标准正态分布
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df

DatetimeIndex(['2021-08-27', '2021-08-28', '2021-08-29', '2021-08-30',
               '2021-08-31', '2021-09-01'],
              dtype='datetime64[ns]', freq='D')


Unnamed: 0,A,B,C,D
2021-08-27,1.030453,-0.53543,-0.791386,0.325403
2021-08-28,0.736789,-1.47653,0.21508,-0.534774
2021-08-29,0.84948,-2.095524,-0.316863,0.72336
2021-08-30,1.396196,-1.118973,0.56767,-0.08471
2021-08-31,-0.141668,2.476473,0.710798,-1.412875
2021-09-01,-0.011199,1.489545,0.735018,1.027801


In [26]:
# 用 Series 字典对象生成 DataFrame:
df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})
print(df2)
df2.dtypes

     A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo


A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

# 二.查看数据

## 1.head tail
head()与tail()用于快速预览Series与DataFrame，默认显示5条数据，也可以指定显示数据的数量。

In [44]:
df.head(2)

Unnamed: 0,A,B,C,D
2021-08-27,1.030453,-0.53543,-0.791386,0.325403
2021-08-28,0.736789,-1.47653,0.21508,-0.534774


In [37]:
df.tail()

Unnamed: 0,A,B,C,D
2021-08-28,0.736789,-1.47653,0.21508,-0.534774
2021-08-29,0.84948,-2.095524,-0.316863,0.72336
2021-08-30,1.396196,-1.118973,0.56767,-0.08471
2021-08-31,-0.141668,2.476473,0.710798,-1.412875
2021-09-01,-0.011199,1.489545,0.735018,1.027801


In [39]:
df.index

DatetimeIndex(['2021-08-27', '2021-08-28', '2021-08-29', '2021-08-30',
               '2021-08-31', '2021-09-01'],
              dtype='datetime64[ns]', freq='D')

In [42]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

**DataFrame.to_numpy()输出底层数据的 NumPy 对象**
- DataFrame.to_numpy()的输出不包含行索引和列标签

In [54]:
df2.to_numpy()

array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

In [57]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.643342,-0.210073,0.18672,0.007368
std,0.602085,1.799835,0.621531,0.891578
min,-0.141668,-2.095524,-0.791386,-1.412875
25%,0.175798,-1.38714,-0.183877,-0.422258
50%,0.793134,-0.827201,0.391375,0.120347
75%,0.98521,0.983301,0.675016,0.623871
max,1.396196,2.476473,0.735018,1.027801
