# 浏览数据

首先生成 6x4大小的 DataFrame 供使用：

In [41]:
import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6, freq='D')
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [42]:
df = pd.DataFrame(np.random.rand(6, 4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.059858,0.988169,0.671358,0.757114
2013-01-02,0.235234,0.393781,0.616279,0.572503
2013-01-03,0.733239,0.147542,0.418538,0.0213
2013-01-04,0.272862,0.988362,0.889549,0.001054
2013-01-05,0.742841,0.853958,0.758803,0.896411
2013-01-06,0.09574,0.021197,0.790819,0.866529


可以通过 head 和 tail 浏览上、下部分的数据：

In [43]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,0.059858,0.988169,0.671358,0.757114
2013-01-02,0.235234,0.393781,0.616279,0.572503
2013-01-03,0.733239,0.147542,0.418538,0.0213
2013-01-04,0.272862,0.988362,0.889549,0.001054
2013-01-05,0.742841,0.853958,0.758803,0.896411


In [44]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,0.272862,0.988362,0.889549,0.001054
2013-01-05,0.742841,0.853958,0.758803,0.896411
2013-01-06,0.09574,0.021197,0.790819,0.866529


显示 index 和 columns：

In [45]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [46]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

可以使用 to_numpy 生成 numpy 数组，但是代价通常很大。因为 numpy 数组只有一种数据类型，而 pandas 数组则每一列均有一种数据类型。采用 to_numpy 生成时首先要找到能兼容所有类型的类型（通常为 object ），导致转换非常慢。

In [47]:
df.to_numpy()

array([[0.05985765, 0.98816874, 0.67135756, 0.75711403],
       [0.23523382, 0.39378121, 0.61627853, 0.57250335],
       [0.73323856, 0.1475421 , 0.41853843, 0.02130019],
       [0.27286184, 0.98836238, 0.88954877, 0.00105375],
       [0.74284053, 0.85395797, 0.75880332, 0.89641117],
       [0.09574047, 0.02119682, 0.79081875, 0.86652943]])

describe() 用于显示数据的各类统计值：

In [48]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.356629,0.565502,0.690891,0.519152
std,0.30623,0.433846,0.163821,0.409573
min,0.059858,0.021197,0.418538,0.001054
25%,0.130614,0.209102,0.630048,0.159101
50%,0.254048,0.62387,0.71508,0.664809
75%,0.618144,0.954616,0.782815,0.839176
max,0.742841,0.988362,0.889549,0.896411


使用 df.T 进行转置：

In [49]:
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,0.059858,0.235234,0.733239,0.272862,0.742841,0.09574
B,0.988169,0.393781,0.147542,0.988362,0.853958,0.021197
C,0.671358,0.616279,0.418538,0.889549,0.758803,0.790819
D,0.757114,0.572503,0.0213,0.001054,0.896411,0.866529


按轴进行排序，axis=0为按行，axis=1为按列：

In [50]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,0.757114,0.671358,0.988169,0.059858
2013-01-02,0.572503,0.616279,0.393781,0.235234
2013-01-03,0.0213,0.418538,0.147542,0.733239
2013-01-04,0.001054,0.889549,0.988362,0.272862
2013-01-05,0.896411,0.758803,0.853958,0.742841
2013-01-06,0.866529,0.790819,0.021197,0.09574


按值进行排序，可以选择特定列：

In [51]:
df.sort_values(by=['C'])

Unnamed: 0,A,B,C,D
2013-01-03,0.733239,0.147542,0.418538,0.0213
2013-01-02,0.235234,0.393781,0.616279,0.572503
2013-01-01,0.059858,0.988169,0.671358,0.757114
2013-01-05,0.742841,0.853958,0.758803,0.896411
2013-01-06,0.09574,0.021197,0.790819,0.866529
2013-01-04,0.272862,0.988362,0.889549,0.001054


使用 query() 进行查询，十分简洁高效：

In [52]:
df.query('B < 0.5 and A > 0.5')

Unnamed: 0,A,B,C,D
2013-01-03,0.733239,0.147542,0.418538,0.0213
