练习pandas的一些常见用法

In [3]:
import datetime


datetime.datetime.now()

datetime.datetime(2020, 6, 9, 11, 53, 26, 816779)

创建一个 Series 对象

In [5]:
import pandas as pd
import numpy as np


s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

创建一个 DateFrame 对象

In [6]:
dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [7]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.001221,1.310115,0.576224,-1.198675
2013-01-02,0.108909,-0.981721,-1.204032,1.135592
2013-01-03,0.04258,0.860404,-1.298753,2.04406
2013-01-04,-0.845737,-0.204526,2.862116,0.651524
2013-01-05,1.02364,0.063364,0.355199,0.267634
2013-01-06,-0.773822,0.546912,-1.819508,1.764621


通过传递一个可以被转换成 series-like 的字典对象来创建一个 DataFrame

In [8]:
df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


In [9]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

In [10]:
df2.A

0    1.0
1    1.0
2    1.0
3    1.0
Name: A, dtype: float64

In [12]:
df2.abs

<bound method NDFrame.abs of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [13]:
df2.add

<bound method _arith_method_FRAME.<locals>.f of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [14]:
df2.add_prefix

<bound method NDFrame.add_prefix of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [16]:
df2.add_suffix

<bound method NDFrame.add_suffix of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [17]:
df2.align

<bound method DataFrame.align of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [18]:
df2.all

<bound method DataFrame.all of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [19]:
df2.any

<bound method DataFrame.any of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [20]:
df2.append

<bound method DataFrame.append of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [21]:
df2.apply

<bound method DataFrame.apply of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [22]:
df2.applymap

<bound method DataFrame.applymap of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [23]:
df2.D

0    3
1    3
2    3
3    3
Name: D, dtype: int32

In [24]:
df2.bool

<bound method NDFrame.bool of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [25]:
df2.boxplot

<bound method boxplot_frame of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [26]:
df2.C

0    1.0
1    1.0
2    1.0
3    1.0
Name: C, dtype: float32

In [27]:
df2.clip

<bound method NDFrame.clip of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [28]:
df2.clip_lower

<bound method NDFrame.clip_lower of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [29]:
df2.clip_upper

<bound method NDFrame.clip_upper of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [30]:
df2.columns

Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

In [31]:
df2.combine

<bound method DataFrame.combine of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

In [32]:
df2.combine_first

<bound method DataFrame.combine_first of      A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo>

如何查看帧顶部及底部的行

In [35]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,0.001221,1.310115,0.576224,-1.198675
2013-01-02,0.108909,-0.981721,-1.204032,1.135592
2013-01-03,0.04258,0.860404,-1.298753,2.04406
2013-01-04,-0.845737,-0.204526,2.862116,0.651524
2013-01-05,1.02364,0.063364,0.355199,0.267634


In [36]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,-0.845737,-0.204526,2.862116,0.651524
2013-01-05,1.02364,0.063364,0.355199,0.267634
2013-01-06,-0.773822,0.546912,-1.819508,1.764621


显示索引及列

In [37]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [38]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [39]:
df.to_numpy()

array([[ 1.22114506e-03,  1.31011492e+00,  5.76223684e-01,
        -1.19867506e+00],
       [ 1.08909477e-01, -9.81720550e-01, -1.20403203e+00,
         1.13559245e+00],
       [ 4.25795827e-02,  8.60403535e-01, -1.29875262e+00,
         2.04405993e+00],
       [-8.45736616e-01, -2.04526499e-01,  2.86211576e+00,
         6.51524402e-01],
       [ 1.02363973e+00,  6.33638649e-02,  3.55199107e-01,
         2.67633667e-01],
       [-7.73821939e-01,  5.46911942e-01, -1.81950837e+00,
         1.76462087e+00]])

In [40]:
df2.to_numpy()

array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

In [41]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.073868,0.265758,-0.088126,0.777459
std,0.684521,0.817516,1.734957,1.173705
min,-0.845737,-0.981721,-1.819508,-1.198675
25%,-0.580061,-0.137554,-1.275072,0.363606
50%,0.0219,0.305138,-0.424416,0.893558
75%,0.092327,0.782031,0.520968,1.607364
max,1.02364,1.310115,2.862116,2.04406


In [42]:
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,0.001221,0.108909,0.04258,-0.845737,1.02364,-0.773822
B,1.310115,-0.981721,0.860404,-0.204526,0.063364,0.546912
C,0.576224,-1.204032,-1.298753,2.862116,0.355199,-1.819508
D,-1.198675,1.135592,2.04406,0.651524,0.267634,1.764621


##### 按轴排序  
sort_index(axis=1, ascending=False)  
axis的值默认为0，表示按横轴排序，即按行排序，1表示按纵轴排序，即按列排序  
ascending的默认值为True，表示按升序排序，False表示按降序排序  

In [43]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,-1.198675,0.576224,1.310115,0.001221
2013-01-02,1.135592,-1.204032,-0.981721,0.108909
2013-01-03,2.04406,-1.298753,0.860404,0.04258
2013-01-04,0.651524,2.862116,-0.204526,-0.845737
2013-01-05,0.267634,0.355199,0.063364,1.02364
2013-01-06,1.764621,-1.819508,0.546912,-0.773822


In [44]:
df.sort_index(axis=0, ascending=True)

Unnamed: 0,A,B,C,D
2013-01-01,0.001221,1.310115,0.576224,-1.198675
2013-01-02,0.108909,-0.981721,-1.204032,1.135592
2013-01-03,0.04258,0.860404,-1.298753,2.04406
2013-01-04,-0.845737,-0.204526,2.862116,0.651524
2013-01-05,1.02364,0.063364,0.355199,0.267634
2013-01-06,-0.773822,0.546912,-1.819508,1.764621


In [45]:
df.sort_index(axis=0, ascending=False)

Unnamed: 0,A,B,C,D
2013-01-06,-0.773822,0.546912,-1.819508,1.764621
2013-01-05,1.02364,0.063364,0.355199,0.267634
2013-01-04,-0.845737,-0.204526,2.862116,0.651524
2013-01-03,0.04258,0.860404,-1.298753,2.04406
2013-01-02,0.108909,-0.981721,-1.204032,1.135592
2013-01-01,0.001221,1.310115,0.576224,-1.198675


##### 按值排序  
按'B'列的值排序，默认按升序排序

In [47]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2013-01-02,0.108909,-0.981721,-1.204032,1.135592
2013-01-04,-0.845737,-0.204526,2.862116,0.651524
2013-01-05,1.02364,0.063364,0.355199,0.267634
2013-01-06,-0.773822,0.546912,-1.819508,1.764621
2013-01-03,0.04258,0.860404,-1.298753,2.04406
2013-01-01,0.001221,1.310115,0.576224,-1.198675


In [48]:
# 选择一个单一的列，返回值类型是 Series
df['A']  # 等同于 df.A

2013-01-01    0.001221
2013-01-02    0.108909
2013-01-03    0.042580
2013-01-04   -0.845737
2013-01-05    1.023640
2013-01-06   -0.773822
Freq: D, Name: A, dtype: float64

In [49]:
df.A

2013-01-01    0.001221
2013-01-02    0.108909
2013-01-03    0.042580
2013-01-04   -0.845737
2013-01-05    1.023640
2013-01-06   -0.773822
Freq: D, Name: A, dtype: float64

In [50]:
type(df['A'])

pandas.core.series.Series

In [51]:
type(df.A)

pandas.core.series.Series

In [52]:
df[0:3]

Unnamed: 0,A,B,C,D
2013-01-01,0.001221,1.310115,0.576224,-1.198675
2013-01-02,0.108909,-0.981721,-1.204032,1.135592
2013-01-03,0.04258,0.860404,-1.298753,2.04406


In [53]:
df['20130102':'20130104']

Unnamed: 0,A,B,C,D
2013-01-02,0.108909,-0.981721,-1.204032,1.135592
2013-01-03,0.04258,0.860404,-1.298753,2.04406
2013-01-04,-0.845737,-0.204526,2.862116,0.651524
