# Viewing Data from a DataFrame

To see the top and bottom rows of the frame, we use the head and tail methods, respectivey. Note that if we leave the arguements blank the output will be 5 lines, however, this can be modified:

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

df.head()

Unnamed: 0,A,B,C,D
2013-01-01,1.814453,-0.434965,-1.135232,-0.206673
2013-01-02,0.914778,0.072761,-0.623462,-0.101702
2013-01-03,1.5294,-1.081315,-0.765334,0.719386
2013-01-04,-1.280894,0.277744,0.307192,-0.74057
2013-01-05,-1.73593,0.230301,0.184035,0.752999


In [12]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,-1.280894,0.277744,0.307192,-0.74057
2013-01-05,-1.73593,0.230301,0.184035,0.752999
2013-01-06,-1.004233,0.568325,0.97846,-0.498527


To display the index columns and the underlying numpy data, we use the following: 

In [13]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [14]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [15]:
df.values

array([[ 1.81445268, -0.4349651 , -1.13523201, -0.20667256],
       [ 0.91477816,  0.07276144, -0.62346226, -0.10170241],
       [ 1.52939961, -1.08131471, -0.76533361,  0.71938645],
       [-1.28089417,  0.27774436,  0.30719202, -0.74057   ],
       [-1.73593027,  0.23030077,  0.18403522,  0.75299871],
       [-1.00423295,  0.56832485,  0.97845979, -0.49852666]])

The command to show a quick statistic summary of the data:

In [16]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.039596,-0.061191,-0.175723,-0.012514
std,1.557006,0.599072,0.795429,0.62187
min,-1.73593,-1.081315,-1.135232,-0.74057
25%,-1.211729,-0.308033,-0.729866,-0.425563
50%,-0.044727,0.151531,-0.219714,-0.154187
75%,1.375744,0.265883,0.276403,0.514114
max,1.814453,0.568325,0.97846,0.752999


To transpose the data we use the command:

In [17]:
df.T

Unnamed: 0,2013-01-01 00:00:00,2013-01-02 00:00:00,2013-01-03 00:00:00,2013-01-04 00:00:00,2013-01-05 00:00:00,2013-01-06 00:00:00
A,1.814453,0.914778,1.5294,-1.280894,-1.73593,-1.004233
B,-0.434965,0.072761,-1.081315,0.277744,0.230301,0.568325
C,-1.135232,-0.623462,-0.765334,0.307192,0.184035,0.97846
D,-0.206673,-0.101702,0.719386,-0.74057,0.752999,-0.498527


Sorting by axis means that we sort by the order of the column labels. To sort by an axis, we can set ascending equal to True (which doesn't change the way the DataFrame looks). If we change the value to False, we use the following:

In [18]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,-0.206673,-1.135232,-0.434965,1.814453
2013-01-02,-0.101702,-0.623462,0.072761,0.914778
2013-01-03,0.719386,-0.765334,-1.081315,1.5294
2013-01-04,-0.74057,0.307192,0.277744,-1.280894
2013-01-05,0.752999,0.184035,0.230301,-1.73593
2013-01-06,-0.498527,0.97846,0.568325,-1.004233


To sort by a specific column, we use the follwing command:

In [19]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2013-01-03,1.5294,-1.081315,-0.765334,0.719386
2013-01-01,1.814453,-0.434965,-1.135232,-0.206673
2013-01-02,0.914778,0.072761,-0.623462,-0.101702
2013-01-05,-1.73593,0.230301,0.184035,0.752999
2013-01-04,-1.280894,0.277744,0.307192,-0.74057
2013-01-06,-1.004233,0.568325,0.97846,-0.498527
