here we discuss a lot of the essential functionality common to the pandas data structure. Here's the some of the object used in the examples from previous section

In [1]:
import pandas as pd
import numpy as np

In [2]:
index = pd.date_range('1/1/2000', periods=8)
s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])
df = pd.DataFrame(np.random.randn(8,3), index=index, columns=['A','B','C',])
wp = pd.Panel(np.random.randn(2, 5, 4), items=['item1', 'item2'],
                  major_axis=pd.date_range('1/1/2000', periods=5),
                  minor_axis=['A','B','C','D']
              )

### Head and Tail

To view a small sample of a Sereis or DataFrame, use the head() and tail() methods. The defalut number of elements to display is five, but you may pass a custom number.

In [3]:
long_series = pd.Series(np.random.randn(1000))
print long_series.head()
print long_series.tail(3)

0   -0.972873
1   -1.625180
2   -0.270693
3    0.141191
4    0.261722
dtype: float64
997   -0.772221
998    1.955846
999   -0.982712
dtype: float64


### Attributes and the raw ndarray(s)

pandas object have a number of attributes enabling you to aceess the metadata

- shape: give the axis dimetion of the object. consistent with ndarray
- Axis labels
> - Sereis: index (only axis)
  - DataFrame : index(row) and columns
  - Panel : itmes. major_axis, and minor_axis
  
Note, there attributes can be safely assigned to

In [4]:
print df[:2]
print [x for x in df.columns]
df.columns = [x.lower() for x in df.columns]
print df

                   A         B         C
2000-01-01  1.280789  1.748411 -0.152509
2000-01-02 -0.235524  1.470467  1.045986
['A', 'B', 'C']
                   a         b         c
2000-01-01  1.280789  1.748411 -0.152509
2000-01-02 -0.235524  1.470467  1.045986
2000-01-03  0.593322 -0.622625 -0.879623
2000-01-04  0.379363 -1.412958  0.650302
2000-01-05  0.554830  1.091477 -0.106444
2000-01-06 -1.109539 -0.736802  1.047411
2000-01-07 -0.747737 -1.734697  0.519764
2000-01-08 -1.838872  1.773919  0.983656


To get the actual data inside a data structure, one need only acesss the values property

In [5]:
print s.values
print df.values
print wp.values

[ 1.32072342 -0.65913561  0.82362356  1.71852105 -0.83308781]
[[ 1.28078884  1.74841074 -0.15250945]
 [-0.23552368  1.47046691  1.04598602]
 [ 0.5933221  -0.62262493 -0.87962302]
 [ 0.37936278 -1.41295761  0.65030245]
 [ 0.55483025  1.09147715 -0.10644368]
 [-1.1095388  -0.73680224  1.04741091]
 [-0.7477374  -1.73469672  0.51976411]
 [-1.83887165  1.77391887  0.98365632]]
[[[-0.36940706 -2.39910193 -1.87432179  1.28067816]
  [-1.92597436  0.08598417  1.65846546 -0.63415452]
  [-1.40604521 -0.61467081  0.08724799  1.85663859]
  [-0.62727826  0.33385138  0.09331673 -0.48008362]
  [-0.63404146  0.73937405 -0.13263934 -0.17330037]]

 [[-0.4479705   0.57889191  1.38640209  0.57836847]
  [ 0.14754414  0.10331269 -0.67314599  0.34404298]
  [-0.75826778 -1.35185946  0.59261197  2.15333173]
  [-0.10361517  1.31539243  0.38497224  0.78309511]
  [ 0.30407175 -0.81649013 -1.74633981  0.16627246]]]


if a DataFrame or Panel homogeneously-typed data, the ndarray can actually be modified in place, and the changes will be reflected in the data structure. For heterogeneous data (e.g some of the DataFrame's columns are not all the same dtype), this will not be case. the values attribute itself, unlike the axis labels, cannot be assigned to.

> *Note* When working with hetergeneous data, the dtype of the resulting ndarray will be choosen to accommodate all of the data involved.
For example. if string are involved, the result will be of object dtype. if there are only float and integers, the resulting array will be of float dtype

## Flexible binary operations
with binary operation between pandas data structures, there are two key point of interest:

- broadcasting behavior between higher ( e.g DataFrame) and lower-dimensional(e.g Series) objects
- missing data in computation



### Matching / broadcasting behavior

DataFrame has the methods add, sub, mul, div and related function radd, rsub... for carrying out binary operations. For broadcasting behavior. Series input is of primary interest, Using these functions, you can use to either match on the index or columns via the axis keyword

In [14]:
df = pd.DataFrame({
        'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
        'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
        'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])
    })

print df
row = df.ix[1]
print row
column = df['two']
print column

print df.sub(row, axis='columns')
print df.sub(row, axis=1)
print df.sub(column, axis='index')
print df.sub(column, axis=0)


        one     three       two
a  0.159327       NaN  0.911939
b  0.129871 -0.467140 -1.409027
c -0.210639 -0.531240 -2.085780
d       NaN -0.393023  0.278122
one      0.129871
three   -0.467140
two     -1.409027
Name: b, dtype: float64
a    0.911939
b   -1.409027
c   -2.085780
d    0.278122
Name: two, dtype: float64
        one     three       two
a  0.029456       NaN  2.320966
b  0.000000  0.000000  0.000000
c -0.340510 -0.064099 -0.676753
d       NaN  0.074117  1.687148
        one     three       two
a  0.029456       NaN  2.320966
b  0.000000  0.000000  0.000000
c -0.340510 -0.064099 -0.676753
d       NaN  0.074117  1.687148
        one     three  two
a -0.752612       NaN  0.0
b  1.538898  0.941886  0.0
c  1.875141  1.554540  0.0
d       NaN -0.671145  0.0
        one     three  two
a -0.752612       NaN  0.0
b  1.538898  0.941886  0.0
c  1.875141  1.554540  0.0
d       NaN -0.671145  0.0


Futuremore you can align a level of a multi-indexed DataFrame with a Series

In [22]:
dfmi = df.copy()
dfmi.indes = pd.MultiIndex.from_tuples([(1, 'a'), (1, 'b'), (1, 'c'),(2, 'a')], names=['first','second'])
print dfmi.sub(column, axis=0, level='second')
print dfmi

        one     three  two
a -0.752612       NaN  0.0
b  1.538898  0.941886  0.0
c  1.875141  1.554540  0.0
d       NaN -0.671145  0.0
        one     three       two
a  0.159327       NaN  0.911939
b  0.129871 -0.467140 -1.409027
c -0.210639 -0.531240 -2.085780
d       NaN -0.393023  0.278122


### Missing data / operation with fill values

in Series and DataFrame (though not yet in Panel), the arithmetic funcitons have the option of inputting a fill_values, namely a value to substitute when at most one of the values at a location are missing.
For Example when adding two dataFrame objects,  you may wish to treat NaN as 0 Unless both DataFrame are missing that values, in which case the result will be NaN( you can later replace NaN with some other values using fillna if you wish)

In [30]:
print df
df2 = df.copy()
df2.iloc[0][1] = 12
print df2
print df + df2
print df.add(df2, fill_value=0)

        one     three       two
a  0.159327       NaN  0.911939
b  0.129871 -0.467140 -1.409027
c -0.210639 -0.531240 -2.085780
d       NaN -0.393023  0.278122
        one      three       two
a  0.159327  12.000000  0.911939
b  0.129871  -0.467140 -1.409027
c -0.210639  -0.531240 -2.085780
d       NaN  -0.393023  0.278122
        one     three       two
a  0.318654       NaN  1.823878
b  0.259742 -0.934281 -2.818053
c -0.421279 -1.062479 -4.171560
d       NaN -0.786046  0.556243
        one      three       two
a  0.318654  12.000000  1.823878
b  0.259742  -0.934281 -2.818053
c -0.421279  -1.062479 -4.171560
d       NaN  -0.786046  0.556243
