# Pandas

http://pandas.pydata.org/

Open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Excerpt of Pandas' features:

- A set of labeled array data structures, the primary of which are Series and DataFrame
- Index objects enabling both simple axis indexing and multi-level / hierarchical axis indexing
- An integrated group by engine for aggregating and transforming data sets
- **Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.**
- ...

So internally, *pandas* uses *PyTables* for storing data in HDF5.




In [8]:
import pandas_datareader.data as web
import datetime

In [9]:
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2013, 1, 27)


In [10]:
df = web.DataReader("F", 'yahoo', start, end)

In [12]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-01-04,10.17,10.28,10.05,10.28,60855800,8.418735
2010-01-05,10.45,11.24,10.4,10.96,215620200,8.975616
2010-01-06,11.21,11.46,11.13,11.37,200070600,9.311383
2010-01-07,11.46,11.69,11.32,11.66,130201700,9.548876
2010-01-08,11.67,11.74,11.46,11.69,130463000,9.573444


In [13]:
import pandas as pd

In [15]:
store = pd.HDFStore('stock.h5', 'a')

In [17]:
store['yahoo'] = df

In [18]:
store.close()

See HDFView, it's a customized format on top of HDF5.
This format also supports indexed search..

# Indexed search

In [20]:
store = pd.HDFStore('stock.h5', 'a')

No index yet! And it's in "fixed format", so no queries possible:

In [25]:
store.select("yahoo", "Open>10 & Open<11")

TypeError: cannot pass a where specification when reading from a Fixed format store. this store must be selected in its entirety

In [22]:
store.append('yahoo_tableformat', df)

Show in HDFView, different format!

In [30]:
store.append('yahoo_indexed', df, data_columns=['Open', 'Close'])

In [36]:
store.flush()

In [34]:
store.select("yahoo_indexed", "index<=Timestamp('2011-01-01') & Open>10 & Open<11")

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-01-04,10.17,10.28,10.05,10.28,60855800,8.418735
2010-01-05,10.45,11.24,10.4,10.96,215620200,8.975616
2010-01-25,10.73,11.1,10.61,11.03,121621500,9.032942
2010-02-05,10.97,11.11,10.49,10.91,181535200,8.934669
2010-02-12,10.92,11.18,10.85,11.12,69465400,9.106647
2010-05-21,10.25,11.3,10.17,11.26,174455100,9.221299
2010-05-25,10.47,11.05,10.42,11.02,137858600,9.024753
2010-06-24,10.99,11.03,10.64,10.78,74449200,8.828206
2010-06-25,10.75,10.77,10.42,10.75,148168400,8.803638
2010-06-28,10.72,10.76,10.43,10.43,57994700,8.541577


Now also "Open" and "Close" are indexed!