# Financial and Economic Data Applications

The use of Python in the financial industry has been increasing rapidly since 2005, led
largely by the maturation of libraries (like NumPy and pandas) and the availability of
skilled Python programmers. Institutions have found that Python is well-suited both
as an interactive analysis environment as well as enabling robust systems to be devel-
oped often in a fraction of the time it would have taken in Java or C++. Python is also
an ideal glue layer; it is easy to build Python interfaces to legacy libraries built in C or
C++.

While the field of financial analysis is broad enough to fill an entire book, I hope to
show you how the tools in this book can be applied to a number of specific problems
in finance. As with other research and analysis domains, too much programming effort
is often spent wrangling data rather than solving the core modeling and research prob-
lems. I personally got started building pandas in 2008 while grappling with inadequate
data tools.

In these examples, I’ll use the term cross-section to refer to data at a fixed point in time.
For example, the closing prices of all the stocks in the S&P 500 index on a particular
date form a cross-section. Cross-sectional data at multiple points in time over multiple
data items (for example, prices together with volume) form a panel. Panel data can
either be represented as a hierarchically-indexed DataFrame or using the three-dimen-
sional Panel pandas object.

# Data Munging Topics

## Time Series and Cross-Section Alignment

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("stock_px.csv")
df

Unnamed: 0.1,Unnamed: 0,AA,AAPL,GE,IBM,JNJ,MSFT,PEP,SPX,XOM
0,1990-02-01 00:00:00,4.98,7.86,2.87,16.79,4.27,0.51,6.04,328.79,6.12
1,1990-02-02 00:00:00,5.04,8.00,2.87,16.89,4.37,0.51,6.09,330.92,6.24
2,1990-02-05 00:00:00,5.07,8.18,2.87,17.32,4.34,0.51,6.05,331.85,6.25
3,1990-02-06 00:00:00,5.01,8.12,2.88,17.56,4.32,0.51,6.15,329.66,6.23
4,1990-02-07 00:00:00,5.04,7.77,2.91,17.93,4.38,0.51,6.17,333.75,6.33
5,1990-02-08 00:00:00,5.04,7.71,2.92,17.86,4.46,0.51,6.22,332.96,6.35
6,1990-02-09 00:00:00,5.06,8.00,2.94,17.82,4.49,0.52,6.24,333.62,6.37
7,1990-02-12 00:00:00,4.96,7.94,2.89,17.58,4.46,0.52,6.23,330.08,6.22
8,1990-02-13 00:00:00,4.91,8.06,2.88,17.95,4.43,0.52,6.09,331.02,6.23
9,1990-02-14 00:00:00,4.94,8.00,2.89,18.04,4.47,0.52,6.10,332.01,6.20


In [None]:
prices = df