# Introduction to Pandas

In the last section we dove into detail on NumPy and its ``ndarray`` object, which provides efficient storage and manipulation of dense typed arrays in Python.
Here we'll build on this knowledge by looking in detail at the data structures provided by the Pandas library.
Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a *Data Frame*.
Data frames are essentially multi-dimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data.
As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.

As we saw, NumPy's ``ndarray`` data structure provides essential features for the type of clean, well-organized data typically seen in numerical computing tasks.
While it serves this purpose very well, its limitations become clear when we need more flexibility (such as attaching labels to data, working with missing data, etc.) and when attempting operations which do not map well to element-wise broadcasting (such as groupings, pivots, etc.), each of which is an important piece of analyzing the less structured data available in many forms in the world around us.
Pandas, and in particular its ``Series`` and ``DataFrame`` objects, builds on the NumPy array structure and provides efficient access to these sorts of "data munging" tasks that occupy most of a data scientist's time.

In this chapter, we will focus on the mechanics of using the ``Series``, ``DataFrame``, and related structures effectively.
We will use examples drawn from real datasets where appropriate, but these examples are not necessarily the focus.
Like the previous chapter on NumPy, the primary purpose of this chapter is to serve as a comprehensive introduction to the Python tools that will enable the more in-depth analyses and discussions of the second half of the book.

## Installing and Using Pandas

Installation of Pandas on your system requires a previous install of NumPy, as well as the appropriate tools to compile the C and Cython sources on which Pandas is built.
Details on this installation can be found in the Pandas documentation: http://pandas.pydata.org/.
If you followed the advice in the introduction and used the Anaconda stack, you will already have Pandas installed.

Once Pandas is installed, you can import it and check the version:

In [1]:
import pandas
pandas.__version__

'0.16.1'

Just as we generally import NumPy under the alias ``np``, we will generally import Pandas under the alias ``pd``:

In [2]:
import pandas as pd

This import convention will be used throughout the remainder of this book.

## Reminder about Built-in Documentation

As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature), as well as the documentation of various functions (using the "``?``" character).

For example, you can type
```
In [3]: pd.<TAB>
```
to display all the contents of the ``pandas`` namespace, and
```
In [4]: pd?
```
to display Pandas's built-in documentation.
More detailed documentation, along with tutorials and other resources, can be found at http://pandas.pydata.org/.