# 1 - Introduction to Pandas

## What is Pandas?

According to the GitHub page:

- Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. 
- It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
- It has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

Pandas is well suited for handling:

- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with or without row and column labels.
- Ordered and unordered (not necessarily fixed-frequency) time series data.

## Why use pandas?

It is a great tool for:

- exploratory data analysis
- creating ETL pipelines
- building data science products

Integrates nicely with:
- interactive environments like Jupyter Notebook 
- popular Python packages like Statsmodels, Scikit-Learn, Matplotlib and Seaborn  
- big data tooling like Apache Spark


## Key features (1/2)

* Easy handling of __missing data__
* __Size mutability__: columns can be inserted and deleted from DataFrame and higher dimensional objects
* Automatic and explicit __data alignment__: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically
* Powerful, flexible __group by functionality__ to perform split-apply-combine operations on data sets
* Intelligent label-based __slicing, fancy indexing, and subsetting__ of large data sets
* Intuitive __merging and joining__ data sets
* Flexible __reshaping and pivoting__ of data sets
* __Hierarchical labeling__ of axes


## Key features (2/2)

* Robust __IO tools__ for loading data from 
    * flat files 
    * Excel files 
    * databases
    * HDF5
* __Time series functionality__: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

This lecture is meant to provide you with a broad overview of the functionality.

Remember: to fully wield the power of pandas takes time and practice! 

## Check-out [pandas.pydata.org](http://pandas.pydata.org/)

It has tons of good resources, of which especially valuable are:

- [API reference](http://pandas.pydata.org/pandas-docs/stable/api.html) 
- [Cook book](http://pandas.pydata.org/pandas-docs/stable/cookbook.html)
- [Tutorials](http://pandas.pydata.org/pandas-docs/stable/tutorials.html)

Also on [GitHub](https://github.com/pydata/pandas)!