# Pandas

The data manipulation package of choice for Python users is Pandas. Pandas is built on top of NumPy and provides an efficient implementation of a DataFrame. DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.

Let's work through a sample data frame using the Pandas library. We'll start by importing the library and creating a data frame from the Titanic Passenger List.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
titanic_df = pd.read_excel('titanic3.xls', 'titanic3', index_col=None, na_values=['NA'])

In [None]:
titanic_df.head()

In [None]:
titanic_df.describe()

In [None]:
titanic_df.drop(['ticket','cabin','boat','body'],axis=1).head()

In [None]:
titanic_df.isnull()

In [None]:
titanic_df.isnull().sum()

In [None]:
pd.value_counts(titanic_df['survived']).plot.bar()

In [None]:
titanic_df['survived'].mean()

In [None]:
titanic_df.groupby(['sex','pclass'])['survived'].mean()

In [None]:
titanic_df['sex'].count()

In [None]:
titanic_df[titanic_df['age']<18]