# Introduction to Pandas

**Pandas** is a package for data manipulation and analysis in Python. The name Panda is derived from **Pan**el **Da**ta.

Pandas incorporate two additional data structures into Python;
* Pandas series
* Pandas dataframe
These data structures allow us to work with labelled and relational data in an easy and intuitive manner.

## Why use Pandas?
The recent successes of machine learning algorithmsis partly due to the huge amounts of data that we have available to train our algorithms.
However, when it comes to data, quantity is not the only thing that matters, the quality of data is just as important. It often happens that large datasets don't come ready to be fed into your learning algorithms. They will often have incorrect values, missing values, outliers etc.
One important step in machine learning is to look at your data first and make sure it's well suited for your training algorithm by doing some basic data analysis.
This is where Pandas come in.
Pandas Series and Pandas Dataframes are designed for fast data analysis and manipulation. They are also flexible and easy to use.
Below are some features that make it an excellent package for data analysis.
* It allows the use of labels for rows and columns
* Can calculate rolling statistics on time series data
* Easy handling of NaN values
* It's able to load data of different formats into DataFrames
* Can join and merge different datasets together
* Integrates with NumPy and Matplotlib

## Creating a Panda series

A Panda series is a one-dimensional array like object that can hold many datatypes.

One main difference between Panda Series and a NumPy ndarray is that you can assign an index label to each element in the Panda (you can name the indices in your Panda Series).

Another big difference between the two is that a Panda Series can contain elements of different data types.

Let's create a Pandas Series.

The command `pd.Series(data, index)` will create a Pandas series where `index` is the list of index labels.

We can create a Pandas Series to store a grocery list.

In [3]:
import pandas as pd

groceries = pd.Series(data=[30, 6, "Yes", "No"], index=["eggs", "apples", "milk", "bread"])

groceries

eggs       30
apples      6
milk      Yes
bread      No
dtype: object

> Pandas Series are displayed with indices in the first column and the data in the second column.

Just like NumPy ndarrays, Pandas Series have attributes that allow us to get information from the series in an easy way.

In [4]:
print("Groceries has shape {}".format(groceries.shape))
print("Groceries has dimension {}".format(groceries.ndim))
print("Groceries has a total of {} elements".format(groceries.size))

Groceries has shape (4,)
Groceries has dimension 1
Groceries has a total of 4 elements


We can also print the index labels and the data separately. This is helpful if you don't happen to know what the index labels of the Pandas Series are.

In [5]:
print("The data in groceries is {}.".format(groceries.values))
print("The index of groceries is {}.".format(groceries.index))

The data in groceries is [30 6 'Yes' 'No'].
The index of groceries is Index(['eggs', 'apples', 'milk', 'bread'], dtype='object').


If you are dealing with a very large Pandas Series and you're not sure whether an index label exists, you can check using the `in` command.

In [6]:
print("Is bananas an index label in groceries? {}.".format("bananas" in groceries))
print("Is bread an index label in groceries? {}.".format("bread" in groceries))

Is bananas an index label in groceries? False.
Is bread an index label in groceries? True.


## Accessing and deleting elements in a Pandas Series

One great advantage of a Pandss Series is that it allows us to access data in many different ways.

Elements can be accessed using index labels or numerical indices inside square brackets. Both positive and negative indices can be used to access elements from the beginning and from the end of the Series respectively.

Since there are different ways to access elements, Pandas Series have devised a way to remove any abiguity when accessing elements. It has two attributes that allow us to explicitly write what we mean to do. These two attributes are;
* `.loc` - stands for location. It's used to explicitly state that we are using a labeled index.
* `.iloc` - stands for integer location. It's used to explicitly state that we are using a numerical index.

Let's see some examples.

In [10]:
print("How many eggs do we need to buy? {}".format(groceries["eggs"]))
print()
print("Do we need milk and bread? \n {}".format(groceries[["milk", "bread"]]))
print()
print("How many eggs and apples do we need to buy? \n {}".format(groceries[["eggs", "apples"]]))
print()
print("How many eggs and apples do we need to buy? \n {}".format(groceries[[0, 1]]))
print()
print("Do we need bread? {}".format(groceries[-1]))
print()

How many eggs do we need to buy? 30

Do we need milk and bread? 
 milk     Yes
bread     No
dtype: object

How many eggs and apples do we need to buy? 
 eggs      30
apples     6
dtype: object

How many eggs and apples do we need to buy? 
 eggs      30
apples     6
dtype: object

Do we need bread? No



Pandas Series are mutable. We can change elements after the Series has already been created.

In [13]:
print("Original groceries list: \n {}".format(groceries))

groceries["eggs"] = 82
print()
print("Modified groceries list: \n {}".format(groceries))

Original groceries list: 
 eggs        2
apples      6
milk      Yes
bread      No
dtype: object

Modified groceries list: 
 eggs       82
apples      6
milk      Yes
bread      No
dtype: object


We can delete items from Pandas Series using the `drop()` method. By default, this method drops elements **out of place**, meaning that the original Series will not be altered.

In [14]:
print("Original groceries list: \n {}".format(groceries))

print()
print("remove apples (out of place): \n {}".format(groceries.drop("apples")))
print()
print("groceries list after the drop: \n {}".format(groceries))

Original groceries list: 
 eggs       82
apples      6
milk      Yes
bread      No
dtype: object

remove apples (out of place): 
 eggs      82
milk     Yes
bread     No
dtype: object

groceries list after the drop: 
 eggs       82
apples      6
milk      Yes
bread      No
dtype: object


To delete items in place, we need to use the `inplace` keyword and set its value to `True`.

In [15]:
print("Original groceries list: \n {}".format(groceries))

print()
print("remove apples (in place): \n {}".format(groceries.drop("apples", inplace=True)))
print()
print("groceries list after the drop: \n {}".format(groceries))

Original groceries list: 
 eggs       82
apples      6
milk      Yes
bread      No
dtype: object

remove apples (in place): 
 None

groceries list after the drop: 
 eggs      82
milk     Yes
bread     No
dtype: object


## Arithmetic Operations on Panda Series