# Pandas



In this section we will be talking about the Pandas library for python. Pandas is mainly based on the numpy library but offers more functionality and other datatypes making it convenient for to manipulate large datasets. 

Shortly, you can think of it as a powerful version of excel.

## Installation

For those who are not using jupyter Nootebook, installation from the command line is necessary.

## Anaconda distribution:
    conda install pandas
## Python-pip:
    pip install pandas

#### Importing numpy as np, pandas as pd (by convention) and also importing the Series module from pandas:

In [1]:
import numpy as np
import pandas as pd

from pandas import Series

## Series
 

Series is very similar to the numpy array object type. However, contains labels instead of indexes to specify the location of an item. It can also hold any type python object as a value.

Let's jump on to the examples to explain the concept!

Create Series from a list:

In [2]:
list_1 = [3.0,2.5,3.25,3.8]
list_1

[3.0, 2.5, 3.25, 3.8]

In [3]:
my_series = Series(data=list_1,index=['A','B','C','D'])
my_series

A    3.00
B    2.50
C    3.25
D    3.80
dtype: float64

Creating Series from an array:

In [4]:
array = np.array([3.0,2.5,3.25,3.8])

my_series = Series(array)
my_series

0    3.00
1    2.50
2    3.25
3    3.80
dtype: float64

In [5]:
my_series = Series(array,index=['A','B','C','D'])
my_series

A    3.00
B    2.50
C    3.25
D    3.80
dtype: float64

Creating series from a dictionary:

In [6]:
dict_1 = {'A':3.0,'B':2.50,'C':3.25,'D':3.8}


my_series = Series(dict_1)
my_series

A    3.00
B    2.50
C    3.25
D    3.80
dtype: float64

In [7]:
my_series['A']

3.0

In [8]:
my_series[['A','B']]

A    3.0
B    2.5
dtype: float64

In [9]:
my_series>3.0

A    False
B    False
C     True
D     True
dtype: bool

In [10]:
my_series[my_series>3.0]  

C    3.25
D    3.80
dtype: float64

In [11]:
'A' in my_series

True

In [12]:
my_series.values

array([3.  , 2.5 , 3.25, 3.8 ])

In [13]:
my_series.index

Index(['A', 'B', 'C', 'D'], dtype='object')

### How to modify the index?

In [14]:
names = ['Berk','John','Tarik','Theresa']

In [15]:
my_series = Series(my_series.values,index=names,
                  name='Grades')
my_series

Berk       3.00
John       2.50
Tarik      3.25
Theresa    3.80
Name: Grades, dtype: float64

We can name the index (labels) too:

In [16]:
my_series.index.name = 'Names'
my_series

Names
Berk       3.00
John       2.50
Tarik      3.25
Theresa    3.80
Name: Grades, dtype: float64

### How to use the index to select a value?

In [17]:
my_series['Berk']

3.0

Arithmetical operations are done based on the labels:

In [18]:
series_2 =Series(data=[2.0,3.0,2.5,2.0],index=names)
series_2

Berk       2.0
John       3.0
Tarik      2.5
Theresa    2.0
dtype: float64

In [19]:
new_series = my_series + series_2
new_series

Names
Berk       5.00
John       5.50
Tarik      5.75
Theresa    5.80
dtype: float64

What if labels of the two series are not conforming completely?

In [20]:
names_2 = ['Berk','John','Tarik','Barbara']

series_2 =Series(data=[2.0,3.0,2.5,2.0],index=names_2)

new_series = my_series + series_2
new_series

Barbara     NaN
Berk       5.00
John       5.50
Tarik      5.75
Theresa     NaN
dtype: float64

### How to find NaN's?

In [24]:
new_series.isnull()  #find null values

Barbara     True
Berk       False
John       False
Tarik      False
Theresa     True
dtype: bool

In [22]:
new_series[new_series.isnull()]

Barbara   NaN
Theresa   NaN
dtype: float64

In [23]:
new_series

Barbara     NaN
Berk       5.00
John       5.50
Tarik      5.75
Theresa     NaN
dtype: float64