

<img src="https://pandas.pydata.org/_static/pandas_logo.png"/>


_pandas_ is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc. In this tutorial, we will learn the various features of Python Pandas and how to use them in practice.

## Pandas Key Features

- Fast and efficient __DataFrame__ object with default and customized indexing.
- Tools for loading data into __in-memory__ data objects from different file formats.
- Data alignment and integrated handling of missing data.
- __Reshaping__ and pivoting of date sets.
- Label-based slicing, __indexing__ and subsetting of large data sets.
- __Columns__ from a data structure can be deleted or inserted.
- Group by data for __aggregation__ and transformations.
- High performance __merging and joining__ of data.
- __Time Series__ functionality.


### Pandas Data Structures

The main data structure in use by Pandas are:

		

| Data Structure | Dimensions   | Description                                |
|----------------|--------------|--------------------------------------------|
|  Series        |       1      |              1D labeled homogeneous array  |
|   Data Frame   |       2      |   General 2D labeled tabular structure     |
| Panel  | 3|   General 3D labeled, size-mutable array.  |

The more common data structure in analytical use is the __DataFrame__

A __series__ is basically a list of objects of the same data type. 

A __DataFrame__ is group of series that are not necessarily of the same data type. 



### pandas.Series

A series is an _indexed_ list of values of the same type, and of fix length. 

Let's see some ways to create a series in pandas.


In [2]:
# creating an empty series

#import the pandas library and aliasing as pd
import numpy as np
import pandas as pd

s = pd.Series()
print s

Series([], dtype: float64)


In [3]:
# creating a series from a Numpy array. remember that series has an index!
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print s

100    a
101    b
102    c
103    d
dtype: object


In [4]:
# creating a series from a python Dictionary, while using the keys as index
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s

a    0.0
b    1.0
c    2.0
dtype: float64


In [7]:
# create a repetitive series from a scalar
s = pd.Series(5, index=range(0,10))
print s

0    5
1    5
2    5
3    5
4    5
5    5
6    5
7    5
8    5
9    5
dtype: int64


#### Accessing Series Data

In [11]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

# using the Pythonic list slice notion
print s[0], '\n'
print s[:3]

1 

a    1
b    2
c    3
dtype: int64


In [15]:
# get elements using their index: 
print s['b'], '\n'
print s[['a', 'b', 'd']]

2 

a    1
b    2
d    4
dtype: int64


### The DataFrame

Pandas data frames is a Tabular-like  data structures, combining indexed rows and named columns. 

A DataFrame can be created from Lists, Dictionaries, Series, Numpy ndarrays, Another DataFrame or straight from files and databases. 


__Examples:__

In [17]:
import pandas as pd
data = [['Alice',20],['Bob',32],['Charlie',25]]
df = pd.DataFrame(data, columns=['Name','Age'])
print df

      Name  Age
0    Alice   20
1      Bob   32
2  Charlie   25


In [18]:
# create a data frame from a dictionary
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],
        'Age' :[28,34,29,42]
       }
df = pd.DataFrame(data)
print df

   Age   Name
0   28    Tom
1   34   Jack
2   29  Steve
3   42  Ricky


In [20]:
# create a data frame from a list of dictionaries (e.g. a json list)
data = [{'name': 'Felix', 'Age': 22},
        {'name': 'Joe',   'Age': 19},
        {'name': 'Alexa', 'Age': 28, 'Title' : 'CEO'},
       ]
df = pd.DataFrame(data)
print df

   Age Title   name
0   22   NaN  Felix
1   19   NaN    Joe
2   28   CEO  Alexa


In [21]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df

     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13
