# Pandas
---

> **Pandas** is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.  
Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data : 
1. load
2. prepare 
3. manipulate
4. model
5. analyze


## Key Features of Pandas
---
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.

## Installation
---
Using PIP ( Python Package Manager ) 
```pip install pandas```  
*If using winpython no need to install pandas*


## Data Structures in Pandas
---
Pandas deals with the following three data structures, which are built on top of Numpy array −  

| SN | Data Structure | Dimension | Description |
|:----:|:----------------:|:-----------:|:-------------|
| 1  | Series         |     1     | 1D, Homogeneous Data, Size Immutable, Values of Data Mutable|
| 2  | Data Frame     |     2     | 2D, Heterogeneous data, Size Mutable, Data Mutable |
| 3  | Panel          |     3     | 3D, Heterogeneous data, Size Mutable, Data Mutable |



### Series 
---
- 1D Array
- Values can be any of the types : integer, string, float, python objects, etc.
- Is indexed

#### Creating a series
--- 
``` pandas.series(data, index, dtype, copy) ```  
where
- data : data takes various forms like ndarray, list, constants
- index : Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
- dtype : dtype is for data type. If None, data type will be inferred
- copy : Copy data. Default False

In [25]:
## Hashable example
## All immutable built-in objects in python are hashable. 
## Mutable containers like lists and dictionaries are not hashable while immutable container tuple is hashable
print("String".__hash__())
print("String".__hash__())
lst = ["String",1]
type(lst)
lst.__hash__?

4308745190999692981
4308745190999692981


In [26]:
#Import 
import pandas as pd
import numpy as np

### Create an Empty Series

In [27]:
s = pd.Series()
print(s)

Series([], dtype: float64)


  s = pd.Series()


### Create a Series from ndarray 
If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1].

In [38]:
#create series 
data = np.array(list("PANDAS"))
print(data, end = "\n\n--------------\n")
s = pd.Series(data)
print(s)
type(s)

['P' 'A' 'N' 'D' 'A' 'S']

--------------
0    P
1    A
2    N
3    D
4    A
5    S
dtype: object


pandas.core.series.Series

In [39]:
# create series by explicitly specifying index
si = pd.Series(data, index=data)
print(si)

P    P
A    A
N    N
D    D
A    A
S    S
dtype: object


### Create a series from dictionary

In [41]:
data_dict = {'param1' : "param1 value", 'param2' : "param2 Value", 'param3' : "param3 Value"}
sd = pd.Series(data_dict)
print(sd)
# Note the index is keys when not specified

param1    param1 value
param2    param2 Value
param3    param3 Value
dtype: object


### Create a Series from Scalar

In [42]:
ss = pd.Series(5, index=[0, 1, 2, 3])
print(ss)

0    5
1    5
2    5
3    5
dtype: int64


### Accessing Data from Series with Position

In [53]:
print(s)
print("Get First Item : s[0] => \n",s[0])
print("Get First 3 Items : s[:3] => \n",s[:3])
print("Get Last 3 Items : s[-3:] => \n",s[-3:])


0    P
1    A
2    N
3    D
4    A
5    S
dtype: object
Get First Item : s[0] => 
 P
Get First 3 Items : s[:3] => 
 0    P
1    A
2    N
dtype: object
Get Last 3 Items : s[-3:] => 
 3    D
4    A
5    S
dtype: object


### Retrieve Data Using Label (Index)

In [54]:
print(si)

P    P
A    A
N    N
D    D
A    A
S    S
dtype: object


In [57]:
si[['A','P']]

A    A
A    A
P    P
dtype: object