# Pandas

Pandas is an open-source Python Library providing high-performance data manipulation
and analysis tool using its powerful data structures. The name Pandas is derived from the
word Panel Data – an Econometrics from Multidimensional data.

## Key Features of Pandas

1. Fast and efficient DataFrame object with default and customized indexing.
2. Tools for loading data into in-memory data objects from different file formats.
3. Data alignment and integrated handling of missing data.
4. Reshaping and pivoting of date sets.
5. Label-based slicing, indexing and subsetting of large data sets.
6. Columns from a data structure can be deleted or inserted.
7. Group by data for aggregation and transformations.
8. High performance merging and joining of data.
9. Time Series functionality.

## Installing Pandas

### 1. Ancaonda

If you install Anaconda Python package, Pandas will be installed by default

### 2. Ubuntu 

pip install pandas

or

sudo apt-get install python-numpy python-scipy python-matplotlibipythonipython-
notebook python-pandas python-sympy python-nose

## Data Structure

Pandas deals with the following three data structures:

1. Series
2. DataFrame
3. Panel

These data structures are built on top of Numpy array, which means they are fast.

### Mutability

All Pandas data structures are value mutable (can be changed) and except Series all are
size mutable. Series is size immutable

### 1. Series

pandas.Series()

One-dimensional ndarray with axis labels (including time series).

In [3]:
import pandas as pd
my_series = pd.Series([1, 2, 3,4,5],index=['row1','row2','row3','row4','row5'])
my_series

row1    1
row2    2
row3    3
row4    4
row5    5
dtype: int64

#### 1.1 Show Values

In [4]:
my_series.values

array([1, 2, 3, 4, 5])

#### 1.2 Show index

In [5]:
my_series.index

Index(['row1', 'row2', 'row3', 'row4', 'row5'], dtype='object')

#### 1.3 Select index

In [6]:
my_series.row2

2

In [7]:
my_series['row2']

2

#### 1.4 Boolean indexing

In [8]:
my_series[my_series>3]

row4    4
row5    5
dtype: int64

#### Example : Set alphabet label as new index

In [9]:
my_series.index = ['A','B','C','D','E']
my_series

A    1
B    2
C    3
D    4
E    5
dtype: int64

### 2. Data Frame

pandas.DataFrame()

Two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). Arithmetic operations align on both row and
column labels. Can be thought of as a dict-like container for Series objects. The primary
pandas data structure.

#### 2.1 Create Data Frame with Array

In [10]:
import numpy as np
my_array = np.array([[1 ,5 ,9 ,13],[2 ,6 ,10 ,14],[3 ,7 ,11 ,15],[4 ,8 ,12 ,16]])
my_df = pd.DataFrame(my_array,index=['row1' ,'row2' ,'row3' ,'row4'],columns=['col1' ,'col2' ,'col3' ,'col4'])
my_df

Unnamed: 0,col1,col2,col3,col4
row1,1,5,9,13
row2,2,6,10,14
row3,3,7,11,15
row4,4,8,12,16


#### 2.2 Create Data Frame with Dictionary

In [11]:
my_dict = {'col1':[1,2,3,4],'col2':[5,6,7,8],'col3':[9,10,11,12],'col4':[13,14,15,19]}
my_df = pd.DataFrame(my_dict, index=['row1','row2','row3','row4'])
my_df

Unnamed: 0,col1,col2,col3,col4
row1,1,5,9,13
row2,2,6,10,14
row3,3,7,11,15
row4,4,8,12,19


#### 2.3 Show index

In [12]:
my_df.index

Index(['row1', 'row2', 'row3', 'row4'], dtype='object')

#### 2.4 Show Columns

In [13]:
my_df.columns

Index(['col1', 'col2', 'col3', 'col4'], dtype='object')

#### 2.5 Show Value

In [14]:
my_df.values

array([[ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15],
       [ 4,  8, 12, 19]])

#### 2.6 Selecting

In [15]:
my_df

Unnamed: 0,col1,col2,col3,col4
row1,1,5,9,13
row2,2,6,10,14
row3,3,7,11,15
row4,4,8,12,19


In [16]:
my_df.loc['row1'][:]

col1     1
col2     5
col3     9
col4    13
Name: row1, dtype: int64

In [17]:
my_df.iloc[0][:]

col1     1
col2     5
col3     9
col4    13
Name: row1, dtype: int64

#### 2.7 Edit a Data Frame

In [18]:
my_df['col5'] = [20 ,21 ,22 ,23]
my_df

Unnamed: 0,col1,col2,col3,col4,col5
row1,1,5,9,13,20
row2,2,6,10,14,21
row3,3,7,11,15,22
row4,4,8,12,19,23


In [19]:
my_df.loc[['row1','row2'],'col1'] = 0
my_df

Unnamed: 0,col1,col2,col3,col4,col5
row1,0,5,9,13,20
row2,0,6,10,14,21
row3,3,7,11,15,22
row4,4,8,12,19,23


#### 2.8 Reset index

In [20]:
my_df.reset_index(drop=True)

Unnamed: 0,col1,col2,col3,col4,col5
0,0,5,9,13,20
1,0,6,10,14,21
2,3,7,11,15,22
3,4,8,12,19,23


#### 2.9 Deleting

In [21]:
my_df.drop('col5',axis=1)

Unnamed: 0,col1,col2,col3,col4
row1,0,5,9,13
row2,0,6,10,14
row3,3,7,11,15
row4,4,8,12,19


#### 2.10 Renaming

In [22]:
my_df.rename(columns={'col4':'col_four'})

Unnamed: 0,col1,col2,col3,col_four,col5
row1,0,5,9,13,20
row2,0,6,10,14,21
row3,3,7,11,15,22
row4,4,8,12,19,23


#### 2.11 Replacing

In [23]:
my_df.replace({0:1},regex=True)

Unnamed: 0,col1,col2,col3,col4,col5
row1,0,5,9,13,20
row2,0,6,10,14,21
row3,3,7,11,15,22
row4,4,8,12,19,23


#### 2.12 Apply function on index

In [24]:
my_df.col1 = ['{:3.2f}'.format(x) for x in my_df.iloc[:,0] ]
my_df

Unnamed: 0,col1,col2,col3,col4,col5
row1,0.0,5,9,13,20
row2,0.0,6,10,14,21
row3,3.0,7,11,15,22
row4,4.0,8,12,19,23


In [25]:
my_df['col2'] = my_df['col2'].apply(lambda x:'{0:3.2f}'.format(x))
my_df

Unnamed: 0,col1,col2,col3,col4,col5
row1,0.0,5.0,9,13,20
row2,0.0,6.0,10,14,21
row3,3.0,7.0,11,15,22
row4,4.0,8.0,12,19,23


#### 2.13 Sorting

sort index

In [26]:
my_df.sort_index(axis=1,ascending=False)

Unnamed: 0,col5,col4,col3,col2,col1
row1,20,13,9,5.0,0.0
row2,21,14,10,6.0,0.0
row3,22,15,11,7.0,3.0
row4,23,19,12,8.0,4.0


sort values

In [27]:
my_df.sort_values(by='col1',ascending=False)

Unnamed: 0,col1,col2,col3,col4,col5
row4,4.0,8.0,12,19,23
row3,3.0,7.0,11,15,22
row1,0.0,5.0,9,13,20
row2,0.0,6.0,10,14,21


#### 2.14 Methods

In [28]:
my_df.head()

Unnamed: 0,col1,col2,col3,col4,col5
row1,0.0,5.0,9,13,20
row2,0.0,6.0,10,14,21
row3,3.0,7.0,11,15,22
row4,4.0,8.0,12,19,23


In [29]:
my_df.head(2)

Unnamed: 0,col1,col2,col3,col4,col5
row1,0.0,5.0,9,13,20
row2,0.0,6.0,10,14,21


In [30]:
my_df.tail(2)

Unnamed: 0,col1,col2,col3,col4,col5
row3,3.0,7.0,11,15,22
row4,4.0,8.0,12,19,23


### 3. Panel

Panel is deprecated and will be removed in a future version. The 3-
D structure of a Panel is much less common for many types of data analysis, than the 1-D of
the Series or the 2-D of the DataFrame.

## Import Data

In [31]:
data = pd.read_csv('~/Academics/Softwares/DeepLearning-Tutorials/Pandas/SmartPhones.csv')

In [32]:
data

Unnamed: 0,Name,OS,Capacity,Ram,Weight,Company,Inch
0,Galaxy S8,Android,64,4,149.0,Samsung,5.8
1,Lumia 950,Windows,32,3,150.0,Microsoft,5.2
2,Xperia L1,Android,16,2,180.0,Sony,5.5
3,iphone 7,ios,128,2,138.0,Sony,5.5
4,U Ultra,Android,64,4,170.0,HTC,5.7
5,Galaxy S5,Android,16,2,145.0,Samsung,5.1
6,iphone 5s,ios,32,1,112.0,Apple,4.0
7,Moto G5,Android,16,3,144.5,Motorola,5.0
8,Pixel,Android,128,4,143.0,Google,5.0
