# Pandas Tutorials

> - Pandas is used for Processing (load, manipulate, prepare, model, and analyze) the given data. <br>
> - Pandas is built on top of the Numpy package so Numpy is required to work with Pandas <br>
> - Pandas has 2 data structures for processing the data.
> > 1. Series    --> is a one-dimensional array that is capable of storing various data types.
> > 2. DataFrame --> is a two-dimensional array with labeled axes (rows and columns)

# Import required modules

In [15]:
import numpy as np
import pandas as pd

# A) Series in Pandas

In [21]:
# Series a one-dimensional array that is capable of storing various data types
# The row labels of series are called the index.
# Series is capable of holding Integers, strings, floating point numbers, Python objects.
# It should not contains multiple columns.

print("Series Example")
s = pd.Series(np.arange(15,20))
s

Series Example


0    15
1    16
2    17
3    18
4    19
dtype: int32

### A1) Series from ndarray

In [39]:
print("Series From ndarray. \n\nSeries:")

s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])
print(s)

Series From ndarray. 

Series:
a   -1.140473
b    0.327533
c   -1.009269
d   -0.430283
e   -0.355748
dtype: float64


### A2) Series from Dictionary

In [38]:
print("Series From Dictionary. \n\nSeries:")

dict1 = {'p':111, 'q':222, 'r':333, 's':np.NaN, 't':555}
s = pd.Series(dict1)
print(s)

Series From Dictionary. 

Series:
p    111.0
q    222.0
r    333.0
s      NaN
t    555.0
dtype: float64


### A3) Series from Scalar value

In [41]:
print("Series From Scalar value. Same value will be repeated with index length. \n\nSeries:")

s = pd.Series(125, index=['i','j','k','l'])
print(s)

Series From Scalar value. Same value will be repeated with index length. 

Series:
i    125
j    125
k    125
l    125
dtype: int64


### A4) Series functionalities

In [45]:
print("Series will work like ndarray. Slice operations \n\nSeries:")

dict1 = {'p':111, 'q':222, 'r':333, 's':np.NaN, 't':555}
s = pd.Series(dict1)

print(s)

Series will work like ndarray. Slice operations 

Series:
p    111.0
q    222.0
r    333.0
s      NaN
t    555.0
dtype: float64


In [61]:
print("Slice:\ns[1]:",s[1])
print("s['r']:",s['r'])

print("\n##############################")
print("Filters:s[s > 200]:\n", s[s > 200])

print("\n##############################")
print("Select Multiple indexes:s[0,2,4]:\n", s[[0,2,4]])

print("\n##############################")
print("Check DType:", s.dtype)



Slice:
s[1]: 222.0
s['r']: 333.0

##############################
Filters:s[s > 200]:
 q    222.0
r    333.0
t    555.0
dtype: float64

##############################
Select Multiple indexes:s[0,2,4]:
 p    111.0
r    333.0
t    555.0
dtype: float64

##############################
Check DType: float64


In [64]:
print("\n##############################")
print("Sum 2 Series: s + s :\n", s + s)

print("\n##############################")
print("Multiply by 5 : s*5 :\n", s*5)


##############################
Sum 2 Series: s + s :
 p     222.0
q     444.0
r     666.0
s       NaN
t    1110.0
dtype: float64

##############################
Multiply by 5 : s*5 :
 p     555.0
q    1110.0
r    1665.0
s       NaN
t    2775.0
dtype: float64


# B) DataFrame in Pandas

In [68]:
# DataFrame is a two-dimensional array with labeled axes (rows and columns).
# DataFrame is like Structured table or Excel file
df = pd.DataFrame()
df

### B1) DataFrame from List

In [74]:
print("DataFrame Example using List:\n DataFrame:")

l1 = [2,3,4,5,6,7]
df = pd.DataFrame(l1, index = ['a','b','c','d','e','f'], columns = ['ID_NUM'])
df

DataFrame Example using List:
 DataFrame:


Unnamed: 0,ID_NUM
a,2
b,3
c,4
d,5
e,6
f,7


### B2) DataFrame from Dict

In [76]:
print("DataFrame Example using Dict:\n DataFrame:")

dict1 = {"ID":[101,102,103,104,105], "Name":['AAA','BBB','CCC','DDD','EEE']}
df = pd.DataFrame(dict1)
df

DataFrame Example using Dict:
 DataFrame:


Unnamed: 0,ID,Name
0,101,AAA
1,102,BBB
2,103,CCC
3,104,DDD
4,105,EEE


### B3) DataFrame from ndarray

In [100]:
print("DataFrame Example using Dict:\n DataFrame:")

a = np.array(np.random.rand(10,5))
df = pd.DataFrame(a, index = [np.arange(2000,2010)], columns = ['India', 'USA', 'China', 'Japan', 'Italy'])
df

DataFrame Example using Dict:
 DataFrame:


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.787045,0.749275,0.097031,0.881438,0.821116
2001,0.887407,0.108695,0.748217,0.011144,0.547425
2002,0.384851,0.374839,0.026165,0.439566,0.41668
2003,0.713637,0.76816,0.218996,0.040896,0.847797
2004,0.145452,0.156977,0.442098,0.047159,0.024486
2005,0.368619,0.332193,0.758114,0.422307,0.360263
2006,0.765961,0.833094,0.405134,0.029272,0.557857
2007,0.013015,0.335582,0.655592,0.658877,0.278767
2008,0.49666,0.570448,0.309379,0.170474,0.361058
2009,0.226666,0.604811,0.348438,0.876197,0.684754


### B4) DataFrame Basic functions

In [102]:
print("DataFrame count: \n\n", df.count())

DataFrame count: 

 India    10
USA      10
China    10
Japan    10
Italy    10
dtype: int64


In [103]:
print("DataFrame Columns list: \n\n", df.columns)

DataFrame Columns list: 

 Index(['India', 'USA', 'China', 'Japan', 'Italy'], dtype='object')


In [104]:
print("DataFrame index list: \n\n", df.index)

DataFrame index list: 

 MultiIndex(levels=[[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009]],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])


In [108]:
print("DataFrame Shape(Rows X Columns): \n\n", df.shape)

DataFrame Shape(Rows X Columns): 

 (10, 5)


In [112]:
print("DataFrame Information: \n")
df.info()

DataFrame Information: 

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 10 entries, (2000,) to (2009,)
Data columns (total 5 columns):
India    10 non-null float64
USA      10 non-null float64
China    10 non-null float64
Japan    10 non-null float64
Italy    10 non-null float64
dtypes: float64(5)
memory usage: 506.0 bytes


In [113]:
print("DataFrame Description: \n")
df.describe()

DataFrame Description: 



Unnamed: 0,India,USA,China,Japan,Italy
count,10.0,10.0,10.0,10.0,10.0
mean,0.478931,0.483407,0.400916,0.357733,0.49002
std,0.300466,0.258164,0.256427,0.350324,0.254655
min,0.013015,0.108695,0.026165,0.011144,0.024486
25%,0.262154,0.33304,0.241592,0.042462,0.360462
50%,0.440756,0.472644,0.376786,0.29639,0.482053
75%,0.75288,0.713159,0.602218,0.60405,0.653029
max,0.887407,0.833094,0.758114,0.881438,0.847797


In [124]:
print("DataFrame Sample Data: Given 3 records as Sample.\n")
df.sample(3) 

DataFrame Sample Data: Given 3 records as Sample.



Unnamed: 0,India,USA,China,Japan,Italy
2005,0.368619,0.332193,0.758114,0.422307,0.360263
2002,0.384851,0.374839,0.026165,0.439566,0.41668
2006,0.765961,0.833094,0.405134,0.029272,0.557857
