# 2. Data Structures in Pandas:

Pandas deals with the following three data structures −
- Series     - 1D labeled homogeneous array, sizeimmutable.
- DataFrame  - General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.


These data structures are built on top of Numpy array, which means they are fast.

**In This Course we will majorly focus on Series and DataFrame**

#### Just like Numpy, before using Pandas we first need to import the Pandas package

In [None]:
!pip install pandas

In [1]:
import pandas as pd

## A.Series

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

**Syntax:**
    pandas.Series( data, index, dtype, copy)

In [2]:
# Creating a Very Basic Series i.e Empty Seires.
s = pd.Series()
print(s)

Series([], dtype: object)


In [5]:
#Creating a Sries with a Python List
s = pd.Series(data=[101,102,103,104],index=["a","b","c","d"])
s

a    101
b    102
c    103
d    104
dtype: int64

In [6]:
s['a']

101

In [7]:
s[0]

101

A dictionary can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out. if given index does not have corresponding value then NaN(null value) will be added in a series.

In [13]:
#Creating a Series with a Python Dictionary
d = {'a' : 99.1, 'b' : 11.0, 'c' : 2.1}
s = pd.Series(data=d,index=['b','a','c','x'])
s

b    11.0
a    99.1
c     2.1
x     NaN
dtype: float64

In [None]:
#We can create a series of specified index order
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
s

#since there is No 'd' in the dictionary it will return a NAN value which is NOT A NUMBER or a Missing value.

In [14]:
#Creating Series with a numpy array & its okay if we dont specify data and index, 
#pandas will treat 1st array as data and 2nd array as index
import numpy as np
d= np.array([0,1,2])
i = np.array(['a','b','c'])


s = pd.Series(data=d,index=i)
s

a    0
b    1
c    2
dtype: int32

In [14]:
#creating a scalar value series
s = pd.Series(5, index=range(1,6))
s

1    5
2    5
3    5
4    5
5    5
dtype: int64

In [15]:
s = pd.Series(data=[100,11,2,21],index=['a','b','c','d'])
s

a    100
b     11
c      2
d     21
dtype: int64

In [17]:
s.mean()

33.5

In [18]:
s.min()

2

In [19]:
s.max()

100

In [21]:
s.median()

16.0

In [22]:
s.quantile(0.75)

40.75

### B.DataFrame

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

**Features of DataFrame:**
1. Potentially columns are of different types
2. Size – Mutable
3. Labeled axes (rows and columns)
4. Can Perform Arithmetic operations on rows and columns

**Syntax:**
pandas.DataFrame( data, index, columns, dtype, copy)

In [23]:
#Creating a Empty DataFrame
df = pd.DataFrame()
df

In [30]:
#Creating a DataFrame with a Python List
data = [[1,2,3,4,5],[44,55,66,77,88]]
df = pd.DataFrame(data,index=["a","b"],columns=["col1","col2","col3","col4","col5"])
df

Unnamed: 0,col1,col2,col3,col4,col5
a,1,2,3,4,5
b,44,55,66,77,88


In [31]:
#Creating a DataFrame with a Python list with multiple columns
data = [['i','ii','iii'],['a','b',"c"],[1,2,3]]
df = pd.DataFrame(data,columns=['A','B','C'])
df

Unnamed: 0,A,B,C
0,i,ii,iii
1,a,b,c
2,1,2,3


In [32]:
#Creating a DataFrame with a Python Dictionary
data = {'Name':['Jay', 'Kumar', 'Mickey', 'Ricky'],'Age':[16,22,18,15]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Jay,16
1,Kumar,22
2,Mickey,18
3,Ricky,15


In [33]:
#Creating a DataFrame with a Python Dictionary
data = {'Name':['Jay', 'Kumar', 'Mickey', 'Ricky'],'Age':[16,22,18,15]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])

df

Unnamed: 0,Name,Age
rank1,Jay,16
rank2,Kumar,22
rank3,Mickey,18
rank4,Ricky,15


In [34]:
#Creating a DataFrame with numpy array
df = pd.DataFrame(np.arange(1,5).reshape(2,2),columns=['A','B'],index=["I","II"])
df

Unnamed: 0,A,B
I,1,2
II,3,4


In [28]:
#not important

In [35]:
#Creating a DataFrame with Series
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df

#since column one does not have dth index

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


In [36]:
#Creating a DataFrame with Random numpy array

df= pd.DataFrame(np.random.randn(10,4),columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,-0.073485,0.382885,1.204282,-1.326645
1,-0.919471,0.5999,-1.213217,0.244031
2,-2.12709,0.715087,0.70486,-0.807133
3,2.136646,1.236557,0.246899,1.67694
4,-1.278123,0.672029,0.34043,1.329324
5,-0.564359,0.264535,-0.632129,1.05716
6,-0.002385,0.776468,2.431869,0.226073
7,0.144607,0.954733,-1.969429,-0.961017
8,-1.0029,0.577808,-2.232624,0.586632
9,1.402208,0.612725,0.444315,-0.307574


In [37]:
df= pd.DataFrame(np.random.randint(low=10,high=15,size=(10,4)),columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,14,11,11,12
1,11,13,11,12
2,12,13,12,14
3,14,13,14,13
4,10,10,12,10
5,12,13,14,10
6,10,10,11,13
7,10,12,14,14
8,10,14,13,14
9,12,10,14,10
