# 2. Data Structures in Pandas:

Pandas deals with the following three data structures −
- Series     - 1D labeled homogeneous array, sizeimmutable.
- DataFrame  - General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.


These data structures are built on top of Numpy array, which means they are fast.

**In This Course we will majorly focus on Series and DataFrame**

#### Just like Numpy, before using Pandas we first need to import the Pandas package

In [1]:
import pandas as pd

## A.Series

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

**Syntax:**
    pandas.Series( data, index, dtype, copy)

In [2]:
# Creating a Very Basic Series i.e Empty Seires.
s = pd.Series()
print(s)

Series([], dtype: float64)


  s = pd.Series()


In [3]:
#Creating a Sries with a Python List
s = pd.Series(data=[0,1,2],index = ['a','b','c'])
s

a    0
b    1
c    2
dtype: int64

A dictionary can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

In [4]:
#Creating a Series with a Python Dictionary
data = {'a' : 0.0, 'b' : 1.0, 'c' : 2.0}
s = pd.Series(data=data)
s

a    0.0
b    1.0
c    2.0
dtype: float64

In [5]:
#We can create a series of specified index order
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
s

#since there is No 'd' in the dictionary it will return a NAN value which is NOT A NUMBER or a Missing value.

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

In [6]:
#Creating Series with a numpy array & its okay if we dont specify data and index, 
#pandas will treat 1st array as data and 3nd array as index
import numpy as np

i = np.array(['a','b','c'])
d= np.array([0,1,2])

s = pd.Series(d,i)
s

a    0
b    1
c    2
dtype: int32

In [7]:
#creating a scalar value series
s = pd.Series(5, index=[0, 1, 2, 3])
s

0    5
1    5
2    5
3    5
dtype: int64

### B.DataFrame

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

**Features of DataFrame:**
1. Potentially columns are of different types
2. Size – Mutable
3. Labeled axes (rows and columns)
4. Can Perform Arithmetic operations on rows and columns

**Syntax:**
pandas.DataFrame( data, index, columns, dtype, copy)

In [8]:
#Creating a Empty DataFrame
df = pd.DataFrame()
df

In [9]:
#Creating a DataFrame with a Python List
data = [1,2,3,4,5]
df = pd.DataFrame(data)

df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [10]:
#Creating a DataFrame with a Python list with multiple columns
data = [['i','ii','iii'],['a','b','c'],[1,2,3]]
df = pd.DataFrame(data,columns=['A','B','C'])
df

Unnamed: 0,A,B,C
0,i,ii,iii
1,a,b,c
2,1,2,3


In [11]:
#Creating a DataFrame with a Python Dictionary
data = {'Name':['Jay', 'Kumar', 'Mickey', 'Ricky'],'Age':[16,22,18,15]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Jay,16
1,Kumar,22
2,Mickey,18
3,Ricky,15


In [12]:
#Creating a DataFrame with a Python Dictionary
data = {'Name':['Jay', 'Kumar', 'Mickey', 'Ricky'],'Age':[16,22,18,15]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])

df

Unnamed: 0,Name,Age
rank1,Jay,16
rank2,Kumar,22
rank3,Mickey,18
rank4,Ricky,15


In [13]:
#Creating a DataFrame with numpy array
df = pd.DataFrame(np.arange(1,5).reshape(2,2),columns=['A','B'])
df

Unnamed: 0,A,B
0,1,2
1,3,4


In [14]:
#Creating a DataFrame with Series
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df

#since column one does not have dth index

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


In [15]:
#Creating a DataFrame with Random numpy array

df= pd.DataFrame(np.random.randn(10,4),columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,1.141268,0.034799,0.684357,0.758854
1,-1.891255,-0.460355,1.556351,-1.383098
2,1.392269,0.0221,0.476869,0.699545
3,0.521123,0.476854,1.156822,0.811398
4,-0.065799,-0.114678,-0.461951,0.511564
5,-2.189384,-0.178886,0.22279,-1.75273
6,1.91215,0.685901,1.624141,0.189384
7,-2.570097,0.289148,0.147938,-0.55458
8,1.022555,-0.019027,-0.827531,0.57534
9,-1.782256,1.40501,0.479412,1.003899
