## Library for Pandas

In [29]:
import pandas as pd
import numpy as np

Pandas is an open-source Python Library and is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.

Other Data Structure are Data Series and Panel

## What are Data Frames ?
They are the combination of more than 1 row and more than 1 column and it shows how the data is represented in terms of rows and columns.
Its dimension is 2D because of rows and columns. Data and size are mutable in a dataframe and data can be hetrogeneous.

In [None]:
## Creating a DataFrame

In [38]:
df=pd.DataFrame(np.arange(1,21).reshape(4,5)) #pd.DataFrame is the function used for creating a Data Frame
df

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15
3,16,17,18,19,20


In [41]:
type(df) #here we can see that the type is DataFrame

pandas.core.frame.DataFrame

## What are Data Series?

They are the again combination of rows and columns but the combination is as follows:
1. one row with many columns
2. one column with many rows.
It is 1D dimensional structure where data is mutable but size is immutable. Data can be homogeneous only.


So basically if we consider just first row and all the columns of the above made df , we will have the series.
#### Lets Check this:

In [53]:
type(df.loc[0]) #it throws series.

pandas.core.series.Series

In [73]:
Q=pd.Series(anylist)
#pd.series is the function used create Series
Q

0      1
1      2
2      3
3     66
4    555
5      4
6      5
dtype: int64

In [74]:
type(Q)

pandas.core.series.Series

## What are panel ?

Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.

# DATA SERIES

Series can be created using an array, a constant value or a dictionary.

In [84]:
#creating an empty series
series=pd.Series()
series

Series([], dtype: float64)

In [85]:
#creating series from an array
array=np.array([1,3,4,56,88])
series0=pd.DataFrame(array)
series0

Unnamed: 0,0
0,1
1,3
2,4
3,56
4,88


In [86]:
#creating series from a dict
dict={1:'a','a' : 0, 'b' : 1, 'c' : 2}
series1=pd.Series(dict)
series1

1    a
a    0
b    1
c    2
dtype: object

In [106]:
#creating array from a constant value
a=1000
series2=pd.Series(1000,index=[1,3,20,4])
series2

1     1000
3     1000
20    1000
4     1000
dtype: int64

## Indexing:
The values from a series is accessed just we do in Numpy.

In [94]:
series1

1    a
a    0
b    1
c    2
dtype: object

In [93]:
series1[:2] #accessing the 1st two rows

1    a
a    0
dtype: object

In [101]:
series1[2:] #last two rows

b    1
c    2
dtype: object

In [105]:
series1[-3:] #last 3 elements

a    0
b    1
c    2
dtype: object

In [116]:
series1[['a','b','c']] #we can also access using the label names but they should be called in a list

a    0
b    1
c    2
dtype: object

# DATA FRAME

Data Frames can be constructed using:
    0. An empty Data Frame
    1. list
    2. array
    3. dict
    4. another dataframe
    5. series

In [122]:
#creating an empty DataFrame
df1=pd.DataFrame()
df1

In [125]:
#creating df using list
anylist=[1,2,3,66,555,4,5]
df2=pd.DataFrame(anylist)
df2

Unnamed: 0,0
0,1
1,2
2,3
3,66
4,555
5,4
6,5


In [135]:
#Creating df using a nested listed
nested=[[1,333],[3,555],[5,999]]
df4=pd.DataFrame(nested,columns=['NUMBER','AGE'])
df4

Unnamed: 0,NUMBER,AGE
0,1,333
1,3,555
2,5,999


In [129]:
#creating df using array
array=np.random.rand(4,4)
df3=pd.DataFrame(array)
df3

Unnamed: 0,0,1,2,3
0,0.449136,2.7e-05,0.331692,0.118628
1,0.196461,0.389761,0.444628,0.122099
2,0.581371,0.180961,0.879333,0.32856
3,0.162281,0.204467,0.224289,0.746294


In [138]:
#creating df using Dict
#1st example:
dict = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df5=pd.DataFrame(dict)
df5

Unnamed: 0,Name,Age
0,Tom,28
1,Jack,34
2,Steve,29
3,Ricky,42


In [139]:
#2nd example
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

[{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

In [165]:
df5=pd.DataFrame(data)
df5

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [166]:
#creating df using another df
df6=pd.DataFrame(df5)
df6

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


### Customized Index and Columns 

In [56]:
anylist=[1,2,3,66,555,4,5]
s=pd.DataFrame(s)

In [57]:
s.columns=['Random']  #dataframe.columns is the function

In [58]:
s.index=['a','b','c','d','e','f','i'] #dataframe.index is the function

In [59]:
s

Unnamed: 0,Random
a,1
b,2
c,3
d,66
e,555
f,4
i,5


All the above can be collaborated into the following code


In [60]:
D=pd.DataFrame(anylist,index=['a','b','c','d','e','f','i'],columns=['Random'])
#NOTE THAT INDEX AND COLUMNS SHOULD ALWAYS BE A COLLECTION OF SOMETHING

In [61]:
D

Unnamed: 0,Random
a,1
b,2
c,3
d,66
e,555
f,4
i,5


## Accessing the Values in the DataFrame

In [62]:
df

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15
3,16,17,18,19,20


In [64]:
df[:]

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15
3,16,17,18,19,20


In [65]:
df.iloc[:]

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15
3,16,17,18,19,20


# Going Deep in DataFrames :Column Wise

### selection of a column

In [167]:
df6['c']

0     NaN
1    20.0
Name: c, dtype: float64

### addition of a new column

In [168]:
df6['d']=[4,6]
df6

Unnamed: 0,a,b,c,d
0,1,2,,4
1,5,10,20.0,6


### deletion of an existing column

There are two ways to delete the columns:
    1. Del 
    2. pop

In [169]:
del df6['d']

In [170]:
df6.pop('c')

0     NaN
1    20.0
Name: c, dtype: float64

In [171]:
df6

Unnamed: 0,a,b
0,1,2
1,5,10


#  Going Deep in DataFrames : Row Wise

### Selection / Indexing 

Rows can be selected using 3 ways:
    1. iloc 
    2. loc
    3. slicing
    

LOC :
It is used to select the rows as per their labels.
Labels means the name of the each row.

In [178]:
df5

Unnamed: 0,a,b
0,1,2
1,5,10


ILOC:
It is used to select the element using its index value.

In [179]:
df5.iloc[1,1] #selection of 10

10

Slicing

In [184]:
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df7=pd.DataFrame(d)
df7

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


In [189]:
df7[1:3] #note that slicing is done for selecting rows only

Unnamed: 0,one,two
b,2.0,2
c,3.0,3


### Addition of row

append function can be used

In [193]:
df7.append(df6)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


Unnamed: 0,a,b,one,two
a,,,1.0,1.0
b,,,2.0,2.0
c,,,3.0,3.0
d,,,,4.0
0,1.0,2.0,,
1,5.0,10.0,,
