## What is Pandas?

*   I'ts an open source Python package written for the Python programming language for data manipulation, analysis and ML tasks
*   It's built on top of another package named Numpy, which provides support for mathematical computations and multi-dimensional arrays



## Importing Pandas Library
First you need to have Pandas LIvrary downloaded in your system and then import it in your jupyter notebook with the command line below.

In [1]:
#importing Pandas Library as pd (an alias name given to Pandas Library)
import pandas as pd

## Pandas Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). Series in Pandas returns both values and indexes associated with it.

In [4]:
# create new series
s = pd.Series([100,290,40,199,76])
s

0    100
1    290
2     40
3    199
4     76
dtype: int64

In [3]:
# check the type of Series
type(s)

pandas.core.series.Series

In [5]:
# row axis labels
s.axes

[RangeIndex(start=0, stop=5, step=1)]

In [6]:
# check the data type of Series
s.dtype

dtype('int64')

In [7]:
# number of elements
s.size

5

In [8]:
# number of diemnsions
s.ndim

1

In [9]:
# ndarry depending on the dtype
s.values

array([100, 290,  40, 199,  76])

In [11]:
# specify indexes in string/objects
s1 = pd.Series([1, 2, 4, 5, 6], index = ["First", "Zero", "Second", "Third", "Fourth"])
s1

First     1
Zero      2
Second    4
Third     5
Fourth    6
dtype: int64

In [12]:
# sort by index alphabetically
s1.sort_index()

First     1
Fourth    6
Second    4
Third     5
Zero      2
dtype: int64

### Creating Series wtih Dictionaries

In [13]:
ages = {'Andrew': 31, "Kate": 45, "Matthew": 26, "Helen": 19}
new_ages = pd.Series(ages)
new_ages

Andrew     31
Kate       45
Matthew    26
Helen      19
dtype: int64

In [14]:
# select particular elements from dict
pd.Series(ages,index =["Andrew","Helen"])

Andrew    31
Helen     19
dtype: int64

### Creating Pandas Series by Numpy Arrays

In [15]:
import numpy as np

n_one = np.array([1,2,3,4])
pd.Series(n_one)

0    1
1    2
2    3
3    4
dtype: int64

### Merging Two Series (Concat)

In [16]:
s1 = pd.Series([2,3,55,2,6,44]) 
s2 = pd.Series([42,32,34,2,1,4,42])
pd.concat([s1,s2])

0     2
1     3
2    55
3     2
4     6
5    44
0    42
1    32
2    34
3     2
4     1
5     4
6    42
dtype: int64

## Pandas DataFrame
Pandas DataFrame is two-dimensional size-mutable, a heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e. data is aligned in a tabular fashion in rows and columns.

### Creating DataFrame

In [17]:
names = {"Names": ["Allen", "Rob", "Harold", "Amy"], "Age": [21, 11, 13, 15]} 

# Creating a DataFrame using a Dictionary.
new_dic = pd.DataFrame(names)
new_dic["Age"]

0    21
1    11
2    13
3    15
Name: Age, dtype: int64

In [19]:
# assign column name
var = [10, 30, 20, 89, 48, 40]
df = pd.DataFrame(var, columns = ["Variables"])
df

Unnamed: 0,Variables
0,10
1,30
2,20
3,89
4,48
5,40


In [23]:
# create from numpy
arr = np.random.randint(10, size = (5, 2))

new_arr= pd.DataFrame(arr,columns = ["Var1","Var2"])
new_arr

Unnamed: 0,Var1,Var2
0,3,5
1,4,3
2,8,4
3,9,2
4,9,2


In [24]:
# determine shape
new_arr.shape

(5, 2)

In [25]:
# dimension of DataFrame
new_arr.ndim

2

In [26]:
# number of elements
new_arr.size

10

In [27]:
# getting column names
new_arr.columns

Index(['Var1', 'Var2'], dtype='object')

### Accessing the rows of the DataFrame

In [37]:
dfc = pd.DataFrame(
    {
        "Name": ["Josh", "Rachel", "Tim", "Kate", "Zach", "Andrew"],
        "Age": [11, 13, 16, 12, 14, 18],
        "Salary": [10000, 23000, 18000, 3900000, 19000, 24000]
     })
dfc

Unnamed: 0,Name,Age,Salary
0,Josh,11,10000
1,Rachel,13,23000
2,Tim,16,18000
3,Kate,12,3900000
4,Zach,14,19000
5,Andrew,18,24000


In [38]:
dfc.Age

0    11
1    13
2    16
3    12
4    14
5    18
Name: Age, dtype: int64

In [39]:
dfc["Age"][3]

12

### Assigning a Value to a Specific Row
We are accessing the DataFrame using the `iloc` and `loc` and change the values of the DataFrame 

In [40]:
dfc.iloc[2] = ["Ron", 15, 185]
dfc

Unnamed: 0,Name,Age,Salary
0,Josh,11,10000
1,Rachel,13,23000
2,Ron,15,185
3,Kate,12,3900000
4,Zach,14,19000
5,Andrew,18,24000


In [41]:
roll_no = [112890, 39080, 18878, 38788, 9070, 50830]

# adding new column : "Roll Number"
dfc["Roll Number"] = roll_no
dfc

Unnamed: 0,Name,Age,Salary,Roll Number
0,Josh,11,10000,112890
1,Rachel,13,23000,39080
2,Ron,15,185,18878
3,Kate,12,3900000,38788
4,Zach,14,19000,9070
5,Andrew,18,24000,50830


In [42]:
# set index on the basis of "Roll Number"
dfc.set_index("Roll Number", inplace = True)
dfc

Unnamed: 0_level_0,Name,Age,Salary
Roll Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
112890,Josh,11,10000
39080,Rachel,13,23000
18878,Ron,15,185
38788,Kate,12,3900000
9070,Zach,14,19000
50830,Andrew,18,24000


In [43]:
dfc.loc[9070]

Name       Zach
Age          14
Salary    19000
Name: 9070, dtype: object

### Sorting Indexes

In [46]:
dfc = pd.DataFrame(
    {
        "Name": ["Josh", "Rachel", "Tim", "Kate", "Zach", "Andrew"],
        "Age": [11, 13, 16, 12, 14, 18],
        "Salary": [10000, 23000, 18000, 3900000, 19000, 24000]
     }, index = [1, 89, 39, 36, 78, 54])

dfc.sort_index(inplace=True)
dfc

Unnamed: 0,Name,Age,Salary
1,Josh,11,10000
36,Kate,12,3900000
39,Tim,16,18000
54,Andrew,18,24000
78,Zach,14,19000
89,Rachel,13,23000


### Filtering in DataFrame

In [47]:
employees = pd.DataFrame(
    {
        "Name": ["Josh", "Mike", "Julia", "Sergio"],
        "Department": ["IT", "Human Resources", "Finance", "Supply Chain"],
        "Income": [4800, 5200, 6600, 5700],
        "Age": [24, 28, 33, 41]
     })

employees

Unnamed: 0,Name,Department,Income,Age
0,Josh,IT,4800,24
1,Mike,Human Resources,5200,28
2,Julia,Finance,6600,33
3,Sergio,Supply Chain,5700,41


In [48]:
employees["Department"] == "IT"

0     True
1    False
2    False
3    False
Name: Department, dtype: bool

In [49]:
employees.loc[employees["Department"] == "IT", "Name"]

0    Josh
Name: Name, dtype: object

In [50]:
employees[employees["Income"] > 5500]

Unnamed: 0,Name,Department,Income,Age
2,Julia,Finance,6600,33
3,Sergio,Supply Chain,5700,41


In [52]:
employees[(employees["Age"] > 30) | (employees["Department"] == "HR")]

Unnamed: 0,Name,Department,Income,Age
2,Julia,Finance,6600,33
3,Sergio,Supply Chain,5700,41


In [53]:
employees[~(employees["Age"]<35)]

Unnamed: 0,Name,Department,Income,Age
3,Sergio,Supply Chain,5700,41


In [54]:
employees.filter(items = ["Department", "Name", "Income"])

Unnamed: 0,Department,Name,Income
0,IT,Josh,4800
1,Human Resources,Mike,5200
2,Finance,Julia,6600
3,Supply Chain,Sergio,5700


## Add and Remove Rows

In [55]:
employees.append({"Name": "Romeo"}, ignore_index=True)

Unnamed: 0,Name,Department,Income,Age
0,Josh,IT,4800.0,24.0
1,Mike,Human Resources,5200.0,28.0
2,Julia,Finance,6600.0,33.0
3,Sergio,Supply Chain,5700.0,41.0
4,Romeo,,,


In [56]:
employees.append({"Name": "Romeo", "Age": 26, "Department": "IT", "Income": 5500}, ignore_index=True)


Unnamed: 0,Name,Department,Income,Age
0,Josh,IT,4800,24
1,Mike,Human Resources,5200,28
2,Julia,Finance,6600,33
3,Sergio,Supply Chain,5700,41
4,Romeo,IT,5500,26


In [59]:
employees.drop(employees[employees["Age"] > 30].index)

Unnamed: 0,Name,Department,Income,Age
0,Josh,IT,4800,24
1,Mike,Human Resources,5200,28
