<a href="https://colab.research.google.com/github/mayankgupta5758/localrepo/blob/main/19_Pandas1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas

* Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.

* Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.



* Pandas deals with the following three data structures

    1) Series

    2) DataFrame

    3) Panel
    
## 1) Series
Series is a one-dimensional array like structure with homogeneous data.
    
### Creating Series
A series can be created using various inputs like −

1) Array.

2) Dict.

3) Scalar value or constant

##### installing pandas

In [None]:
!pip install pandas



#### 1) Array

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

In [None]:
#Without passing index
#numpy array
arr = np.array(['a','b','c','d'])
print(arr)

['a' 'b' 'c' 'd']


In [None]:
#creating the series using numpy array
s = pd.Series(arr)
print(s)

0    a
1    b
2    c
3    d
dtype: object


In [None]:
#check the type
type(s)

pandas.core.series.Series

In [None]:
#by passing index.
#numpy array
arr = np.array(['a','b','c','d'])

#creating series
s = pd.Series(arr,index=[100,101,102,103])
s

100    a
101    b
102    c
103    d
dtype: object

#### 2) Dictionary
here keys will be our default indexes

In [None]:
# Ex1)
data = {'a' : 0, 'b' : 1, 'c' : 2}
s = pd.Series(data)
s

a    0
b    1
c    2
dtype: int64

#### Scalar

In [None]:
# Create a Series from Scalar
s = pd.Series(5, index=[0, 1, 2, 3])
s

0    5
1    5
2    5
3    5
dtype: int64

### Accessing Data from Series

#### 1) By using the position

In [None]:
# Data in the series can be accessed similar to that in an ndarray.
s = pd.Series([1,2,3,4,5])
s

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [None]:
#retrieve the first element
s[0]

1

In [None]:
# first 3 elements
s[0:3]   #[start,stop,step] # remember stop is exclusive

0    1
1    2
2    3
dtype: int64

In [None]:
# last 3 elements
s[-3:]

2    3
3    4
4    5
dtype: int64

#### 2) By using the keys

In [None]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
s

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [None]:
#retrieve a single element
s['a']

1

In [None]:
#retrieve a multiple elements
s[['a','b','c']]

a    1
b    2
c    3
dtype: int64

In [None]:
#if that key is not present it will produce an error
# s['f']

#### to get the indexes

In [None]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

#### to get the values

In [None]:
s.values

array([1, 2, 3, 4, 5], dtype=int64)

## DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns

### Create an Empty DataFrame

In [None]:
# A basic DataFrame, which can be created is an Empty Dataframe.
import pandas as pd
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


### Create a DataFrame from Dict of lists/ndArray
* All the ndarrays must be of same length.
* If index is passed, then the length of the index should equal to the length of the arrays.
* If no index is passed, then by default, index will be range(n), where n is the array length.

In [None]:
# Ex1) without index
name = ['Ashay', 'Himanshu', 'Padam', 'Ravi']
age = [22,23,24,25]

# creating dictionary
data = {'Name':name,'Age':age}
print(data)
# passing dictionary to create df
df = pd.DataFrame(data)
df

{'Name': ['Ashay', 'Himanshu', 'Padam', 'Ravi'], 'Age': [22, 23, 24, 25]}


Unnamed: 0,Name,Age
0,Ashay,22
1,Himanshu,23
2,Padam,24
3,Ravi,25


In [None]:
#Ex2) with index
name = ['Ashay', 'Himanshu', 'Padam', 'Ravi']
age = [22,23,24,25]

# my new indexes
indexes = ['rank1','rank2','rank3','rank4']

#this is my dictionary
data = {'Name':name,'Age':age}

#passing dictionary to create dataframe
df = pd.DataFrame(data, index=indexes)
df

Unnamed: 0,Name,Age
rank1,Ashay,22
rank2,Himanshu,23
rank3,Padam,24
rank4,Ravi,25


### Column Selection

In [None]:
df['Age']

rank1    22
rank2    23
rank3    24
rank4    25
Name: Age, dtype: int64

In [None]:
df.Age

rank1    22
rank2    23
rank3    24
rank4    25
Name: Age, dtype: int64

### Column Addition

In [None]:
# say suppose you want to add city column
city = ['Banglore','Pune','Hyderabad','Mumbai']

#adding new column named 'City'
df["City"] = city
df

Unnamed: 0,Name,Age,City
rank1,Ashay,22,Banglore
rank2,Himanshu,23,Pune
rank3,Padam,24,Hyderabad
rank4,Ravi,25,Mumbai


### Column Deletion

In [None]:
# 1) Using del keyword
del df["City"]
df

Unnamed: 0,Name,Age
rank1,Ashay,22
rank2,Himanshu,23
rank3,Padam,24
rank4,Ravi,25


In [None]:
# 2) Using pop function
df.pop('Age')
df

Unnamed: 0,Name
rank1,Ashay
rank2,Himanshu
rank3,Padam
rank4,Ravi
