### Pandas

- Pandas is a python library used for **data analysis**
- Pandas uses **DataFrame** that built on Numpy for data analysis, manipulation and process efficiently
- Can be used to read and write data between wide varieties of data format such as `csv`, `xlsx`, `json`, `sql`, `parquet`, `feather` and so on
- Helps to read data efficiently using **indexing, slicing** and so on
- Adjust and restructure data
- Helps to handle **missing** data

### Pandas - Section Overview

- Series and DataFrame
- Reading data from files
- Handling missing data
- Conditional Flitering
- Group By operations
- Combining dataframes
- Text methods and time methods
- Inputs and Outputs data using Pandas

In [1]:
import pandas as pd
import numpy as np

#### Series
- A series is a data structure in Pandas that holds an array of information along with named/labelled index
- Formal defn: **One-dimensional** ndarray with label/named index

In [2]:
# Creating Pandas Series
narr = np.arange(8)

In [3]:
narr

array([0, 1, 2, 3, 4, 5, 6, 7])

In [4]:
pd.Series(data=narr)

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
dtype: int64

In [7]:
cities = ['Varanasi', 'Pune', 'Mumbai', 'Delhi']
temperature = [18, 24, 28, 16]

In [12]:
city_se = pd.Series(data=temperature, index=cities)

In [13]:
city_se

Varanasi    18
Pune        24
Mumbai      28
Delhi       16
dtype: int64

In [9]:
# we can also use Python dictionary to create Series object
city_temp = {"Jaipur":28, "Pune":23, "Mumbai": 29, "Delhi": 14}

In [10]:
city_seone = pd.Series(data=city_temp)

In [11]:
city_seone

Jaipur    28
Pune      23
Mumbai    29
Delhi     14
dtype: int64

In [14]:
# fetching data 
city_se['Pune']

np.int64(24)

In [15]:
city_se['Pune':"Delhi"]

Pune      24
Mumbai    28
Delhi     16
dtype: int64

In [16]:
city_se.shape

(4,)

In [17]:
# adding numbers to series
city_se + 2

Varanasi    20
Pune        26
Mumbai      30
Delhi       18
dtype: int64

In [19]:
city_se + city_seone

Delhi       30.0
Jaipur       NaN
Mumbai      57.0
Pune        47.0
Varanasi     NaN
dtype: float64

In [21]:
# We are getting NaN for Varanasi and Jaipur as these keys are not common in both the Series
# so when we are adding 2 series make sure we have common index
# but there is a way we can avoid getting NaN by replacing with 0
city_se.add(other=city_seone, fill_value=0)

Delhi       30.0
Jaipur      28.0
Mumbai      57.0
Pune        47.0
Varanasi    18.0
dtype: float64

In [23]:
np.concat((city_se, city_seone))

array([18, 24, 28, 16, 28, 23, 29, 14])