# ***Pandas***

## *What is it?*
> pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way towards this goal.

## *Main Features*
Here are just a few of the things that pandas does well:

> - Easy handling of **<u>missing data</u>** (represented as NaN, NA, or NaT) in floating point as well as non-floating point data
> - Size mutability: columns can be **<u>inserted and deleted</u>** from DataFrame and higher dimensional objects
> - Automatic and explicit **<u>data alignment</u>**: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let > - Series, DataFrame, etc. automatically align the data for you in computations
> - Powerful, flexible **<u>group by</u>** functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
> - Make it **<u>easy to convert</u>** ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
> - Intelligent label-based **<u>slicing</u>**, **<u>fancy indexing</u>**, and **<u>subsetting</u>** of large data sets
> - Intuitive **<u>merging</u>** and **<u>joining</u>** data sets
> - Flexible **<u>reshaping</u>** and **<u>pivoting</u>** of data sets
> - **<u>Hierarchical</u>** labeling of axes (possible to have multiple labels per tick)
> - Robust IO tools for loading data from **<u>flat files</u>** (CSV and delimited), **<u>Excel files</u>**, **<u>databases</u>**, and saving/loading data from the ultrafast **<u>HDF5</u>** format
> - **<u>Time series</u>**-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging

## *Mission*
> pandas aims to be the fundamental high-level building block for doing practical, real world **<u>data analysis</u>** in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source **<u>data analysis / manipulation tool</u>** available in any language.

## *Timeline*
> - **2008**: Development of pandas started
> - **2009**: pandas becomes open source
> - **2012**: First edition of Python for Data Analysis is published
> - **2015**: pandas becomes a NumFOCUS sponsored project
> - **2018**: First in-person core developer sprint

## *How to Install?*
> - conda install pandas
> - pip install pandas

## *Intro to data structures*

We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started. The fundamental behavior about data types, indexing, axis labeling, and alignment apply across all of the objects. To get started, import NumPy and load pandas into your namespace:

`import numpy as np`

`import pandas as pd`

## *Series*
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

`s = pd.Series(data, index=index)`

Here, data can be many different things:

1. A Python dict

2. An ndarray

3. A scalar value (like 5)

In [28]:
import pandas as pd
import numpy as np

In [29]:
values = ["Ranjith", "Sumit", "Anurag", "Shubhya", "Nishchal"]

In [30]:
values

['Ranjith', 'Sumit', 'Anurag', 'Shubhya', 'Nishchal']

In [31]:
type(values)

list

In [32]:
sh = pd.Series(data=values, index=[1,2,3,4,5], name='nalla_ppl')

In [33]:
sh

1     Ranjith
2       Sumit
3      Anurag
4     Shubhya
5    Nishchal
Name: nalla_ppl, dtype: object

In [34]:
ht = pd.Series(np.array([5.10, 5.8, 5.4, 3.1, 5.10]), index=[1,2,3,4,5], name='nallas_height')

In [38]:
pd.Series(ht)

1    5.1
2    5.8
3    5.4
4    3.1
5    5.1
Name: nallas_height, dtype: float64

In [43]:
ng = pd.Series({'ranjith': 5.8, 'sumit': 2.9, 'anurag': 3.1, 'shubham': 9.99999999, 'nischal': 5.11}, name='nalla_gang')

In [45]:
ng.values

array([5.8       , 2.9       , 3.1       , 9.99999999, 5.11      ])

In [47]:
ng.index

Index(['ranjith', 'sumit', 'anurag', 'shubham', 'nischal'], dtype='object')

In [48]:
ng.name

'nalla_gang'

In [49]:
from sklearn.datasets import load_iris

In [52]:
load_iris().keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [56]:
df = pd.DataFrame(load_iris()['data'], columns=load_iris()['feature_names'])

In [57]:
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [58]:
df.values

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [59]:
df.columns

Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
       'petal width (cm)'],
      dtype='object')

In [60]:
df.index

RangeIndex(start=0, stop=150, step=1)

In [62]:
df.shape

(150, 4)

In [63]:
df.size

600

In [64]:
df.dtypes

sepal length (cm)    float64
sepal width (cm)     float64
petal length (cm)    float64
petal width (cm)     float64
dtype: object

In [74]:
df.max(axis=0)

sepal length (cm)    7.9
sepal width (cm)     4.4
petal length (cm)    6.9
petal width (cm)     2.5
dtype: float64

In [75]:
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


# **The References**

- https://pandas.pydata.org/
- https://pandas.pydata.org/docs/user_guide/index.html
- https://pypi.org/project/pandas/