# Introduction to Pandas Library
- Pandas is an open source library in python which is know for its rich applications and utilities for all kinds of mathematical, financial and statistical functions
- It is useful in data manipulation and analysis
- It provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data



#### Installing pandas

In [None]:
!pip install pandas

#### Importing pandas

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Dataframe

A DataFrame is two dimensional data structure where the data is arranged in the tabular format in rows and columns

#### DataFrame features:

- Columns can be of different data types
- Size of dataframe can be changes
- Axes(rows and columns) are labeled
- Arithmetic operations can be performed on rows and columns

### Reading data from External Data Sources

#### Excel

#### Sharepoint

In [None]:
!pip install Office365-REST-Python-Client pandas

In [None]:
from office365.sharepoint.client_context import ClientContext
from office365.runtime.auth.authentication_context import AuthenticationContext
import pandas as pd
import io

# SharePoint site URL
sharepoint_url = "https://knowledgecornerin.sharepoint.com/sites/mylearnings/"
file_relative_url = "/sites/mylearnings/Documents/Invoices.xlsx"


# Authentication
username = "vaidehi.nair@knowledgecorner.in"
password = "password"

ctx_auth = AuthenticationContext(sharepoint_url)
if ctx_auth.acquire_token_for_user(username, password):
    ctx = ClientContext(sharepoint_url, ctx_auth)
    with io.BytesIO() as file:
        file = ctx.web.get_file_by_server_relative_url(file_relative_url).download(file).execute_query()
        file.seek(0)
        df = pd.read_excel(file)  # Read into pandas dataframe
        print(df.head())  # Display first few rows
        print("success")
else:
    print("Authentication failed!")

### Examples using Coffee Shop Dataset

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

###### Ex. Read data from `coffee_sales.csv`

#### Drop a column or row from dataframe

#### Working with **null** values

`df.isna()` - Detect missing values. Return a boolean same-sized object indicating if the values are NA.

`df.fillna(value=None, inplace=False)` - Fill NA/NaN values using the specified method.

#### Drop null rows
df.dropna(`axis = 0`, `how = "any"`, `inplace = False`)
- axis 0 for row or 1 for column
- how - {any or all}

#### Renaming Columns

###### Rename Columns (column 5 - 8 are not accessible)

In [None]:
headers = ["ShopID", "Year/Month", "Product", "Product Type", "State", "Target Profit", "Target Sales", "Profit", "Sales"]


#### Rename Single Column

#### Understanding Data in Dataframe

- `df.shape` - gives the size of the dataframe in the format (row_count x column_count)
- `df.dtypes` - returns a Series with the data type of each column
- `df.info()` - prints information about a DataFrame including the index dtype and columns, non-null values and memory usage
- `df.head()` - prints the first 5 rows of you dataset including column header and the content of each row
- `df.tail()` - prints the last 5 rows of you dataset including column header and the content of each row

In [None]:
df_coffee.shape

In [None]:
df_coffee.dtypes

In [None]:
df_coffee.info()

In [None]:
df_coffee.head()

In [None]:
df_coffee.head(3)

In [None]:
df_coffee.tail()

In [None]:
df_coffee.tail(3)

###### Ex. Converting Sales and Profits columns to float types

#### Removing Duplicate Data

#### Replacing values

df.replace(old_value, new_value, inplace=True)

#### Adding a new Column by calculation

###### Ex. Create columns showing `Sales` and `Profit` targets achieved

###### Ex. Count the number times Targets are achieved

###### Ex. Create a bar chart to view Target Status

#### Insert a column in between
df.insert(`index`, `column_name`, `default_value`)

###### Create columns Year and Month - extract data using pd.DatetimeIndex

#### Grouping Dataframes

##### `df.groupby(by=None, as_index=True, sort=True, dropna=True)`

###### Ex. Find product wise total Sales - bar chart

###### Ex. Extract Monthly Sales and Profit

###### Ex. Trend vs Sesonality

###### Ex. Analyse Growth over years