# Introduction to Pandas

Pandas is a package built on top of NumPy, and provides an implementation of a DataFrame. DataFrames are essentially two dimensional arrays with attached row and column labels. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations for data wrangling.

In [None]:
import pandas as pd

## The DataFrame Object

In [None]:
df = pd.DataFrame({
    'ClaimNumber': [1001, 1002, 1003, 1004, 1005],
    'PaidAmount': [8000, 500, 2000, 1000, 0],
    'CaseReserve': [500, 2000, 1000, 0, 22000],
    'ClaimType' : ['PIP', 'Liab', 'Liab', 'Liab','PIP']
})

# show top rows
df.head()

## Loading Data into Python

Pandas make it easy and fast to load data into Python from a variety of formats.

In [None]:
file_url = "https://raw.githubusercontent.com/gpwa-com/WSIA2019/master/inputs/comauto_pos.csv"
claims = pd.read_csv(file_url)
claims.head()

Pandas includes other functions to load data from other popular sources such as `read_excel` and `read_sql`.

## Data summary, statistics

In [None]:
# Get basic summary statistics of your data
claims.describe()

In [None]:
# Get number of records by value
claims["GRNAME"].value_counts()

## Groupby Operations

In [None]:
claims.groupby("DevelopmentLag")["CumPaidLoss_C"].sum()

## Pivot Tables

In [None]:
claims.pivot_table(values="CumPaidLoss_C", index="AccidentYear", columns="DevelopmentYear")

## Filtering

In [None]:
claims[claims["GRNAME"].str.contains("Farmers")]["GRNAME"].unique()

## Visualization

In [None]:
%matplotlib inline
claims.groupby("DevelopmentLag")["CumPaidLoss_C"].sum().hist()