# Introduction to Pandas

Pandas is a package built on top of NumPy, and provides an implementation of a DataFrame. DataFrames are essentially multidimensional arrays with attached row and column labels. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations for data wrangling.

In [1]:
import pandas as pd

### Creating a DataFrame

In [2]:
claims = pd.DataFrame({
    'ClaimNumber': [1001, 1002, 1003, 1004, 1005],
    'PaidAmount': [8000, 500, 2000, 1000, 0],
    'CaseReserve': [500, 2000, 1000, 0, 22000],
    'ClaimType' : ['PIP', 'Liab', 'Liab', 'Liab','PIP']
})

# show top rows
claims.head()

Unnamed: 0,ClaimNumber,PaidAmount,CaseReserve,ClaimType
0,1001,8000,500,PIP
1,1002,500,2000,Liab
2,1003,2000,1000,Liab
3,1004,1000,0,Liab
4,1005,0,22000,PIP


### Loading Data

In [None]:
pd.read_csv("../inputs/claims.csv")

### Data Selection

In [3]:
# select by name
claims['PaidAmount']

0    8000
1     500
2    2000
3    1000
4       0
Name: PaidAmount, dtype: int64

In [4]:
# select by index
claims.iloc[:, -1]

0     PIP
1    Liab
2    Liab
3    Liab
4     PIP
Name: ClaimType, dtype: object

In [5]:
claims[claims["ClaimType"] == "PIP"]

Unnamed: 0,ClaimNumber,PaidAmount,CaseReserve,ClaimType
0,1001,8000,500,PIP
4,1005,0,22000,PIP


### Creating New Data

In [6]:
claims['IncurredAmount'] = claims['PaidAmount'] + claims['CaseReserve']
claims.head()

Unnamed: 0,ClaimNumber,PaidAmount,CaseReserve,ClaimType,IncurredAmount
0,1001,8000,500,PIP,8500
1,1002,500,2000,Liab,2500
2,1003,2000,1000,Liab,3000
3,1004,1000,0,Liab,1000
4,1005,0,22000,PIP,22000


### Aggregation

In [7]:
claims['PaidAmount'].sum()

11500

In [8]:
claims['PaidAmount'].count()

5

In [9]:
claims['PaidAmount'].mean()

2300.0