[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gdsaxton/GDAN5400/blob/main/Week%206%20Notebooks/GDAN%205400%20-%20Week%206%20Notebooks%20%28VII%29%20-%20Pie%20Graphs.ipynb)

This notebook provides a mini-tutorial on understanding and using pie charts in Python 

In [None]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')

# Load Packages and Set Working Directory
Import several necessary Python packages. We will be using the <a href="http://pandas.pydata.org/">Python Data Analysis Library,</a> or <i>PANDAS</i>, extensively for our data manipulations in this and future tutorials.

In [None]:
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import Series

<br>
PANDAS allows you to set various options for, among other things, inspecting the data. I like to be able to see all of the columns. Therefore, I typically include this line at the top of all my notebooks.

In [None]:
#http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', 250)
pd.set_option('display.max_info_columns', 500)

# Read in Data

In [None]:
import requests

# NOTE: replace `https://github.com/` with `https://raw.githubusercontent.com`
# https://github.com/gdsaxton/GDAN5400/blob/main/Coding%20Assignment%201/final_insurance_fraud.xlsx
url = 'https://raw.githubusercontent.com/gdsaxton/GDAN5400/main/Coding%20Assignment%203/final_insurance_fraud.xlsx'

# Download the file
response = requests.get(url)
with open('final_insurance_fraud.xlsx', 'wb') as f:
    f.write(response.content)

# Load the Excel file
df = pd.read_excel('final_insurance_fraud.xlsx', engine='openpyxl')

print('# of rows:', len(df), '\n')

df[:2]

# **Mini-Tutorial: Understanding Pie Charts**

## **What is a Pie Chart?**
A **pie chart** is a circular statistical graphic divided into slices, where each slice represents a **proportion** of a whole. It is often used to show **percentage distributions** across different categories.

## **Why Use Pie Charts in Data Analytics?**
- **Easily communicate proportions** – Helps visualize how different parts contribute to a whole.
- **Compare category sizes at a glance** – Shows which categories dominate and which are smaller.
- **Ideal for percentage-based insights** – Useful when working with data where the sum of all categories equals 100%.

## **When to Use Pie Charts in Insurance & Accounting Analytics**
Pie charts are particularly useful in **insurance and financial analytics** to:
- **Visualize the distribution of claim types** – e.g., hail damage vs. wind damage vs. water damage.
- **Show the proportion of paid vs. unpaid invoices** – Helps identify outstanding receivables.
- **Analyze customer segmentation** – See what percentage of policies belong to different risk levels.
- **Break down expenses or revenue sources** – e.g., distribution of different cost categories in an insurance claim.

## **Key Features of Pie Charts**
1. **Each slice represents a proportion**  
   - The total sum of the slices equals **100%**.
   - Each category's size is determined by its relative frequency or total value.

2. **Best for Small Category Counts**  
   - Works well when **there are only a few categories** (e.g., 3-6).
   - Not ideal for **datasets with too many categories**, as slices can become difficult to distinguish.

3. **Useful for Quick Comparisons**  
   - Helps identify **dominant categories** at a glance.
   - Makes it easy to communicate **relative importance** of different groups.

By using **pie charts correctly**, analysts can quickly understand category proportions and communicate key insights visually.


## **Example 1: Pie Chart of Claims by Adjuster**
#### **Purpose**
This pie chart visualizes the **distribution of claims assigned to different adjusters**, helping to analyze workload distribution.

In [None]:
claims_by_adjuster = df['Adjuster'].value_counts()

# Create a pie chart
plt.figure(figsize=(8, 8))
plt.pie(claims_by_adjuster, labels=claims_by_adjuster.index, autopct='%1.1f%%', colors=['skyblue', 'lightcoral', 'gold', 'lightgreen'])
plt.title("Proportion of Claims by Adjuster")
plt.show()

#### Key Insight
- Adjusters with significantly higher claim volumes may be handling more complex cases or working in regions with higher insurance claims.

## Example 2: Pie Chart of Claim Types
**Purpose** - This pie chart helps visualize **the proportion of different claim types in the dataset**, allowing insurers to see which types of damages occur most frequently.

In [None]:
# Count the number of claims by type
claim_counts = df['How did the loss or damage happen?'].value_counts()

# Create the pie chart
plt.figure(figsize=(8,8))
plt.pie(claim_counts, labels=claim_counts.index, autopct='%1.1f%%', colors=['skyblue', 'lightcoral', 'gold', 'lightgreen', 'orange'], startangle=140)
plt.title("Distribution of Claim Types")
plt.show()

<br>There is **no variation!** This is a good example of the importance of getting to know your data. 

## **Conclusions on Pie Charts**
Pie charts are a simple yet effective way to visualize **proportions** in a dataset. They help in **quickly understanding category distributions** and are especially useful in **insurance and accounting analytics** to:
- **Highlight dominant categories** – Easily see which claim types, payment statuses, or financial categories take up the largest share.
- **Compare relative proportions** – Useful for understanding how different elements contribute to the whole (e.g., paid vs. unpaid claims).
- **Communicate insights effectively** – Provides an intuitive way for stakeholders to grasp key data points at a glance.

### **When to Use Pie Charts**
- Best for datasets with **a few distinct categories** (3-6).
- Works well when analyzing **percentage-based distributions**.
- Useful when **visualizing parts of a whole**, such as revenue sources, expense categories, or claim types.

### **Limitations of Pie Charts**
- **Not effective for large numbers of categories** – Too many slices make it difficult to interpret.
- **Difficult to compare precise values** – A bar chart may be better for detailed comparisons.
- **Cannot show trends over time** – A line or bar chart would be more appropriate.

By using pie charts appropriately, analysts can effectively **communicate key distributions** and **help stakeholders make informed decisions** based on visualized proportions.
