# TSA Claims Analysis

## by Justin Sierchio

In this analysis, we will be looking at claims made against the US Transportation Security Administration (TSA) from 2002 to 2015. Ideally, we would like to be able to answer the following questions:

<ul>
    <li>Which airports have the most complaints?</li>
    <li>What are the most likely claims made</li>
    <li>Are there certain times of the year where more incidents are likely to occur?</li>
</ul>

This data is in .csv file format and is from Kaggle at: https://www.kaggle.com/terminal-security-agency/tsa-claims-database/download. More information related to the dataset can be found at: https://www.kaggle.com/terminal-security-agency/tsa-claims-database.

## Notebook Initialization

In [1]:
# Import Relevant Libraries
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt

print('Initial libraries loaded into workspace!')

Initial libraries loaded into workspace!


In [3]:
# Upload Datasets for Study
df_TSA = pd.read_csv("tsa_claims.csv");

print('Datasets uploaded!');

Datasets uploaded!


In [4]:
# Display 1st 5 rows from TSA Claims dataset
df_TSA.head()

Unnamed: 0,Claim Number,Date Received,Incident Date,Airport Code,Airport Name,Airline Name,Claim Type,Claim Site,Item,Claim Amount,Status,Close Amount,Disposition
0,0909802M,4-Jan-02,12/12/2002 0:00,EWR,Newark International Airport,Continental Airlines,Property Damage,Checkpoint,Other,$350.00,Approved,$350.00,Approve in Full
1,0202417M,2-Feb-02,1/16/2004 0:00,SEA,Seattle-Tacoma International,,Property Damage,Checked Baggage,Luggage (all types including footlockers),$100.00,Settled,$50.00,Settle
2,0202445M,4-Feb-02,11/26/2003 0:00,STL,Lambert St. Louis International,American Airlines,Property Damage,Checked Baggage,Cell Phones,$278.88,Settled,$227.92,Settle
3,0909816M,7-Feb-02,1/6/2003 0:00,MIA,Miami International Airport,American Airlines,Property Damage,Checkpoint,Luggage (all types including footlockers),$50.00,Approved,$50.00,Approve in Full
4,2005032379513,18-Feb-02,2/5/2005 0:00,MCO,Orlando International Airport,Delta (Song),Property Damage,Checkpoint,Baby - Strollers; car seats; playpen; etc.,$84.79,Approved,$84.79,Approve in Full


Let's describe what each of the columns in this dataset mean.

<ul>
    <li>Claim Number: Self-Explanatory</li>
    <li>Date Received: DD-Mon-YY</li>
    <li>Incident Date: MM/DD/YYYY</li>
    <li>Airport Code: 3-Letter code from FAA for each US domestic airport</li>
    <li>Airport Name: Full name of US domestic airport</li>
    <li>Airline Name: Name of Airline</li>
    <li>Claim Type: Self-Explanatory</li>
    <li>Claim Site: Location where claim is made</li>
    <li>Item: The type of item for which a claim was made</li>
    <li>Claim Amount: The dollar amount for the claim</li>
    <li>Claim Status: Adjudication of Claim</li>
    <li>Close Amount: Dollar amount for resolving claim</li>
    <li>Disposition: Final result of claim</li>
</ul>

## Data Cleaning

First, let's get a sense of the quality of the dataset.

In [5]:
# Characteristics of the TSA Claims dataset
df_TSA.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 204267 entries, 0 to 204266
Data columns (total 13 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Claim Number   204267 non-null  object
 1   Date Received  204004 non-null  object
 2   Incident Date  202084 non-null  object
 3   Airport Code   195743 non-null  object
 4   Airport Name   195743 non-null  object
 5   Airline Name   169893 non-null  object
 6   Claim Type     196354 non-null  object
 7   Claim Site     203527 non-null  object
 8   Item           200301 non-null  object
 9   Claim Amount   200224 non-null  object
 10  Status         204262 non-null  object
 11  Close Amount   135315 non-null  object
 12  Disposition    131359 non-null  object
dtypes: object(13)
memory usage: 20.3+ MB
