# Food Claims Process
## Data Analyst Associate Practical Exam Submission by Jessica Bohannon



![ ](food_cover.png)

## Company Background

Vivendo is a fast food chain in Brazil with over 200 outlets. Customers often file claims for compensation from the company due to food poisoning. These claims are processed by the legal team, which has offices in four locations. The legal team wants to improve how long it takes to reply to customers and close claims. The head of the legal department has requested a report on how each location differs in the time it takes to close claims.

## Customer Question

The legal team wants to understand how the number of claims and average time to close claims differ across locations. Specifically, they would like to know:
- How does the number of claims differ across locations?
- What is the distribution of time to close claims?
- How does the average time to close claims differ by location?

## Dataset

The dataset contains one row for each claim. The dataset can be downloaded from [here](https://s3.amazonaws.com/talent-assets.datacamp.com/food_claims_2212.csv). The dataset needs to be validated and cleaned based on the description in the table below:

![ ](foodclaimstable.png)

## Data Validation and Cleaning



The original dataset has 2000 rows and 8 columns. While there are no duplicates, there are some missing values and inconsistencies that need to be cleaned.

I inspected and cleaned each column to ensure they match the description in the table above as follows:

> `claim_id`: Values are as expected, integers with a range from 1 to 2000. There are no missing values and no cleaning was needed.
>
> `time_to_close`: Values are as expected, integers with a range from 76 to 518 indicating number of days. There are no missing values and no cleaning was needed.
>
> `claim_amount`: Unwanted currency characters “R$ ” appear before the amount. I removed these characters and converted the values to numeric with 2 decimal places. I then checked the range, 1637.94 to 76106.80, and confirmed there were no missing values. 
>
> `amount_paid`: Values are numeric and range from 1516.72 to 52498.75, but decimal places vary from 0 to 2, which I corrected to 2 decimal places for consistency. There were 36 missing values in the column, which I replaced with the overall median amount for the field.
>
> `location`: Values are as expected, characters indicating the location of the claim, one of “RECIFE”, “SAO LUIS”, "FORTALEZA", or "NATAL". There are no missing values and no cleaning was needed.
>
> `individuals_on_claim`: Values are as expected, integers with a range from 1 to 15. There are no missing values and no cleaning was needed.
>
> `linked_cases`: Values are as expected, “TRUE” or “FALSE”, other than 26 missing values, which I replaced with “FALSE”.
>
> `cause`: There were 16 rows with the value “VEGETABLES”, which I corrected to “vegetable”. All other cells correctly recorded “vegetable”, “meat”, or “unknown”. There were no missing values.


## How does the number of claims differ across locations?

The legal team responsible for processing claims has offices in Recife, Sao Luis, Fortaleza, and Natal. Recife has processed the most claims, making it the most productive office by a significant margin. Sao Luis processed claims slightly above the average claim amount of 500, whereas Fortaleza and Natal processed the lowest number of claims.
![ ](claims_per_location.png)

## What is the distribution of time to close claims?

According to our analysis, most claims are resolved within 175-199 days, with an average resolution time of 111 days. Although some outlier claims take longer than usual to process, these cases are rare. To identify the cause of the extended closure time and further improve processing time, additional analysis should be conducted on these claims. Investigating these outliers can help develop targeted strategies to streamline the claims process and reduce resolution times overall.

![ ](distribution.png)

## How does the average time to close claims differ by location?

In terms of the average time to close claims by location, Recife was the fastest, followed by Fortaleza, then Natal and Sao Luis. While there was a slight variation of fewer than three days, the average time was similar for each location.

![ ](close_bar.png)

Analyzing the spread of time to close claims for each location revealed that Recife's median, upper, and lower quartiles of time to close were all less than or equal to those of the other locations.

Also note that Sao Luis had the widest range of time to close. Two extreme outliers, claims 827 and 378, had high claim amounts with many individuals on the claims, which likely contributed to the longer times to close.


![ ](close_box.png)


When focusing solely on cases that were closed within a year to exclude rare cases, Recife continues to exhibit superior performance. Recife's median, upper, and lower quartiles remain less than or equal to that of the other locations. 

![ ](close_box_year.png)

## Recommendations

The Recife legal office location warrants further analysis based on its exceptional performance in resolving the highest number of cases and achieving the fastest time to close compared to other locations. Examining the practices and team structure at this location could provide valuable insights for enhancing the productivity and efficiency of other locations. 

Additionally, it would be worthwhile to investigate cases with longer times to close in order to identify any potential correlations with factors such as the number of claimants or claim amount. By investigating these outliers, targeted strategies can be developed to optimize the claims process and decrease overall resolution times.