## EDA Group 10

## Table of Contents
1. [Introduction](#Introduction)
2. [Description of the Data](#Description-of-the-Data)
3. [Discussion of Missing Data](#Discussion-of-Missing-Data)
4. [Exploratory Visualizations and/or Summary Tables](#Exploratory-Visualizations-and-Summary-Tables)
5. [Results](#Results)
6. [Professional Use of Notebook](#Professional-Use-of-Notebook)


## Introduction <a id="Introduction"></a>

Our goal for this project is to provide a solution to Swire Coca-Cola that enables them to reduce unplanned maintenance events and improve productivity. The target variable we are focusing on, at this point in the project, is whether an event is planned or unplanned. Once we gain a deeper understanding, we will develop a model that helps Swire Coca-Cola predict upcoming events and implement protocols to avoid unplanned maintenance.

The purpose of this notebook is to document the exploratory phase of the analytics process. In this initial analysis, we aim to better understand the relationships between variables, which will guide our model selection and address the objectives outlined in the business problem. During this EDA, we will also examine the structure of the data, identify outliers, and visualize our findings.

The first step is to examine the relationship between equipment ID and maintenance activity type to determine if certain components are more likely to fail unexpectedly. This will help guide our further analysis as we focus on specific equipment IDs with a higher likelihood of unplanned failures.

The second question we want to explore is whether older machines are more likely to experience unplanned maintenance. By analyzing the EQUIP_START_UP_DATE field, we can investigate if there is a correlation between a machine’s age and the frequency of unplanned maintenance events. Specifically, we aim to determine if older equipment tends to break down more often. If this is the case, implementing more frequent planned maintenance for older machines could reduce downtime and ultimately save the company money.  

Another question we aim to explore is whether certain plant IDs are more prone to unplanned maintenance activities. By analyzing this, we may be able to identify problem plants and tailor our model to address the issues of a specific plant, rather than making it broad enough to handle problems across all plants.

## Description of the Data <a id="Description-of-the-Data"></a>

The data provided for this project comes from Swire Coca-Cola's internal system, IWC, which tracks equipment breakdowns. The dataset contains maintenance records with key features such as plant ID, equipment ID, execution date, work duration, and type of maintenance. It is a large dataset with over 1.4 million records. The case description notes that the "actual working minutes" is the most reliable variable for determining the time spent on repairs, as the "actual start time" and "actual end time" can be inaccurate in certain situations.

## Discussion of Missing Data <a id="Discussion-of-Missing-Data"></a>
* Scope of missing data and proposed solutions.

## Exploratory Visualizations and/or Summary Tables <a id="Exploratory-Visualizations-and-Summary-Tables"></a>
* Visualizations and summary tables (label axes, titles, etc.).

In [8]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv('IWC_Work_Orders_Extract.csv')

# Drop rows with missing values in relevant columns, if any
df = df.dropna(subset=['MAINTENANCE_ACTIVITY_TYPE', 'PLANT_ID'])

# Group by PLANT_ID and MAINTENANCE_ACTIVITY_TYPE and count occurrences
maintenance_count = df.groupby(['PLANT_ID', 'MAINTENANCE_ACTIVITY_TYPE']).size().unstack()

# Plot the grouped data
maintenance_count.plot(kind='bar', stacked=True, figsize=(10, 7))

# Labeling the axes and chart
plt.title('Maintenance Activity Type by Plant')
plt.xlabel('Plant ID')
plt.ylabel('Number of Maintenance Activities')
plt.xticks(rotation=45)
plt.legend(title='Maintenance Activity Type')
plt.tight_layout()

# Show the plot
plt.show()


FileNotFoundError: [Errno 2] No such file or directory: 'IWC_Work_Orders_Extract.csv'

## Results <a id="Results"></a>
* Summarize findings and insights.