Copper wire production line root cause analysis

Context
Data from a real copper wire production line for root cause analysis purposes. This dataset was taken from a copper production line in November 2020. The purpose of the job was to find the root cause of an increasing number of defects in a copper production line.

Content
The data includes the dates and shifts for each inspection, the type of defect, and the production downtime in minutes per type of defect.

License CC BY-SA 4.0

Link to the dataset: https://www.kaggle.com/osroru/copper-wire-production-line-dataset

In [None]:
# import libraries
import numpy as np
import pandas as pd

In [None]:
# import the dataset
df = pd.read_csv("../input/copper-wire-production-line-dataset/Cable-Production-Line-Dataset.csv")

Step 1: Learning the dataset and feature engineering

In [None]:
# show first five rows of the dateset
df.head()

In [None]:
# show statistical information about the dataset
df.info()

In [None]:
# import label encoder for integer encoding
from sklearn.preprocessing import LabelEncoder

In [None]:
# create a dictionary with the mapping of categories to numbers
ordinal_mapping = { k: i for i, k in enumerate(df["Shift"].unique(), 0) }
ordinal_mapping

In [None]:
# transfer of dates to datetime format
import datetime
df["Date"] = pd.to_datetime(df["Date"])

In [None]:
# replace the labels with the integers
df["Shift"] = df["Shift"].map(ordinal_mapping)
# showing statistical properties of all numerical vairiables

In [None]:
# sum cable failures and other failures to all failures, same for downtime
df["All Failures"] = df["Cable Failures"] + df["Other Failures"]
df["All Failure Downtime"] = df["Cable Failure Downtime"] + df["Other Failure Downtime"]

In [None]:
# show first five rows of the dateset
df.head()

In [None]:
# show statistical data of the dataset
df.describe()

In [None]:
# plot Machine variable
ax = df["Machine"].value_counts().sort_index().plot.bar(xlabel="Machine", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
# plot Machine variable in descending order
ax = df["Machine"].value_counts().plot.bar(xlabel="Machine", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
# plot Shift variable
df["Shift"].value_counts().sort_index().plot.bar(xlabel="Shift", ylabel="Frequency", figsize=(3,5));

In [None]:
# plot Operator variable
ax = df["Operator"].value_counts().sort_index().plot.bar(xlabel="Operator", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
# plot Operator variable in descending order
ax = df["Operator"].value_counts().plot.bar(xlabel="Operator", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
df["Date"].min()

In [None]:
df["Date"].max()

In [None]:
# plot Operator variable
ax = df["Date"].value_counts().sort_index().plot.bar(xlabel="Date", ylabel="Frequency", figsize=(10,6), rot=45);

In [None]:
# plot Date variable in descending order
ax = df["Date"].value_counts().plot.bar(xlabel="Date", ylabel="Frequency", figsize=(10,6), rot=45);

In [None]:
# calculate overall cable failures
df["Cable Failures"].sum()

In [None]:
# plot Cable Failures variable
ax = df["Cable Failures"].value_counts().sort_index().plot.bar(
    xlabel="Cable Failures", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
# plot Cable Failures variable in descending order
ax = df["Cable Failures"].value_counts().plot.bar(xlabel="Cable Failures", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
# calculate overall cable failure downtime
df["Cable Failure Downtime"].sum()

In [None]:
# plot Cable Failure Downtime variable
ax = df["Cable Failure Downtime"].value_counts().sort_index().plot.bar(
    xlabel="Cable Failure Downtime", ylabel="Frequency", figsize=(15,6), rot=0);

In [None]:
# plot Cable Failure Downtime variable in descending order
ax = df["Cable Failure Downtime"].value_counts().plot.bar(
    xlabel="Cable Failure Downtime", ylabel="Frequency", figsize=(15,6), rot=0);

In [None]:
# calculate overall other failures
df["Other Failures"].sum()

In [None]:
# plot Other Failures variable
ax = df["Other Failures"].value_counts().sort_index().plot.bar(
    xlabel="Other Failures", ylabel="Frequency", figsize=(4,6), rot=0);

In [None]:
# calculate overall other failure downtime
df["Other Failure Downtime"].sum()

In [None]:
# plot Other Failure Downtime variable
ax = df["Other Failure Downtime"].value_counts().sort_index().plot.bar(
    xlabel="Other Failure Downtime", ylabel="Frequency", figsize=(15,6), rot=0);

In [None]:
# plot Other Failure Downtime variable in descending order
ax = df["Other Failure Downtime"].value_counts().plot.bar(
    xlabel="Other Failure Downtime", ylabel="Frequency", figsize=(15,6), rot=0);

In [None]:
# plot All Failures variable
ax = df["All Failures"].value_counts().sort_index().plot.bar(
    xlabel="All Failures", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
# plot All Failures variable in descending order
ax = df["All Failures"].value_counts().plot.bar(xlabel="All Failures", ylabel="Frequency", figsize=(10,6), rot=0);

In [None]:
# plot Other Failure Downtime variable
ax = df["All Failure Downtime"].value_counts().sort_index().plot.bar(
    xlabel="All Failure Downtime", ylabel="Frequency", figsize=(15,6), rot=0);

In [None]:
# plot All Failure Downtime variable in descending order
ax = df["All Failure Downtime"].value_counts().plot.bar(
    xlabel="All Failure Downtime", ylabel="Frequency", figsize=(15,6), rot=0);

Understanding correlation between variables with heatmap

In [None]:
# calculate a correlation matrix
corr_matrix = df.corr()
print(corr_matrix)

In [None]:
# import libraries for heatmap building 
import seaborn as sn
import matplotlib.pyplot as plt

In [None]:
# plota a heatmap
sn.heatmap(corr_matrix, cmap='Blues', annot=True)
plt.show()

Step 2: Data analysis

Exlpore dependence of cable failures & its downtime on machine

In [None]:
# make dataframe for the machine exploration from a dictionary
list_machine = np.arange(1,18).tolist() # all the machines listed
list_zeros = [0]*17 # list of zeroes
dict_machine = {"Machine":list_machine,
                "Cable Failures Sum": list_zeros, 
                "Cable Failure Downtime Sum": list_zeros} # the dictionary
df_machine = pd.DataFrame(data=dict_machine).set_index("Machine") # make a dataframe from the dictionary 
df_machine.head(3)

In [None]:
# calculate and fill up total cable failures and cable failure downtime for all the machines
for index, row in df.iterrows():
    #print(row["Machine"], row["Cable Failures"], row["Cable Failure Downtime"])
    df_machine.iat[row["Machine"]-1,0] += row["Cable Failures"]
    df_machine.iat[row["Machine"]-1,1] += row["Cable Failure Downtime"]
df_machine

In [None]:
# plot Cable Failures Sum variable in descending order
ax = df_machine["Cable Failures Sum"].sort_values(ascending=False).plot.pie(
    xlabel="Machine", ylabel="Cable Failures Sum vs Machine", figsize=(12,12), rot=0);

In [None]:
# plot Cable Failure Downtime Sum variable in descending order
ax = df_machine["Cable Failure Downtime Sum"].sort_values(ascending=False).plot.pie(
    xlabel="Machine", ylabel="Cable Failure Downtime Sum vs Machine", figsize=(12,12), rot=0);

In [None]:
# make dataframe for the operator exploration from a dictionary
list_operator = np.arange(1,33).tolist() # all the machines listed
list_zeros = [0]*32 # list of zeroes
dict_operator = {"Operator":list_operator,
                "Cable Failures Sum": list_zeros, 
                "Cable Failure Downtime Sum": list_zeros} # the dictionary
df_operator = pd.DataFrame(data=dict_operator).set_index("Operator") # make a dataframe from the dictionary 
df_operator.head(3)

In [None]:
# calculate and fill up total cable failures and cable failure downtime for all the operators
for index, row in df.iterrows():
    df_operator.iat[row["Operator"]-1,0] += row["Cable Failures"]
    df_operator.iat[row["Operator"]-1,1] += row["Cable Failure Downtime"]
df_operator

In [None]:
# plot Cable Failures Sum variable in descending order
ax = df_operator["Cable Failures Sum"].sort_values(ascending=False).plot.pie(
    xlabel="Machine", ylabel="Cable Failures Sum vs Operator", figsize=(12,12), rot=0);

In [None]:
# plot Cable FailureDowntime Sum variable in descending order
ax = df_operator["Cable Failure Downtime Sum"].sort_values(ascending=False).plot.pie(
    xlabel="Machine", ylabel="Cable Failure Downtime Sum vs Operator", figsize=(12,12), rot=0);

In [None]:
# make dataframe for the shift exploration from a dictionary
list_shift = np.arange(0,2).tolist() # all the machines listed
list_zeros = [0]*2 # list of zeroes
dict_shift = {"Shift":list_shift,
                "Cable Failures Sum": list_zeros, 
                "Cable Failure Downtime Sum": list_zeros} # the dictionary
df_shift = pd.DataFrame(data=dict_shift).set_index("Shift") # make a dataframe from the dictionary 
df_shift

In [None]:
# calculate and fill up total cable failures and cable failure downtime for all the operators
for index, row in df.iterrows():
    df_shift.iat[row["Shift"]-1,0] += row["Cable Failures"]
    df_shift.iat[row["Shift"]-1,1] += row["Cable Failure Downtime"]
df_shift

In [None]:
# plot Cable Failures Sum variable in descending order
ax = df_shift["Cable Failures Sum"].sort_values(ascending=False).plot.pie(
    xlabel="Shift", ylabel="Cable Failures Sum vs Operator", figsize=(5,5), rot=0);

In [None]:
# plot Cable Failure Downtime Sum variable in descending order
ax = df_shift["Cable Failure Downtime Sum"].sort_values(ascending=False).plot.pie(
    xlabel="Shift", ylabel="Cable Failures Sum vs Operator", figsize=(5,5), rot=0);

In [None]:
df.head()

Conclusions:

Machines:
- 50% of cable failures happenned on machines 2, 8 & 7.
- 50% of cable failure downtime happenned on machines 2, 8 & 1.

Operators
- 50% of cable failures happenned with operators 3, 14, 1, 7, 9 & 13.
- 50% of cable failure downtime happenned with operators 14, 13, 1, 9, 15 & 31.

Shifts:
- 33% more cable failures happenned during shift B.
- 18% more cable failure downtime happenned during shift B.

Recommendation:

Open operators 3, 14, 1, 7, 9 & 13 schedules and look if they had mostly worked on 2, 8 & 7 or no.