### **Problem 2: Production Line Efficiency and Outlier Detection**

Management suspects a few lines have underperforming streaks. Clean and investigate the following:

**Your tasks:**

1. Group data by `Production Line` and calculate total weight and average scrap rate.
2. Detect lines with outlier scrap rates: more than 2 standard deviations from the mean.
3. Extract lot number prefix (first 3 characters) and use it as a batch series indicator.
4. Create a column `Normalized Scrap (%)` by standardizing values (z-score).
5. Flag all batches whose normalized scrap is > 2.
6. Identify the line with most flagged batches.
7. Output a summary table of lines with average weight, scrap, and failure count.

*Hint: Use `std()` and `mean()` to standardize scrap rate.*


In [118]:
import pandas as pd
import numpy as np
import re

In [119]:
data = pd.read_csv('Spool_Manufacturing_Batch_Log.csv')

In [120]:
df = pd.DataFrame(data)

In [121]:
df.head(3)

Unnamed: 0,Batch ID,Date Produced,Material Type,Color,Production Line,Weight (g),Scrap Rate (%),Pass/Fail,Operator,Phone,Email,Shift,Machine Barcode,Lot Number
0,eb6221c8-f45a-49f6-8c0c-ee28f5a29fc0,2025-05-01,PLA,Black,Line 2,1024.84,1.79,Pass,Jacqueline Bass,001-988-061-3911x7775,haynesdavid@yahoo.com,Shift C,MCH-001,L9935
1,9748d109-45e1-4bb0-98af-53396946b791,2025-05-01,PLA,Red,Line 1,1032.38,4.28,Pass,Kristen Cole,300-905-2906x4997,theodore63@yahoo.com,Shift A,MCH-001,L4257
2,35de154c-67d6-4144-a9e2-8afe65353fb2,2025-05-01,ABS,Blue,Line 4,988.29,1.65,Pass,Sherry Bryant,001-741-699-1830x254,timothy04@knox.net,Shift C,MCH-001,L3615


In [122]:
# 1. Group data by `Production Line` and calculate total weight and average scrap rate.
weight_total_by_line = df.groupby('Production Line')['Weight (g)'].sum()

In [123]:
scrap_rate_by_line = df.groupby('Production Line')['Scrap Rate (%)'].mean()

In [124]:
summary = pd.DataFrame(
    {
        'Total Weight (g)': weight_total_by_line,
        'Average Scrap Rate (%)': scrap_rate_by_line
    }
).sort_values('Average Scrap Rate (%)', ascending=False)

In [125]:
summary

Unnamed: 0_level_0,Total Weight (g),Average Scrap Rate (%)
Production Line,Unnamed: 1_level_1,Unnamed: 2_level_1
Line 3,90280.09,2.217889
Line 2,89078.17,2.189551
Line 4,88877.98,2.175455
Line 6,83513.08,2.125238
Line 5,98509.98,2.085152
Line 1,90586.37,2.052889


In [126]:
# 2. Detect lines with outlier scrap rates: more than 2 standard deviations from the mean.
scrap_rate_mean = df['Scrap Rate (%)'].mean()

In [127]:
scrap_rate_mean

np.float64(2.1400555555555556)

In [128]:
scrap_rate_std = df['Scrap Rate (%)'].std()

In [129]:
scrap_rate_std

np.float64(1.3478179288945513)

In [130]:
threshold = scrap_rate_mean + 2 * scrap_rate_std

In [131]:
lines_with_outliers = df[df['Scrap Rate (%)'] > threshold]

In [132]:
lines_with_outliers.head(3)

Unnamed: 0,Batch ID,Date Produced,Material Type,Color,Production Line,Weight (g),Scrap Rate (%),Pass/Fail,Operator,Phone,Email,Shift,Machine Barcode,Lot Number
56,fb4c18db-c9b1-4c9a-ad29-15f37d7076b1,2025-05-04,ABS,Red,Line 3,1003.01,5.69,Pass,Mary Anderson,247-382-6261x52923,jesse59@stewart-johnson.info,Shift C,MCH-004,L6279
62,4d34f112-a00f-4561-9b21-44029a58c303,2025-05-04,ABS,Red,Line 5,1029.34,5.29,Pass,Sherry Bryant,001-741-699-1830x254,timothy04@knox.net,Shift C,MCH-001,L8478
83,8ee4dbf6-b468-493d-b11d-8c4206490c82,2025-05-05,ABS,Blue,Line 1,1041.1,4.85,Pass,Mary Anderson,247-382-6261x52923,jesse59@stewart-johnson.info,Shift B,MCH-004,L4995


In [133]:
lines_with_outliers['Production Line'].value_counts()

Production Line
Line 3    5
Line 5    2
Line 1    2
Line 6    2
Line 2    2
Line 4    1
Name: count, dtype: int64

In [134]:
lines_with_outliers.groupby('Production Line')['Scrap Rate (%)'].mean()

Production Line
Line 1    5.400
Line 2    5.460
Line 3    5.358
Line 4    5.210
Line 5    6.535
Line 6    5.260
Name: Scrap Rate (%), dtype: float64

In [135]:
# 3. Extract lot number prefix (first 3 characters) and use it as a batch series indicator.
df['Batch Series Indicator'] = df['Lot Number'].str.extract(r'^[A-Z]+([0-9]{3}+)')

In [136]:
df[['Lot Number', 'Batch Series Indicator']].head(3)

Unnamed: 0,Lot Number,Batch Series Indicator
0,L9935,993
1,L4257,425
2,L3615,361


In [137]:
# first 3 characters including the letter
df['Batch Series Indicator'] = df['Lot Number'].str[:3]

In [138]:
df[['Lot Number', 'Batch Series Indicator']].head(3)

Unnamed: 0,Lot Number,Batch Series Indicator
0,L9935,L99
1,L4257,L42
2,L3615,L36


In [139]:
# 4. Create a column `Normalized Scrap (%)` by standardizing values (z-score).
scrap_mean = df['Scrap Rate (%)'].mean()

In [140]:
scrap_std = df['Scrap Rate (%)'].std()

In [141]:
df['Normalized Scrap (%)'] = round((df['Scrap Rate (%)'] - scrap_rate_mean) / scrap_rate_std, 2)

In [142]:
df[['Scrap Rate (%)', 'Normalized Scrap (%)']].head(3)

Unnamed: 0,Scrap Rate (%),Normalized Scrap (%)
0,1.79,-0.26
1,4.28,1.59
2,1.65,-0.36


In [143]:
# 5. Flag all batches whose normalized scrap is > 2.
df['Outlier Indicator (Norm > 2)'] = df['Normalized Scrap (%)'].apply(lambda x: 'FLAGGED' if x > 2 else '')

In [144]:
df[['Scrap Rate (%)', 'Normalized Scrap (%)', 'Outlier Indicator (Norm > 2)']].head(3)

Unnamed: 0,Scrap Rate (%),Normalized Scrap (%),Outlier Indicator (Norm > 2)
0,1.79,-0.26,
1,4.28,1.59,
2,1.65,-0.36,


In [145]:
df['Outlier Indicator (Norm > 2)'].value_counts()

Outlier Indicator (Norm > 2)
           526
FLAGGED     14
Name: count, dtype: int64

In [146]:
df[df['Outlier Indicator (Norm > 2)'] == 'FLAGGED'].head(3)

Unnamed: 0,Batch ID,Date Produced,Material Type,Color,Production Line,Weight (g),Scrap Rate (%),Pass/Fail,Operator,Phone,Email,Shift,Machine Barcode,Lot Number,Batch Series Indicator,Normalized Scrap (%),Outlier Indicator (Norm > 2)
56,fb4c18db-c9b1-4c9a-ad29-15f37d7076b1,2025-05-04,ABS,Red,Line 3,1003.01,5.69,Pass,Mary Anderson,247-382-6261x52923,jesse59@stewart-johnson.info,Shift C,MCH-004,L6279,L62,2.63,FLAGGED
62,4d34f112-a00f-4561-9b21-44029a58c303,2025-05-04,ABS,Red,Line 5,1029.34,5.29,Pass,Sherry Bryant,001-741-699-1830x254,timothy04@knox.net,Shift C,MCH-001,L8478,L84,2.34,FLAGGED
83,8ee4dbf6-b468-493d-b11d-8c4206490c82,2025-05-05,ABS,Blue,Line 1,1041.1,4.85,Pass,Mary Anderson,247-382-6261x52923,jesse59@stewart-johnson.info,Shift B,MCH-004,L4995,L49,2.01,FLAGGED


In [147]:
# 6. Identify the line with most flagged batches.
# Convert to boolean before summing
flagged_count = (
    df['Outlier Indicator (Norm > 2)']
    .eq('FLAGGED')
    .groupby(df['Production Line'])
    .sum()
    .sort_values(ascending=False)
)

In [148]:
flagged_count

Production Line
Line 3    5
Line 1    2
Line 2    2
Line 5    2
Line 6    2
Line 4    1
Name: Outlier Indicator (Norm > 2), dtype: int64

In [149]:
# 7. Output a summary table of lines with average weight, scrap, and failure count.
average_weight_per_line = round(df.groupby('Production Line')['Weight (g)'].mean(), 2)

In [150]:
scrap_count_per_line = df.groupby('Production Line')['Scrap Rate (%)'].count()

In [151]:
failure_count_per_line = df['Pass/Fail'].eq('Fail').groupby(df['Production Line']).sum()

In [152]:
summary = pd.DataFrame(
    {
        'Avg. Weight per Line (g)': average_weight_per_line,
        'Total Scrap per Line': scrap_count_per_line,
        'Failure Count per Line': failure_count_per_line
    }
)

In [153]:
summary

Unnamed: 0_level_0,Avg. Weight per Line (g),Total Scrap per Line,Failure Count per Line
Production Line,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Line 1,1006.52,90,7
Line 2,1000.88,89,9
Line 3,1003.11,90,12
Line 4,1009.98,88,13
Line 5,995.05,99,9
Line 6,994.2,84,11
