# Polymers Classifier

EM 538
Instructor:  
Student: Mike Keating


## Preprocessing

Data was queried by polymer symbol (PP, PMMA, etc) and saved to 5 separate csv files. Due to the nature of the search, there are likely blends and special materials included in each dataset, and we will want to pair down to a reasonable number of classes.


In [12]:
## Dependencies
import pandas as pd


In [28]:
## Combine
import os

df = pd.DataFrame()
for file in os.listdir("data"):
    if file.endswith(".csv"):
        df_tmp = pd.read_csv(os.path.join("data", file))
        df = pd.concat([df, df_tmp], ignore_index=True)
# Overview
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36382 entries, 0 to 36381
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Product Name                  36382 non-null  object 
 1   Grade                         36382 non-null  object 
 2   Generic Polymer Type          36382 non-null  object 
 3   Generic Polymer Symbol        36382 non-null  object 
 4   Density                       25865 non-null  float64
 5   Tensile Strength at Yield     18942 non-null  float64
 6   Flexural Modulus              26080 non-null  float64
 7   Flexural Strength             12903 non-null  float64
 8   Tensile Modulus               12543 non-null  float64
 9   Glass Transition Temperature  515 non-null    float64
 10  Melt Mass-Flow Rate (MFR)     25821 non-null  float64
 11  Polymer Code                  7891 non-null   object 
 12  Melting Temperature           1264 non-null   float64
dtypes

In [29]:
# Discard  Glass Transition Temp as it has too many null columns
df = df.drop("Glass Transition Temperature", axis=1)
df = df.drop("Polymer Code", axis=1)
df = df.drop("Melting Temperature", axis=1)
# df.dropna(inplace=True)
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36382 entries, 0 to 36381
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Product Name               36382 non-null  object 
 1   Grade                      36382 non-null  object 
 2   Generic Polymer Type       36382 non-null  object 
 3   Generic Polymer Symbol     36382 non-null  object 
 4   Density                    25865 non-null  float64
 5   Tensile Strength at Yield  18942 non-null  float64
 6   Flexural Modulus           26080 non-null  float64
 7   Flexural Strength          12903 non-null  float64
 8   Tensile Modulus            12543 non-null  float64
 9   Melt Mass-Flow Rate (MFR)  25821 non-null  float64
dtypes: float64(6), object(4)
memory usage: 2.8+ MB


In [34]:
#


symbols = [
    "ABS",
    "PET",
    "MABS",
    "PP Homopolymer",
    "Acrylic (PMMA)",
    "PC+ABS",
    "PP Copolymer",
    "PS (GPPS)",
    "PEEK",
    "PS (HIPS)",
    "PVC",
    "HDPE",
    "LDPE",
]

df = df[df["Generic Polymer Symbol"].isin(symbols)]
# df
df.isna().sum()
df_clean = df.dropna()
df_clean.groupby("Generic Polymer Symbol").mean(numeric_only=True)
df_clean.groupby("Generic Polymer Symbol").count()


Unnamed: 0_level_0,Product Name,Grade,Generic Polymer Type,Density,Tensile Strength at Yield,Flexural Modulus,Flexural Strength,Tensile Modulus,Melt Mass-Flow Rate (MFR)
Generic Polymer Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ABS,329,329,329,329,329,329,329,329,329
Acrylic (PMMA),14,14,14,14,14,14,14,14,14
MABS,16,16,16,16,16,16,16,16,16
PC+ABS,198,198,198,198,198,198,198,198,198
PEEK,17,17,17,17,17,17,17,17,17
PET,2,2,2,2,2,2,2,2,2
PP Copolymer,182,182,182,182,182,182,182,182,182
PP Homopolymer,228,228,228,228,228,228,228,228,228
PS (GPPS),23,23,23,23,23,23,23,23,23
PS (HIPS),39,39,39,39,39,39,39,39,39


In [31]:
# Debugging: Check where data is lost
print("Unique symbols in data:", df["Generic Polymer Symbol"].unique())
print("Shape after filtering symbols:", df.shape)
print("Number of NaNs per column after filtering:")
print(df.isna().sum())
df_clean = df.dropna()
print("Shape after dropping NaNs:", df_clean.shape)
print("Remaining symbols:", df_clean["Generic Polymer Symbol"].unique())
df_clean.groupby("Generic Polymer Symbol").count()

Unique symbols in data: ['PC+ABS' 'ABS' 'MABS' 'HDPE' 'LDPE' 'PEEK' 'PET' 'Acrylic (PMMA)'
 'PP Copolymer' 'PP Homopolymer' 'PS (HIPS)' 'PS (GPPS)' 'PVC, Rigid']
Shape after filtering symbols: (21352, 10)
Number of NaNs per column after filtering:
Product Name                     0
Grade                            0
Generic Polymer Type             0
Generic Polymer Symbol           0
Density                       6799
Tensile Strength at Yield     9412
Flexural Modulus              5922
Flexural Strength            13572
Tensile Modulus              13659
Melt Mass-Flow Rate (MFR)     5318
dtype: int64
Shape after dropping NaNs: (1048, 10)
Remaining symbols: ['ABS' 'PC+ABS' 'MABS' 'PEEK' 'PET' 'Acrylic (PMMA)' 'PP Copolymer'
 'PP Homopolymer' 'PS (HIPS)' 'PS (GPPS)']


Unnamed: 0_level_0,Product Name,Grade,Generic Polymer Type,Density,Tensile Strength at Yield,Flexural Modulus,Flexural Strength,Tensile Modulus,Melt Mass-Flow Rate (MFR)
Generic Polymer Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ABS,329,329,329,329,329,329,329,329,329
Acrylic (PMMA),14,14,14,14,14,14,14,14,14
MABS,16,16,16,16,16,16,16,16,16
PC+ABS,198,198,198,198,198,198,198,198,198
PEEK,17,17,17,17,17,17,17,17,17
PET,2,2,2,2,2,2,2,2,2
PP Copolymer,182,182,182,182,182,182,182,182,182
PP Homopolymer,228,228,228,228,228,228,228,228,228
PS (GPPS),23,23,23,23,23,23,23,23,23
PS (HIPS),39,39,39,39,39,39,39,39,39


In [None]:
# Create meaningful groupings for class balancing
def map_polymer_group(symbol):
    if symbol in ["PP Homopolymer", "PP Copolymer"]:
        return "PP"
    elif symbol in ["PS (GPPS)", "PS (HIPS)"]:
        return "PS"
    elif symbol in ["Acrylic (PMMA)", "MABS"]:
        return "Acrylics"
    elif symbol in ["PC+ABS", "ABS"]:
        return "ABS/Blends"
    elif symbol == "HDPE":
        return "HDPE"
    elif symbol == "LDPE":
        return "LDPE"
    elif symbol == "PET":
        return "PET"
    elif symbol == "PEEK":
        return "PEEK"
    elif symbol == "PVC":
        return "PVC"
    else:
        return "Other"


df_clean["Polymer Group"] = df_clean["Generic Polymer Symbol"].apply(map_polymer_group)
print(df_clean["Polymer Group"].value_counts())