#Task 3

The file “Parts.csv” contains descriptions of some fictitious parts. Your goal is to
find 5 alternative parts to each provided fictitious part in the dataset based on their
similarity. First provide descriptive analysis of the data and highlight 2-3 findings
and difficulties of the data that we provided and describe how you would handle this.
Continue to implement a solution that is finding the similar fictitious parts based on
the column “DESCRIPTION”. Please give details of your solution and why you choose it.
Once you finished your implementation of your solution, please think about how you
would integrate your code into the chatbot from task 1.

##Goal = Aanalyze Parts.csv dataset of fictitious parts and find alternative parts that are similar to each given part.

1. Descriptive Analysis: Examine the data to understand its structure, identify patterns, and note any challenges or limitations.

2. Similarity Matching: Develop a method to identify 5 alternative parts for each part based on the "DESCRIPTION" column, which contains detailed specifications (e.g., current rating, voltage, blow type, material, etc.).

3. Integration: Suggest how this solution could be integrated into a chatbot for real-time use.

### This is useful in scenarios like inventory management, product recommendation, or customer support, where a user might ask, "What are alternatives to part A1?" The chatbot would analyze the data and provide similar parts based on their descriptions.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import os
import re

In [None]:
# Load the datasets
df_original = pd.read_csv('/content/drive/MyDrive/BMW_csv_files/Parts.csv', sep=';')

In [None]:
# Adjust display options for single-line output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

# confirm the data loaded: Print the dataframe in a single line
print(df_original.head().to_string(index=False))

ID                                                                                                                                                                     DESCRIPTION Attribut1 Additional Feature                     Application Characteristic Temp Height Length in mm Rating Material     Size Code Joule-integral-Nom (J) LC Risk Maximum AC Voltage Rating Maximum DC Voltage Rating Maximum Power Dissipation Mounting Mounting Feature  Number of Terminals Operating Temperature-Max (Cel) Operating Temperature-Min (Cel) Physical Dimension Pre-arcing time-Min (ms) Product Diameter Product Length Rated Breaking Capacity (A) Rated Current (A) Rated Voltage (V) Rated Voltage(AC) (V) Rated Voltage(DC) (V)
A1 Indicator Red Fast Movement 1.6A 250V Holder Plastic 5 X 20mm Ceramic Box CCC/PSE/VDE/cULus Electric Indicator, Very Fast Blow, 1.6A, 250VAC, 1500A (IR), Inline/holder, 5x20mm      Fast                NaN Primary Protection In Equipment      VERY FAST  NaN   20mm        5.2mm   1.6A

In [None]:
num_rows, num_columns = df_original.shape
print(f"There are {num_rows} rows (records) and {num_columns} columns (fields) in the dataset.")

There are 998 rows (records) and 32 columns (fields) in the dataset.


## 1. Descriptive Analysis of the Data.
The "DESCRIPTION" column is the primary focus. Descriptions use natural language (e.g., "Indicator Red Fast Movement" vs. "Non Resettable Indicators Electric Indicator"), which may lead to parsing challenges.

In [None]:
df_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 998 entries, 0 to 997
Data columns (total 32 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   ID                               998 non-null    object 
 1   DESCRIPTION                      663 non-null    object 
 2   Attribut1                        770 non-null    object 
 3   Additional Feature               324 non-null    object 
 4   Application                      821 non-null    object 
 5   Characteristic                   672 non-null    object 
 6   Temp                             417 non-null    object 
 7   Height                           705 non-null    object 
 8   Length in mm                     705 non-null    object 
 9   Rating                           834 non-null    object 
 10  Material                         771 non-null    object 
 11  Size                             880 non-null    object 
 12  Code                  

In [None]:
df_original.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
ID,998.0,998.0,A998,1.0,,,,,,,
DESCRIPTION,663.0,582.0,Indicator Chip Very Fast Movement 0.062A 125V ...,4.0,,,,,,,
Attribut1,770.0,5.0,Very Fast,313.0,,,,,,,
Additional Feature,324.0,39.0,RATED BREAKING CAPACITY AT 125 VDC: 50 A,129.0,,,,,,,
Application,821.0,28.0,Motor Circuit,293.0,,,,,,,
Characteristic,672.0,7.0,VERY FAST,231.0,,,,,,,
Temp,417.0,16.0,2.69mm,187.0,,,,,,,
Height,705.0,33.0,20mm,189.0,,,,,,,
Length in mm,705.0,30.0,5.2mm,187.0,,,,,,,
Rating,834.0,70.0,1A,50.0,,,,,,,


In [None]:
df_original['DESCRIPTION'].head().to_list()

['Indicator Red Fast Movement 1.6A 250V Holder Plastic 5 X 20mm Ceramic Box CCC/PSE/VDE/cULus Electric Indicator, Very Fast Blow, 1.6A, 250VAC, 1500A (IR), Inline/holder, 5x20mm',
 'Non Resettable Indicators Electric Indicator, Very Fast Blow, 6.3A, 250VAC, 1500A (IR), Inline/holder, 5x20mm',
 'Indicator Red Fast Movement 8A 250V Holder Plastic 5 X 20mm Ceramic Box KC/PSE/VDE/cULus Electric Indicator, Very Fast Blow, 8A, 250VAC, 1500A (IR), Inline/holder, 5x20mm',
 'Non Resettable Indicators Electric Indicator, Very Fast Blow, 10A, 250VAC, 1500A (IR), Inline/holder, 5x20mm',
 'Indicator Red Fast Movement 12.5A 250V Holder Plastic 5 X 20mm Ceramic Box PSE/cULus Electric Indicator, Very Fast Blow, 12.5A, 250VAC, 500A (IR), Inline/holder, 5x20mm']

In [None]:
# Check duplicates in the data frame
print(df_original[df_original.duplicated()])
print("There are",df_original.duplicated().sum(), "duplicates")

Empty DataFrame
Columns: [ID, DESCRIPTION, Attribut1, Additional Feature, Application, Characteristic, Temp, Height, Length in mm, Rating, Material, Size, Code, Joule-integral-Nom (J), LC Risk, Maximum AC Voltage Rating, Maximum DC Voltage Rating, Maximum Power Dissipation, Mounting, Mounting Feature, Number of Terminals, Operating Temperature-Max (Cel), Operating Temperature-Min (Cel), Physical Dimension, Pre-arcing time-Min (ms), Product Diameter, Product Length, Rated Breaking Capacity (A), Rated Current (A), Rated Voltage (V), Rated Voltage(AC) (V), Rated Voltage(DC) (V)]
Index: []
There are 0 duplicates


In [None]:
# Checking for null/missing values in the data
missing_values = df_original.isnull().sum()
if missing_values.sum() == 0:
    print("\nThere are NO missing values in the dataset.\n")
else:
    print("\nThere are missing values in the dataset which need treatment.\n")
    print("\nMissing values per column:\n")
    print(missing_values)


There are missing values in the dataset which need treatment.


Missing values per column:

ID                                   0
DESCRIPTION                        335
Attribut1                          228
Additional Feature                 674
Application                        177
Characteristic                     326
Temp                               581
Height                             293
Length in mm                       293
Rating                             164
Material                           227
Size                               118
Code                               443
Joule-integral-Nom (J)             335
LC Risk                            212
Maximum AC Voltage Rating          264
Maximum DC Voltage Rating          550
Maximum Power Dissipation          745
Mounting                           224
Mounting Feature                   319
Number of Terminals                240
Operating Temperature-Max (Cel)    291
Operating Temperature-Min (Cel)    296
Physical D

#2. Observations and Findings

Structure: The dataset contains 998 entries with columns such as "ID," "DESCRIPTION," "Additional Feature," "Application," "Characteristic," "Rated Current (A)," "Rated Voltage (AC) (V)," "Rated Voltage (DC) (V)," "Material," "Size," and more. The "DESCRIPTION" column provides detailed specifications (e.g., "Indicator Red Fast Movement 1.6A 250V Holder Plastic 5 X 20mm Ceramic Box CCC/PSE/VDE/cULus Electric Indicator").

Variability: Parts vary by current (0.002A to 30A), voltage (72V to 500V), blow type (Very Fast Blow, Time Lag Blow, Slow Blow), material (Ceramic, Glass), and mounting type (Inline/Holder, Surface Mount, Through Hole).

Data Quality: Some entries (e.g., A8, A18) have incomplete "DESCRIPTION" fields, while others (e.g., A63-A79) are accessory holders with no current or voltage ratings.

##Data cleansing

In [None]:
df= df_original.copy(deep=True)

In [None]:
df['Material'].isnull().sum()

np.int64(227)

In [None]:
df[df['Material'].isnull()].head()

Unnamed: 0,ID,DESCRIPTION,Attribut1,Additional Feature,Application,Characteristic,Temp,Height,Length in mm,Rating,Material,Size,Code,Joule-integral-Nom (J),LC Risk,Maximum AC Voltage Rating,Maximum DC Voltage Rating,Maximum Power Dissipation,Mounting,Mounting Feature,Number of Terminals,Operating Temperature-Max (Cel),Operating Temperature-Min (Cel),Physical Dimension,Pre-arcing time-Min (ms),Product Diameter,Product Length,Rated Breaking Capacity (A),Rated Current (A),Rated Voltage (V),Rated Voltage(AC) (V),Rated Voltage(DC) (V)
11,A12,"Red Indicator, 5 X 20 mm Electric Indicator, T...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,,5x20mm,,11.7J,Low,,,,,INLINE/HOLDER,,125Cel,-55Cel,5.2mm x 20mm,10ms,,,1500A,2.5A,250V,250V,300V
15,A16,"Non Resettable Indicators Electric Indicator, ...",,RATED BREAKING CAPACITY AT 150 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,,5x20mm,,97.5J,Low,,,,Holder,INLINE/HOLDER,2.0,125Cel,-55Cel,5.2mm x 20mm,10ms,5.2mm,20mm,1500A,5A,250V,250V,150V
24,A25,"Indicators PN Electric Indicator, Time Lag Blo...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,,5 X 20mm,e4,1.1J,Low,,,,,SURFACE MOUNT,,125Cel,-55Cel,5.2mm x 20mm,10ms,,,1500A,1A,,250V,300V
27,A28,"Indicators PN Electric Indicator, Time Lag Blo...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,,5 X 20mm,e4,1.86J,Low,,,,,SURFACE MOUNT,,125Cel,-55Cel,5.2mm x 20mm,10ms,,,1500A,1.25A,,250V,300V
32,A33,"Indicators PN Electric Indicator, Time Lag Blo...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,,5 X 20mm,e4,9.2J,Low,,,,Holder,SURFACE MOUNT,2.0,125Cel,-55Cel,5.2mm x 20mm,10ms,5.2mm,20mm,1500A,2A,,250V,300V


Remove rows with missing "Rated Current (A)" as it's essential for similarity matching. Retain rows with missing secondary fields (e.g., "Material") but flag them for potential exclusion in analysis if needed.

In [None]:
#Step 1: Handle Missing Values
df_clean = df.dropna(subset=["Rated Current (A)"])  # Remove rows with no current
df_clean = df_clean.fillna({"Material": "Unknown", "Mounting": "Unknown", "Characteristic": "Unknown"})  # Fill other missing with defaults

In [None]:
df_clean['Material'].isnull().sum()

np.int64(0)

In [None]:
num_rows, num_columns = df_clean.shape
print(f"There are {num_rows} rows (records) and {num_columns} columns (fields) in the dataset.")

There are 673 rows (records) and 32 columns (fields) in the dataset.


Solution: Finding Similar Fictitious Parts

#Approach

1. Use rule-based similarity approach (regular expressions) to extract key attributes (current, voltage, blow type, material, mounting) from "DESCRIPTION," prioritizing "Rated Current (A)" and "Rated Voltage (AC) (V)" where available.

2. Focus on numerical similarity (current, voltage) as the primary metric, with categorical matches (blow type, material) as secondary.

Steps:

1.Extract Features: Parse each description to extract current (e.g., 1.6A), voltage (e.g., 250VAC), blow type (e.g., Very Fast Blow), and material (e.g., Ceramic).

2.Define Similarity: Rank parts by closeness in current (primary factor), then voltage and blow type (secondary factors). Material is a tertiary factor.

3.Match Alternatives: For each part, find the top 5 parts with the closest current ratings, adjusting for voltage and blow type compatibility.

Simplicity: Rule-based matching is feasible with a small dataset and avoids the need for training data.

Focus on Key Attributes: Current is the primary differentiator, with voltage and blow type as secondary checks, aligning with electrical part selection criteria.

Sample Output:
For A1 (1.6A): Alternatives might include A2 (6.3A), A3 (8A), etc., ranked by current proximity.

In [None]:
# Step 2: Standardize Descriptions - Normalize text
df_clean["DESCRIPTION"] = df_clean["DESCRIPTION"].str.lower().str.strip()

In [None]:
# Adjust display options for single-line output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

# confirm the data loaded: Print the dataframe in a single line
print(df_clean.head().to_string(index=False))

ID                                                                                                                                                                     DESCRIPTION Attribut1 Additional Feature                     Application Characteristic Temp Height Length in mm Rating Material     Size Code Joule-integral-Nom (J) LC Risk Maximum AC Voltage Rating Maximum DC Voltage Rating Maximum Power Dissipation Mounting Mounting Feature  Number of Terminals Operating Temperature-Max (Cel) Operating Temperature-Min (Cel) Physical Dimension Pre-arcing time-Min (ms) Product Diameter Product Length Rated Breaking Capacity (A) Rated Current (A) Rated Voltage (V) Rated Voltage(AC) (V) Rated Voltage(DC) (V)
A1 indicator red fast movement 1.6a 250v holder plastic 5 x 20mm ceramic box ccc/pse/vde/culus electric indicator, very fast blow, 1.6a, 250vac, 1500a (ir), inline/holder, 5x20mm      Fast                NaN Primary Protection In Equipment      VERY FAST  NaN   20mm        5.2mm   1.6A

In [None]:
def extract_features(row):
    desc = str(row["DESCRIPTION"])  # Ensure desc is a string
    # Handle current with fallback, capturing only the number before 'A'
    current_match = re.search(r'(\d+\.?\d*)(?=A\b)', desc)  # Use lookahead with word boundary
    current = float(current_match.group(1)) if current_match else float(str(row.get("Rated Current (A)", 0)).replace('A', ''))  # Ensure numeric fallback
    # Handle voltage_ac with alternative column names, ensuring numeric
    voltage_ac_match = re.search(r'(\d+)(?=VAC\b)', desc)
    voltage_ac = int(voltage_ac_match.group(1)) if voltage_ac_match else float(str(row.get("Rated Voltage(AC) (V)", row.get("Maximum AC Voltage Rating", 0))).replace('V', ''))
    # Handle voltage_dc with alternative column names, ensuring numeric
    voltage_dc_match = re.search(r'(\d+)(?=VDC\b)', desc)
    voltage_dc = int(voltage_dc_match.group(1)) if voltage_dc_match else float(str(row.get("Rated Voltage(DC) (V)", row.get("Maximum DC Voltage Rating", 0))).replace('V', ''))
    # Handle blow_type
    blow_type = re.search(r'(very fast|time lag|slow|fast|super fast) blow', desc).group(1) + " blow" if re.search(r'(very fast|time lag|slow|fast|super fast) blow', desc) else row.get("Characteristic", "unknown").lower()
    # Handle material
    material = re.search(r'(ceramic|glass|plastic)', desc).group(1) if re.search(r'(ceramic|glass|plastic)', desc) else row.get("Material", "unknown").lower()
    # Handle mounting
    mounting = re.search(r'(inline/holder|surface mount|through hole|panel mount)', desc).group(1) if re.search(r'(inline/holder|surface mount|through hole|panel mount)', desc) else row.get("Mounting", "unknown").lower()
    return pd.Series({"current": current, "voltage_ac": voltage_ac, "voltage_dc": max(voltage_ac, voltage_dc), "blow_type": blow_type, "material": material, "mounting": mounting})


This function is designed to process a single row of the df_clean DataFrame and extract specific features (current, voltage, blow type, material, mounting) from the "DESCRIPTION" column, with fallbacks to other columns if needed.

In [None]:
df_clean[["current", "voltage_ac", "voltage_dc", "blow_type", "material", "mounting"]] = df_clean.apply(extract_features, axis=1)

In [None]:
# Adjust display options for single-line output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

# confirm the data loaded: Print the dataframe in a single line
print(df_clean.head().to_string(index=False))

ID                                                                                                                                                                     DESCRIPTION Attribut1 Additional Feature                     Application Characteristic Temp Height Length in mm Rating Material     Size Code Joule-integral-Nom (J) LC Risk Maximum AC Voltage Rating Maximum DC Voltage Rating Maximum Power Dissipation Mounting Mounting Feature  Number of Terminals Operating Temperature-Max (Cel) Operating Temperature-Min (Cel) Physical Dimension Pre-arcing time-Min (ms) Product Diameter Product Length Rated Breaking Capacity (A) Rated Current (A) Rated Voltage (V) Rated Voltage(AC) (V) Rated Voltage(DC) (V)  current  voltage_ac  voltage_dc      blow_type material      mounting
A1 indicator red fast movement 1.6a 250v holder plastic 5 x 20mm ceramic box ccc/pse/vde/culus electric indicator, very fast blow, 1.6a, 250vac, 1500a (ir), inline/holder, 5x20mm      Fast                NaN Primary

In [None]:
df_clean["blow_type"].value_counts()

Unnamed: 0_level_0,count
blow_type,Unnamed: 1_level_1
slow blow,191
very fast blow,184
fast blow,158
very fast,47
unknown,38
time lag,28
time lag blow,16
fast,9
super fast blow,2


In [None]:
# Step 3: Resolve Inconsistencies
df_clean["blow_type"] = df_clean["blow_type"].replace({"super fast blow": "very fast blow", "fast blow": "very fast blow"})  # Standardize blow types
# Prioritize extracted current
df_clean["Rated Current (A)"] = df_clean.apply(lambda x: x["current"] if pd.notna(x["current"]) else x["Rated Current (A)"], axis=1)

In [None]:
df_clean["blow_type"].value_counts()

Unnamed: 0_level_0,count
blow_type,Unnamed: 1_level_1
very fast blow,344
slow blow,191
very fast,47
unknown,38
time lag,28
time lag blow,16
fast,9


In [None]:
df_clean.shape

(673, 38)

In [None]:
# Step 4: Remove Outliers/Irrelevant Data
df_clean = df_clean[~df_clean["DESCRIPTION"].str.contains("holder|accessory", case=False, na=False)]  # Remove holders
# Outlier check could be added (e.g., remove currents < 0.01A or > 30A if invalid)

In [None]:
df_clean.shape

(493, 38)

In [None]:
# Adjust display options for single-line output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

# confirm the data loaded: Print the dataframe in a single line
print(df_clean.head().to_string(index=False))

 ID                                                                                                                                               DESCRIPTION Attribut1                         Additional Feature                                                                           Application Characteristic Temp Height Length in mm Rating Material     Size Code Joule-integral-Nom (J) LC Risk Maximum AC Voltage Rating Maximum DC Voltage Rating Maximum Power Dissipation     Mounting Mounting Feature  Number of Terminals Operating Temperature-Max (Cel) Operating Temperature-Min (Cel) Physical Dimension Pre-arcing time-Min (ms) Product Diameter Product Length Rated Breaking Capacity (A)  Rated Current (A) Rated Voltage (V) Rated Voltage(AC) (V) Rated Voltage(DC) (V)  current  voltage_ac  voltage_dc     blow_type material      mounting
A11 indicator red slow blow movement 2a 250v axial 5 x 20mm ceramic t/r culus electric indicator, time lag blow, 2a, 250vac, 300vdc, 1500a (ir), through h

In [None]:
# Step 5: Validate Data Types
df_clean["Rated Current (A)"] = pd.to_numeric(df_clean["Rated Current (A)"], errors="coerce")
df_clean["Rated Voltage(AC) (V)"] = pd.to_numeric(df_clean["Rated Voltage(AC) (V)"], errors="coerce")
df_clean["Rated Voltage(DC) (V)"] = pd.to_numeric(df_clean["Rated Voltage(DC) (V)"], errors="coerce")

In [None]:
df_clean.shape

(493, 38)

In [None]:
#df_clean["Rated Current (A)"].value_counts()
#df_clean["Rated Voltage(AC) (V)"].value_counts()

In [None]:
#df_clean = df_clean.dropna(subset=["Rated Current (A)", "Rated Voltage(AC) (V)"])  # Ensure no NaN in key fields

In [None]:
# Adjust display options for single-line output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

# confirm the data loaded: Print the dataframe in a single line
print(df_clean.head().to_string(index=False))

 ID                                                                                                                                               DESCRIPTION Attribut1                         Additional Feature                                                                           Application Characteristic Temp Height Length in mm Rating Material     Size Code Joule-integral-Nom (J) LC Risk Maximum AC Voltage Rating Maximum DC Voltage Rating Maximum Power Dissipation     Mounting Mounting Feature  Number of Terminals Operating Temperature-Max (Cel) Operating Temperature-Min (Cel) Physical Dimension Pre-arcing time-Min (ms) Product Diameter Product Length Rated Breaking Capacity (A)  Rated Current (A) Rated Voltage (V)  Rated Voltage(AC) (V)  Rated Voltage(DC) (V)  current  voltage_ac  voltage_dc     blow_type material      mounting
A11 indicator red slow blow movement 2a 250v axial 5 x 20mm ceramic t/r culus electric indicator, time lag blow, 2a, 250vac, 300vdc, 1500a (ir), through

In [None]:
# Save cleaned data
# df_clean.to_csv("cleaned_parts.csv", index=False)
# print("Data cleaning complete. Cleaned dataset saved as 'cleaned_parts.csv'.")

#Next steps
#1.similarity matching.

The goal is to find similar fictitious parts in the df_clean DataFrame by comparing their features (e.g., current, voltage_ac, voltage_dc, blow_type, material, mounting).

Steps:

1.Define a similarity function to calculate a score between two parts.

2.For each part, find the top 5 similar parts by sorting on the similarity score.

3.Integrate this into the chatbot for user queries (e.g., "What are alternatives to part A1?").

In [None]:
#Similarity Function
#We'll create a function to compute a similarity score between two parts based on their feature differences.
def calculate_similarity(part1, part2):
    # Handle missing or zero values to avoid division by zero
    if part1["current"] == 0 or part2["current"] == 0:
        return 0

    # Calculate difference in current (normalized by the average current)
    current_diff = abs(part1["current"] - part2["current"]) / max(part1["current"], part2["current"])

    # Voltage match (prioritize AC voltage, use DC if AC is zero)
    voltage1 = part1["voltage_ac"] if part1["voltage_ac"] > 0 else part1["voltage_dc"]
    voltage2 = part2["voltage_ac"] if part2["voltage_ac"] > 0 else part2["voltage_dc"]
    if voltage1 == 0 or voltage2 == 0:
        voltage_match = 0.5  # Partial match if one voltage is zero
    else:
        voltage_diff = abs(voltage1 - voltage2) / max(voltage1, voltage2)
        voltage_match = max(0, 1 - voltage_diff)  # Scale to 0-1

    # Categorical matches (1 for match, 0.7 for mismatch)
    blow_match = 1.0 if part1["blow_type"] == part2["blow_type"] else 0.7
    material_match = 1.0 if part1["material"] == part2["material"] else 0.9
    mounting_match = 1.0 if part1["mounting"] == part2["mounting"] else 0.8

    # Combined similarity score
    score = (1 - current_diff) * voltage_match * blow_match * material_match * mounting_match
    return score


#2.Find Similar Parts
This function will identify the top 5 similar parts for a given part ID.

In [None]:
def find_similar_parts(target_id, df):
    # Check if target_id exists in the DataFrame
    if target_id not in df["ID"].values:
        print(f"Error: Part ID '{target_id}' not found in the dataset.")
        return []

    # Get the target row
    target_part = df[df["ID"] == target_id].iloc[0]  # Safe to access since we checked existence
    similarities = []

    for index, part in df.iterrows():
        if part["ID"] != target_id and part["current"] != 0:  # Exclude target and incomplete parts
            score = calculate_similarity(target_part, part)
            similarities.append((part["ID"], score))

    similarities.sort(key=lambda x: x[1], reverse=True)  # Sort by score descending
    return [id for id, score in similarities[:5] if score > 0]  # Return top 5 with non-zero scores

# Example: Find alternatives for part A1
alternatives = find_similar_parts("A1", df_clean)
print(f"Alternatives for A1: {alternatives}")

Error: Part ID 'A1' not found in the dataset.
Alternatives for A1: []


In [None]:
# Example: Find alternatives for part A1
alternatives = find_similar_parts("A2", df_clean)
print(f"Alternatives for A1: {alternatives}")

Error: Part ID 'A2' not found in the dataset.
Alternatives for A1: []


In [None]:
df_clean

Unnamed: 0,ID,DESCRIPTION,Attribut1,Additional Feature,Application,Characteristic,Temp,Height,Length in mm,Rating,Material,Size,Code,Joule-integral-Nom (J),LC Risk,Maximum AC Voltage Rating,Maximum DC Voltage Rating,Maximum Power Dissipation,Mounting,Mounting Feature,Number of Terminals,Operating Temperature-Max (Cel),Operating Temperature-Min (Cel),Physical Dimension,Pre-arcing time-Min (ms),Product Diameter,Product Length,Rated Breaking Capacity (A),Rated Current (A),Rated Voltage (V),Rated Voltage(AC) (V),Rated Voltage(DC) (V),current,voltage_ac,voltage_dc,blow_type,material,mounting
10,A11,indicator red slow blow movement 2a 250v axial...,Slow Blow,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,Primary Protection on PCB|Power Supply Adapter...,TIME LAG,,22.5mm,5.4mm,2A,Ceramic,5 X 20mm,e3,9.2J,Low,250V,300V,2.5W,Through Hole,THROUGH HOLE,2.0,125Cel,-55Cel,5.4mm x 22.5mm,10ms,5.4mm,22.5(Max)mm,1500A,2.00,250V,,,2.00,250.0,300.0,slow blow,ceramic,through hole
24,A25,"indicators pn electric indicator, time lag blo...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,Unknown,5 X 20mm,e4,1.1J,Low,,,,Unknown,SURFACE MOUNT,,125Cel,-55Cel,5.2mm x 20mm,10ms,,,1500A,1.00,,,,1.00,250.0,300.0,time lag blow,unknown,surface mount
27,A28,"indicators pn electric indicator, time lag blo...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,Unknown,5 X 20mm,e4,1.86J,Low,,,,Unknown,SURFACE MOUNT,,125Cel,-55Cel,5.2mm x 20mm,10ms,,,1500A,1.25,,,,1.25,250.0,300.0,time lag blow,unknown,surface mount
32,A33,"indicators pn electric indicator, time lag blo...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,Unknown,5 X 20mm,e4,9.2J,Low,,,,Holder,SURFACE MOUNT,2.0,125Cel,-55Cel,5.2mm x 20mm,10ms,5.2mm,20mm,1500A,2.00,,,,2.00,250.0,300.0,time lag blow,unknown,surface mount
35,A36,"indicators pn electric indicator, time lag blo...",,RATED BREAKING CAPACITY AT 300 VDC: 1500 A,,TIME LAG,,20mm,5.2mm,,Unknown,5 X 20mm,e4,11.7J,Low,,,,Unknown,SURFACE MOUNT,,125Cel,-55Cel,5.2mm x 20mm,10ms,,,1500A,2.50,,,,2.50,250.0,300.0,time lag blow,unknown,surface mount
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
992,A993,indicator chip slow blow movement 1.5a 125v sm...,Slow Blow,RATED BREAKING CAPACITY AT 125 VDC: 50 A,Automotive|Battery Charging Circuit|Cooling Fa...,SLOW,2.69mm,2.69mm,6.1mm,1.5A,Ceramic,6.1 X 2.69mm,e4,3.65J,Low,125V,125V,,Surface Mount,SURFACE MOUNT,2.0,125Cel,-55Cel,6.1mm x 2.69mm x 2.69mm,3.65ms,,6.1mm,50A,1.50,125V,,,1.50,125.0,125.0,slow blow,ceramic,surface mount
994,A995,indicator chip slow blow movement 1.5a 125v sm...,Slow Blow,RATED BREAKING CAPACITY AT 125 VDC: 50 A,Automotive|Battery Charging Circuit|Cooling Fa...,SLOW,2.69mm,2.69mm,6.1mm,1.5A,Ceramic,6.1 X 2.69mm,e4,3.65J,Low,125V,125V,,Surface Mount,SURFACE MOUNT,2.0,125Cel,-55Cel,6.1mm x 2.69mm x 2.69mm,3.65ms,,6.1mm,50A,1.50,125V,,,1.50,125.0,125.0,slow blow,ceramic,surface mount
995,A996,indicator chip slow blow movement 2.5a 125v sm...,Slow Blow,RATED BREAKING CAPACITY AT 125 VDC: 50 A,Automotive|Battery Charging Circuit|Cooling Fa...,TIME LAG,2.69mm,2.69mm,6.1mm,2.5A,Ceramic,6.1 X 2.69mm,e4,15J,High,125V,125V,,Surface Mount,SURFACE MOUNT,2.0,125Cel,-55Cel,6.1mm x 2.69mm x 2.69mm,,,6.1mm,50A,2.50,,,,2.50,125.0,125.0,slow blow,ceramic,surface mount
996,A997,indicator chip slow blow movement 2.5a 125v sm...,Slow Blow,RATED BREAKING CAPACITY AT 125 VDC: 50 A,Automotive|Battery Charging Circuit|Cooling Fa...,SLOW,2.69mm,2.69mm,6.1mm,2.5A,Ceramic,6.1 X 2.69mm,e4,15J,Low,125V,125V,,Surface Mount,SURFACE MOUNT,2.0,125Cel,-55Cel,6.1mm x 2.69mm x 2.69mm,15ms,,6.1mm,50A,2.50,,,,2.50,125.0,125.0,slow blow,ceramic,surface mount
