# Part Linking
## Problem
We have 3 data sources that describe the underlying system in aircraft maintenance. The data sources are:
1. Maitenance logs
2. Parts catalog
3. troubleshooting guide

These data sources are for the most part separate. The Maintenance logs are completely unstructured and only consist of text parts. The parts catalog is a structured data source that contains information about the parts used in the aircraft. Many of the parts mentioned in the maintenance logs could be found in the parts catalog. Finding links between the parts in the maintenance logs and the parts catalog would be useful for further analysis.

## 1. Data Extraction
see gpt.ipynb
We use the problem extractions from gpt 4 to extract the parts from the maintenance logs. 

## 2. Exploration

In [43]:
import pandas as pd

# load in problem extractions and parts catalog
problem_extractions = pd.read_csv('problem_extractions_chatgpt_4o.csv')
action_extractions = pd.read_csv('action_extractions_chatgpt_4o.csv')
parts_catalog = pd.read_csv('pdf-extracted/parts-catalog.csv')

In [44]:
# create a set with all the unique parts in the parts catalog
part_set = set()
for index, row in parts_catalog.iterrows():
    part_set.add(str(row['Type']))
    
print(part_set)

# calculate the % of parts exactly mentioned in the problem extractions (==) 
# that are in the parts catalog
count = 0
for index, row in problem_extractions.iterrows():
    if row['part'] in part_set:
        count += 1


print(f"Exact matches: {count*100/len(problem_extractions):.2f}%")
# from analyzing the process, it seems that most parts
# mentioned in the problem extractions that aren't in the parts catalog
# seem to be collections, like 'ENGINE' or 'INTAKE',
# abbreviations, like 'CYL'
# or still have location identifiers attached to them, like in 'ROCKER COVER GASKETS'


# idea: split up the parts in problem extractions by spaces
count = 0
for index, row in problem_extractions.iterrows():
    parts = str(row['part']).split(' ')
    for part in parts:
        if part in part_set:
            count += 1
            break
        
print(f"Split matches: {count*100/len(problem_extractions):.2f}%")
# much higher percentage of parts mentioned in the problem extractions
# however there is a lot of noise as well
# like part: INTAKE TUBE GASKET match: TUBE
# this is where a more structured approach could be useful


{'SHAFT ASSEMBLY', 'VALVE', 'SPARK PLUG', 'FITTING', 'TUBE', 'HOSE', 'IMPELLER', 'INSERT', 'ROD ASSEMBLY', 'SHROUD TUBE ASSEMBLY', 'OIL FILTER BASE ASSEMBLY', 'BODY ASSEMBLY', 'CONNECTOR', 'MAGNETO', 'KEY', 'TUBE ASSEMBLY', 'BUSHING', 'SPACER', 'SCREEN ASSEMBLY', 'CLIP', 'CYLINDER ASSEMBLY', 'SCREEN', 'PLUNGER ASSEMBLY', 'OIL FILTER', 'PLUG', 'COVER', 'CRANKSHAFT ASSEMBLY', 'PIN', 'CONNECTING ROD ASSEMBLY', 'FUEL PUMP', 'BAFFLE ASSEMBLY', 'SOCKET', 'SHIM', 'HOUSING ASSEMBLY', 'SEAT', 'GUIDE', 'CRANKCASE ASSEMBLY (Roller tappet engines)', 'PLATE', 'SPRING', 'ZIP STRAP', 'GAGE ASSEMBLY', 'GEAR ASSEMBLY', 'CRANKCASE ASSEMBLY KIT (Roller tappet engines)', 'THRUST BUTTON', 'CONNECTION', 'HOUSING', 'RETAINER', 'SHAFT', 'GEAR', 'IMPELLER KIT', 'CAMSHAFT ASSEMBLY', 'ADAPTER ASSEMBLY', 'VALVE ASSEMBLY', 'LINK', 'RING', 'BRACKET', 'NIPPLE', 'GASKET', 'STARTER ASSEMBLY', 'CARBURETOR', 'CAP', 'CRANKCASE ASSEMBLY', 'ELBOW', 'PIPE', 'CRANKCASE ASSEMBLY KIT', 'CLAMP', 'COTTER PIN', 'HARNESS ASSEMBLY'

## 3. Lexing & Parsing
Using the problem extractions, we want to identify a few things:
- The main part (like "GASKET" in "ROCKER COVER GASKET")
- which words are useful for linking

I've chosen to use a parser to extract the main part and useful words. In hindsight it is probably overkill with the current uses. The parser could be replaced with something simpler, however the current setup works just fine.

### 3.1 Lexer
The lexer is responsible for tokenizing the input text. It splits the text into words and categorizes them into different types. 

In [45]:
# TOKENS: PART, ATTRIBUTE, CONTEXT
# all words are classified as either a part, an attribute, or context
# CONTEXT is a word that doesn't fit as either a PART or ATTRIBUTE

# The token classification is done by a simple lookup in a wordmap
# which is populated with all the parts in the parts catalog
# and other words that are known to be parts or attributes

TOKENS = ['PART', 'ATTRIBUTE', 'CONTEXT']
ATTRIBUTES = [
    "BLACK",
    "BOX",
    "ANGLE",
    "INTERCONNECT",
    "MOUNTING",
    "BLAST",
    "INNER",
    "SNAP",
    "PUSH",
    "BOTTOM",
    "WIRE",
    "BACK",
    "INDUCTION"
]
wordmap = {
    #"ENGINE": 'PART', # ENGINE would match with all parts, practically useless
    "INTAKE": 'PART', # a subset of parts, but we treat it as a part, even though we can't find it in the parts catalog
}

# make each part in the parts catalog a PART
for part in part_set:
    wordmap[part] = 'PART'
    wordmap[part + "S"] = 'PART'
    
    # if the part is an assembly, remove the last word
    splt = part.split(' ')
    if len(splt) > 1 and splt[-1] == 'ASSEMBLY':
        wordmap[' '.join(splt[:-1])] = 'PART'

# add all attributes as Tokens
for attribute in ATTRIBUTES:
    if attribute in wordmap:
        print("WARNING: attribute already in wordmap")
    wordmap[attribute] = 'ATTRIBUTE'

def lex(sentence):
    """
    Tokenize a sentence into PART, ATTRIBUTE, and CONTEXT tokens
    """
    tokens = []
    words = str(sentence).split(' ')
    for word in words:
        if word in wordmap:
            tokens.append((word, wordmap[word]))
        else:
            tokens.append((word, 'CONTEXT'))
    return tokens

print(lex("INTAKE TUBE GASKET"))
print(lex("BLACK BOX SEAL"))

[('INTAKE', 'PART'), ('TUBE', 'PART'), ('GASKET', 'PART')]
[('BLACK', 'ATTRIBUTE'), ('BOX', 'ATTRIBUTE'), ('SEAL', 'PART')]


### 3.2 Parser
The parser creates a tree structure from the token stream. The tree structure is used to extract the main part and useful words.

In [46]:
# tree structure:
# main: ctx part ctx
# part: ATTRIBUTE part
#     | PART part
#     | -
# ctx : CONTEXT ctx
#     | -
# we don't care about ambiguity in the parsing,
# as examples are small enough to brute force the parsing

class ParseNode:
    def __init__(self, type, value, children):
        self.type = type
        self.value = value
        self.children = children
        
    def __str__(self):
        s = self.type
        if self.value != '':
            s += ' ' + self.value
        s += ' ('
        for child in self.children:
            s += str(child) + ', '
        s += ')'
        return s
    
    def __repr__(self) -> str:
        return self.__str__()
    
class UnparsableException(Exception):
    pass

def parse(tokens):
    """
    Parse a list of tokens into a tree structure
    """
    try:
        ctx1 = parse_ctx(tokens)
        part = parse_part(tokens, True)
        ctx2 = parse_ctx(tokens)
        if tokens != []:
            raise UnparsableException
        return ParseNode('main', '', [ctx1, part, ctx2])
    except UnparsableException:
        return None

def parse_part(tokens, force=False):
    if tokens == []:
        if force:
            raise UnparsableException
        return ParseNode("", "", [])
    
    next = tokens.pop(0)
    if next[1] == 'CONTEXT':
        tokens.insert(0, next)
        if force:
            raise UnparsableException
        return ParseNode("part", "", [])
    if next[1] == 'ATTRIBUTE':
        leaf = ParseNode('ATTRIBUTE', next[0], [])
        return ParseNode('part', "", [leaf, parse_part(tokens, force)])
    elif next[1] == 'PART':
        leaf = ParseNode('PART', next[0], [])
        return ParseNode('part', "", [leaf, parse_part(tokens, False)])
    else:
        raise UnparsableException

def parse_ctx(tokens):
    if tokens == []:
        return ParseNode("ctx", "", [])
    
    next = tokens.pop(0)
    if next[1] == 'CONTEXT':
        leaf = ParseNode('CONTEXT', next[0], [])
        return ParseNode('ctx', "", [leaf, parse_ctx(tokens)])
    else:
        tokens.insert(0, next)
        return ParseNode("ctx", "", [])
    
print(parse(lex("INTAKE TUBE GASKET")))
print(parse(lex("BLACK BOX SEAL")))

main (ctx (), part (PART INTAKE (), part (PART TUBE (), part (PART GASKET (),  (), ), ), ), ctx (), )
main (ctx (), part (ATTRIBUTE BLACK (), part (ATTRIBUTE BOX (), part (PART SEAL (),  (), ), ), ), ctx (), )


### 3.3 Main Part Extraction
The main part is usually the last part of the part name. For example, in "ROCKER COVER GASKET", "GASKET" is the main part. The parser extracts the main part by looking at the deepest PART node in the tree.

In [47]:
def find_main_part(node):
    """
    Find the main PART node in the tree structure
    """
    # Usually this is the last part in the sentence,
    # which corresponds to the deepest part in the tree
    
    # we use a depth first search to find the deepest part
    if node.type == 'main':
        return find_main_part(node.children[1])
    elif node.type == 'part':
        for child in node.children[::-1]: # iterate in reverse order
            if child.type == 'part':
                d = find_main_part(child)
                if d is not None:
                    return d
            elif child.type == 'PART':
                return child
    return None

find_main_part(parse(lex("INTAKE TUBE GASKET")))

PART GASKET ()

### 3.4 Useful Words Extraction
The useful words are almost all the words in the part name. However, parts may be pluralized. In these cases the parser removes the pluralization.

In [48]:
# find all buzzwords
# which are basicallly all words in the sentence.
# for parts, we also add the singular form
def find_buzzwords(node,):
    buzzwords = []
    if node.type == 'main':
        for child in node.children:
            buzzwords.extend(find_buzzwords(child))
    elif node.type == 'ctx':
        for child in node.children:
            buzzwords.extend(find_buzzwords(child))
    elif node.type == 'part':
        for child in node.children:
            buzzwords.extend(find_buzzwords(child))
    elif node.type == 'ATTRIBUTE':
        buzzwords.append(node.value)
    elif node.type == 'PART':
        buzzwords.append(node.value)
        if node.value.endswith('S'):
            buzzwords.append(node.value[:-1]) # remove the S
    elif node.type == 'CONTEXT':
        buzzwords.append(node.value)
    
    return buzzwords

print(find_buzzwords(parse(lex("INTAKE TUBE GASKETS"))))

['INTAKE', 'TUBE', 'GASKETS', 'GASKET']


## 4. Part Graph
Parts from the parts catalog are loaded into a graph. This graph also records related words for each part, and which parts are related to each other. 

In [49]:
#==============================================================================
#   STRUCTURING THE PARTS CATALOG
#==============================================================================
# Idea: create a graph with all parts being nodes, these parts have relations
# to other parts, like being a subpart of an assembly, or being a part of the same assembly
# with this graph structure, we may be able to identify parts based on mentions of other parts

class Part:
    def __init__(self, part_number, part_type, specifics):
        self.part_number = part_number
        self.part_type = part_type
        self.specifics = specifics
        self.connections = set()

    def __str__(self):
        return self.part_number + ' (' + self.part_type + ')'
    
    def __repr__(self) -> str:
        return self.__str__()

class Connection:
    def __init__(self, part1, part2, relation):
        self.part1 = part1
        self.part2 = part2
        self.relation = relation

    def __str__(self):
        return self.part1.part_number + ' ' + self.relation + ' ' + self.part2.part_number
    
    def __repr__(self) -> str:
        return self.__str__()
    
class Section:
    def __init__(self, name):
        self.name = name
        self.assemblies = {}
        
    def __str__(self):
        return self.name + ' (' + str(len(self.parts)) + ' parts)'
    
    def __repr__(self) -> str:
        return self.__str__()
    
class Assembly:
    def __init__(self, name):
        self.name = name
        self.parts = {}
        
    def __str__(self):
        return self.name + ' (' + str(len(self.parts)) + ' parts)'
    
    def __repr__(self) -> str:
        return self.__str__()


## 5. Part Linking
When Populating the Part graph, we record which words were mentioned for each part. From this information, we can create the `word_hints` map. This map maps each word to the parts that contain that word. 

When linking parts from the maintenance logs, we can use the `word_hints` map to find the parts that contain the words in the part name. 

In [50]:
word_hints = {}
sections = {}

# populate the graph with parts from the parts catalog
for index, row in parts_catalog.iterrows():
    section = row['Section']
    assembly = row['Figure']
    
    # if the section or assembly doesn't exist yet, create it
    if section not in sections:
        sections[section] = Section(section)
    if assembly not in sections[section].assemblies:
        sections[section].assemblies[assembly] = Assembly(assembly)
    
    part_name = str(row['Part Number'])
    
    # some parts occur multiple times in the parts catalog
    # skip them (for now), if assembly words are added,
    # this needs to be changed
    if part_name in sections[section].assemblies[assembly].parts:
        #print("WARNING: part already in assembly:", part_name)
        continue
    
    part = Part(part_name, row['Type'], row['Specifics'])
    sections[section].assemblies[assembly].parts[part.part_number] = part
    
    # for the related words, take the specifics and type of the part
    related_words = []
    related_words.extend(str(row['Specifics']).split(' '))
    related_words.extend(str(row['Type']).split(' '))
    related_words = [str(w).upper() for w in set(related_words)]
    for w in related_words:
        if w in word_hints:
            word_hints[w].append(part)
        else:
            word_hints[w] = [part]
        
    part.words = related_words

# build relations between parts
# for now only Assembly relations are added
# but others can easily be added as well
for section in sections.values():
    for assembly in section.assemblies.values():
        for part in assembly.parts.values():
            for other in assembly.parts.values():
                if part != other:
                    c = Connection(part, other, 'ASSEMBLY')
                    part.connections.add(c)
                    other.connections.add(c)

# we can now search for all parts related to a word, such as 'ROCKER'
word_hints["ROCKER"]

[17F19357 (ROCKER ASSEMBLY),
 74637 (BUSHING),
 LW-13790 (SHAFT),
 LW-12892 (THRUST BUTTON),
 75906 (GASKET),
 61247 (COVER),
 66610 (BUSHING)]

Now, for each part in the maintenance logs, we have a list of words that are related to the part in question, and we have a map of words to possible parts. 

This means that for each entry in the maintenance logs, we can find the catalog parts similar to the part in the maintenance logs. However, to find a good match, we need some way to score the similarity between the parts.

To do this, we devise a very simple scoring system that scores the similarity between a part in the catalog and the entry. It basically counts how many words overlap between the entry and the catalog part.

$$ \text{score}(c, e) = |\text{words}(c) \cap \text{words}(e)| $$
where $c$ is a part in the catalog, $e$ is an entry in the maintenance logs, and $\text{words}(x)$ is the set of words in $x$.

In [52]:
def make_buzz_ranking(buzzwords):
    # rank the parts based on the number of times a word
    # overlaps with the words of a part
    ranking = {}
    for buzzword in buzzwords:
        if buzzword in word_hints:
            for part in word_hints[buzzword]:
                if part in ranking:
                    ranking[part] += 1
                else:
                    ranking[part] = 1
    return ranking

example = parse(lex("ROCKER COVER GASKETS"))
words = find_buzzwords(example)
ranking = make_buzz_ranking(words)
print(ranking)

{17F19357 (ROCKER ASSEMBLY): 1, 74637 (BUSHING): 1, LW-13790 (SHAFT): 1, LW-12892 (THRUST BUTTON): 1, 75906 (GASKET): 3, 61247 (COVER): 2, 66610 (BUSHING): 1, 69106 (COVER): 1, 60430 (COVER): 1, 03D23350 (COVER): 1, 06E19769-0.63 (GASKET): 1, 61173 (GASKET): 1, 69551 (GASKET): 1, 61183 (GASKET): 1, 8313 (GASKET): 1, 06B23862 (GASKET): 1, 76510 (GASKET): 1, 60096 (GASKET): 1, 73818 (GASKET): 1, 62224 (GASKET): 1, 06E19769-1.25 (GASKET): 1, 66224 (GASKET): 1, 71973 (GASKET): 1, 77611 (GASKET): 1, LW-13353 (GASKET): 1, 06E19769-1.00 (GASKET): 1, 72059 (GASKET): 1}


### 5.1 Candidate Selection
In addition of the useful words per entry, we also have the main part. We can use the main part to filter the parts in the catalog. We only consider a part a candidate if the main part exactly corresponds to the part type in the catalog.

In [53]:
# filter out the parts that don't match the type
def get_candidates(ranking, part_type):
    candidates = []
    for part, score in ranking.items():
        if (part.part_type == part_type 
        or part.part_type == part_type + ' ASSEMBLY'
        or part.part_type + 'S' == part_type):
            candidates.append((part, score))
    return candidates

main_part = find_main_part(example).value
get_candidates(ranking, main_part)

[(75906 (GASKET), 3),
 (06E19769-0.63 (GASKET), 1),
 (61173 (GASKET), 1),
 (69551 (GASKET), 1),
 (61183 (GASKET), 1),
 (8313 (GASKET), 1),
 (06B23862 (GASKET), 1),
 (76510 (GASKET), 1),
 (60096 (GASKET), 1),
 (73818 (GASKET), 1),
 (62224 (GASKET), 1),
 (06E19769-1.25 (GASKET), 1),
 (66224 (GASKET), 1),
 (71973 (GASKET), 1),
 (77611 (GASKET), 1),
 (LW-13353 (GASKET), 1),
 (06E19769-1.00 (GASKET), 1),
 (72059 (GASKET), 1)]

### 5.2 Scoring Refinement
Earlier, we scored every part in the catalog. For the candidate scoring we also want to account for parts related to the candidate. 

$$ \text{ranking}(c, e) = \text{score}(c, e) * B + \sum_{r \in \text{assembly}(c)} \text{score}(r, e) * R $$
where $c$ is the candidate, $e$ is the entry, $B$ is the base score factor, and $R$ is the relation score factor. $\text{assembly}(x)$ is the set of related parts in the same assembly as $x$, excluding parts that are already candidates.

In [54]:
B = 5 # multiplier for the base score of a candidate
R = 1 # multiplier for the score of a related part
def rank_candidates(ranking, candidates):
    scores = {}
    for candidate in candidates:
        score = 0
        for connection in candidate[0].connections:
            subjectpart = connection.part1 if connection.part1 != candidate[0] else connection.part2
            
            # ignore if the part is of the same type as the candidate
            # this is to prevent a specific kind of bias where
            # parts of the same type and figure boost each other.
            # even though there is nothing from the entry that suggests this
            if subjectpart.part_type == candidate[0].part_type:
                continue
            if subjectpart in ranking:
                score += ranking[subjectpart] * R
        
        # apply base score of the candidate
        score += candidate[1] * B
        scores[candidate[0]] = score
    return scores

In [66]:
# This function combines the previous functions to get the ranked candidates
def get_ranked_candidates(node, secondary=None):
    dpart = find_main_part(node).value
    bz = find_buzzwords(node)
    if dpart is None: # no part found
        #print("No part found for", node)
        return None
    
    ranking = make_buzz_ranking(bz)
    
    candidates = get_candidates(ranking, dpart)
    #if len(candidates) == 0:
        #print("No candidates found for", dpart)
    return rank_candidates(ranking, candidates)

## 6. Results

#### Parsing success rate

In [67]:
# check parsability of all problem extractions
parsable = 0
parsed_extractions = []
unparsable = []
for index, row in problem_extractions.iterrows():
    result = parse(lex(row['part']))
    # parse returns None if the extraction is unparsable
    if result is not None:
        parsable += 1
        parsed_extractions.append(result)
    else:
        lexed = lex(row['part'])
        unparsable.append(row['part'])
        
print(f"Parsable: {parsable*100/len(problem_extractions):.2f}%")
print(f"Unparsable extractions: {unparsable}")

Parsable: 72.89%
Unparsable extractions: ['ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'OIL DIPSTICK', 'BRAKE', 'COWL MOUNTS', 'OIL PRESS', 'OIL PRESS', nan, nan, 'FUEL PRESS', 'ENGINE', nan, 'ENGINE', 'ENGINE', 'BRAKE LININGS', 'ENGINE', 'ENGINE', nan, nan, nan, nan, nan, nan, nan, nan, 'ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'LACING CORD', 'INTAKES', 'ENGINE', 'ENGINE', 'RIVET', 'INTAKES', nan, 'INSPECTION PANEL', 'INTAKES', 'INTAKES', 'INTAKES', 'INTAKES', 'INTAKES', 'INTAKES', 'INTAKES', 'FUEL INJ LINES STANDOFF', 'ENGINE', 'ENGINE', 'ENGINE', 'ENGINE', 'INNER CYL BAFFLE TIE ROD', 'INTAKES', 'INTAKES', 'BACK TOP BAFFLE', 'ENGINE IDLE', 'INTAKES', 'AFT BAFFLE ATTACH PLATE', 'OIL COOLER BAFFLING', nan, 'OIL RETURN LINE', 'COWL SHOCK MOUNTS', nan, nan, nan, nan, 'INTAKES', 'OIL RETURN LINE', 'INTAKES', 'THROTTLE LINKAGE DOG BONE', 'INTAKES', 'ENG BAFFLING', 'FIREWALL', 'ENG BAFFLING', 'INDUCTION', 'DIPSTICK', 'INTAKES', 'INTAKES', 'INTAKES'

#### Linking success rates


In [71]:
def inv(d):
    """
    Inverts a dictionary
    """ 
    r = {}
    for k, v in d.items():
        if v in r:
            r[v].append(k)
        else:
            r[v] = [k]
    return r


# run the function on all parsed extractions
ranked_candidates = []
for parsed in parsed_extractions:
    res = get_ranked_candidates(parsed)
    ranked_candidates.append(res)

extraction_successes = 0
identified_parts = []
MIN_SCORE = 4
for c in ranked_candidates:
    if c is not None and len(c) > 0:
        extraction_successes += 1
        # get the part with the highest score
        # and only add it if it is the only part with that score
        m = max(c.values())
        invc = inv(c)
        if len(invc[m]) == 1 and m > MIN_SCORE:
            identified_parts.append(invc[m][0])
        #else:
            #print("Ambiguity in ranking:", invc[m],"with score", m)
        
print(f"succesful extractions: {extraction_successes * 100 / len(problem_extractions):.2f}%")
print(f"succesful identifications: {len(identified_parts) * 100 / len(problem_extractions):.2f}%")

# scrolling through the output, it seems that the parser is able to identify
# clearly mentioned parts, like seals and gaskets
# but struggles a lot with more general mentions, like 'ENGINE' or 'INTAKE'
# obviously this makes sense, there is not a single part that is called 'ENGINE' or 'INTAKE'
# but rather a collection of parts that make up the engine or intake system

# now it is unknown wether it actually identifies the correct part.
# it does seem to identify the correct part in most cases, but it is hard to evaluate

succesful extractions: 62.34%
succesful identifications: 33.06%


## 7. Possible Extensions
#### Scoring
The current scoring method is very simple, but has a few biases. It rewards parts with more words, and it rewards parts with more related parts. This could be improved by using a more advanced scoring system. An example could be to define the initial score as: 
$$ \text{score}(c, e) = \frac{|\text{words}(c) \cap \text{words}(e)|}{|\text{words}(c)|} $$



In [None]:
def make_buzz_ranking(buzzwords):
    ranking = {}
    for buzzword in buzzwords:
        if buzzword in word_hints:
            for part in word_hints[buzzword]:
                if part in ranking:
                    ranking[part] += 100.0 / len(set(chain(word_hints[buzzword],buzzwords)))
                else:
                    ranking[part] = 100.0 / len(set(chain(word_hints[buzzword],buzzwords)))
    return ranking

#### Integration of action extractions
Currently we only use the problem extractions to identify parts. Every entry also has an action extraction. This could be used to further refine the linking, especially in cases where the symptoms were very broad, but a specific part was replaced.

The action extractions are very similar to the problem extractions, and could be used in the same way, or even together with the problem extractions.

#### Integrate Assembly and Section names into related words
Currently only the type and details columns are used to find related words. The assembly and section names could also be used for this. Some words are very clearly mentioned in the Assembly names, such as Intake.