# Keyword Deletion

Extract keywords using [Yake!](https://doi.org/10.1016/j.ins.2019.09.013)

Gap keywords in the Cloze passsage. If a keyword appears multiple times, gap all instances of the keyword.

In [1]:
import yake

In [17]:
args = {
    "lan": "en",  # language
    "n": 1,  # max ngram size
    "dedupLim": 0.9,  # deduplication threshold theta
    "dedupFunc": "seqm",
    "windowsSize": 1,
    "top": 6,  # Num keywords to return
}

extractor = yake.KeywordExtractor(**args)

In [31]:
def gap(text, min_distance=30):
    """
    Args:
      text (str): input passage to be gapped
      min_distance (int): minimum number of characters between gap start indexes
    """
    extractor = yake.KeywordExtractor(n=1, top=10)

    # Get keywords with scores
    kws = extractor.extract_keywords(text)

    # Sort keywords by score (higher score = more important)
    sorted_kws = sorted(kws, key=lambda x: x[1], reverse=True)

    # Keep track of positions where we've already gapped words
    gapped_positions = []
    answers = []

    # Process each keyword
    for kw, score in sorted_kws:
        # Find all occurrences of the keyword
        start_idx = 0
        while start_idx < len(text):
            idx = text.find(kw, start_idx)
            if idx == -1:
                break

            # Check if this position is too close to any existing gap
            too_close = False
            for pos, length in gapped_positions:
                if abs(idx - pos) < min_distance:
                    too_close = True
                    break

            # If not too close, gap this occurrence
            if not too_close:
                text = text[:idx] + "_" * len(kw) + text[idx + len(kw) :]
                gapped_positions.append((idx, len(kw)))
                answers.append(kw)

            # Move to next potential occurrence
            start_idx = idx + 1

    return text, answers

In [32]:
text = """The Cloze procedure, first introduced by Taylor, is a widely used method for creating reading 
comprehension tests inspired by the Gestalt principle of closure. Though many variations have been 
introduced and studied, the core concept is to mask words in prose and task the subject with providing 
the missing words."""

text = """Embarking on an international assignment, whether for work or study, entails navigating a complex 
landscape of emotional and cultural challenges. Initially marked by intrigue and excitement, expatriates often 
face culture shock and a period of adjustment before embracing their host culture. This journey necessitates 
meticulous preparation akin to other significant life changes, emphasizing the importance of adaptability, 
language proficiency, and cultural understanding. Successful expatriates are those who, rather than succumbing 
to frustration, leverage these experiences to enhance their personal and professional growth. The process of 
acculturation involves various emotional stages, including initial elation, culture shock, and eventual acceptance, 
followed by the challenges of reentry into one's native culture. Despite the potential for early termination of 
assignments due to family or personal issues, careful consideration and preparation can mitigate these risks, 
making international experience a valuable asset both personally and professionally."""

cloze_text, answers = gap(text)
print("Cloze text:")
print(cloze_text)

[('culture', 0.09336050545993559), ('Embarking', 0.13410709245603725), ('study', 0.13410709245603725), ('entails', 0.13410709245603725), ('cultural', 0.14330654539367615), ('work', 0.16051495746922928)]
Cloze text:
_________ on an international assignment, whether for ____ or study, entails navigating a complex 
landscape of emotional and ________ challenges. Initially marked by intrigue and excitement, expatriates often 
face _______ shock and a period of adjustment before embracing their host _______. This journey necessitates 
meticulous preparation akin to other significant life changes, emphasizing the importance of adaptability, 
language proficiency, and ________ understanding. Successful expatriates are those who, rather than succumbing 
to frustration, leverage these experiences to enhance their personal and professional growth. The process of 
acculturation involves various emotional stages, including initial elation, _______ shock, and eventual acceptance, 
followed by the c