# Word Segmentation and Association

![](../figs/intro_nlp/words/entelecheia_associaltion_vs_segmentation.png)

## Word Segmentation

- **Word segmentation** is the task of splitting a string of characters into words.
- Word segmentation is important for a machine to understand the meaning of a sentence.
- In English, we can split a string of characters into words by spaces.
- However, in languages like Chinese and Janpanese, there is no space between words.  
- Even in English, there are some cases where no space is used between words.
- Humans can easily segment a string of characters into words, even though there is no space between words.
- For example, we can easily segment the string of characters `Ilikechocolate` into words `I like chocolate`.

## Why should we segment words?

There are many applications that require word segmentation, even in English.

- Normalizing English compound nouns that are variably written for search engines.
  - For example, `ice cream` and `ice-cream` should be segmented into `icecream`.
- Word segmentation for compounds: Both orginal words and split words should be in the dictionary.
- Typing errors may be corrected by word segmentation.
- Conversion errors: During conversion, some spaces may be lost.
- OCR errors: OCRed text may contain errors.
- Keyword extraction from URL addresses, domain names, table column description or programming variables that are written without spaces.
- For password analysis, the extraction of terms from passwords can be required.
- Automatic CamelCasing of programming variables.
- Speech recognition: Speech recognition systems may not properly recognize spaces between words.


## Generating segment variants

We can generate all possible segment variants of a string of characters. Each distinct segment variant is called a **composition**.

- En a string of length $n$, there are $n-1$ possible positions to split the string.
- Each of the $n-1$ positions can be used as word boundary.
- Therefore, there are $2^{n-1}$ possible compositions.

The compositions have to be evaluated to find the best segmentation.

- The best segmentation is the one that has the highest probability.


### Naive Recursive Algorithm

- The naive recursive algorithm is to generate all possible compositions and evaluate them.
- The time complexity of the naive recursive algorithm is $O(2^n)$.
- The naive recursive algorithm is not efficient for long strings.

In [6]:
from pprint import pprint

def segment_naive(string):
    if not string:
        return []
    else:
        return [[string]] + [
            [string[:i]] + rest
            for i in range(1, len(string))
            for rest in segment_naive(string[i:])
        ]

In [7]:
pprint(segment_naive("isit"))

[['isit'],
 ['i', 'sit'],
 ['i', 's', 'it'],
 ['i', 's', 'i', 't'],
 ['i', 'si', 't'],
 ['is', 'it'],
 ['is', 'i', 't'],
 ['isi', 't']]


In [8]:
pprint(segment_naive("가방에"))

[['가방에'], ['가', '방에'], ['가', '방', '에'], ['가방', '에']]


In [10]:
text = "thisislongtext"
print(len(text), len(segment_naive(text)))

14 8192


In [13]:
text = "아버지가방에들어가신다" # Father goes into the bag or Father enters the room
print(len(text), len(segment_naive(text)))

11 1024


### Dynamic Programming

- Dynamic programming is a technique to solve a problem by breaking it into subproblems and storing the results of subproblems to avoid computing the same results again.
- The time complexity of dynamic programming is $O(n)$.
- For long strings, dynamic programming is much more efficient than the naive recursive algorithm.

```python
def segment(string, dictionary):
    if not string:
        return []
    for end in range(1, len(string) + 1):
        first, rest = string[:end], string[end:]
        if first in dictionary:
            return [first] + segment(rest, dictionary)
    return [string]
```

### Triangular Matrix

- The dynamic programming algorithm can be implemented using a triangular matrix.
- The tryangular matrix algorithm uses nested loops and a circular buffer to store the results of subproblems.
- A triangular matrix of parts with increasing length is generated and organized in a circular buffer.
- This allows a constant amount of memory to be used for the algorithm.


### Unknown Words

- We can not rely on the dictionary to segment all words.
- There are uncommon words, new words, misspelled words, foreign words, proper nouns, slang words, etc.
- Even in these cases, we want to segment the words into meaningful parts.
- Therefore, we have to estimate the probability of any possible segmentation.

## Evaluation of Compositions

- Generally, we can evaluate a composition by calculating the probability of the composition.
- Word probabilities can be estimated from a corpus:

    $$
    P(w_i) = \frac{c(w_i)}{N}
    $$

    where $c(w_i)$ is the count of word $w_i$ and $N$ is the total number of words in the corpus.

- However, for unkonwn words, we have to use other criteria to evaluate the composition.
- At word boundary, the uncertainty of the segmentation increases.
- By measuring the uncertainty, we can evaluate the composition.

### Uncertainty of word boundaries

- The uncertainty of word boundaries can be measured by the entropy of the word boundary.
- Harris, 1970 said that if the uncertainty of successive tokens increases, the location is a word boundary.
- Feng et al., 2004 proposed a statistical criterion called accessor variety (AV) to measure how likely a sub-sequence is a word, and then to find the best segmentation pattern that maximizes a target function of accessor variety and the length of the sub-sequence as variants. 
- Jin and TanakaIshii, 2006 proposed branch entropy as another criterion for unsupervised segmentation.
- Both criteria share a similar assumption as in the fundamental work by Harris, 1970, that the uncertainty of successive tokens increases at word boundaries.
- The latter is the countinous version of the former.


![](../figs/intro_nlp/words/branching_entropy_uncertainty.png)


### Accessor Variety

- The accessor variety (AV) defines that the uncertainty of a sub-sequence is the number of different words that can be formed by adding a sub-sequence to the sub-sequence.
- For the right-side accessor variety, it is the number of different words that can be formed by adding a sub-sequence to the right side of the sub-sequence.
- For the following sub-sequence, the right-side accessor variety of `hope` is 2, because `hope` can be followed by `less` or `fully`.

    ```
    "hopeful": 100
    "hopeless": 80
    ```
- The left-side accessor variety is the number of different words that can be formed by adding a sub-sequence to the left side of the sub-sequence.
- For example, the left-side accessor variety of `less` is 3, because `hopeless`, `useless`, and `pointless` can be formed by adding `less` to the left side of `less`.

    ```
    "hopeless": 80
    "unless": 160
    "pointless": 70
    ```
- Depending on the language, the left-side accessor variety or the right-side accessor variety may be more suitable for segmentation.
- Threshold values can be used to determine the word boundaries.
- The threshold values can be determined by the corpus.

### Branch Entropy

- The branch entropy is defined as the entropy of the distribution of the number of words that can be formed by adding a single character to the end of a sub-sequence.

    $$
    \text{BE}(w|c) = -\sum_{i=1}^n p_i \log p_i
    $$

    where $p_i$ is the probability of the number of words that can be formed by adding a single character to the end of a sub-sequence $w$ and $c$ is the character.

- As in the case of accessor variety, the branch entropy can be calculated for the left-side and the right-side.


### Cohesion Probability

- The accesor variety and the branch entropy determine the boundary of a word by measuring the uncertainty of the word boundary, that is, the exterior boundary of a word.
- Unlike the accesor variety and the branch entropy, the cohesion probability determines the boundary of a word by measuring the association among characters inside a word, that is, the interior boundary of a word.
- The cohesion probability is defined as the probability of a sequence of n characters given the first n-1 characters.

    $$
    cohesion(c_1, c_2, \cdots, c_n) = \sqrt[n-1]{\prod_{i=1}^{n-1} P(c_1, c_2, \cdots, c_{i+1}|c_1, c_2, \cdots, c_i)}
    $$

    where $P(c_1, c_2 | c_1) = \frac{count(c_1, c_2)}{count(c_1)}$.

    Therefore, the formula can be simplified as:

    $$
    cohesion(c_1, c_2, \cdots, c_n) = \sqrt[n-1]{\prod_{i=1}^{n-1} \frac{count(c_1, c_2, \cdots, c_{i+1})}{count(c_1, c_2, \cdots, c_i)}} \\ = \sqrt[n-1]{\frac{count(c_1, c_2, \cdots, c_n)}{count(c_1)}}
    $$

- The above formula assumes that characters are assoiciated in the forward direction.
- For the backward direction, the formula is:

    $$
    cohesion(c_1, c_2, \cdots, c_n) = \sqrt[n-1]{\prod_{i=1}^{n-1} \frac{count(c_{i}, c_{i+1}, \cdots, c_n)}{count(c_{i+1}, c_{i+2}, \cdots, c_{n})}} \\ = \sqrt[n-1]{\frac{count(c_1, c_2, \cdots, c_n)}{count(c_n)}}
    $$

    where $count(c_{i+1}, c_{i+2}, \cdots, c_n)$ is the count of the sequence of characters from $c_{i+1}$ to $c_n$.

### Maximum Matching Algorithm

- If we have all known words in a dictionary, we can use the maximum matching algorithm to segment a sentence.
- The maximum matching algorithm is a greedy algorithm that finds the longest matching word from the dictionary.
- The algorithm is as follows:

    1. Find the longest matching word from the dictionary.
    2. If the word is found, add the word to the result and remove the word from the input.
    3. If the word is not found, add the first character to the result and remove the first character from the input.
    4. Repeat 1-3 until the input is empty.

## Word Segmentation in Practice

In [2]:
from ekorpkit import eKonf

eKonf.setLogger("WARNING")

cfg = eKonf.compose("corpus")
cfg.name = "fomc"
cfg.path.cache.uri = (
    "https://github.com/entelecheia/ekorpkit-book/raw/main/assets/data/fomc.zip"
)
cfg.data_dir = cfg.path.cached_path
cfg.auto.merge = True
fomc_corpus = eKonf.instantiate(cfg)
print(fomc_corpus)
texts = fomc_corpus.data[fomc_corpus.data.content_type == "fomc_statement"].text.tolist()

Corpus : fomc


In [90]:
import re
import collections


def pre_tokenize(text, lowercase=True, whitespace_token=" "):
    if lowercase:
        text = text.lower()
    text = re.sub(r"\s+", whitespace_token, text)
    return text


print(pre_tokenize(texts[0]))
text = pre_tokenize(" ".join(texts))
word_freqs = collections.Counter(text.split())


chairman alan greenspan announced today that the federal open market committee decided to increase slightly the degree of pressure on reserve positions. the action is expected to be associated with a small increase in short-term money market interest rates. the decision was taken to move toward a less accommodative stance in monetary policy in order to sustain and enhance the economic expansion. chairman greenspan decided to announce this action immediately so as to avoid any misunderstanding of the committee's purposes, given the fact that this is the first firming of reserve market conditions by the committee since early 1989.


In [91]:
def initialize_subwords(word_freqs, verbose=True):
    character_freqs = collections.defaultdict(int)
    subwords_freqs = collections.defaultdict(int)
    for word, freq in word_freqs.items():
        for i in range(len(word)):
            character_freqs[word[i]] += freq
            # Loop through the subwords of length at least 2
            for j in range(i + 2, len(word) + 1):
                subwords_freqs[word[i:j]] += freq

    # Sort subwords by frequency
    sorted_subwords = sorted(subwords_freqs.items(), key=lambda x: x[1], reverse=True)
    if verbose:
        print("Top 10 subwords: {}".format(sorted_subwords[:10]))
    return sorted_subwords, character_freqs


sorted_subwords, characters = initialize_subwords(word_freqs)


Top 10 subwords: [('in', 19653), ('th', 17728), ('on', 17156), ('an', 15227), ('er', 13990), ('at', 13073), ('re', 12861), ('ti', 12781), ('he', 12611), ('the', 11984)]


In [92]:
tokens = list(characters.items()) + sorted_subwords
tokens = {token: freq for token, freq in tokens}
tokens = collections.Counter(tokens)


In [93]:
class Trie:
    def __init__(self, end_symbol="<END>"):
        self.root = {}
        self.end_symbol = end_symbol

    def add(self, word, value):
        node = self.root
        for ch in word:
            if ch not in node:
                node[ch] = {}
            node = node[ch]
        node[self.end_symbol] = value

    def get_value(self, word):
        node = self.root
        for ch in word:
            if ch not in node:
                return 0
            node = node[ch]
        if self.end_symbol not in node:
            return 0
        return node[self.end_symbol]

    def set_value(self, word, value):
        node = self.root
        for ch in word:
            if ch not in node:
                raise ValueError("word not in trie")
            node = node[ch]
        if self.end_symbol not in node:
            raise ValueError("word not in trie")
        node[self.end_symbol] = value

In [94]:
from scipy.special import digamma

def initialize_trie(tokens):
    trie = Trie()
    norm = sum(list(tokens.values()))
    logsum = digamma(norm)

    maxlen = 0
    for tok, val in tokens.items():
        trie.add(tok, digamma(val) - logsum)
        maxlen = max(maxlen, len(tok))

    return trie, maxlen


In [95]:
trie, maxlen = initialize_trie(tokens)


In [225]:
import re

L = collections.defaultdict(int)
R = collections.defaultdict(int)
_aL = collections.defaultdict(int)
_aR = collections.defaultdict(int)

def normalize_word(word):
    # replace all non-alphanumeric characters at the end of the word with a space
    word = re.sub(r"[^a-zA-Z0-9]+$", " ", word)
    # replace all non-alphanumeric characters at the beginning of the word with a space
    word = re.sub(r"^[^a-zA-Z0-9]+", " ", word)
    return word.strip()

def pre_tokenize(text, lowercase=True, whitespace_token=" "):
    if lowercase:
        text = text.lower()
    text = re.sub(r"\s+", whitespace_token, text)
    return [normalize_word(word) for word in text.split() if len(normalize_word(word)) > 0]

# words = pre_tokenize(text).split()
all_words = pre_tokenize(" ".join(texts[:10]))

max_left_length = maxlen
max_right_length = maxlen

for word in all_words:
    
    if (not word) or (len(word) <= 1):
        continue
    word_len = len(word)
    for i in range(1, min(max_left_length + 1, word_len) + 1):
        L[word[:i]] += 1
    for i in range(1, min(max_right_length + 1, word_len)):
        R[word[-i:]] += 1

    for left_word, word, right_word in zip(
        [all_words[-1]] + all_words[:-1], all_words, all_words[1:] + [all_words[0]]
    ):
        # print(left_word, word, right_word)

        l_word = word[-i:]
        r_word = word[:i]
        word_len = len(word)
        _aL["%s %s" % (l_word, "▁")] += 1
        _aR["%s %s" % ("▁", r_word)] += 1
        for i in range(1, min(max_right_length + 1, word_len)):
            _aL["%s %s" % (l_word, right_word[0])] += 1
        for i in range(1, min(max_left_length + 1, word_len)):
            _aR["%s %s" % (left_word[-1], r_word)] += 1


In [226]:
print(len(_aL), len(_aR), len(L), len(R))

753 677 1125 897


In [227]:
import math


def entropy(dic):
    if not dic:
        return 0.0
    sum_ = sum(dic.values())
    entropy = 0
    if sum_ == 0:
        return 0.0
    for freq in dic.values():
        prob = float(freq) / sum_
        if prob > 0:
            entropy -= prob * math.log(prob)
    return -1 * entropy


def all_branching_entropy(
    get_score=entropy, verbose=True
):
    def parse_left(extension):
        return extension[:-1]

    def parse_right(extension):
        return extension[1:]

    def sort_by_length(counter):
        sorted_by_length = collections.defaultdict(lambda: [])
        for w in counter.keys():
            sorted_by_length[len(w)].append(w)
        return sorted_by_length

    def get_entropy_table(
        parse, sorted_by_length, sorted_by_length_a, max_length, counter, counter_a
    ):
        num_sum = sum((len(words) for length, words in sorted_by_length.items()))
        be = {}
        for word_len in range(2, max_length):
            words = sorted_by_length.get(word_len, [])
            extensions = collections.defaultdict(lambda: [])
            for word in words:
                extensions[parse(word)].append(word)
            words_ = sorted_by_length_a.get(word_len + 1, [])
            for word in words_:
                extensions[parse(word.replace(" ", ""))].append(word)
            for root_word, extension_words in extensions.items():
                extension_frequency = {
                    ext: counter_a.get(ext, 0) if " " in ext else counter.get(ext)
                    for ext in extension_words
                }
                be[root_word] = get_score(extension_frequency)
        return be

    def merge(be_l, be_r):
        be = {word: (v, be_r.get(word, 0)) for word, v in be_l.items()}
        for word, v in be_r.items():
            if word in be_l:
                continue
            be[word] = (0, v)
        return be

    be_l = get_entropy_table(
        parse_right,
        sort_by_length(R),
        sort_by_length(_aR),
        max_right_length + 1,
        R,
        _aR,
    )
    be_r = get_entropy_table(
        parse_left,
        sort_by_length(L),
        sort_by_length(_aL),
        max_left_length + 1,
        L,
        _aL,
    )
    be = merge(be_l, be_r)
    if verbose > 0:
        print_head = (
            "branching entropies" if get_score == entropy else "accessor variety"
        )
        print("\rall %s was computed # words = %d" % (print_head, len(be)))
    return be


In [228]:
import numpy as np

def get_words():
    words = {word for word in L.keys() if len(word) <= max_left_length}
    words.update({word for word in R.keys() if len(word) <= max_right_length})
    return words

def frequency(word):
    return (L.get(word, 0), R.get(word, 0))
    
def cohesion_score(word):
    word_len = len(word)
    if (not word) or (word_len <= 1):
        return (0, 0)
    l_freq, r_freq = map(float, frequency(word))
    l_cohesion = 0 if l_freq == 0 else np.power( (l_freq / L[word[0]]), (1 / (word_len - 1)) )
    r_cohesion = 0 if r_freq == 0 else np.power( (r_freq / R[word[-1]]), (1 / (word_len - 1)) )
    return (l_cohesion, r_cohesion)

def all_cohesion_scores(verbose=True):
    cps = {}
    words = get_words()
    for i, word in enumerate(words):
        # if (verbose > 0) and (i % verbose == 0):
        #     print('\r cohesion probabilities ... (%d in %d)' % (i+1, len(words)))
        cp = cohesion_score(word)
        if (cp[0] == 0) and (cp[1] == 0):
            continue
        cps[word] = cp
    if (verbose > 0):
        print('\rall cohesion probabilities was computed. # words = %d' % len(cps))
    return cps 

In [229]:
cps = all_cohesion_scores()
bes = all_branching_entropy()
avs = all_branching_entropy(get_score=lambda x: len(x))

all cohesion probabilities was computed. # words = 1929
all branching entropies was computed # words = 1615
all accessor variety was computed # words = 1615


In [230]:
Scores = collections.namedtuple('Scores', 'cohesion_forward cohesion_backward left_branching_entropy right_branching_entropy left_accessor_variety right_accessor_variety leftside_frequency rightside_frequency')


def word_scores(cps={}, bes={}, avs={}):
    scores = {}
    for word in get_words():
        cp = cps.get(word, (0, 0))
        be = bes.get(word, (0, 0))
        av = avs.get(word, (0, 0))
        scores[word] = Scores(cp[0], cp[1], be[0], be[1], av[0], av[1], L.get(word, 0), R.get(word, 0))
    return scores

In [231]:
def extract(
    scores=None,
    min_frequency=3,
    min_cohesion_forward=0.05,
    min_cohesion_backward=0.0,
    max_droprate_cohesion=0.98,
    max_droprate_leftside_frequency=0.98,
    min_left_branching_entropy=0.5,
    min_right_branching_entropy=0.5,
    min_left_accessor_variety=0,
    min_right_accessor_variety=0,
    min_word_length=2,
    remove_subwords=True,
):
    if not scores:
        scores = word_scores()
    scores_ = {}
    for word, score in sorted(scores.items(), key=lambda x: len(x[0])):
        if (
            (score.left_branching_entropy > min_left_branching_entropy)
            and (score.right_branching_entropy > min_right_branching_entropy)
            # or (score.left_accessor_variety < min_left_accessor_variety)
            # or (score.right_accessor_variety < min_right_accessor_variety)
            or (
                max(score.leftside_frequency, score.rightside_frequency) < min_frequency
            )
        ):
            continue
        if (len(word) >= 2) and (
            (score.cohesion_forward < min_cohesion_forward)
            or (score.cohesion_backward < min_cohesion_backward)
        ):
            continue
        if len(word) < min_word_length:
            continue
        scores_[word] = score
        if not remove_subwords:
            continue
        subword = word[:-1]
        droprate_leftside_frequency = (
            0 if not (subword in L) else score.leftside_frequency / L[subword]
        )
        if (droprate_leftside_frequency > max_droprate_leftside_frequency) and (
            subword in scores_
        ):
            del scores_[subword]
    return scores_


In [232]:
words = extract(word_scores(cps=cps, bes=bes, avs=avs))

In [240]:
def word_score(score):
    return score.cohesion_forward * math.exp(score.right_branching_entropy)

for word, score in sorted(words.items(), key=lambda x: word_score(x[1]), reverse=True):
    if word in all_words:
        continue
    print(word, score)


5- Scores(cohesion_forward=1.0, cohesion_backward=0, left_branching_entropy=0, right_branching_entropy=-0.3250829733914482, left_accessor_variety=0, right_accessor_variety=2, leftside_frequency=10, rightside_frequency=0)
ne Scores(cohesion_forward=0.9090909090909091, cohesion_backward=0.006666666666666667, left_branching_entropy=-0.0, right_branching_entropy=-0.5004024235381879, left_accessor_variety=1, right_accessor_variety=2, leftside_frequency=10, rightside_frequency=2)
ha Scores(cohesion_forward=0.7777777777777778, cohesion_backward=0, left_branching_entropy=0, right_branching_entropy=-0.410116318288409, left_accessor_variety=0, right_accessor_variety=2, leftside_frequency=7, rightside_frequency=0)
5-1/ Scores(cohesion_forward=0.9654893846056297, cohesion_backward=0, left_branching_entropy=-0.5004024235381879, right_branching_entropy=-0.6365141682948128, left_accessor_variety=2, right_accessor_variety=2, leftside_frequency=9, rightside_frequency=0)
expect Scores(cohesion_forward=0

In [243]:
def word_score(score):
    return score.cohesion_backward * math.exp(
        score.left_branching_entropy
    ) + score.cohesion_forward * math.exp(score.right_branching_entropy)


for word, score in sorted(words.items(), key=lambda x: word_score(x[1]), reverse=True):
    if word in all_words:
        print(f" - {word}")
        continue
    print(word, score)

 - inflationary
 - utilization
 - york
 - 1/4
 - 5-1/4
 - greenspan
 - discount
 - 25
 - committee
 - percent
 - reserve
 - sustainable
 - philadelphia
 - submitted
 - governors
 - expected
 - directors
 - minneapolis
 - effective
 - expansion
 - louis
 - pressure
 - contained
 - keep
 - inflation
 - immediately
 - 5-1/2
 - immediate
 - depository
 - institutions
 - conditions
 - consistent
5- Scores(cohesion_forward=1.0, cohesion_backward=0, left_branching_entropy=0, right_branching_entropy=-0.3250829733914482, left_accessor_variety=0, right_accessor_variety=2, leftside_frequency=10, rightside_frequency=0)
 - reduction
 - short-term
 - announced
 - positions
 - interest
 - moderating
 - slightly
 - increase
 - reflected
 - announce
 - requests
 - following
 - district
 - potential
 - associated
 - markets
 - 1/2
 - continue
 - charged
 - actions
 - boards
 - easing
th Scores(cohesion_forward=0.6974789915966386, cohesion_backward=0.7777777777777778, left_branching_entropy=-1.0235570898

In [242]:
def word_score(score):
    return score.cohesion_backward * math.exp(score.left_branching_entropy)


for word, score in sorted(words.items(), key=lambda x: word_score(x[1]), reverse=True):
    if word in all_words:
        continue
    print(word, score)


al Scores(cohesion_forward=0.07971014492753623, cohesion_backward=0.8653846153846154, left_branching_entropy=-0.39267446722755217, right_branching_entropy=-1.4010394588692539, left_accessor_variety=2, right_accessor_variety=10, leftside_frequency=11, rightside_frequency=45)
tent Scores(cohesion_forward=0.1613643827228219, cohesion_backward=0.30792608099883834, left_branching_entropy=-0.0, right_branching_entropy=-0.0, left_accessor_variety=1, right_accessor_variety=1, leftside_frequency=1, rightside_frequency=4)
th Scores(cohesion_forward=0.6974789915966386, cohesion_backward=0.7777777777777778, left_branching_entropy=-1.0235570898907984, right_branching_entropy=-0.6686752780804099, left_accessor_variety=7, right_accessor_variety=4, leftside_frequency=166, rightside_frequency=21)
all Scores(cohesion_forward=0.08512565307587486, cohesion_backward=0.2401922307076307, left_branching_entropy=-0.0, right_branching_entropy=-0.5009153717361616, left_accessor_variety=1, right_accessor_variety=

## References

- [Uncertanty to word boundary; Accessor Variety & Branching Entropy](https://lovit.github.io/nlp/2018/04/09/branching_entropy_accessor_variety/)
- [Fast Word Segmentation of Noisy Text](https://medium.com/towards-data-science/fast-word-segmentation-for-noisy-text-2c2c41f9e8da)