Define and implement an algorithm to find a translation, given a sentence in the source language. That is, you should try to find

E* = argmaxE P(E|F)

In plain words, for a given source-language sentence F, we want to find the English-language sentence E that has the highest probability according to the probabilistic model we have discussed. Using machine translation jargon, we call this algorithm the "decoder." In practice, you can't solve this problem exactly and you'll have to come up with some sort of approximation.

Exemplify how this algorithm works by showing the result of applying your translation system to a short sentence from the source language.

As mentioned, it is expected that you will need to introduce a number of assumptions to make this at all feasible. Please explain all simplifying assumptions that you have made, and the impact you think that they will have on the quality of translations. But why is it an algorithmically difficult problem to find the English sentence that has the highest probability in our model?

In [6]:
def greedy_search(source_sentence, translation_probs):
    # Initialize an empty list to store the translated words
    translated_sentence = []

    # Iterate through each word in the source sentence
    for source_word in source_sentence:
        # Find the English word with the highest probability given the source word
        english_word = max(translation_probs[source_word], key=translation_probs[source_word].get)

        # Add the selected English word to the translated sentence
        translated_sentence.append(english_word)

    return translated_sentence

# Example usage:
# Assuming translation_probs is a dictionary with translation probabilities
# where translation_probs[source_word][english_word] represents the probability of translating source_word to english_word
translation_probs = {
    'jag': {'i': 0.8, 'me': 0.2},
    'förklarar': {'explain': 0.9, 'declare': 0.1}
    # ... other source-target word pairs
}

source_sentence = ['jag', 'förklarar']  # Replace with your actual source sentence

result = greedy_search(source_sentence, translation_probs)
print("Source Sentence:", source_sentence)
print("Translated Sentence:", result)


Source Sentence: ['jag', 'förklarar']
Translated Sentence: ['i', 'explain']


The probability of a long_sentence is much smaller than that of a short sentence as longer sentences result in more unique combinations of words. The model might not have encountered many of these combinations during training, resulting in a lower probability. Moreover, since the probabilities of each word are multiplied with each other, the longer the sequence, the lower the probability.

A possible modification to the code could be building Makemore, which treats each character as a token and predicts the next one. This increases the number of inputs by referencing the inputs it was trained on to allow the model to encounter many combinations to prevent data sparsity.
