# Markov Chains

The idea is to build a table that captures the words after a specific word, but we also need the probability for each word. As a first solution we can just save all the words, repeating them, that in the end is like a probability because if one is in the list more often it has more chance to be chosen. This is space iniefficient, but it is fine for the first approach. 

Inspired from: http://www.cyber-omelette.com/2017/01/markov.html

Steps: 
1. We read the text and remove uneccessary characters
2. We generate the chain (dictionary) with the words, and the links to the next word
3. We generate new text by starting on a specific word

In [172]:
import random

# Function that creates the dictionary with the words and frequencies
def generate_model(text):
    i=0
    chain={}
    words=text.split()
    max_length = len(words)
    for word in words:        
        # Checking for the end of the text
        if ((i+1) < max_length):
            if word in chain:
                chain[word].append(words[i+1])
            else:
                chain[word] = [words[i+1]]
            i+=1
        else:
            chain[word] = [""]
    return chain
   
# Function to get the initial state for the phrase.
# 1st approach: random
# 2nd approach: First random word that starts with Capital letter
def get_initial_state(model):
    # Random
#     initial_state=list(model)[random.randint(0,len(model)-1)]
    
    # Capital letter
    initial_state=" "
    while initial_state[0].isupper()==False:
        initial_state = list(model)[random.randint(0,len(model)-1)]
        
    return initial_state
    
# Function that generates the text. Question is: Where do we start? 
# Let's start with a random word. Then we can change that. 
# Initial state can be: A word that starts with a capital letter
def generate_text(model, length):
    initial_state=get_initial_state(model)
    text=[initial_state]
    current_state=initial_state
    for i in range(0,length-1):
        list_of_words = model[current_state]
        word = list_of_words[random.randint(0,len(list_of_words)-1)]
        text.append(word)
        current_state=word
    return text
  
def read_file(file):
    with open(file, 'r') as myfile:
        data = myfile.read()
        data=data.replace(',', '')
        data=data.replace('\n',' ')
        data=data.replace('   ', ' ')
        data=data.replace('«', '')
        data=data.replace('»', '')
    return data


In [173]:
# Example with Silvia Federici: Caliban Y La Bruja, Spanish

text = read_file("texts/federici.txt")
desired_length = 100
if __name__ == "__main__":
    model = generate_model(text)
    generated_text = generate_text(model,desired_length)
    print(" ".join(generated_text))

Pobres de su independencia. Así si en la población caía y la persona. Su promesa de tres de las calles así destinado a la procreación y el siglo XI fueron puestas en 1280 los movimientos milenaristas de mestizos debilitaba el hecho en el nuevo discurso filosófico y confinada a pecar con la bestia. Grabado de esta función social (Simmel 1978). Pero más como guías prácticas durante una recompensa sustancial. La Jacquerie La Revolución Industrial. Tal y psicológica reaparece en África Asia y cada vez más tarde en el acceso a la escena esta carta sexual siguiendo a nivel de herencia


In [186]:
# Fabrizio Moro: Italian singer
text = read_file("texts/fabrizio_moro.txt")
desired_length = 50
if __name__ == "__main__":
    model = generate_model(text)
    generated_text = generate_text(model,desired_length)
    print(" ".join(generated_text))

Viva viva un’emozione ormai che è normale Ora basta devi lavorare Vorrei ci passo... sotto il sole Dai che ora è? E ora so controllare non dici niente nel cinquanta e io ne sa che ti sposi e adesso hai settant'anni e i miei sogni nel cinquanta e non dici
