#### Assignment 9
To implement a Markov chain working with whole words, we can use a sparse matrix to represent the transition probabilities between word sequences. In this example, I'll use the scipy.sparse.dok_matrix for a sparse dictionary matrix.

In [6]:
import random
from scipy.sparse import dok_matrix
from collections import defaultdict

def generate_markov_chain(text, k=2):
    words = text.split()
    
    # Create a sparse dictionary matrix to store transition probabilities
    transition_matrix = dok_matrix((len(set(words)), len(set(words))), dtype=float)

    # Create a dictionary to store word occurrences
    word_count = defaultdict(int)

    # Populate the transition matrix and word occurrences
    for i in range(len(words) - k):
        current_state = tuple(words[i:i+k])  # Use tuple for multi-word keys
        next_state = words[i+k]
        
        transition_matrix[current_state, next_state] += 1
        word_count[current_state] += 1

    # Normalize transition probabilities
    for key in transition_matrix.keys():
        transition_matrix[key] /= word_count[key[0]]

    return transition_matrix

def generate_sequence(markov_chain, seed, length=10):
    sequence = list(seed)
    current_state = tuple(seed)

    for _ in range(length - 1):
        next_state_probs = dict(markov_chain.get(current_state, {}))
        if not next_state_probs:
            break

        next_state = random.choices(list(next_state_probs.keys()), weights=list(next_state_probs.values()))[0]
        sequence.append(next_state)
        current_state = tuple(sequence[-k:])

    return ' '.join(sequence)

# Example usage:
input_text = "This is a sample text for testing the Markov chain implementation. You can replace it with your own text."
k_value = 2

# Generate Markov chain
markov_chain = generate_markov_chain(input_text, k=k_value)

# Generate a sequence
seed_sequence = random.choice(list(markov_chain.keys()))
generated_sequence = generate_sequence(markov_chain, seed_sequence, length=10)

# Print the results
print(f"Generated Markov Chain (k={k_value}):\n{markov_chain}\n")
print(f"Generated Sequence starting with '{seed_sequence}':\n{generated_sequence}")


UFuncTypeError: ufunc 'maximum' did not contain a loop with signature matching types (dtype('<U4'), dtype('<U4')) -> None

In this implementation:

generate_markov_chain takes a text and a parameter k (the number of words in a sequence) and creates a Markov chain represented by a sparse matrix.
generate_sequence generates a sequence of words based on the Markov chain, starting from a given seed sequence.
You can adjust the k_value and the length parameter in the generate_sequence function to control the length of the generated sequence.