# Text Generation Using First-Order Markov Chains

## Step 1: Importing Necessary Libraries

To build our Markov chain model, we need a couple of tools from Python's standard library.

* `random`: This library is essential for the text generation phase. When a word has multiple possible words that can follow it, `random.choice()` will help us pick one at random.
* `collections.defaultdict`: This is a specialized dictionary. It simplifies our code by automatically creating a default value (in our case, an empty list) for any key that we try to access but doesn't exist yet. This is perfect for adding new words to our chain.

In [None]:
!pip install requirements.txt

In [22]:
import random
from collections import defaultdict

## Step 2: Defining the Corpus

Every language model needs data to learn from. The text below, `building_text`, will serve as our **corpus**. The model will analyze this paragraph to understand the relationships between consecutive words. The structure, tone, and vocabulary of this corpus will directly influence the generated output. A larger and more diverse corpus would result in more varied and coherent generated text.

In [23]:
building_text = """
The sun dipped below the horizon,
casting a warm golden glow over the tranquil lake.
The water shimmered like a thousand diamonds,
reflecting the hues of orange and pink that painted the sky.
A gentle breeze rustled the leaves of the nearby trees,
creating a soothing melody that echoed through the quiet evening.
In that moment, time seemed to stand still,
inviting anyone who witnessed it to pause and appreciate the beauty of nature.
"""

## Step 3: Building the Markov Chain Model

This is the core of our model. The `build_markov_chain` function takes the corpus text and transforms it into a structured data format that represents the Markov chain.

Here's how it works:
1.  The input text is split into a list of individual words (tokens).
2.  A `defaultdict` is initialized to store the chain.
3.  The function iterates through the list of words, looking at each word (`curr_word`) and the word that immediately follows it (`next_word`).
4.  It then populates the dictionary, using the current word as a key and appending the next word to the list of its possible followers.

For example, given the phrase "The sun dipped...", the model creates an entry: `{"sun": ["dipped"]}`.

In [24]:
def build_markov_chain(text):
    words = text.split()
    markov_chain = defaultdict(list)
    
    #Looping through the words and map each word to the word following it
    for i in range(len(words) - 1):
        curr_word = words[i]
        next_word = words[i + 1]
        markov_chain[curr_word].append(next_word)
        
    return markov_chain

## Step 4: Defining the Text Generation Function

Once the Markov chain is built, we need a function to generate new text from it. The `generate_text` function handles this process.

Its logic is as follows:
1.  It starts with a given `start_word`.
2.  It enters a loop that runs for the desired number of words (`num_words`).
3.  In each iteration, it looks up the `current_word` in our Markov chain dictionary to get the list of possible next words.
4.  If no followers exist (i.e., the word was at the end of the corpus), the generation stops.
5.  Otherwise, it uses `random.choice()` to select one of the next words. This selected word becomes the `current_word` for the next loop iteration.
6.  Each chosen word is added to an output list, which is finally joined into a single string.

In [25]:
def generate_text(chain, start_word, num_words=100):
    if start_word not in chain:
        print(f"Start word '{start_word}' not found in chain.")
        return ""
    
    current_word = start_word
    output = [current_word]
    
    for _ in range(num_words - 1):
        next_words = chain.get(current_word, [])
        if not next_words:
            break
        current_word = random.choice(next_words)
        output.append(current_word)
    
    return ' '.join(output)

## Step 5: Generating the Final Output

Now it's time to put everything together and see the result.

1.  **Build the Model**: We first call `build_markov_chain()` with our corpus (`building_text`) to create the `markov_model`.
2.  **Set Parameters**: We choose a `start_word` (which must exist in the original text) and set the desired length of the output with `num_words`.
3.  **Generate Text**: We call the `generate_text()` function with the model and our parameters.
4.  **Print the Result**: The final generated sentence is printed.

Because the process involves random choices, you will likely get a different (and often nonsensical) sentence each time you run the final cell!

In [26]:
#Building the Markov chain from paragraph text(building_text)
markov_model = build_markov_chain(building_text)

#Generating text starting from a chosen word(Any word in the building_text paragraph)
generated_sentence = generate_text(markov_model, start_word="sun", num_words=15)

print("Generated Sentence:\n", generated_sentence)

Generated Sentence:
 sun dipped below the hues of orange and appreciate the quiet evening. In that moment,
