## $ \color{blue}{\text {Welcome to AI: Critical Principles & Strategy!} } $

### $\color{purple}{\text{This course is a subset of a}}$ [3 credit graduate level course on Artificial Intelligence Strategy at Rutgers University](https://bloustein.rutgers.edu/graduate/public-informatics/mpi/)

Connect to Faculty: [@ Jim Samuel](https://twitter.com/jimsamuel/)  ----  https://twitter.com/jimsamuel/

---

[Please see the copyright statement below at the end of the notebook.](#ethics)

## $ \color{purple}{\text {Text Generation} } $

In this notebook, we will use Python along with libraries such as NLTK (Natural Language Toolkit) to generate text based on existing textual data. We will start by cleaning the text data and then explore methods to generate sentences from the given data.



In [10]:
!pip install nltk

You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.[0m


#### Importing Libraries

The below cell is used to download the 'punkt' tokenizer from the NLTK library. Tokenization is a crucial step in NLP that breaks down the text into smaller units (tokens) for further analysis.

In [11]:
import requests
import nltk
nltk.download('punkt')
import pandas as pd
from numpy.random import choice

[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


#### Reading input file to generate text  you can import your own file and use it.

In [12]:
with open("11-0.txt") as f:
    txt=f.read()

#### Cleaning and tokenizing text
The cell is to process the text data. It converts the text to lowercase and tokenizes it using the NLTK tokenizer. The resulting tokens are stored in a Pandas DataFrame.The script then creates a new DataFrame data containing unique tokens from the raw data, essentially removing duplicates.


In [13]:
def clean_data(text):
    text = text.lower()
    data = nltk.word_tokenize(text)
    data = pd.DataFrame(data, columns = ['tokens'])
    #data = pd.DataFrame(list(text), columns = ['tokens'])
    return data

raw_data = clean_data(txt)
data = pd.DataFrame(raw_data['tokens'].unique())

In [14]:
txt[1:100]

'The Project Gutenberg eBook of Alice’s Adventures in Wonderland, by Lewis Carroll\n\nThis eBook is fo'

In [15]:
txt[1:5000]

'The Project Gutenberg eBook of Alice’s Adventures in Wonderland, by Lewis Carroll\n\nThis eBook is for the use of anyone anywhere in the United States and\nmost other parts of the world at no cost and with almost no restrictions\nwhatsoever. You may copy it, give it away or re-use it under the terms\nof the Project Gutenberg License included with this eBook or online at\nwww.gutenberg.org. If you are not located in the United States, you\nwill have to check the laws of the country where you are located before\nusing this eBook.\n\nTitle: Alice’s Adventures in Wonderland\n\nAuthor: Lewis Carroll\n\nRelease Date: January, 1991 [eBook #11]\n[Most recently updated: October 12, 2020]\n\nLanguage: English\n\nCharacter set encoding: UTF-8\n\nProduced by: Arthur DiBianca and David Widger\n\n*** START OF THE PROJECT GUTENBERG EBOOK ALICE’S ADVENTURES IN WONDERLAND ***\n\n[Illustration]\n\n\n\n\nAlice’s Adventures in Wonderland\n\nby Lewis Carroll\n\nTHE MILLENNIUM FULCRUM EDITION 3.0\n\nConten

Text Generation Functions: There are three functions defined in the notebook:
* get_probabilities: This function takes a word as input and calculates the probabilities of words that appear after it in the text.
* pick: Given a starting word, this function picks the next word based on their probabilities of occurrence after the starting word.
* make_sentence: This function generates a sentence using the pick function, starting from a given seed word.

In [16]:
def get_probabilities(word):
    mask = raw_data['tokens'] == word
    probabilities = raw_data[mask.shift(1).fillna(False)]['tokens'].value_counts()
    
    return probabilities/probabilities.sum()

def pick(start_word):
    x = get_probabilities(start_word)
   
    return choice(x.index, p = x.values)

def make_sentence(seed):
    for num in range(0, 500):
        print(seed, end = ' ')
        seed = pick(seed)

In [17]:
make_sentence('happy')

happy summer day made a feather flock together. ” “ but little ! ” she had not noticed before that her hedgehog just at once crowded round to see it fitted ! ” the king , “ why , resting their shoulders , “ oh , ” “ to get ready to keep moving about for a footman seemed to the mouse did not venture to do practically anything but to death. ’ d let us , ” “ _she_ , that there was snorting like , if my tea when you ’ s the right , my tea ; yet , the conversation . do you may convert to ask the rest waited patiently . “ you ? ” continued , that was going , and came in another moment he said alice in the individual works that one for the other bit . however , in a set the house in an oyster ! ” this time she let the king ; there ’ re a very curious dream ! ” said to herself “ that ? ” alice ; and the gryphon hastily , for some mischief , ” said to the queen . alice thought alice replied in at last concert ! ” “ and the caterpillar . first thought alice . “ there was in a serpent ? _ i must 

In [18]:
make_sentence('sad')

sad and saw maps and picking them a large birds complained that is— ‘ let the hedgehog to turn a house , has lasted the multiplication table as she soon began picking the hatter , and peeped into the little anxiously about it ; and the only bowed and yet i _never_ get what ’ s more , look of rome , you can , and the next , very much under this piece of present of that for it , “ it ’ s a friend . “ here the wretched height to a little sharp hiss made no pleasing them , judging by u.s. laws of that perhaps as the only yesterday , dear little shrieks , leaning over me like a large eyes , which way with , ” cried out the large crowd collected at once in a failure . project gutenberg literary archive foundation 's ein or unenforceability of bread-and-butter in existence ; it a knife , and beasts and you know . “ we were me like : they were , and shoes done that it is made of that , it was busily stirring the queen was , and some day , “ it very civil of his eyes , and now and low-spirited 