# Case Study (Word Suggestion)

### Predict the next word! Can you build a word suggestion system using only what's built-in Python, analyzing text and suggesting the most likely next word based on word co-occurrence?


Dear students, in our upcoming class session, we're diving into an exciting micro project: constructing a Word Suggestion System using only Python's built-in features. This practical endeavor involves not only text preprocessing but also delves into the intricate relationships between words through co-occurrence analysis. As we prompt users for input and offer the most likely next word, we'll subtly echo the concepts of string manipulation, lists, loops, and conditionals. The hands-on nature of this project ensures a dynamic exploration of Python's capabilities, showcasing their application in creating a robust word suggestion system. Your active participation, questions, and collaboration will be crucial for a comprehensive learning experience.

## Text Preprocessing:


In [1]:
# Text Preprocessing
text = "This is a sample text. You can replace this with your own data."

# Tokenize each word
text_tokens = text.split(" ")
print(text_tokens)

['This', 'is', 'a', 'sample', 'text.', 'You', 'can', 'replace', 'this', 'with', 'your', 'own', 'data.']


In [2]:
# Remove punctuation and convert to lowercase
import string
text_tokens = [token.strip(string.punctuation).lower() for token in text_tokens]
print(text_tokens)

['this', 'is', 'a', 'sample', 'text', 'you', 'can', 'replace', 'this', 'with', 'your', 'own', 'data']


In [3]:
# Create a list of all unique words
text_tokens = list(set(text_tokens))
print(text_tokens)

['text', 'data', 'you', 'your', 'this', 'a', 'with', 'own', 'is', 'sample', 'replace', 'can']


## Building Co-occurrence Matrix:


In [4]:
# Initialize Distance Matrix
import numpy as np
dist_matrix = np.zeros((len(text_tokens), len(text_tokens)))
print(dist_matrix)

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


In [5]:
# Building Co-occurrence Matrix
for i in range(len(text_tokens)):
    for j in range(len(text_tokens)):
        if i != j:
            dist_matrix[i][j] = text.count(text_tokens[i] + " " + text_tokens[j])
    print(dist_matrix[i], "\t", text_tokens[i])

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 	 text
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 	 data
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 	 you
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] 	 your
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] 	 this
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] 	 a
[0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0.] 	 with
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 	 own
[0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.] 	 is
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 	 sample
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] 	 replace
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] 	 can


In [6]:
# Print Distance Matrix
for k in range (len(text_tokens)):
    print("\t", text_tokens[k], end="")
print()
for i in range(len(text_tokens)):
    if(text_tokens[i] == "replace"):
        print(text_tokens[i], "", end="")
    else:
        print(text_tokens[i], "\t", end="")
    for j in range(len(text_tokens)):
        print(int(dist_matrix[i][j]), "\t", end="")
    print()

	 text	 data	 you	 your	 this	 a	 with	 own	 is	 sample	 replace	 can
text 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
data 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
you 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
your 	0 	0 	0 	0 	0 	0 	0 	1 	0 	0 	0 	0 	
this 	0 	0 	0 	0 	0 	0 	1 	0 	0 	0 	0 	0 	
a 	0 	0 	0 	0 	0 	0 	0 	0 	0 	1 	0 	0 	
with 	0 	0 	1 	1 	0 	0 	0 	0 	0 	0 	0 	0 	
own 	0 	1 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
is 	0 	0 	0 	0 	0 	1 	1 	0 	0 	0 	0 	0 	
sample 	1 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
replace 0 	0 	0 	0 	1 	0 	0 	0 	0 	0 	0 	0 	
can 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	1 	0 	


## Top Suggestion:


In [16]:
# Ask user to input a word
user_input = input("Enter a word: ")

# Find top suggestion
if user_input in text_tokens:
    index_of_user_input = text_tokens.index(user_input)
    max_value=max(dist_matrix[index_of_user_input])
    index_of_max_value = np.where(dist_matrix[index_of_user_input] == max_value)
    print("Top suggestion: ", text_tokens[index_of_max_value[0][0]])

# Print Top Suggestion



Top suggestion:  with


## Task
### Modify the existing code in such a way that
1. It will suggest 3 words instead of 1
2. Split the entire code into different functions
4. Use a text file as sourece instead of a sring