In [43]:
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

In [62]:
import numpy as np

In [44]:
text_data = '''Steve Jobs was a visionary entrepreneur who co-founded and served as the CEO of Apple Inc., one of the world's largest technology companies. He was born on February 24, 1955, in San Francisco, California, and passed away on October 5, 2011. Jobs was widely recognized for his innovative approach to technology, design, and marketing, which revolutionized the computer and mobile phone industries.
Jobs co-founded Apple Computer in 1976 with Steve Wozniak, and the company quickly gained popularity with the release of its first computer, the Apple I. Jobs had a passion for design and user experience, and he worked tirelessly to ensure that Apple products were not only functional, but also visually appealing. Under his leadership, Apple released several groundbreaking products, including the Macintosh computer, the iMac, and the iPod, which changed the way people interact with technology.
In addition to his work at Apple, Jobs was also a pioneer in the computer graphics industry. He co-founded and served as the CEO of Pixar Animation Studios, which produced several critically acclaimed animated films, including "Toy Story," "Finding Nemo," and "The Incredibles." Jobs' approach to storytelling and animation revolutionized the industry and set a new standard for animation and filmmaking.
Despite his success, Jobs faced several challenges throughout his career. In 1985, he was forced to resign from Apple due to disagreements with the board of directors. During this time, he founded NeXT Computer, a company that developed cutting-edge computer software. In 1996, Apple acquired NeXT, and Jobs returned to the company as CEO. He later presided over a period of significant growth and innovation at Apple, including the development of the iPhone and iPad.
Jobs' impact on the technology industry cannot be overstated. He was a visionary who saw the potential of technology to change the world, and he worked tirelessly to turn his vision into reality. He was a master of design and user experience, and his products changed the way people interact with technology. Jobs was also a great leader who inspired others to pursue their dreams and to think differently about the world.
In conclusion, Steve Jobs was a visionary entrepreneur who left a lasting impact on the technology and animation industries. His innovative approach to design, technology, and marketing revolutionized the way people interact with technology, and his legacy continues to inspire entrepreneurs and innovators today. Jobs will always be remembered as a brilliant entrepreneur and a master of design, who pushed the boundaries of what was possible and made a lasting impact on the world'''

### Preprocessing

#### Removing Special Characters and spaces

In [45]:
text_data = text_data.lower()
text_data

'steve jobs was a visionary entrepreneur who co-founded and served as the ceo of apple inc., one of the world\'s largest technology companies. he was born on february 24, 1955, in san francisco, california, and passed away on october 5, 2011. jobs was widely recognized for his innovative approach to technology, design, and marketing, which revolutionized the computer and mobile phone industries.\njobs co-founded apple computer in 1976 with steve wozniak, and the company quickly gained popularity with the release of its first computer, the apple i. jobs had a passion for design and user experience, and he worked tirelessly to ensure that apple products were not only functional, but also visually appealing. under his leadership, apple released several groundbreaking products, including the macintosh computer, the imac, and the ipod, which changed the way people interact with technology.\nin addition to his work at apple, jobs was also a pioneer in the computer graphics industry. he co-fo

In [46]:
for i in '~!@#$%^&*()_+-=<>?,/:;"{}[]\n' :
    text_data = text_data.replace(i, ' ')  # removed special characters

text_data = text_data.replace("'", " ")    # removed single quotes (')

text_data = text_data.replace('  ', ' ')   # removed double space with single

text_data = text_data.replace('  ', ' ')   # removed triple space with single

text_data

'steve jobs was a visionary entrepreneur who co founded and served as the ceo of apple inc. one of the world s largest technology companies. he was born on february 24 1955 in san francisco california and passed away on october 5 2011. jobs was widely recognized for his innovative approach to technology design and marketing which revolutionized the computer and mobile phone industries. jobs co founded apple computer in 1976 with steve wozniak and the company quickly gained popularity with the release of its first computer the apple i. jobs had a passion for design and user experience and he worked tirelessly to ensure that apple products were not only functional but also visually appealing. under his leadership apple released several groundbreaking products including the macintosh computer the imac and the ipod which changed the way people interact with technology. in addition to his work at apple jobs was also a pioneer in the computer graphics industry. he co founded and served as th

#### Tokenizing

In [95]:
tokenized_words = word_tokenize(text_data)       # tokenizing word
tokenized_sentences = sent_tokenize(text_data)   # tokenizing sentences

In [49]:
tokenized_words = [word for word in tokenized_words if len(word) != 0]        # Removing Null
word_index = {word: i for i, word in enumerate(set(tokenized_words))}         # Creating Word Index Dictionary
index_word = {word_index[word]:word for word in word_index}                   # Creating Index Word Dictionary

word_sent_tokenize = [word_tokenize(sent) for sent in tokenized_sentences]    # tokenizing sentences and words in it and making 2D list/array
word_sent_tokenize

[['steve',
  'jobs',
  'was',
  'a',
  'visionary',
  'entrepreneur',
  'who',
  'co',
  'founded',
  'and',
  'served',
  'as',
  'the',
  'ceo',
  'of',
  'apple',
  'inc.',
  'one',
  'of',
  'the',
  'world',
  's',
  'largest',
  'technology',
  'companies',
  '.'],
 ['he',
  'was',
  'born',
  'on',
  'february',
  '24',
  '1955',
  'in',
  'san',
  'francisco',
  'california',
  'and',
  'passed',
  'away',
  'on',
  'october',
  '5',
  '2011.',
  'jobs',
  'was',
  'widely',
  'recognized',
  'for',
  'his',
  'innovative',
  'approach',
  'to',
  'technology',
  'design',
  'and',
  'marketing',
  'which',
  'revolutionized',
  'the',
  'computer',
  'and',
  'mobile',
  'phone',
  'industries',
  '.'],
 ['jobs',
  'co',
  'founded',
  'apple',
  'computer',
  'in',
  '1976',
  'with',
  'steve',
  'wozniak',
  'and',
  'the',
  'company',
  'quickly',
  'gained',
  'popularity',
  'with',
  'the',
  'release',
  'of',
  'its',
  'first',
  'computer',
  'the',
  'apple',
  'i

In [50]:
index_word

{0: 'that',
 1: 'i.',
 2: 'was',
 3: 'under',
 4: 'presided',
 5: 'innovation',
 6: 'leader',
 7: 'differently',
 8: 'potential',
 9: 'change',
 10: 'marketing',
 11: 'gained',
 12: 'developed',
 13: 'legacy',
 14: 'experience',
 15: 'story',
 16: 'addition',
 17: 'disagreements',
 18: 'new',
 19: 'later',
 20: 'board',
 21: 'filmmaking',
 22: 'time',
 23: 'directors',
 24: 'pushed',
 25: '1985',
 26: 'innovative',
 27: 'period',
 28: 'revolutionized',
 29: 'pioneer',
 30: 'industries',
 31: 'nemo',
 32: 'appealing',
 33: 'california',
 34: 'entrepreneur',
 35: 'due',
 36: 'and',
 37: 'he',
 38: 'for',
 39: 'its',
 40: 'ensure',
 41: 'this',
 42: 'will',
 43: 'during',
 44: 'graphics',
 45: 'worked',
 46: 'tirelessly',
 47: 'one',
 48: 'from',
 49: 'finding',
 50: 'popularity',
 51: 'also',
 52: 'inspired',
 53: 'reality',
 54: 'interact',
 55: 'visually',
 56: 'saw',
 57: 'people',
 58: 'of',
 59: 'can',
 60: 'vision',
 61: 'but',
 62: 'changed',
 63: 'brilliant',
 64: 'on',
 65: '195

### Creating Word Vectors

In [51]:
window_size = 2   # number of words that we will consider both left and right side

features    = []  # list of all the surrounding words for each target word
labels      = []  # list of target words

In [56]:
# adding elements in features[] and labels[]

for sent in word_sent_tokenize :
    for i in range(len(sent) - (window_size * 2)):
        features.append(sent[i : i + window_size] + sent[i + window_size + 1 : i + window_size * 2 + 1])
        labels.append(sent[ i + window_size ])

In [60]:
for i in range(0 , len(features)) :
    print(features[i], labels[i])

['steve', 'jobs', 'a', 'visionary'] was
['jobs', 'was', 'visionary', 'entrepreneur'] a
['was', 'a', 'entrepreneur', 'who'] visionary
['a', 'visionary', 'who', 'co'] entrepreneur
['visionary', 'entrepreneur', 'co', 'founded'] who
['entrepreneur', 'who', 'founded', 'and'] co
['who', 'co', 'and', 'served'] founded
['co', 'founded', 'served', 'as'] and
['founded', 'and', 'as', 'the'] served
['and', 'served', 'the', 'ceo'] as
['served', 'as', 'ceo', 'of'] the
['as', 'the', 'of', 'apple'] ceo
['the', 'ceo', 'apple', 'inc.'] of
['ceo', 'of', 'inc.', 'one'] apple
['of', 'apple', 'one', 'of'] inc.
['apple', 'inc.', 'of', 'the'] one
['inc.', 'one', 'the', 'world'] of
['one', 'of', 'world', 's'] the
['of', 'the', 's', 'largest'] world
['the', 'world', 'largest', 'technology'] s
['world', 's', 'technology', 'companies'] largest
['s', 'largest', 'companies', '.'] technology
['he', 'was', 'on', 'february'] born
['was', 'born', 'february', '24'] on
['born', 'on', '24', '1955'] february
['on', 'februa

#### Creating vectors - OneHot Encoding

In [89]:
context_words = []
for feature in features:
    enc = np.zeros(len(word_index))
    for word in feature:
        enc[word_index[word]] = 1

    context_words.append(enc)

target_words = []

for label in labels:
    enc = np.zeros(len(word_index))
    enc[word_index[label]] = 1
    target_words.append(enc)


context_words = np.array(X_train)
target_words = np.array(y_train)

In [94]:
context_words[0]

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
       0., 1., 0., 0., 0., 0., 0.])