## Since Computers work strictly with numbers so I first have to have a model to map words into numerical representations (called vectors).

To do that I have to install some additional packages (run the cell below by clicking in it and hitting shift-enter)

In [3]:
# Run this cell
!pip install gensim



Now I'm going to import some libraries to run my code (shift-enter to run)

In [4]:
# Run this cell
import gensim.downloader as api
import numpy as np

Download and load the pre-trained model from Twitter and "vectorizing" two words, king and queen (may take a minute or two to run)

In [6]:
# Run this cell

# Download and load the pre-trained model from Twitter
word2vec_model = api.load("glove-twitter-25")

# Get word vectors
word_vector1 = word2vec_model['king']
word_vector2 = word2vec_model['queen']

Lets see what king and queen look like as numerical representations

In [7]:
# Run this cell
print("Word Vector 1 (king):", word_vector1)
print("Word Vector 2 (queen):", word_vector2)

Word Vector 1 (king): [-0.74501  -0.11992   0.37329   0.36847  -0.4472   -0.2288    0.70118
  0.82872   0.39486  -0.58347   0.41488   0.37074  -3.6906   -0.20101
  0.11472  -0.34661   0.36208   0.095679 -0.01765   0.68498  -0.049013
  0.54049  -0.21005  -0.65397   0.64556 ]
Word Vector 2 (queen): [-1.1266   -0.52064   0.45565   0.21079  -0.05081  -0.65158   1.1395
  0.69897  -0.20612  -0.71803  -0.02811   0.10977  -3.3089   -0.49299
 -0.51375   0.10363  -0.11764  -0.084972  0.02558   0.6859   -0.29196
  0.4594   -0.39955  -0.40371   0.31828 ]


We can apply different numerical operations e.g. scalling to transform these vectors. I think you will use it to train the model so that it identifies the text patterns in text!

## Working With Vectors

Let's add these vectors together and see what we get

In [8]:
# Run this cell
# Add the two vectors
result_vector = word_vector1 + word_vector2

# Print the results
print("Result Vector:", result_vector)

Result Vector: [-1.87161    -0.64056003  0.82894003  0.57926    -0.49801    -0.88038
  1.84068     1.5276899   0.18874    -1.3015      0.38677     0.48051
 -6.9995003  -0.694      -0.39903003 -0.24298     0.24444     0.010707
  0.00793     1.3708799  -0.340973    0.99988997 -0.6096     -1.05768
  0.96384   ]


By itself this vector doesn't mean much but let's see what the top ten most similar words are and see if they make sense

In [9]:
# Run this cell
# Find the most similar words to the result vector (remember adding king and queen should give us various royalty)
most_similar_words = word2vec_model.similar_by_vector(result_vector, topn=10)

# Print the most similar words
print("Most Similar Words to Result Vector:")
for word, similarity in most_similar_words:
    print(f"{word}: {similarity}")

Most Similar Words to Result Vector:
king: 0.9806498885154724
queen: 0.9790497422218323
prince: 0.9494946002960205
lady: 0.9438114166259766
aka: 0.9265403747558594
hero: 0.9069766402244568
's: 0.9044355154037476
jack: 0.9005230665206909
princess: 0.9002885818481445
star: 0.8977693915367126


Not a 100% match but as you can see a lot of these are related to royalty (prince, lady, jack, etc)

## Word2Vec Playground - Feel free to mix and match words or expand the list

In [12]:
# Edit and run these cells
# Get word vectors
student_word_vector1 = word2vec_model['student'] # Feel free to change this word or add additional words
student_word_vector2 = word2vec_model['teacher']

# Add the two vectors
student_result_vector = student_word_vector1 + student_word_vector2

# Print the results
print("Student Word Vector 1 :", student_word_vector1)
print("Word Vector 2 (queen):", student_word_vector2)
print("Result Vector:", student_result_vector)

Student Word Vector 1 : [ 0.21425   0.76572  -0.047929 -1.4637    0.99642  -0.46683   0.56335
 -0.58812   1.0889    0.24792  -0.47263  -0.17876  -3.5099    0.82986
  0.97985  -0.46292   0.74787  -0.29517  -0.14853  -0.23281   0.20884
 -0.042099 -1.3691   -0.88508  -0.77564 ]
Word Vector 2 (queen): [ 0.19484   0.33447   0.93546  -0.48329   1.1514   -1.2814    1.5305
 -0.3708    0.57756   0.24608  -0.78535  -0.32367  -3.6943    0.12604
  1.253     0.014313  1.0713    0.12095   0.07319  -0.47222   0.99402
  0.29617  -0.61875   0.097479 -0.90367 ]
Result Vector: [ 0.40908998  1.10019     0.887531   -1.94699     2.14782    -1.74823
  2.0938501  -0.95892     1.66646     0.49400002 -1.25798    -0.50243
 -7.2042      0.95589995  2.23285    -0.448607    1.81917    -0.17422001
 -0.07534    -0.70503     1.20286     0.254071   -1.98785    -0.787601
 -1.6793101 ]


Printing the results

In [13]:
# Run this cell
student_most_similar_words = word2vec_model.similar_by_vector(student_result_vector, topn=10)

# Print the most similar words
print("Most Similar Words to Result Vector:")
for word, similarity in student_most_similar_words:
    print(f"{word}: {similarity}")

Most Similar Words to Result Vector:
teacher: 0.963818371295929
student: 0.9602292776107788
teachers: 0.9031844735145569
primary: 0.899797260761261
class: 0.8975030779838562
group: 0.8948927521705627
law: 0.889959454536438
private: 0.8884251117706299
term: 0.8844797611236572
office: 0.8840461373329163
