In [1]:
text_data = """Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning.
Learning can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks, convolutional neural networks and Transformers have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, climate science, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.
Artificial neural networks  were inspired by information processing and distributed communication nodes in biological systems. 
ANNs have various differences from biological brains. Specifically, artificial neural networks tend to be static and symbolic, 
while the biological brain of most living organisms is dynamic (plastic) and analog.
The adjective "deep" in deep learning refers to the use of multiple layers in the network. 
Early work showed that a linear perceptron cannot be a universal classifier, but that a network with a nonpolynomial activation function with one hidden layer of unbounded width can.
Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size, which permits practical application and optimized implementation, while retaining theoretical universality under mild conditions.
In deep learning the layers are also permitted to be heterogeneous and to deviate widely from biologically informed connectionist models, for the sake of efficiency, trainability and understandability, hence the "structured" part.
Deep neural networks are generally interpreted in terms of the universal approximation theorem or probabilistic inference.
The classic universal approximation theorem concerns the capacity of feedforward neural networks with a single hidden layer of finite size to approximate continuous functions.
In 1989, the first proof was published by George Cybenko for sigmoid activation functions and was generalised to feed-forward multi-layer architectures in 1991 by Kurt Hornik.
Recent work also showed that universal approximation also holds for non-bounded activation functions such as the rectified linear unit.
The universal approximation theorem for deep neural networks concerns the capacity of networks with bounded width but the depth is allowed to grow. 
Lu et al. proved that if the width of a deep neural network with ReLU activation is strictly larger than the input dimension,
then the network can approximate any Lebesgue integrable function; If the width is smaller or equal to the input dimension, then a deep neural network is not a universal approximator.
The probabilistic interpretation derives from the field of machine learning. 
It features inference, as well as the optimization concepts of training and testing, related to fitting and generalization, respectively. 
More specifically, the probabilistic interpretation considers the activation nonlinearity as a cumulative distribution function.
The probabilistic interpretation led to the introduction of dropout as regularizer in neural networks. 
The probabilistic interpretation was introduced by researchers including Hopfield, Widrow and Narendra and popularized in surveys such as the one by Bishop."""

In [2]:
## word2vec model using gensim

In [3]:

from gensim.models import Word2Vec
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re

In [4]:
## Preprocessing
def preprocess(text):
    text = text.lower()
    text = re.sub(r'\W', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    return text

text_data = preprocess(text_data)


In [5]:
## Tokenization
def create_tokens(text):
    sentences = sent_tokenize(text)
    word_tokens = [word_tokenize(sentence) for sentence in sentences]
    return word_tokens

word_tokens = create_tokens(text_data)

In [6]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/rishabhsharma/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [7]:
## Remove Stopwords
def remove_stopwords(word_tokens):
    stop_words = set(stopwords.words('english'))
    return [[w for w in word if not w in stop_words] for word in word_tokens]

word_tokens = remove_stopwords(word_tokens)

In [10]:
## Lemmatization
def lemmatize_words(filtered_tokens):
    lemmatizer = WordNetLemmatizer()
    return [[lemmatizer.lemmatize(w) for w in word] for word in filtered_tokens]

In [11]:
lemmatized_output = lemmatize_words(word_tokens)

In [12]:
## Word2Vec Model
model = Word2Vec(lemmatized_output, min_count=1)


In [14]:
## Similarity
sim_words = model.wv.most_similar('deep')
print(sim_words)

[('considers', 0.21648749709129333), ('capacity', 0.18907596170902252), ('one', 0.1884191483259201), ('science', 0.1849723756313324), ('living', 0.18418720364570618), ('allowed', 0.18141457438468933), ('produced', 0.18017159402370453), ('widely', 0.17434096336364746), ('trainability', 0.1719604879617691), ('survey', 0.1692296266555786)]


In [16]:

## Word2Vec Model
model = Word2Vec(lemmatized_output, min_count=1, window = 5, sg = 1)

## Similarity
sim_words = model.wv.most_similar('deep')

In [17]:
sim_words

[('considers', 0.2331845611333847),
 ('capacity', 0.19747082889080048),
 ('produced', 0.19468210637569427),
 ('one', 0.19266347587108612),
 ('science', 0.19240933656692505),
 ('allowed', 0.19074752926826477),
 ('widely', 0.18849675357341766),
 ('living', 0.18813900649547577),
 ('trainability', 0.1873253434896469),
 ('analysis', 0.17168308794498444)]

In [20]:
sim_words = model.wv.most_similar('living')
print(sim_words)

[('interpretation', 0.3232540786266327), ('translation', 0.2383796125650406), ('work', 0.23218710720539093), ('distribution', 0.2225135862827301), ('led', 0.20174400508403778), ('machine', 0.19200050830841064), ('deep', 0.18813902139663696), ('result', 0.1761387288570404), ('hence', 0.17364022135734558), ('comparable', 0.17268885672092438)]
