philo2vec

A Tensorflow implementation of word2vec applied to stanford philosophy encyclopedia, the implementation supports both cbow and skip gram

for more reference, please have a look at this papers:

After training the model returns some interesting results, see interesting results part

Evaluating hume - empiricist + rationalist:

descartes
malebranche
spinoza
hobbes
herder

Some interesting results

Similarities

Similar words to death:

untimely
ravages
grief
torment

Similar words to god:

divine
De Providentia
christ
Hesiod

Similar words to love:

friendship
affection
christ
reverence

Similar words to life:

career
live
lifetime
community
society

Similar words to brain:

neurological
senile
nerve
nervous

operations

Evaluating hume - empiricist + rationalist:

descartes
malebranche
spinoza
hobbes
herder

Evaluating ethics - rational:

hiroshima

Evaluating ethic - reason:

inegalitarian
anti-naturalist
austere

Evaluating moral - rational:

commonsense

Evaluating life - death + love:

self-positing
friendship
care
harmony

Evaluating death + choice:

regret
agony
misfortune
impending

Evaluating god + human:

divine
inviolable
yahweh
god-like
man

Evaluating god + religion:

amida
torah
scripture
buddha
sokushinbutsu

Evaluating politic + moral:

rights-oriented
normative
ethics
integrity

The repo contains:

an object to crawl data from the philosophy encyclopedia; PlatoData
a object to build the vocabulary based on the crawled data; VocabBuilder
the model that computes the continuous distributed representations of words; Philo2Vec

Installation

The dependencies used for this module can be easily installed with pip:

> pip install -r requirements.txt

The params for the VocabBuilder:

min_frequency: the minimum frequency of the words to be used in the model.
size: the size of the data, the model then use the top size most frequenct words.

The hyperparams of the model:

optimizer: an instance of tensorflow Optimizer, such as GradientDescentOptimizer, AdagradOptimizer, or MomentumOptimizer.
model: the model to use to create the vectorized representation, possible values: CBOW, SKIP_GRAM.
loss_fct: the loss function used to calculate the error, possible values: SOFTMAX, NCE.
embedding_size: dimensionality of word embeddings.
neg_sample_size: number of negative samples for each positive sample
num_skips: numer of skips for a SKIP_GRAM model.
context_window: window size, this window is used to create the context for calculating the vector representations [ window target window ].

Quick usage:

params = {
    'model': Philo2Vec.CBOW,
    'loss_fct': Philo2Vec.NCE,
    'context_window': 5,
}
x_train = get_data()
validation_words = ['kant', 'descartes', 'human', 'natural']
x_validation = [StemmingLookup.stem(w) for w in validation_words]
vb = VocabBuilder(x_train, min_frequency=5)
pv = Philo2Vec(vb, **params)
pv.fit(epochs=30, validation_data=x_validation)

params = {
    'model': Philo2Vec.SKIP_GRAM,
    'loss_fct': Philo2Vec.SOFTMAX,
    'context_window': 2,
    'num_skips': 4,
    'neg_sample_size': 2,
}
x_train = get_data()
validation_words = ['kant', 'descartes', 'human', 'natural']
x_validation = [StemmingLookup.stem(w) for w in validation_words]
vb = VocabBuilder(x_train, min_frequency=5)
pv = Philo2Vec(vb, **params)
pv.fit(epochs=30, validation_data=x_validation)

about stemming

Since the words are stemmed as part of the preprocessing, some operation are sometimes necessary

StemmingLookup.stem('religious')  # returns "religi"

StemmingLookup.original_form('religi')  # returns "religion"

Getting similarities

pv.get_similar_words(['rationalist', 'empirist'])

Evaluating operations

pv.evaluate_operation('moral - rational')

plotting vectorized words

pv.plot(['hume', 'empiricist', 'descart', 'rationalist'])

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.py		data.py
models.py		models.py
preprocessors.py		preprocessors.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

philo2vec

Some interesting results

Similarities

operations

The repo contains:

Installation

The params for the VocabBuilder:

The hyperparams of the model:

Quick usage:

about stemming

Getting similarities

Evaluating operations

plotting vectorized words

Training details

skip_gram:

cbow:

About

Releases

Packages

Languages

License

mmourafiq/philo2vec

Folders and files

Latest commit

History

Repository files navigation

philo2vec

Some interesting results

Similarities

operations

The repo contains:

Installation

The params for the VocabBuilder:

The hyperparams of the model:

Quick usage:

about stemming

Getting similarities

Evaluating operations

plotting vectorized words

Training details

skip_gram:

cbow:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages