Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

ValueError: Variable topic_embedding already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at: #40

Open
dbl001 opened this issue Mar 25, 2019 · 9 comments
Labels
bug Something isn't working

Comments

@dbl001
Copy link
Collaborator

dbl001 commented Mar 25, 2019

I get this error when pretrained_embeddings=None

m = model(num_docs,
          vocab_size,
          num_topics=num_topics,
          #embedding_size=embed_size,
          restore=False,
          #logdir="/data/",
          pretrained_embeddings=None,
          freqs=freqs)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-2552ac324704> in <module>()
     25           #logdir="/data/",
     26           pretrained_embeddings=None,
---> 27           freqs=freqs)
     28 
     29 m.train(pivot_ids,target_ids,doc_ids, len(pivot_ids), num_epochs, idx_to_word=idx_to_word,  switch_loss_epoch=5)

~/Lda2vec-Tensorflow/lda2vec/Lda2vec.py in __init__(self, num_unique_documents, vocab_size, num_topics, freqs, save_graph_def, embedding_size, num_sampled, learning_rate, lmbda, alpha, power, batch_size, logdir, restore, fixed_words, factors_in, pretrained_embeddings)
     76                                             power=self.power)
     77             # Initialize the Topic-Document Mixture
---> 78             self.mixture = M.EmbedMixture(self.num_unique_documents, self.num_topics, self.embedding_size)
     79 
     80 

~/Lda2vec-Tensorflow/lda2vec/embedding_mixture.py in __init__(self, n_documents, n_topics, n_dim, temperature, W_in, factors_in, name)
     27         self.topic_embedding = tf.get_variable('topic_embedding', shape=[n_topics, n_dim],
     28                                                dtype=tf.float32,
---> 29                                                initializer=tf.orthogonal_initializer(gain=scalar)) if factors_in is None else factors_in
     30 
     31 

~/anaconda/envs/ai/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
   1485       constraint=constraint,
   1486       synchronization=synchronization,
-> 1487       aggregation=aggregation)
   1488 
   1489 

~/anaconda/envs/ai/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
   1235           constraint=constraint,
   1236           synchronization=synchronization,
-> 1237           aggregation=aggregation)
   1238 
   1239   def _get_partitioned_variable(self,

~/anaconda/envs/ai/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
    538           constraint=constraint,
    539           synchronization=synchronization,
--> 540           aggregation=aggregation)
    541 
    542   def _get_partitioned_variable(self,

~/anaconda/envs/ai/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation)
    490           constraint=constraint,
    491           synchronization=synchronization,
--> 492           aggregation=aggregation)
    493 
    494     # Set trainable value based on synchronization value.

~/anaconda/envs/ai/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
    859                          "reuse=tf.AUTO_REUSE in VarScope? "
    860                          "Originally defined at:\n\n%s" % (
--> 861                              name, "".join(traceback.format_list(tb))))
    862       found_var = self._vars[name]
    863       if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable topic_embedding already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "/home/ubuntu/Lda2vec-Tensorflow/lda2vec/embedding_mixture.py", line 29, in __init__
    initializer=tf.orthogonal_initializer(gain=scalar)) if factors_in is None else factors_in
  File "/home/ubuntu/Lda2vec-Tensorflow/lda2vec/Lda2vec.py", line 78, in __init__
    self.mixture = M.EmbedMixture(self.num_unique_documents, self.num_topics, self.embedding_size)
  File "<ipython-input-8-6f2c3ffe8774>", line 27, in <module>
    freqs=freqs
@nateraw
Copy link
Owner

nateraw commented Mar 25, 2019

Does the twenty newsgroups example work when you set pretrained_embeddings=False right now? There weren't any errors for me yesterday.

I'm assuming you pulled since my commit last night?

@nateraw nateraw added the bug Something isn't working label Mar 25, 2019
@dbl001
Copy link
Collaborator Author

dbl001 commented Mar 25, 2019 via email

@dbl001
Copy link
Collaborator Author

dbl001 commented Mar 25, 2019 via email

@nateraw
Copy link
Owner

nateraw commented Mar 25, 2019

For my own sanity, recommenting for you so I can format with markdown. Can't reformat email replies

import pandas as pd
from lda2vec.nlppipe import Preprocessor

# Data directory
data_dir ="data"
# Where to save preprocessed data
clean_data_dir = "data/clean_data_twenty_newsgroups"
# Name of input file. Should be inside of data_dir
input_file = "20_newsgroups.txt"
# Should we load pretrained embeddings from file
load_embeds = True

# Read in data file
df = pd.read_csv(data_dir+"/"+input_file, sep="\t")

# Initialize a preprocessor
P = Preprocessor(df, "texts", max_features=30000, maxlen=10000, min_count=30, nlp="en_core_web_lg")

# Run the preprocessing on your dataframe
P.preprocess()

# Load embeddings from file if we choose to do so
if load_embeds:
    # Load embedding matrix from file path - change path to where you saved them
    embedding_matrix = P.load_glove("glove.6B.300d.txt")
else:
    embedding_matrix = None

# Save data to data_dir
P.save_data(clean_data_dir, embedding_matrix=embedding_matrix)

from lda2vec import utils, model

# Path to preprocessed data
data_path  = "data/clean_data_twenty_newsgroups"
# Whether or not to load saved embeddings file
load_embeds = True

# Load data from files
(idx_to_word, word_to_idx, freqs, pivot_ids,
 target_ids, doc_ids, embed_matrix) = utils.load_preprocessed_data(data_path, load_embed_matrix=load_embeds)

# Number of unique documents
num_docs = doc_ids.max() + 1
# Number of unique words in vocabulary (int)
vocab_size = len(freqs)
# Embed layer dimension size
# If not loading embeds, change 128 to whatever size you want.
embed_size = embed_matrix.shape[1] if load_embeds else 128
# Number of topics to cluster into
num_topics = 20
# Amount of iterations over entire dataset
num_epochs = 200
# Batch size - Increase/decrease depending on memory usage
batch_size = 500
# Epoch that we want to "switch on" LDA loss
switch_loss_epoch = 0
# Pretrained embeddings value
pretrained_embeddings = embed_matrix if load_embeds else None
# If True, save logdir, otherwise don't
save_graph = True


# Initialize the model
m = model(num_docs,
          vocab_size,
          num_topics,
          embedding_size=embed_size,
          pretrained_embeddings=pretrained_embeddings,
          freqs=freqs,
          batch_size = batch_size,
          save_graph_def=save_graph)

# Train the model
m.train(pivot_ids,
        target_ids,
        doc_ids,
        len(pivot_ids),
        num_epochs,
        idx_to_word=idx_to_word,
        switch_loss_epoch=switch_loss_epoch)

# Visualize topics with pyldavis
utils.generate_ldavis_data(data_path, m, idx_to_word, freqs, vocab_size)

@dbl001
Copy link
Collaborator Author

dbl001 commented Mar 25, 2019 via email

@nateraw
Copy link
Owner

nateraw commented Mar 25, 2019

No worries 😄

@nateraw
Copy link
Owner

nateraw commented Mar 25, 2019

At work right now so I can't really help. Just adding this to my todo list tonight. Will get back to you. Sorry that this stuff is always breaking 🙁 . Sooo many improvements lately but they came with many more bugs.

@nateraw
Copy link
Owner

nateraw commented Apr 1, 2019

Also, just to be clear, this is all supposed to be run on Tensorflow 1.5.0.

@klausrossmann
Copy link

Was running into the same error just now. The TF Version was the problem, with 1.5.0 it's working!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants