Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model cannot be output if the raise the dimension of word vector #17

Closed
chuangys opened this issue Oct 1, 2017 · 10 comments
Closed

Model cannot be output if the raise the dimension of word vector #17

chuangys opened this issue Oct 1, 2017 · 10 comments

Comments

@chuangys
Copy link

chuangys commented Oct 1, 2017

Everything is okay within the default parameter setting. But when I raised the dimension of word vector too 200, or 300. The model training is still fast but hang at model output. Could you help to check it?

@pommedeterresautee
Copy link
Owner

pommedeterresautee commented Oct 3, 2017

Hi, can you provide some code? (one I can test, with some data)

I just tried

library(fastrtext)

data("train_sentences")
data("test_sentences")

# prepare data
tmp_file_model <- tempfile()

train_labels <- paste0("__label__", train_sentences[,"class.text"])
train_texts <- tolower(train_sentences[,"text"])
train_to_write <- paste(train_labels, train_texts)
train_tmp_file_txt <- tempfile()
writeLines(text = train_to_write, con = train_tmp_file_txt)

test_labels <- paste0("__label__", test_sentences[,"class.text"])
test_texts <- tolower(test_sentences[,"text"])
test_to_write <- paste(test_labels, test_texts)

# learn model
execute(commands = c("supervised", "-input", train_tmp_file_txt,
                     "-output", tmp_file_model, "-dim", 200, "-lr", 1,
                     "-epoch", 20, "-wordNgrams", 2, "-verbose", 1))

model <- load_model(tmp_file_model)
predict(model, sentences = test_sentences[1, "text"])

And had no issue...

Can you try -verbose 1 in your command line?

@chuangys
Copy link
Author

chuangys commented Oct 26, 2017

@pommedeterresautee
Your code is running well at my environment. So I have to correct my problem.
Apply the same example data, and I use the pre-trained vector, than can reproduce the hang at model output issue.

Source code below:

library(fastrtext)
data("train_sentences")
data("test_sentences")
tmp_file_model <- tempfile(); print(tmp_file_model);
train_labels <- paste0("label", train_sentences[,"class.text"])
train_texts <- tolower(train_sentences[,"text"])
train_to_write <- paste(train_labels, train_texts)
train_tmp_file_txt <- tempfile(); print(train_tmp_file_txt);
writeLines(text = train_to_write, con = train_tmp_file_txt)
execute(commands = c("supervised", "-input", train_tmp_file_txt,
"-output", tmp_file_model, "-dim", 300, "-lr", 1,
"-epoch", 300, "-wordNgrams", 2, "-verbose", 1,
"-pretrainedVectors", "e:/baproject/data/pretrainedword2vec/wiki-news-300d-1M.vec"))

The wiki-news-300d-1M.vec download from facebookresearch pre-trained vector at below website.
https://fasttext.cc/docs/en/english-vectors.html

@pommedeterresautee
Copy link
Owner

it may be related to RAM issue. Did you fixed it?

@dockstreet
Copy link

Hi - I'm having the same issue as @chuangys, it seems to hang on the larger vec file ? I have 16GB of RAM

@pommedeterresautee
Copy link
Owner

Have you some test code? Did you checked the RAM (model trained by Facebook are quite big).

@dockstreet
Copy link

dockstreet commented Jan 9, 2018

I do.

execute(commands = c("supervised", "-input", "C:/Users/xxx/R/fasttext_test/train.txt",
"-output", "C:/Users/xxx/R/fasttext_test/train.bin","-lr", 1,
"-epoch", 50,"-wordNgrams", 2, "-verbose", 1 ))

This worked (while the Facebook one would not) - however I'm using pre trained vectors :
https://github.com/jazzyarchitects/fasttext-node/raw/master/train.txt

Here is the RAM size

memory.limit()
[1] 16204

Would you know of a larger example I could try with fastrtext to try that you know works with a pretrained vec from an external source? It may help clarify if it's my environment or not

@datalee
Copy link

datalee commented Jan 22, 2018

hi,i have a question: the arguments of ' pretrainedVectors does not support the vec products by gensim ?thks

@pommedeterresautee
Copy link
Owner

@datalee what is the feature you are referring to?

@datalee
Copy link

datalee commented Jan 22, 2018

@pommedeterresautee classification.

@pommedeterresautee
Copy link
Owner

pretrainedVectors is the text file produced by fasttext when you learn a model, whatever it is. I don't know the format of gensim but should not be hard to convert (word\tvector where each value is separated by a space).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants