Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different problems to test "Text classification with Keras" example #2209

Closed
lfdharo opened this issue Apr 12, 2018 · 2 comments
Closed

Different problems to test "Text classification with Keras" example #2209

lfdharo opened this issue Apr 12, 2018 · 2 comments
Labels
examples Code examples in /examples

Comments

@lfdharo
Copy link

lfdharo commented Apr 12, 2018

Hello,

I have found three main problems when training and evaluating the Text classifier using Keras example.

  1. Saving the 'config.json' file after training in line 222, instead of opening the file with 'wb' it should be just 'w' since the write function in the next line will complain that it cannot handle a 'str' produced by lstm.to_json()

  2. For testing (using the is_runtime flag to True) the program crashes when trying to set the weights of the embeddings and lstm in line 33.

ValueError: Dimension 0 in both shapes must be equal, but are 1070971 and 0. Shapes are [1070971,300] and [0,0]. for 'Assign' (op: 'Assign') with input shapes: [1070971,300], [0,0].

The reason is that during training the model loads the embeddings from 'en_vectors_web_lg' but during test it does not load them because it is using 'en' model instead. Besides, the model is saved using pickle.dump(weights[1:], file_) in line 221 which clearly left the embeddings out.

My partial solution is to load 'en_vectors_web_lg' inside the load function of the SentimentAnalyser class, get the embeddings and set the weights. But probably saving the embeddings directly with pickle should do the same thing.

  1. After that, the system crashes in line 153 when iterating over the data using parallel batches.
    File "/usr/local/lib/python3.5/dist-packages/spacy/language.py", line 558, in pipe
    for name, proc in self.pipeline:
    TypeError: 'Tagger' object is not iterable

So, I solved this issue by not using the create_pipeline function and instead to use:
nlp = spacy.load('en')
nlp.add_pipe(SentimentAnalyser.load(model_dir, nlp, max_length=max_length))

I don't add the tagger and parser since they are already included in 'en'.

however, after all these changes the system prints 0.5 as accuracy and it is taking a long time to present this result (more than 10 minutes) even when I have a GPU.

Please let me know what could be wrong or if it is expected this accuracy for this example.

Info about spaCy

  • spaCy version: 2.0.11
  • Platform: Linux-4.4.0-119-generic-x86_64-with-Ubuntu-16.04-xenial
  • Models: en, en_vectors_web_lg, en_core_web_lg
  • Python version: 3.5.3

Info about models

Installed models (spaCy v2.0.11)
/usr/local/lib/python3.5/dist-packages/spacy

TYPE        NAME                  MODEL                 VERSION                                   
package     en-core-web-lg        en_core_web_lg        2.0.0    ✔      
package     en-core-web-sm        en_core_web_sm        2.0.0    ✔      
package     en-vectors-web-lg     en_vectors_web_lg     2.0.0    ✔      
link        en                    en_core_web_sm        2.0.0    ✔      
link        en_core_web_lg        en_core_web_lg        2.0.0    ✔      
link        en_vectors_web_lg     en_vectors_web_lg     2.0.0    ✔  
@ines ines added the examples Code examples in /examples label Apr 18, 2018
@ines ines mentioned this issue Sep 12, 2018
8 tasks
@ines
Copy link
Member

ines commented Sep 12, 2018

Merging this with #2758.

@lock
Copy link

lock bot commented Oct 12, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
examples Code examples in /examples
Projects
None yet
Development

No branches or pull requests

2 participants