Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce spaCy training results #3182

Closed
ykeuter opened this issue Jan 21, 2019 · 4 comments
Closed

Reproduce spaCy training results #3182

ykeuter opened this issue Jan 21, 2019 · 4 comments
Labels
bug Bugs and behaviour differing from documentation gpu Using spaCy on GPU training Training and updating models

Comments

@ykeuter
Copy link

ykeuter commented Jan 21, 2019

It would be nice if there is a way to reproduce training result in spaCy. Below snippet trains a trivial NER component, but shows different results in separate runs, even while using fix_random_seed. Is this expected behaviour?

import spacy

spacy.util.fix_random_seed()

nlp = spacy.blank("en")

ner = nlp.create_pipe("ner")
ner.add_label("TEST")
nlp.add_pipe(ner)

losses = {}
nlp.begin_training()
nlp.update(
    ["test"],  # batch of texts
    [{"entities": [(0, 4, "TEST")]}],  # batch of annotations
    losses=losses,
)
print(losses)

Environment:

* **spaCy version:** 2.0.18
* **Platform:** Linux-4.9.125-linuxkit-x86_64-with-debian-buster-sid
* **Python version:** 3.6.7
* **Models:** nl_core_news_sm
@ellenir
Copy link

ellenir commented Jan 29, 2019

I am having the same issue with spacy.util.fix_random_seed() I am trying to train the model for text categorization however (using textcat).

  `  import spacy
    spacy.util.fix_random_seed()
    nlp = spacy.load('en_core_web_sm')
    textclass = nlp.create_pipe('textcat')
    nlp.add_pipe(textclass, last=True)
    textclass.add_label('NotAtFault')
    
   
    optimizer = nlp.begin_training()`

I have experimented with the location of the seed setting but to no avail.

For me:

spaCy version: 2.0.18
Platform: Windows 10
Python version: 3.6.8
Models: en_core_web_sm

@honnibal
Copy link
Member

honnibal commented Feb 7, 2019

The three sources of randomization should be:

numpy.random.seed(i)
cupy.random.seed(i)

I think currently cupy.random.seed isn't being set, so if you're using GPU, that would be the problem.

@ines ines added bug Bugs and behaviour differing from documentation training Training and updating models gpu Using spaCy on GPU labels Feb 8, 2019
@ines
Copy link
Member

ines commented Feb 8, 2019

Closing this, since this is already fixed on develop!

@ines ines closed this as completed Feb 8, 2019
@lock
Copy link

lock bot commented Mar 10, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation gpu Using spaCy on GPU training Training and updating models
Projects
None yet
Development

No branches or pull requests

4 participants