Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaCy training stopping automatically in Google Colab #13312

Closed
Daremitsu1 opened this issue Feb 8, 2024 · 0 comments
Closed

spaCy training stopping automatically in Google Colab #13312

Daremitsu1 opened this issue Feb 8, 2024 · 0 comments

Comments

@Daremitsu1
Copy link

Hello,

I am training spaCy's NER on a customed dataset.

I have changed the dataset template as per spaCy requirements:
data[0]['text']

 RECEIVED REGISTER OF DEEDS KENT COUNTY, MI 2022 MAY 02 4:26 PM GU 51 202205030036938 Total Pages: 2 05/03/2022 11:00 AM Fees: $30.00 Lisa Posthumus Lyons, County Clerk/Register Kent County, MI SEAL QUIT CLAIM DEED 41-13-23-104-009 rc Debra Kathleen Hoek, as trustee of the Jeanette (Ma Janet) Hoek Living Trust u/a/d April 17, 2019, of 1058 Patton Avenue NW, Grand Rapids, Michigan 49504, QUIT CLAIMS to Janet Hoek,' individually, of 1058 Patton Avenue NW, Grand Rapids, Michigan 49504, the premises located in Kent County, Michigan, described as on the attached Exhibit A, subject to all easements and restrictions of record, for One Dollar ($1.00). This transfer is exempt from real estate transfer tax under MCLA 207.526(a), MSA 7.456(26) and MCLA 207.505(a), MSA 7.456(5). This conveyance does not create a division of any parcel of real property and no divisions have been made since March 31, 1997. This property may be located within the vicinity of 'farmland or a farm operation. Generally a..

data[0]['entities']

[[70, 85, 'Recording Number'],
 [101, 111, 'Recording Date'],
 [199, 214, 'Doc Type'],
 [235, 311, 'Seller'],
 [405, 416, 'Buyer']]

How to reproduce the behaviour

Created train.spacy

from spacy.util import filter_spans 

for training_example in tqdm(data): 
  text = training_example['text'] 
  labels = training_example['entities'] 
  doc = nlp.make_doc(text) 
  ents = [] 
  for start, end, label in labels: 
    span = doc.char_span(start, end, label=label, alignment_mode="contract") 
    if span is None: 
      print("Skipping entity") 
    else: 
      ents.append(span) 
      filtered_ents = filter_spans(ents) 
      doc.ents = filtered_ents 
      doc_bin.add(doc) 
  
  doc_bin.to_disk("train.spacy")

Created config.cfg

!python -m spacy init fill-config base_config.cfg config.cfg

Output:

✔ Auto-filled config with all values
✔ Saved config
config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy

Now when I am trying to train the model, getting the following error in the output:
!python -m spacy train config.cfg --output ./ --paths.train ./train.spacy --paths.dev ./train.spacy
Output:

ℹ Saving to output directory: .
ℹ Using CPU
ℹ To switch to GPU 0, use the option: --gpu-id 0

=========================== Initializing pipeline ===========================
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
^C

Automatic ^C is coming by itself and stopping the training.

Your Environment

Info about spaCy

  • spaCy version: 3.7.3
  • Platform: Linux-6.1.58+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Pipelines: en_core_web_lg (3.7.1), en_core_web_sm (3.7.1)
@explosion explosion locked and limited conversation to collaborators Feb 11, 2024
@danieldk danieldk converted this issue into discussion #13320 Feb 11, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant