You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #2

raphael10-collab · 2021-05-17T16:44:03Z

When trying to train the model with python ./jerex_train.py --config-path configs/docred_joint I get this
message You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
What does it mean? How should I then train the model instead?

O.S: Ubuntu 18.04.4

/jerex$ python ./jerex_train.py --config-path configs/docred_joint
datasets:
  train_path: ./data/datasets/docred_joint/train_joint.json
  valid_path: ./data/datasets/docred_joint/dev_joint.json
  test_path: null
  types_path: ./data/datasets/docred_joint/types.json
model:
  model_type: joint_multi_instance
  encoder_path: bert-base-cased
  tokenizer_path: bert-base-cased
  mention_threshold: 0.85
  coref_threshold: 0.85
  rel_threshold: 0.6
  prop_drop: 0.1
  meta_embedding_size: 25
  size_embeddings_count: 30
  ed_embeddings_count: 300
  token_dist_embeddings_count: 700
  sentence_dist_embeddings_count: 50
  position_embeddings_count: 700
sampling:
  neg_mention_count: 200
  neg_coref_count: 200
  neg_relation_count: 200
  max_span_size: 10
  sampling_processes: 8
  neg_mention_overlap_ratio: 0.5
  lowercase: false
loss:
  mention_weight: 1.0
  coref_weight: 1.0
  entity_weight: 0.25
  relation_weight: 1.0
inference:
  valid_batch_size: 1
  test_batch_size: 1
  max_spans: null
  max_coref_pairs: null
  max_rel_pairs: null
training:
  batch_size: 1
  min_epochs: 20
  max_epochs: 20
  lr: 5.0e-05
  lr_warmup: 0.1
  weight_decay: 0.01
  max_grad_norm: 1.0
  accumulate_grad_batches: 1
  max_spans: null
  max_coref_pairs: null
  max_rel_pairs: null
distribution:
  gpus: []
  accelerator: ''
  prepare_data_per_node: false
misc:
  store_predictions: true
  store_examples: true
  flush_logs_every_n_steps: 1000
  log_every_n_steps: 1000
  deterministic: false
  seed: null
  cache_path: null
  precision: 32
  profiler: null
  final_valid_evaluate: true

Parse dataset '/home/marco/PyTorchMatters/EntitiesRelationsExtraction/jerex/data/datasets/docred_joint/train_joint.json': 100%|██████| 3008/3008 [00:41<00:00, 71.72it/s]
Parse dataset '/home/marco/PyTorchMatters/EntitiesRelationsExtraction/jerex/data/datasets/docred_joint/dev_joint.json': 100%|██████████| 300/300 [00:03<00:00, 75.22it/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing JointMultiInstanceModel: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.pooler.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.pooler.dense.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing JointMultiInstanceModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing JointMultiInstanceModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of JointMultiInstanceModel were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['entity_classification.entity_classifier.weight', 'relation_classification.pair_linear.weight', 'coreference_resolution.coref_classifier.weight', 'relation_classification.rel_classifier.weight', 'mention_localization.size_embeddings.weight', 'relation_classification.rel_linear.weight', 'mention_localization.linear.weight', 'relation_classification.sentence_distance_embeddings.weight', 'relation_classification.token_distance_embeddings.weight', 'coreference_resolution.coref_linear.bias', 'coreference_resolution.coref_linear.weight', 'mention_localization.mention_classifier.weight', 'entity_classification.linear.bias', 'mention_localization.linear.bias', 'relation_classification.entity_type_embeddings.weight', 'entity_classification.linear.weight', 'relation_classification.rel_linear.bias', 'relation_classification.rel_classifier.bias', 'coreference_resolution.coref_ed_embeddings.weight', 'coreference_resolution.coref_classifier.bias', 'entity_classification.entity_classifier.bias', 'mention_localization.mention_classifier.bias', 'relation_classification.pair_linear.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type                    | Params
--------------------------------------------------
0 | model | JointMultiInstanceModel | 113 M 
--------------------------------------------------
113 M     Trainable params
0         Non-trainable params
113 M     Total params
455.954   Total estimated model params size (MB)

The text was updated successfully, but these errors were encountered:

markus-eberts · 2021-05-17T17:02:18Z

Hi,

this is just a remark by the Huggingface library - no need to worry. We are using the BERT implementation of Huggingface internally. You are doing everything correctly here. When executing the train code (as you do), you train JEREX (and fine-tune into BERT) on a down-stream task (end-to-end relation extraction) and you can then use the model for prediction.

raphael10-collab · 2021-05-18T04:48:18Z

Thank you @markus-eberts .

Now I've got this memory issue: #3

aurelien-m · 2023-05-16T12:39:53Z

Does anyone know if there's a way to hide this message? :)

kantholtz · 2023-05-18T12:06:34Z

Hi, you should be able to suppress messages by decreasing the logging verbosity as described in their documentation: https://huggingface.co/docs/transformers/main_classes/logging

raphael10-collab closed this as completed May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #2

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #2

raphael10-collab commented May 17, 2021

markus-eberts commented May 17, 2021 •

edited

raphael10-collab commented May 18, 2021

aurelien-m commented May 16, 2023

kantholtz commented May 18, 2023

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #2

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #2

Comments

raphael10-collab commented May 17, 2021

markus-eberts commented May 17, 2021 • edited

raphael10-collab commented May 18, 2021

aurelien-m commented May 16, 2023

kantholtz commented May 18, 2023

markus-eberts commented May 17, 2021 •

edited