Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #2

Closed
raphael10-collab opened this issue May 17, 2021 · 4 comments

Comments

@raphael10-collab
Copy link

When trying to train the model with python ./jerex_train.py --config-path configs/docred_joint I get this
message You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
What does it mean? How should I then train the model instead?

O.S: Ubuntu 18.04.4

/jerex$ python ./jerex_train.py --config-path configs/docred_joint
datasets:
  train_path: ./data/datasets/docred_joint/train_joint.json
  valid_path: ./data/datasets/docred_joint/dev_joint.json
  test_path: null
  types_path: ./data/datasets/docred_joint/types.json
model:
  model_type: joint_multi_instance
  encoder_path: bert-base-cased
  tokenizer_path: bert-base-cased
  mention_threshold: 0.85
  coref_threshold: 0.85
  rel_threshold: 0.6
  prop_drop: 0.1
  meta_embedding_size: 25
  size_embeddings_count: 30
  ed_embeddings_count: 300
  token_dist_embeddings_count: 700
  sentence_dist_embeddings_count: 50
  position_embeddings_count: 700
sampling:
  neg_mention_count: 200
  neg_coref_count: 200
  neg_relation_count: 200
  max_span_size: 10
  sampling_processes: 8
  neg_mention_overlap_ratio: 0.5
  lowercase: false
loss:
  mention_weight: 1.0
  coref_weight: 1.0
  entity_weight: 0.25
  relation_weight: 1.0
inference:
  valid_batch_size: 1
  test_batch_size: 1
  max_spans: null
  max_coref_pairs: null
  max_rel_pairs: null
training:
  batch_size: 1
  min_epochs: 20
  max_epochs: 20
  lr: 5.0e-05
  lr_warmup: 0.1
  weight_decay: 0.01
  max_grad_norm: 1.0
  accumulate_grad_batches: 1
  max_spans: null
  max_coref_pairs: null
  max_rel_pairs: null
distribution:
  gpus: []
  accelerator: ''
  prepare_data_per_node: false
misc:
  store_predictions: true
  store_examples: true
  flush_logs_every_n_steps: 1000
  log_every_n_steps: 1000
  deterministic: false
  seed: null
  cache_path: null
  precision: 32
  profiler: null
  final_valid_evaluate: true

Parse dataset '/home/marco/PyTorchMatters/EntitiesRelationsExtraction/jerex/data/datasets/docred_joint/train_joint.json': 100%|██████| 3008/3008 [00:41<00:00, 71.72it/s]
Parse dataset '/home/marco/PyTorchMatters/EntitiesRelationsExtraction/jerex/data/datasets/docred_joint/dev_joint.json': 100%|██████████| 300/300 [00:03<00:00, 75.22it/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing JointMultiInstanceModel: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.pooler.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.pooler.dense.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing JointMultiInstanceModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing JointMultiInstanceModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of JointMultiInstanceModel were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['entity_classification.entity_classifier.weight', 'relation_classification.pair_linear.weight', 'coreference_resolution.coref_classifier.weight', 'relation_classification.rel_classifier.weight', 'mention_localization.size_embeddings.weight', 'relation_classification.rel_linear.weight', 'mention_localization.linear.weight', 'relation_classification.sentence_distance_embeddings.weight', 'relation_classification.token_distance_embeddings.weight', 'coreference_resolution.coref_linear.bias', 'coreference_resolution.coref_linear.weight', 'mention_localization.mention_classifier.weight', 'entity_classification.linear.bias', 'mention_localization.linear.bias', 'relation_classification.entity_type_embeddings.weight', 'entity_classification.linear.weight', 'relation_classification.rel_linear.bias', 'relation_classification.rel_classifier.bias', 'coreference_resolution.coref_ed_embeddings.weight', 'coreference_resolution.coref_classifier.bias', 'entity_classification.entity_classifier.bias', 'mention_localization.mention_classifier.bias', 'relation_classification.pair_linear.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type                    | Params
--------------------------------------------------
0 | model | JointMultiInstanceModel | 113 M 
--------------------------------------------------
113 M     Trainable params
0         Non-trainable params
113 M     Total params
455.954   Total estimated model params size (MB)
@markus-eberts
Copy link
Member

markus-eberts commented May 17, 2021

Hi,

this is just a remark by the Huggingface library - no need to worry. We are using the BERT implementation of Huggingface internally. You are doing everything correctly here. When executing the train code (as you do), you train JEREX (and fine-tune into BERT) on a down-stream task (end-to-end relation extraction) and you can then use the model for prediction.

@raphael10-collab
Copy link
Author

Thank you @markus-eberts .

Now I've got this memory issue: #3

@aurelien-m
Copy link

Does anyone know if there's a way to hide this message? :)

@kantholtz
Copy link
Member

Hi, you should be able to suppress messages by decreasing the logging verbosity as described in their documentation: https://huggingface.co/docs/transformers/main_classes/logging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants