Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

求问:使用auc和logloss指标效果很差 #44

Open
AML-CityU opened this issue Dec 19, 2022 · 1 comment
Open

求问:使用auc和logloss指标效果很差 #44

AML-CityU opened this issue Dec 19, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@AML-CityU
Copy link

AML-CityU commented Dec 19, 2022

您好,我是一名使用者,想用recbole-cdr进行跨域CTR任务,需要AUC与logloss做输出,但发现这两个指标输出效果很差。希望寻求参数/模型调整建议。
测试使用的是代码recbole_cdr/dataset_example下的两个数据集(source:ml-1m, target: ml-100k),使用theshold=4过滤标签。不论基础模型是哪个输出的AUC都在0.6左右。但相同的target数据集使用其他地方的单域模型代码(测试用的deepfm)都能达到AUC>0.75
我对一些超参数进行过调整(如xx_xx_num_interval, 学习率,valid_metric,甚至theshold=3等),但没有明显提升效果。
下面是我使用的recbole-cdr模型参数,请参考:

1.参数文件sample.yaml:

# dataset config
gpu_id: 0
state: INFO
field_separator: "\t"
use_gpu: True
seed: 2000
reproducibility: True
data_path: 'dataset/'
checkpoint_dir: 'saved'
show_progress: True
save_dataset: False
dataset_save_path: ~
save_dataloaders: False
dataloaders_save_path: ~
log_wandb: False
wandb_project: 'recbole_cdr'
normalize_all: True

# training settings
train_epochs: ["BOTH:300"]
train_batch_size: 2048
learner: adam
neg_sampling:
  uniform: 1
eval_step: 1
stopping_step: 10
clip_grad_norm: ~
weight_decay: 1e-3
loss_decimal_place: 6
require_pow: False

# evaluation settings
eval_args: 
  split: {'RS':[0.8,0.1,0.1]}
  group_by: None
  mode: labeled
repeatable: False
metrics: ['AUC', 'LogLoss']
valid_metric: AUC
valid_metric_bigger: True
eval_batch_size: 2048
metric_decimal_place: 6

source_domain:
  dataset: ml-1m
  data_path: 'dataset/'
  seq_separator: " "
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  threshold:
    rating: 4
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[5,inf)"
  item_inter_num_interval: "[5,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

target_domain:
  dataset: ml-100k
  data_path: 'dataset/'
  seq_separator: ","
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  threshold:
    rating: 4
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[5,inf)"
  item_inter_num_interval: "[5,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

2.python 文件:

import argparse
from recbole_cdr.quick_start import run_recbole_cdr


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', '-m', type=str, default='DTCDR', help='name of models')
    parser.add_argument('--config_files', type=str, default='sample.yaml', help='config files')

    args, _ = parser.parse_known_args()

    config_file_list = args.config_files.strip().split(' ') if args.config_files else None
    print(config_file_list)
    run_recbole_cdr(model=args.model, config_file_list=config_file_list)
  1. 其中一个基础模型DTCDR的yaml参数:
embedding_size: 64
base_model: NeuMF
learning_rate: 0.0005
mlp_hidden_size: [64, 64]
dropout_prob: 0.3
alpha: 0.3

感谢您的帮助!

@AML-CityU AML-CityU added the bug Something isn't working label Dec 19, 2022
@Wicknight Wicknight self-assigned this Dec 22, 2022
@Wicknight
Copy link
Collaborator

Wicknight commented Dec 22, 2022

@AML-CityU 您好,感谢您对RecBole-CDR的关注!
需要向您确认几点信息:

  1. 您在测试单域ctr模型时是否使用了额外的特征?RecBole-CDR中的模型并未使用任何内容信息;
  2. 您在测试单域ctr模型时是否也测试了一些top_n推荐模型的效果?因为这些跨域模型的目标实际都是做top-n推荐的,对特征的处理不如ctr模型高效(如deepfm),推荐您将这些模型和BPR等模型进行一次对比,这样可能才是一次公平的比较。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants