# Tune a model via grid search

FuxiCTR version: v1.0

This tutorial shows how to tune model hyper-parameters via grid search over the specified tuning space.


We provide a useful tool script `run_param_tuner.py` to tune FuxiCTR models based on YAML config files.

+ --config: The config file that defines the tuning space
+ --gpu: The available gpus for parameters tuning and multiple gpus can be used (e.g., using --gpu 0 1 for two gpus)
+ --tag: (optional) Specify the tag to determine which expid to run (e.g. 001 for the first expid). This is useful to rerun one specific experiment_id that contains the tag.

In the following example, we use the hyper-parameters of `FM_test` in [./config](https://github.com/xue-pai/FuxiCTR/tree/main/config) as the base setting, and create a tuner config file `FM_tuner_config.yaml` in [benchmarks/tuner_config](https://github.com/xue-pai/FuxiCTR/tree/main/benchmarks/tuner_config), which defines the tuning space for parameter tuning. 

In [None]:
# FM_tuner_config.yaml
base_config: ../config/ # the location of base config
base_expid: FM_test # the expid of default hyper-parameters
dataset_id: taobao_tiny_data # the dataset_id used

tuner_space:
    model_root: './tuner_config/' # the value will override the default value in FM_test
    embedding_dim: [16, 32] # the values in the list will be grid-searched
    regularizer: [0, 1.e-6, 1.e-5] # the values in the list will be grid-searched
    learning_rate: 1.e-3 # it is equivalent to [1.e-3]
    batch_size: 128 # the value will override the default value in FM_test

Specifically, if a key in `tuner_space` has values stored in a list, those values will be grid-searched. Otherwise, the default value in `FM_test` will be applied.

Run the following command to start:

In [None]:
!cd benchmarks
!python run_param_tuner.py --config ./tuner_config/FM_tuner_config.yaml --gpu 0 1

After finished, all the searched results can be accessed from `FM_tuner_config.csv` in the `./benchmarks` folder.

Note that if you want to run only one group of hyper-parameters in the search space, you can use `--tag` to specify which one to run. In the following example, 001 means the expid (i.e., FM_test_001_7f7f3b34) corresponding to the first group of hyper-parameters. It is useful when one needs to rerun an expid for reproduction.

In [None]:
!cd benchmarks
!python run_param_tuner.py --config ./tuner_config/FM_tuner_config.yaml --tag 001 --gpu 0 1

While the above example config file shows how to import base_expid and dataset_id from the base_config folder, it is also flexible to directly expand the base setting in the tunner config file. Both configurations are the same.

In [None]:
# This example load base_expid and dataset_id from the same file
base_expid: FM_test # the expid of default hyper-parameters
dataset_id: taobao_tiny_data # the dataset_id used

model_config:
    FM_test:
        model_root: '../checkpoints/'
        workers: 3
        verbose: 1
        patience: 2
        pickle_feature_encoder: True
        use_hdf5: True
        save_best_only: True
        every_x_epochs: 1
        debug: False
        model: FM
        dataset_id: taobao_tiny_data
        loss: binary_crossentropy
        metrics: ['logloss', 'AUC']
        task: binary_classification
        optimizer: adam
        learning_rate: 1.0e-3
        regularizer: 1.e-8
        batch_size: 128
        embedding_dim: 4
        epochs: 1
        shuffle: True
        seed: 2019
        monitor: 'AUC'
        monitor_mode: 'max'
    
dataset_config:
    taobao_tiny_data:
        data_root: ../data/
        data_format: csv
        train_data: ../data/tiny_data/train_sample.csv
        valid_data: ../data/tiny_data/valid_sample.csv
        test_data: ../data/tiny_data/test_sample.csv
        min_categr_count: 1
        feature_cols:
            - {name: ["userid","adgroup_id","pid","cate_id","campaign_id","customer","brand","cms_segid",
                      "cms_group_id","final_gender_code","age_level","pvalue_level","shopping_level","occupation"], 
                      active: True, dtype: str, type: categorical}
        label_col: {name: clk, dtype: float}

tuner_space:
    model_root: './tuner_config/' # the value will override the default value in FM_test
    embedding_dim: [16, 32] # the values in the list will be grid-searched
    regularizer: [0, 1.e-6, 1.e-5] # the values in the list will be grid-searched
    learning_rate: 1.e-3 # it is equivalent to [1.e-3]
    batch_size: 128 # the value will override the default value in FM_test

.

If you want to find more running examples, please refer to the benchmarking results in the [BARS-CTR-Prediction](https://openbenchmark.github.io/ctr-prediction) benchmark.