Skip to content

machine-intelligence-lab/raytune_example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contents

  • utils.py : A code containing common codes
  • train.py : A normal training code without 'raytune'
  • train_tune.py : A training code with 'raytune'

Required libraries

  • pytorch
  • raytune
    -Installation:
$ pip install 'ray[tune]'

Run

1. clone

$ git clone https://github.com/machine-intelligence-lab/raytune_example.git

2. run

<train.py>

$ python3 train.py

<train_tune.py>

$ CUDA_VISIBLE_DEVICES=x,x python3 train_tune.py
  • It runs total 5 trials and each trial runs maximum 30 epochs
  • It assigns 2 CPUs and 0.25 GPUs to each trial (That is, 4 trials share one gpu)
  • It saves logs into ".ray_result/expr_name"
    (expr_name has a name like 'DEFAULT_yyyy_mm_dd_hh_mm_ss')

Tensorboard

If tensorboard is installed, you can visualize trial results
Command:

$ tensorboard --logdir=.ray_result/expr_name

If you run trials on a server, then it might be more convinient just copying the experiment directory to your local computer where a web-browser works