Skip to content

Latest commit

 

History

History
193 lines (158 loc) · 4.44 KB

README.md

File metadata and controls

193 lines (158 loc) · 4.44 KB

hao

configurations, logs and others.

install

pip install hao

precondition

The folder contained any of the following files (searched in this very order) will be treated as project root path.

  • pyproject.toml
  • requirements.txt
  • setup.py
  • LICENSE
  • .idea
  • .git
  • .vscode

If your project structure does NOT conform to this, it will not work as expected.

features

config

It will try to load YAML config file from conf folder

.                               # project root
├── conf
│   ├── config-{env}.yml        # if `export env=abc`, will raise error if not found
│   ├── config-{hostname}.yml   # try to load this file, then the default `config.yml`
│   └── config.yml              # the default config file that should always exist
├── pyproject.toml              # or requirements.txt
├── .git

In following order:

if os.environ.get("env") is not None:
    try_to_load(f'config-{env}.yml', fallback='config.yml')                   # echo $env
else:
    try_to_load(f'config-{socket.gethostname()}.yml', fallback='config.yml')  # echo hostname

Say you have the following content in your config file:

# config.yml
es:
  default:
    host: 172.23.3.3
    port: 9200
    indices:
      - news
      - papers

The get the configured values in your code:

import hao
es_host = hao.config.get('es.default.host')          # str
es_port = hao.config.get('es.default.port')          # int
indices = hao.config.get('es.default.indices')       # list
...

logs

Set the logger levels to filter logs

e.g.

# config.yml
logging:
  __main__: DEBUG
  transformers: WARNING
  lightning: INFO
  pytorch_lightning: INFO
  elasticsearch: WARNING
  tests: DEBUG
  root: INFO                        # root level

Settings for logger:

# config.yml
logger:
  format: "%(asctime)s %(levelname)-7s %(name)s:%(lineno)-4d - %(message)s"   # overwrite to change to other format
  handlers:
    TimedRotatingFileHandler:    # any Handlers in `logging` and `logging.handlers` with it's config
      when: d
      backupCount: 3

Example

logger:
  format: "%(asctime)s %(levelname)-7s %(name)s:%(lineno)-4d - %(message)s"   # overwrite to change to other format
  handlers:
    stdout:
      format: "%(asctime)s %(levelname)-7s %(name)s:%(lineno)-4d - %(message)s"   # overwrite to change to other format
    file:
      format: "%(message)s"   # overwrite to change to other format
      handler: TimedRotatingFileHandler
      args:
        when: d
        backupCount: 3
        filename: test.log
    rolling-file:
      handler: TimedRotatingFileHandler
      args:
        when: d
        backupCount: 3
        filename: hello.log

logging:
  root: INFO
  torch.models: INFO
  __main__: DEBUG
  access:
    level: INFO
    handlers:
      - stdout
      - rolling-file
  test:
    level: INFO
    handlers:
      - file

Declare and user the logger

import hao
LOGGER = hao.logs.get_logger(__name__)

LOGGER.debug('message')
LOGGER.info('message')
LOGGER.warnning('message')
LOGGER.error('message')
LOGGER.exception(err)

namespaces

import hao
from hao.namespaces import from_args, attr

@from_args
class ProcessConf(object):
    file_in = attr(str, required=True, help="file path to process")
    file_out = attr(str, required=True, help="file path to save")
    tokenizer = attr(str, required=True, choice=('wordpiece', 'bpe'))


from argparse import Namespace
from pytorch_lightning import Trainer
@from_args(adds=Trainer.add_argparse_args)
class TrainConf(Namespace):
    root_path_checkpoints = attr(str, default=hao.paths.get_path('data/checkpoints/'))
    dataset_train = attr(str, default='train.txt')
    dataset_val = attr(str, default='val.txt')
    dataset_test = attr(str, default='test.txt')
    batch_size = attr(int, default=128, key='train.batch_size')                          # key means try to load from config.yml by the key
    task = attr(str, choices=('ner', 'nmt'), default='ner')
    seed = attr(int)
    epochs = attr(int, default=5)

Where attr is a wrapper for argpars.add_argument()

Usage 1: overwrite the default value from command line

python -m your_module --task=nmt

Usage 2: overwrite the default value from constructor

train_conf = TrainConf(task='nmt')

Value lookup order:

  • command line
  • constructor
  • config yml if key specified in attr
  • default if specified in attr