Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --target_dir option #83

Merged
merged 10 commits into from
Apr 17, 2020
Merged

Add --target_dir option #83

merged 10 commits into from
Apr 17, 2020

Conversation

moraval
Copy link
Contributor

@moraval moraval commented Apr 1, 2020

I need to add tests.

@coveralls
Copy link

coveralls commented Apr 1, 2020

Pull Request Test Coverage Report for Build 4300

  • 28 of 47 (59.57%) changed or added relevant lines in 8 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.04%) to 88.911%

Changes Missing Coverage Covered Lines Changed/Added Lines %
emloop/cli/resume.py 1 2 50.0%
emloop/cli/eval.py 0 2 0.0%
emloop/cli/train.py 1 3 33.33%
emloop/entry_point.py 0 3 0.0%
emloop/cli/args.py 0 11 0.0%
Files with Coverage Reduction New Missed Lines %
emloop/cli/args.py 1 8.33%
emloop/cli/eval.py 1 38.89%
Totals Coverage Status
Change from base Build 4154: -0.04%
Covered Lines: 4017
Relevant Lines: 4518

💛 - Coveralls

Copy link
Contributor

@blazekadam blazekadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please do not reformat the code randomly (or at least do it in a separate commit) it is quite hard to decipher the actual substance
  • the MainLoop shall be left out of this entirely
  • I think output_dir would be much better name for the argument since it would be in-line with output_root (and current in-code naming convention)
  • this should be tested

@@ -32,6 +32,7 @@ class MainLoop(CaughtInterrupts): # pylint: disable=too-many-instance-attribut
def __init__(self, # pylint: disable=too-many-arguments
model: AbstractModel, dataset: AbstractDataset,
hooks: Iterable[AbstractHook]=(),
output_dir: str='',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

@@ -184,7 +186,8 @@ def _run_epoch(self, stream: StreamWrapper, train: bool) -> None:
continue
elif self._fixed_batch_size:
if batch_sizes != {self._fixed_batch_size}:
var, len_ = [(k, len(v)) for k, v in batch_input.items() if len(v) != self._fixed_batch_size][0]
var, len_ = [(k, len(v)) for k, v in batch_input.items()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not randomly reformat the code without changing its semantics

Copy link
Contributor Author

@moraval moraval Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blazekadam I would love to, but the problem is that I have an automatic pep8 style checker that updates the whole file if I save it..
But I'll try to avoid it as much as possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea well you need to find some other way to check the code. Just see the update on line 198/199... :(

emloop/api.py Outdated
return output_dir


def config_log_to_output_dir(config: dict, output_dir: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I truly can not comprehend this function name.
  • is the config arg even used?
  • what about
os.makedirs(x, exist_ok=True)

emloop/api.py Outdated
logging.info('\tOutput dir is: %s', output_dir)

if not os.path.exists(output_dir):
logging.info('\tOutput dir folder "%s" does not exist and will be created', output_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info('\tOutput dir folder "%s" does not exist and will be created', output_dir)
logging.info('\tOutput dir `%s` does not exist and will be created', output_dir)

emloop/api.py Outdated
:param output_root: dir where output_dir shall be created
:param restore_from: if not None, from whence the model should be restored (backend-specific information)

:return: main loop object
"""
output_dir = dataset = model = hooks = main_loop = None
dataset = model = hooks = main_loop = None
output_dir = target_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should warn you that something is fishy with the naming...

emloop/api.py Outdated

output_dir = create_output_dir(config=config, output_root=output_root)
if not output_dir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the only call of the create_output_dir. In any case, putting the whole use the given name or figure out one and make sure it exists logic to the already-existing create_output_dir function would be imo better and more compact. (And you would not have to think about the first sentence)

@blazekadam
Copy link
Contributor

Hmmm... it looks like we do not test most of the cli sub-module. I would appreciate at least unit-testing the create_ouput_dir fn (a] generated name, b] specific name)

@blazekadam
Copy link
Contributor

Ha, I ignored the WIP status of this PR... sorry. The remarks are quite valid though... :)

@moraval moraval changed the title WIP: Add --target_dir option Add --target_dir option Apr 6, 2020
emloop/api.py Outdated
@@ -16,17 +16,24 @@
from .main_loop import MainLoop


def create_output_dir(config: dict, output_root: str, default_model_name: str='Unnamed') -> str:
def create_output_dir(config: dict, output_root: str, default_model_name: str='Unnamed', output_dir: str='') -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have a None as the default.

emloop/api.py Outdated Show resolved Hide resolved
emloop/api.py Outdated
:return: path to the created output_dir
"""
if output_dir:
logging.info('\tOutput dir is: %s', output_dir)
os.makedirs(output_dir, exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.makedirs(output_dir, exist_ok=True)
os.makedirs(path.join(output_root, output_dir), exist_ok=True)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I expect the user to give the whole path to the --output_dir argument.. Not just the name of the output_dir within the output_root.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, but that is a bit confusing behavior. The --output_root argument is required (but have a default). I do not expect it to be entirely ignored if and only if --output_dir is present.

emloop/api.py Outdated
if not os.path.exists(output_dir):
logging.info('\tOutput dir folder "%s" does not exist and will be created', output_dir)
os.makedirs(output_dir)
def create_config_log(config: dict, output_dir: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dislike this function (and its name). It does two quite different things none of which is what I would expect from the name (creating a log for logging the configuration?).

lets just rewrite the create_output_dir as

if output_dir is not None:
    # log that the dir was specified
    # join the output_root and outptu_dir to get output_path
else:
    # log that the dir name will be generated
    # generate the name and join it to get the output_path
# proceed with the dir creation, config dumping, logger set up etc.

and this fn is suddenly not needed

emloop/cli/train.py Outdated Show resolved Hide resolved
emloop/api.py Outdated
@@ -16,7 +16,7 @@
from .main_loop import MainLoop


def create_output_dir(config: dict, output_root: str, default_model_name: str='Unnamed', output_dir: str='') -> str:
def create_output_dir(config: dict, output_root: str, default_model_name: str='Unnamed', output_dir: str=None) -> str:
Copy link
Contributor

@blazekadam blazekadam Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is Optional[str] now

emloop/api.py Outdated
:return: path to the created output_dir
"""
if output_dir:
logging.info('\tOutput dir is: %s', output_dir)
os.makedirs(output_dir, exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, but that is a bit confusing behavior. The --output_root argument is required (but have a default). I do not expect it to be entirely ignored if and only if --output_dir is present.

emloop/api.py Outdated Show resolved Hide resolved
emloop/api.py Outdated
yaml_to_file(data=config, output_dir=output_dir, name=EL_CONFIG_FILE)

# create file logger
file_handler = logging.FileHandler(path.join(output_dir, EL_LOG_FILE))
file_handler.setFormatter(logging.Formatter(EL_LOG_FORMAT, datefmt=EL_LOG_DATE_FORMAT))
logging.getLogger().addHandler(file_handler)

logging.info(f'Output directory has name {output_dir}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info(f'Output directory has name {output_dir}')
logging.info(f'Created output directory with name {output_dir}')

emloop/api.py Outdated
@@ -224,7 +213,7 @@ def create_hooks(config: dict, model: Optional[AbstractModel]=None, dataset: Opt
return hooks


def create_main_loop(config: dict, output_root: str, restore_from: str=None, output_dir: str='') -> MainLoop:
def create_main_loop(config: dict, output_root: str, restore_from: str=None, output_dir: str=None) -> MainLoop:
Copy link
Contributor

@blazekadam blazekadam Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the arg is optional str now

@blazekadam
Copy link
Contributor

pls resolve the conflicts and merge it, I ll re-review it afterwards

@moraval
Copy link
Contributor Author

moraval commented Apr 16, 2020

I need to push --force it, so that the commits won't be duplicate here - are you okay with it?

Copy link
Contributor

@blazekadam blazekadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code and tests look fine but I truly hate the whitespace changes :(

@@ -184,7 +186,8 @@ def _run_epoch(self, stream: StreamWrapper, train: bool) -> None:
continue
elif self._fixed_batch_size:
if batch_sizes != {self._fixed_batch_size}:
var, len_ = [(k, len(v)) for k, v in batch_input.items() if len(v) != self._fixed_batch_size][0]
var, len_ = [(k, len(v)) for k, v in batch_input.items()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea well you need to find some other way to check the code. Just see the update on line 198/199... :(

@moraval moraval merged commit 7596b05 into dev Apr 17, 2020
@moraval moraval deleted the target-dir-option branch April 17, 2020 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants