WIP: Progress logger #1171

Closed
wants to merge 33 commits into
from

Conversation

Projects
None yet
5 participants
@GaelVaroquaux
Member

GaelVaroquaux commented Sep 20, 2012

Sketch of a small logging framework for the scikit-learn, to avoid using 'print'.

This is still very a work in progress, but I am opening a pull request to enable comments.

TODOS:

  • Convert all prints to logger.progress
  • Write some documentation, and an example of fancy logging
  • Implement a progress_context to monitor better progress in for loops (may be done in a later PR)

As a side note, as this branch touches to many files, it is going to lead to merge nightmares. I am going to try to rebase it often (which will probably screw up github :( )

Also, the 'internal/object_log.py' example will probably be removed at some point. It is there only to enable demo and discussion of the API of the ProgressLogger.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Sep 20, 2012

Member

This might be a stupid question but could you list the benefits of using a logger object?

Member

amueller commented Sep 20, 2012

This might be a stupid question but could you list the benefits of using a logger object?

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member

This might be a stupid question but could you list the benefits of using a
logger object?

Mainly make the web developpers happy because they are used to such
framework.

One of the side aspects of it, is that you get overrideable/configurable
streams of logging. As a results, it makes it much easier to set up
computation e.g. on a cluster, and get good log files to figure out what
has been going on. Note that there is probably a bit more work to do on
the logger object to get good log files.

Member

GaelVaroquaux commented Sep 20, 2012

This might be a stupid question but could you list the benefits of using a
logger object?

Mainly make the web developpers happy because they are used to such
framework.

One of the side aspects of it, is that you get overrideable/configurable
streams of logging. As a results, it makes it much easier to set up
computation e.g. on a cluster, and get good log files to figure out what
has been going on. Note that there is probably a bit more work to do on
the logger object to get good log files.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Sep 20, 2012

Member

On 09/20/2012 09:55 AM, Gael Varoquaux wrote:

This might be a stupid question but could you list the benefits of
using a
logger object?

Mainly make the web developpers happy because they are used to such
framework.
That doesn't seem that good a reason :-/

One of the side aspects of it, is that you get overrideable/configurable
streams of logging. As a results, it makes it much easier to set up
computation e.g. on a cluster, and get good log files to figure out what
has been going on. Note that there is probably a bit more work to do on
the logger object to get good log files.

Good log files are always good. But this seems like a lot of added
complexity.
Do you have a use-case currently?

Member

amueller commented Sep 20, 2012

On 09/20/2012 09:55 AM, Gael Varoquaux wrote:

This might be a stupid question but could you list the benefits of
using a
logger object?

Mainly make the web developpers happy because they are used to such
framework.
That doesn't seem that good a reason :-/

One of the side aspects of it, is that you get overrideable/configurable
streams of logging. As a results, it makes it much easier to set up
computation e.g. on a cluster, and get good log files to figure out what
has been going on. Note that there is probably a bit more work to do on
the logger object to get good log files.

Good log files are always good. But this seems like a lot of added
complexity.
Do you have a use-case currently?

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member

Mainly make the web developpers happy because they are used to such
framework.
That doesn't seem that good a reason :-/

Talk to @ogrisel about that.

Good log files are always good. But this seems like a lot of added
complexity.

Agreed.

Do you have a use-case currently?

No, not really. We do have a will in a project that I manage, to have
systematic logging, but that's a couple years down the line.

The more I work on this, the more I get annoyed with the whole Python
logging framework.

When I see Python programmers outside the scientific Python community, as
this week end at PyconFr, they tell me that I should be using the logging
framework. I do worry about the added complexity.

The only way to tell if it brings benefits is to give it a good try, and
drink all the cool aid, i.e. go all the way to using this in production
and analysing log file. That's a lot of work, and won't be done too soon
I fear. @ogrisel seems to think that in real production scenarios it is
necessary.

Member

GaelVaroquaux commented Sep 20, 2012

Mainly make the web developpers happy because they are used to such
framework.
That doesn't seem that good a reason :-/

Talk to @ogrisel about that.

Good log files are always good. But this seems like a lot of added
complexity.

Agreed.

Do you have a use-case currently?

No, not really. We do have a will in a project that I manage, to have
systematic logging, but that's a couple years down the line.

The more I work on this, the more I get annoyed with the whole Python
logging framework.

When I see Python programmers outside the scientific Python community, as
this week end at PyconFr, they tell me that I should be using the logging
framework. I do worry about the added complexity.

The only way to tell if it brings benefits is to give it a good try, and
drink all the cool aid, i.e. go all the way to using this in production
and analysing log file. That's a lot of work, and won't be done too soon
I fear. @ogrisel seems to think that in real production scenarios it is
necessary.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

This might be a stupid question but could you list the benefits of using a
logger object?

Mainly make the web developpers happy because they are used to such
framework.

Not just web developers, anybody writing an long running application on a server.

Using the standard library logging module makes it easier to use scikit-learn as yet another component of a larger python application.

Do you have a use-case currently?

Being able to log running process such as grid search and model selection among many different kind of models, possibly on a distributed setup is very useful do debug strange behaviors (caused by wrong assumption on the data and / or bad parameters).

Member

ogrisel commented Sep 20, 2012

This might be a stupid question but could you list the benefits of using a
logger object?

Mainly make the web developpers happy because they are used to such
framework.

Not just web developers, anybody writing an long running application on a server.

Using the standard library logging module makes it easier to use scikit-learn as yet another component of a larger python application.

Do you have a use-case currently?

Being able to log running process such as grid search and model selection among many different kind of models, possibly on a distributed setup is very useful do debug strange behaviors (caused by wrong assumption on the data and / or bad parameters).

sklearn/__init__.py
# It could have been provided in the environment
_random_seed = os.environ.get('SKLEARN_SEED', None)
if _random_seed is None:
_random_seed = np.random.uniform()*(2**31-1)
_random_seed = int(_random_seed)
- print "I: Seeding RNGs with %r" % _random_seed
+ get_logger(verbosity=1000).progress("I: Seeding RNGs with %r",
+ _random_seed)

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

I would use a simple info instead of progress here. The zoomable progress report does not make sense in that context.

@ogrisel

ogrisel Sep 20, 2012

Member

I would use a simple info instead of progress here. The zoomable progress report does not make sense in that context.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

@GaelVaroquaux you should explain in the summary why you introduced the progress method with nested verbosity instead of just using static info calls. This is what is adding a lot of the complexity of this logging integration and it is addressing a valid use case: managing zoomable verbosity in progress report for the convergence of nested estimators:

GridSearch > DictionaryLearning > Lasso

We don't want Lasso to be as verbose when used within 2 nested levels as it would be when called directly by the user code / script.

This kind of contextual zoomable verbosity is not provided by the default logging module hence the need to extend it here.

Member

ogrisel commented Sep 20, 2012

@GaelVaroquaux you should explain in the summary why you introduced the progress method with nested verbosity instead of just using static info calls. This is what is adding a lot of the complexity of this logging integration and it is addressing a valid use case: managing zoomable verbosity in progress report for the convergence of nested estimators:

GridSearch > DictionaryLearning > Lasso

We don't want Lasso to be as verbose when used within 2 nested levels as it would be when called directly by the user code / script.

This kind of contextual zoomable verbosity is not provided by the default logging module hence the need to extend it here.

ENH: Logger in lfw
The difficulty here was to be able to turn off verbosity during the tests
@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member

@GaelVaroquaux you should explain in the summary why you introduced the
progress method with nested verbosity instead of just using static info
calls.

Actually, I worry much more about the added complexity that gets
distilled in the code base, as in e25073c, rather than a complexity
localized in a specific file.

Member

GaelVaroquaux commented Sep 20, 2012

@GaelVaroquaux you should explain in the summary why you introduced the
progress method with nested verbosity instead of just using static info
calls.

Actually, I worry much more about the added complexity that gets
distilled in the code base, as in e25073c, rather than a complexity
localized in a specific file.

MISC: Address @ogrisel's comment on info
Too bad that we loose the nice display of the context
@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

Why not configure the tests to have the equivalent of logging.basicConfig(level=logging.WARN) at the beginning once and for all?

Why not configure the tests to have the equivalent of logging.basicConfig(level=logging.WARN) at the beginning once and for all?

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

This could be done on a per test module fixture (in a setup_module function) to make it possible to have some test with progress logging enabled (e.g. to test the progress logger itself or to debug a test temporarily).

Member

ogrisel replied Sep 20, 2012

This could be done on a per test module fixture (in a setup_module function) to make it possible to have some test with progress logging enabled (e.g. to test the progress logger itself or to debug a test temporarily).

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member
Member

GaelVaroquaux replied Sep 20, 2012

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

This raises red flags for me: touching a global state in tests. Some of
the messages might actually by useful.

Yes, better have a per-module logging fixture. I think that's the right granularity for tests.

Member

ogrisel replied Sep 20, 2012

This raises red flags for me: touching a global state in tests. Some of
the messages might actually by useful.

Yes, better have a per-module logging fixture. I think that's the right granularity for tests.

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

We could use:

def setup_module():
    logging.getLogger('sklearn').level = logging.WARN

for instance.

Member

ogrisel replied Sep 20, 2012

We could use:

def setup_module():
    logging.getLogger('sklearn').level = logging.WARN

for instance.

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member
Member

GaelVaroquaux replied Sep 20, 2012

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Why the I: ? I would rather avoid introducing our own microsyntax inside the log message.

Why the I: ? I would rather avoid introducing our own microsyntax inside the log message.

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Owner
Owner

GaelVaroquaux replied Sep 20, 2012

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Sep 20, 2012

Member

It is not quite clear how this helps me in distributed systems. As far as I know, there is no support for distributed systems in sklearn atm, so I don't really know how the setup might look like.

I see the case of integrating sklearn in a bigger system. It surely seems useful there.

Does this enable me to, say, write warnings to stdout and debugging messages to a file?

Member

amueller commented Sep 20, 2012

It is not quite clear how this helps me in distributed systems. As far as I know, there is no support for distributed systems in sklearn atm, so I don't really know how the setup might look like.

I see the case of integrating sklearn in a bigger system. It surely seems useful there.

Does this enable me to, say, write warnings to stdout and debugging messages to a file?

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member

so I don't really know how the setup might look like.

I think that you should simply be able to specify a log_file to the
setup_logger function. All this will need to be documented.

Does this enable me to, say, write warnings to stdout and debugging
messages to a file?

I believe so, using the standard Python logging features. For this to
work, however, all warnings need to be channelled through the logging
framework. This is planned.

Member

GaelVaroquaux commented Sep 20, 2012

so I don't really know how the setup might look like.

I think that you should simply be able to specify a log_file to the
setup_logger function. All this will need to be documented.

Does this enable me to, say, write warnings to stdout and debugging
messages to a file?

I believe so, using the standard Python logging features. For this to
work, however, all warnings need to be channelled through the logging
framework. This is planned.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

It is not quite clear how this helps me in distributed systems. As far as I know, there is no support for distributed systems in sklearn atm, so I don't really know how the setup might look like.

On distributed systems you might have node specific failures. Having the ability to log to files on each node make it easier to perform error analysis afterwards. Possibly using tools that collects and gather all the node logs back to the client machine if necessary (but this kind of tools need not be part of the logging framework).

Note that some framework such as IPython.parallel make it trivial to gather the stdout of each engine back to the controller (using a zmq channel in the background) so logging to file might not be that important in that case.

Still being able to have DEBUG level trace information log onto a node local file (+ log rotation and optionally log compression) for post failure debugging while having level INFO messages on stdout for live monitoring can be nice pattern too.

Using the python logging module makes it possible level for classical STDOUT live monitoring when using scikit-learn as an interactive data exploration tool in ipython session or short data processing scripts. But it also allows at the same time application builders to use scikit-learn as a well behaved library in a larger framework that has more complex logging requirements.

Here is an example of a custom logging config for a project. It's not that complicated to setup if you need to:
https://github.com/nuxeo/nuxeo-drive/blob/master/nuxeo-drive-client/nxdrive/logging_config.py#L13

If you don't need it, just use the default STDOUT StreamHandler configured by default in sklearn. The point is that this configuration can be overridden externally by the caller application, without changing the sklearn source by using:

import logging
sklearn_parent_logger = logging.getLogger('sklearn')

# change level global level, introspect sublogger hierarchy or change handlers
Member

ogrisel commented Sep 20, 2012

It is not quite clear how this helps me in distributed systems. As far as I know, there is no support for distributed systems in sklearn atm, so I don't really know how the setup might look like.

On distributed systems you might have node specific failures. Having the ability to log to files on each node make it easier to perform error analysis afterwards. Possibly using tools that collects and gather all the node logs back to the client machine if necessary (but this kind of tools need not be part of the logging framework).

Note that some framework such as IPython.parallel make it trivial to gather the stdout of each engine back to the controller (using a zmq channel in the background) so logging to file might not be that important in that case.

Still being able to have DEBUG level trace information log onto a node local file (+ log rotation and optionally log compression) for post failure debugging while having level INFO messages on stdout for live monitoring can be nice pattern too.

Using the python logging module makes it possible level for classical STDOUT live monitoring when using scikit-learn as an interactive data exploration tool in ipython session or short data processing scripts. But it also allows at the same time application builders to use scikit-learn as a well behaved library in a larger framework that has more complex logging requirements.

Here is an example of a custom logging config for a project. It's not that complicated to setup if you need to:
https://github.com/nuxeo/nuxeo-drive/blob/master/nuxeo-drive-client/nxdrive/logging_config.py#L13

If you don't need it, just use the default STDOUT StreamHandler configured by default in sklearn. The point is that this configuration can be overridden externally by the caller application, without changing the sklearn source by using:

import logging
sklearn_parent_logger = logging.getLogger('sklearn')

# change level global level, introspect sublogger hierarchy or change handlers
@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Sep 20, 2012

Member

Thanks for the explanation @ogrisel :) I think I have a much clearer picture now.

Member

amueller commented Sep 20, 2012

Thanks for the explanation @ogrisel :) I think I have a much clearer picture now.

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member

I think I have a much clearer picture now.

So, what's your take? I am definitely interested in criticism. Of course,
this is very much a WIP, and still needs documentation and example. I
think that the hard part, from the design point of view, is making sure
that we can easily plug the logger everywhere in the scikit.

Member

GaelVaroquaux commented Sep 20, 2012

I think I have a much clearer picture now.

So, what's your take? I am definitely interested in criticism. Of course,
this is very much a WIP, and still needs documentation and example. I
think that the hard part, from the design point of view, is making sure
that we can easily plug the logger everywhere in the scikit.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Sep 20, 2012

Member

I haven't looked at your implementation yet, I just think I got the general idea. It seems good to have. Usually I had the feeling that we added something when we found a reasonable use-case. Scaling up sklearn applications is definitely an important topic and olivier is working on that a lot. So if he wants it, sure ;)

From a user-perspective, I'd prefer it if we could keep the current interface. For example I might be very interested in the debug output of my classifier, but not in the PCA and feature selection I do beforehand.
To me it seems that is possible with your current implementation which is great.

I'd have to read the logging module docs to really understand I guess. For example is there one global logger instance?

Member

amueller commented Sep 20, 2012

I haven't looked at your implementation yet, I just think I got the general idea. It seems good to have. Usually I had the feeling that we added something when we found a reasonable use-case. Scaling up sklearn applications is definitely an important topic and olivier is working on that a lot. So if he wants it, sure ;)

From a user-perspective, I'd prefer it if we could keep the current interface. For example I might be very interested in the debug output of my classifier, but not in the PCA and feature selection I do beforehand.
To me it seems that is possible with your current implementation which is great.

I'd have to read the logging module docs to really understand I guess. For example is there one global logger instance?

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 20, 2012

Member

From a user-perspective, I'd prefer it if we could keep the current
interface. For example I might be very interested in the debug output of
my classifier, but not in the PCA and feature selection I do beforehand.

Absolutely. This is also central to me.

I'd have to read the logging module docs to really understand I guess. For
example is there one global logger instance?

Somewhat (global to sklearn), but:

  1. You can inject a specific logger in each object via the 'verbose'
    attribute, that can now be an instance of ProgressLog (or subclass
    of). This is a similar pattern than with the 'random_state'. It will need
    to be documented.
  2. The verbosity of each object is controlled separately
Member

GaelVaroquaux commented Sep 20, 2012

From a user-perspective, I'd prefer it if we could keep the current
interface. For example I might be very interested in the debug output of
my classifier, but not in the PCA and feature selection I do beforehand.

Absolutely. This is also central to me.

I'd have to read the logging module docs to really understand I guess. For
example is there one global logger instance?

Somewhat (global to sklearn), but:

  1. You can inject a specific logger in each object via the 'verbose'
    attribute, that can now be an instance of ProgressLog (or subclass
    of). This is a similar pattern than with the 'random_state'. It will need
    to be documented.
  2. The verbosity of each object is controlled separately
@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

For example I might be very interested in the debug output of my classifier, but not in the PCA and feature selection I do beforehand.

For this use case, the default logging module from python would have been enough:

Let's do a basic config with a StreamHandler on STDOUT at level DEBUG

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG) 

By default, any module logging at the debug level will appear in the output:

>>> logging.getLogger('some_package.some_module').debug('Some debug message')
DEBUG:some_package.some_module:Some debug message
>>> logging.getLogger('some_package.other_module').debug('Some debug message')
DEBUG:some_package.other_module:Some debug message

To filter the DEBUG output of any module but one you can do:

>>> logging.getLogger('some_package').level = logging.WARN
>>> logging.getLogger('some_package.some_module').level = logging.DEBUG

Everything under the some_package will have its debug output filtered out except some_package.some_module:

>>> logging.getLogger('some_package.other_module').debug('Some debug message')
>>>  # nothing has been passed to the handler
>>> logging.getLogger('some_package.some_module').debug('Some debug message')
DEBUG:some_package.some_module:Some debug message

Note that it's good to not cut WARN, ERROR and CRITICAL message from other modules as they might help you debug too in case something unexpected happens in some_package.other_module in parallel:

>>> logging.getLogger('some_package.other_module').warn('Some warning message')
WARNING:some_package.other_module:Some warning message

What Gael is adding is the ability to cut the verbosity of nested instances using the progress method and the verbosity counter.

Member

ogrisel commented Sep 20, 2012

For example I might be very interested in the debug output of my classifier, but not in the PCA and feature selection I do beforehand.

For this use case, the default logging module from python would have been enough:

Let's do a basic config with a StreamHandler on STDOUT at level DEBUG

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG) 

By default, any module logging at the debug level will appear in the output:

>>> logging.getLogger('some_package.some_module').debug('Some debug message')
DEBUG:some_package.some_module:Some debug message
>>> logging.getLogger('some_package.other_module').debug('Some debug message')
DEBUG:some_package.other_module:Some debug message

To filter the DEBUG output of any module but one you can do:

>>> logging.getLogger('some_package').level = logging.WARN
>>> logging.getLogger('some_package.some_module').level = logging.DEBUG

Everything under the some_package will have its debug output filtered out except some_package.some_module:

>>> logging.getLogger('some_package.other_module').debug('Some debug message')
>>>  # nothing has been passed to the handler
>>> logging.getLogger('some_package.some_module').debug('Some debug message')
DEBUG:some_package.some_module:Some debug message

Note that it's good to not cut WARN, ERROR and CRITICAL message from other modules as they might help you debug too in case something unexpected happens in some_package.other_module in parallel:

>>> logging.getLogger('some_package.other_module').warn('Some warning message')
WARNING:some_package.other_module:Some warning message

What Gael is adding is the ability to cut the verbosity of nested instances using the progress method and the verbosity counter.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Sep 21, 2012

Member

I just saw a connection to doing randomized hyper parameter optimization: if we have a log file (and maybe some method to parse it), we can more easily judge the status of a grid search and see if we should stop or not.

Member

amueller commented Sep 21, 2012

I just saw a connection to doing randomized hyper parameter optimization: if we have a log file (and maybe some method to parse it), we can more easily judge the status of a grid search and see if we should stop or not.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 21, 2012

Member

This would be possible but better be addressed with a real asynchronous job scheduling + structured progress reporting (not just text log lines) runtime: for instance with IPython: for instance I started this experimenal work last year at pycon:

https://github.com/ogrisel/pycon-pydata-sprint/blob/master/grid_search.py
https://github.com/ogrisel/pycon-pydata-sprint/blob/master/grid_search_digits.py

But for this PR I think we should focus on standard convergence monitorying.

Member

ogrisel commented Sep 21, 2012

This would be possible but better be addressed with a real asynchronous job scheduling + structured progress reporting (not just text log lines) runtime: for instance with IPython: for instance I started this experimenal work last year at pycon:

https://github.com/ogrisel/pycon-pydata-sprint/blob/master/grid_search.py
https://github.com/ogrisel/pycon-pydata-sprint/blob/master/grid_search_digits.py

But for this PR I think we should focus on standard convergence monitorying.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Sep 21, 2012

Member

I'm always for keeping things simple at first. I just became aware of the connection. Thanks for sharing your blob, I might have a look later.

Member

amueller commented Sep 21, 2012

I'm always for keeping things simple at first. I just became aware of the connection. Thanks for sharing your blob, I might have a look later.

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Sep 25, 2012

Member

For iterative algorithms, it would be nice to have a callback / monitoring API to be able to plot the algorithm progress, among other things. I was thinking that an event-based API could be nice. For example, there could be an event "iteration-completed" to which the user could choose or not to listen to. A logger can be seen as an object that listens to some events and display messages to stdout. So, if at some point we plan to add a monitoring API, it would make sense to think of a uniform API...

Member

mblondel commented Sep 25, 2012

For iterative algorithms, it would be nice to have a callback / monitoring API to be able to plot the algorithm progress, among other things. I was thinking that an event-based API could be nice. For example, there could be an event "iteration-completed" to which the user could choose or not to listen to. A logger can be seen as an object that listens to some events and display messages to stdout. So, if at some point we plan to add a monitoring API, it would make sense to think of a uniform API...

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Sep 25, 2012

Member

The monitoring API is more impacting that simple logging as it will also be useful for controlling the fit iteration / checkpointing, e.g. using early stopping on a validation set or snapshotting the intermediate model to a database / filesystem for offline / post run analysis.

Logging is more a generic need but could also be used by a callback based monitoring / checkpointing API. I would rather finish the work on the pure logging use case and then move on to the monitoring API which seems more complex to me.

We can always refactor the logger later if we need to make it evolve for special needs identified while working on the checkpointing stuff.

Member

ogrisel commented Sep 25, 2012

The monitoring API is more impacting that simple logging as it will also be useful for controlling the fit iteration / checkpointing, e.g. using early stopping on a validation set or snapshotting the intermediate model to a database / filesystem for offline / post run analysis.

Logging is more a generic need but could also be used by a callback based monitoring / checkpointing API. I would rather finish the work on the pure logging use case and then move on to the monitoring API which seems more complex to me.

We can always refactor the logger later if we need to make it evolve for special needs identified while working on the checkpointing stuff.

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Sep 25, 2012

Member

It would still be nice, when designing the logging system, to have a rough idea of how the monitoring API would fit in the big picture...

Member

mblondel commented Sep 25, 2012

It would still be nice, when designing the logging system, to have a rough idea of how the monitoring API would fit in the big picture...

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 25, 2012

Member

It would still be nice, when designing the logging system, to have a rough
idea of how the monitoring API would fit in the big picture...

I agree completely. My initial idea was to have a 'progress_context'
method on the logger object that returns a context manager suited for
such use.

with logger.progress_context(n_iter=10) as progress:
    for i in range(10):
    progress(step=i, message='Cost %.2f', msg_vars=(i, ))

But that's just a rough sketch.

My plan of action is to get the simple logger that I have skecthed
integrated in the scikit first, and then revisit the design, including
the potential progress bar, later.

Member

GaelVaroquaux commented Sep 25, 2012

It would still be nice, when designing the logging system, to have a rough
idea of how the monitoring API would fit in the big picture...

I agree completely. My initial idea was to have a 'progress_context'
method on the logger object that returns a context manager suited for
such use.

with logger.progress_context(n_iter=10) as progress:
    for i in range(10):
    progress(step=i, message='Cost %.2f', msg_vars=(i, ))

But that's just a rough sketch.

My plan of action is to get the simple logger that I have skecthed
integrated in the scikit first, and then revisit the design, including
the potential progress bar, later.

@agramfort

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Jun 6, 2017

Member

I think I can close this :)

Member

agramfort commented Jun 6, 2017

I think I can close this :)

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 6, 2017

Member
Member

GaelVaroquaux commented Jun 6, 2017

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 6, 2017

Member
Member

GaelVaroquaux commented Jun 6, 2017

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jun 6, 2017

Member

Axel? 👎

I still want progress bars btw :P

Member

amueller commented Jun 6, 2017

Axel? 👎

I still want progress bars btw :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment