Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline logging interferences when using datamodel in data analysis #6344

Closed
stscijgbot-jp opened this issue Sep 13, 2021 · 25 comments
Closed

Comments

@stscijgbot-jp
Copy link
Collaborator

stscijgbot-jp commented Sep 13, 2021

Issue JP-2280 was created on JIRA by Christophe Cossou:

[Description reworked for clarity]

Hello,

I've found a "bug" related to logging when I try to create a CDP using datamodels. This problem is related to a conflict between what logging configuration the pipeline (or a subpackage) is trying to enforce, and what I want to do.

Here are the two main takes about this issue:

  1. I have my logging output duplicated (each line appear twice, be it in file or stdout)
  2. This behavior only start AFTER the first call to one datamodel. Before, I only get 1 line per logging call

Please find attached a Complete Minimum Example to showcase the issue (https://jira.stsci.edu/secure/attachment/66957/CME_double_logging.py)

@ccossou
Copy link

ccossou commented Sep 14, 2021

As I flagged it already on JIRA, this issue is close to this one: #6294

@ccossou
Copy link

ccossou commented Nov 5, 2021

JIRA seem to be offline, I can't access it, hence why I post my comment here. I've digged a bit more. I've found a way to disable some handlers (jwst and stdatamodels). But some handlers are created after the import procedure (stpipe).

At the beginning of my script, this is the state of all handlers for all loggers (the format is "logger name: list of its handlers")

Just after import I have:

concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: [<NullHandler (NOTSET)>]
stdatamodels: []
stdatamodels.properties: [<NullHandler (NOTSET)>]
stdatamodels.fits_support: [<NullHandler (NOTSET)>]
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: [<NullHandler (NOTSET)>]
jwst.datamodels: []
jwst: []
jwst.datamodels.container: [<NullHandler (NOTSET)>]
__main__: []

After initialising logging for my own script I have:

concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: []

But after my first use of a jwst.datamodel function, some new handlers have appeared and I suspect they are what causes the issue:

concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: []
stpipe.config_parser: [<NullHandler (NOTSET)>, <StreamHandler <stderr> (INFO)>]
stpipe: [<StreamHandler <stderr> (INFO)>]
CRDS: [<StreamHandler <stderr> (DEBUG)>]
filelock: []
urllib3.util.retry: []
urllib3.util: []
urllib3: [<NullHandler (NOTSET)>]
urllib3.connection: []
urllib3.response: []
urllib3.connectionpool: []
urllib3.poolmanager: []
charset_normalizer: [<StreamHandler <stderr> (NOTSET)>]
requests: [<NullHandler (NOTSET)>]

I also noted that the root logger has a 3rd handlers added:

r = logging.getLogger()
r.handlers
[<StreamHandler <stdout> (INFO)>, <FileHandler /local/home/ccossou/.config/JetBrains/PyCharmCE2021.2/scratches/miricap.log (DEBUG)>, <DelegationHandler (NOTSET)>]

when deleting it, the double logging issue disappear:

r.removeHandler(r.handlers[2])
create_cdp("toto.fits")
11:08:30 INFO     Process toto.fits
11:08:30 INFO     Model created
11:08:30 INFO     CDP written to toto.fits

So I guess this comes from the DelegationHandler, but I don't know where it comes from exactly, I just know it appears when jwst.datamodel class is used. (The handler in question seem to be created by the pipeline itself, the type is said to be stpipe.log.DelegationHandler)

Hope it helps.

@nden nden added this to the Build 7.9 milestone Nov 15, 2021
@hbushouse
Copy link
Collaborator

Problem ultimately traced to issue with logging in Glue package.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Christophe Cossou on JIRA:

I created a Complete Minimal Example to show the problem. Here it is, attached. [^CME_double_logging.py]

Here is the output I get:

16:23:08 INFO     i = 0
16:23:08 INFO     Process fake_cdp_00000.fits
2021-09-14 16:23:09,463 - stpipe - INFO - Model created
16:23:09 INFO     Model created
16:23:09 INFO     Model created
16:23:09 INFO     CDP written to fake_cdp_00000.fits
16:23:09 INFO     CDP written to fake_cdp_00000.fits
16:23:09 INFO     i = 1
16:23:09 INFO     i = 1
16:23:09 INFO     Process fake_cdp_00001.fits
16:23:09 INFO     Process fake_cdp_00001.fits
16:23:09 INFO     Model created
16:23:09 INFO     Model created
2021-09-14 16:23:09,487 - stpipe - INFO - CDP written to fake_cdp_00000.fits
2021-09-14 16:23:09,487 - stpipe - INFO - i = 1
2021-09-14 16:23:09,487 - stpipe - INFO - Process fake_cdp_00001.fits
2021-09-14 16:23:09,495 - stpipe - INFO - Model created
2021-09-14 16:23:09,511 - stpipe - INFO - CDP written to fake_cdp_00001.fits
2021-09-14 16:23:09,511 - stpipe - INFO - i = 2
2021-09-14 16:23:09,511 - stpipe - INFO - Process fake_cdp_00002.fits
2021-09-14 16:23:09,518 - stpipe - INFO - Model created
16:23:09 INFO     CDP written to fake_cdp_00001.fits
16:23:09 INFO     CDP written to fake_cdp_00001.fits
16:23:09 INFO     i = 2
16:23:09 INFO     i = 2
16:23:09 INFO     Process fake_cdp_00002.fits
16:23:09 INFO     Process fake_cdp_00002.fits
16:23:09 INFO     Model created
16:23:09 INFO     Model created
2021-09-14 16:23:09,534 - stpipe - INFO - CDP written to fake_cdp_00002.fits
16:23:09 INFO     CDP written to fake_cdp_00002.fits
16:23:09 INFO     CDP written to fake_cdp_00002.fits

The important part here is that the first 2 lines are not duplicated (everything is ok before the first call to the CDP class)

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

This may have been fixed by JP-2258. Please retest.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Christophe Cossou on JIRA:

I've tried with the latest master version: 1.3.4.dev27+gd6f7bf90

I experience the same issue. The output is a bit different than before, but in short, I still see double logs.

16:22:48 INFO     i = 0
16:22:48 INFO     Process fake_cdp_00000.fits
16:22:48 INFO     Model created
16:22:48 INFO     Model created
2021-11-04 16:22:48,583 - stpipe - INFO - Model created
2021-11-04 16:22:48,627 - stpipe - INFO - CDP written to fake_cdp_00000.fits
2021-11-04 16:22:48,627 - stpipe - INFO - i = 1
2021-11-04 16:22:48,627 - stpipe - INFO - Process fake_cdp_00001.fits
2021-11-04 16:22:48,637 - stpipe - INFO - Model created
16:22:48 INFO     CDP written to fake_cdp_00000.fits
16:22:48 INFO     CDP written to fake_cdp_00000.fits
16:22:48 INFO     i = 1
16:22:48 INFO     i = 1
16:22:48 INFO     Process fake_cdp_00001.fits
16:22:48 INFO     Process fake_cdp_00001.fits
16:22:48 INFO     Model created
16:22:48 INFO     Model created
2021-11-04 16:22:48,670 - stpipe - INFO - CDP written to fake_cdp_00001.fits
2021-11-04 16:22:48,670 - stpipe - INFO - i = 2
2021-11-04 16:22:48,670 - stpipe - INFO - Process fake_cdp_00002.fits
16:22:48 INFO     CDP written to fake_cdp_00001.fits
16:22:48 INFO     CDP written to fake_cdp_00001.fits
16:22:48 INFO     i = 2
16:22:48 INFO     i = 2
16:22:48 INFO     Process fake_cdp_00002.fits
16:22:48 INFO     Process fake_cdp_00002.fits
2021-11-04 16:22:48,681 - stpipe - INFO - Model created
16:22:48 INFO     Model created
16:22:48 INFO     Model created
2021-11-04 16:22:48,713 - stpipe - INFO - CDP written to fake_cdp_00002.fits
16:22:48 INFO     CDP written to fake_cdp_00002.fits
16:22:48 INFO     CDP written to fake_cdp_00002.fits

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Christophe Cossou on JIRA:

JIRA was offline while I added a new comment in Github, so the daemon supposed to synchronise the comments did not copy it in JIRA. This comment is just a copy of what I wrote in Github (expect a double comment in Github).

JIRA seem to be offline, I can't access it, hence why I post my comment here. I've digged a bit more. I've found a way to disable some handlers (jwst and stdatamodels). But some handlers are created after the import procedure (stpipe).

At the beginning of my script, this is the state of all handlers for all loggers (the format is "logger name: list of its handlers")

Just after import I have:

concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: [<NullHandler (NOTSET)>]
stdatamodels: []
stdatamodels.properties: [<NullHandler (NOTSET)>]
stdatamodels.fits_support: [<NullHandler (NOTSET)>]
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: [<NullHandler (NOTSET)>]
jwst.datamodels: []
jwst: []
jwst.datamodels.container: [<NullHandler (NOTSET)>]
__main__: []

After initialising logging for my own script I have:

concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: []

But after my first use of a jwst.datamodel function, some new handlers have appeared and I suspect they are what causes the issue:

concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: []
stpipe.config_parser: [<NullHandler (NOTSET)>, <StreamHandler <stderr> (INFO)>]
stpipe: [<StreamHandler <stderr> (INFO)>]
CRDS: [<StreamHandler <stderr> (DEBUG)>]
filelock: []
urllib3.util.retry: []
urllib3.util: []
urllib3: [<NullHandler (NOTSET)>]
urllib3.connection: []
urllib3.response: []
urllib3.connectionpool: []
urllib3.poolmanager: []
charset_normalizer: [<StreamHandler <stderr> (NOTSET)>]
requests: [<NullHandler (NOTSET)>]

I also noted that the root logger has a 3rd handlers added:

r = logging.getLogger()
r.handlers
[<StreamHandler <stdout> (INFO)>, <FileHandler /local/home/ccossou/.config/JetBrains/PyCharmCE2021.2/scratches/miricap.log (DEBUG)>, <DelegationHandler (NOTSET)>]

when deleting it, the double logging issue disappear:

r.removeHandler(r.handlers[2])
create_cdp("toto.fits")
11:08:30 INFO     Process toto.fits
11:08:30 INFO     Model created
11:08:30 INFO     CDP written to toto.fits

So I guess this comes from the DelegationHandler, but I don't know where it comes from exactly, I just know it appears when jwst.datamodel class is used. (The handler in question seem to be created by the pipeline itself, the type is said to be stpipe.log.DelegationHandler)

Hope it helps.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

Howard Bushouse Jonathan Eisenhamer Ori Fox Tyler Pauly

I am running JWST 1.4.3.

When the pipeline is run from a Jupiter notebook AND 'import jdaviz' is imported, then there are many more logging statements being written to the output (and they have a different format from the standard logging information - the line starts with INFO or DEBUG) . If I do not 'import jdaviz' then the logging information is the same as running the pipeline from the command line and these INFO and DEBUG lines are not printed.  Any pointers to where I should look to what is being set when 'jdaviz' is imported ?  

 

Here is a sample of the extra type of logging information
022-02-15 15:42:42,420 - stpipe.Detector1Pipeline - INFO - Detector1Pipeline instance created.
INFO:stpipe.Detector1Pipeline:Detector1Pipeline instance created.
2022-02-15 15:42:42,423 - stpipe.Detector1Pipeline.group_scale - INFO - GroupScaleStep instance created.
INFO:stpipe.Detector1Pipeline.group_scale:GroupScaleStep instance created.

 

Also DEBUG logging information now pops up

2022-02-15 15:42:45,462 - stpipe.Detector1Pipeline.dq_init - INFO - Using MASK reference file /Users/morrison/crds_cache/references/jwst/miri/jwst_miri_mask_0027.fits
INFO:stpipe.Detector1Pipeline.dq_init:Using MASK reference file /Users/morrison/crds_cache/references/jwst/miri/jwst_miri_mask_0027.fits
DEBUG:jwst.lib.reffile_utils:ref substrt1=1, subsize1=1032, substrt2=1, subsize2=1024
DEBUG:stpipe.Detector1Pipeline.dq_init:ref substrt1=1, subsize1=1032, substrt2=1, subsize2=1024
DEBUG:jwst.lib.reffile_utils:sci substrt1=1, subsize1=1032, substrt2=1, subsize2=1024
DEBUG:stpipe.Detector1Pipeline.dq_init:sci substrt1=1, subsize1=1032, substrt2=1, subsize2=1024
2022-02-15 15:42:45,748 - stpipe.Detector1Pipeline.dq_init - INFO - Step dq_init done

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

Tyler Pauly Jonathan Eisenhamer Pey-Lian Lim Ori Fox

Looking back at the comments in this ticked I found a key bit of information. It is related to the handlers of the logging. This logging information is a little beyond my knowledge but here is what I have found.
If I do not import jdaviz package and I get this information:
r = logging.getLogger()
r.handlers
Then r.handlers is only <DelegationHandler (NOTSET)>
If I import the jdaviz package then we get an extra handler: <StreamHandler stderr (NOTSET)>
Christophe found something similar in the above comments.
So removed this extra handler before running the pipeline 
r.removeHandler(r.handlers[0])

 

And now  when the pipeline runs no extra logging is printed.

So my question is do we need to fix Jdaviz so that this handler is not created or so we need to make sure it is not set in the JWST packag, or maybe both 

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jonathan Eisenhamer on JIRA:

Not to be too authoritarian, but the general philosophy of Python logging is that the top-level module controls what it needs to output or not. So, technically, if jdaviz does not want the extra handling, then jdaviz should turn it off.

However, I know jwst violates logging conventions, and, I'm guessing, so does the other package, specutils? Fixing jwst is a bit of a project in this respect; turning it off in jwst may create too much collateral damage. Suggest trying it in jwst and see what happens. There are documented ways of making this so, which maybe that is what jdaviz should do.

Relevant documentation for jwst is as follows:

I note there isn't anything about completely turning off. I'm guessing using /dev/null as output would do it.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Pey-Lian Lim on JIRA:

There is really no good reason for specutils nor Jdaviz to be using a logger. specutils is too low level and Jdaviz is too high level for such things.

You might need one or all of these:

Please report back at spacetelescope/jdaviz#1084 . Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Pey-Lian Lim on JIRA:

Jane Morrison and I looked at the problem together today using all three patches above from specutils 1.7.0 and both unmerged PRs from jdaviz (installed together). Alas, they didn't fix the problem. Turns out, there is also logging in glue. Jane was able to reproduce the exact same logging problem by simply adding "import glue" instead of "import jdaviz".

Good news: We found the real problem that is glue logger.

Bad news: Glue isn't ours and I don't see Tom Robitaille getting rid of it.

My recommendation: jwst pipeline should provide users a way to control the verbosity of its logger. Even the ability to set it to just display "critical" stuff would greatly reduce the amount of unwanted logging in the offending notebook.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Pey-Lian Lim on JIRA:

p.s. Doing this didn't change anything either:

from glue.logger import logger
logger.setLevel('ERROR')

https://docs.python.org/3/library/logging.html#levels

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

Here is a summary of what I have learned. Jonathan Eisenhamer Tyler Pauly Pey-Lian Lim Ori Fox

As Pey-Lian Lim reported the real problem is the glue logger. It should be fixed so it does not think it is the top logger. But until that is done. I have 2 possible work arounds. If one of these work around is acceptable then I will close this ticket. 

In the Jupyter notebook check the loggers and remove the stream handler one (created by glue)

This is code I put in the Jupter notebook and it gets rid of the double logging

from logging import getLogger, StreamHandler
root_logger = getLogger('')
num = len(root_logger.handlers)
print('number of loggers', num)
for i in range(num):
    if isinstance(root_logger.handlers[i], StreamHandler):
        print('Found the StreamHandler and going to remove it')
        root_logger.removeHandler(root_logger.handlers[I])
        break

If that is too clumsy then you can direct what the jwst pipeline logs to a file. I followed the instructions at https://jwst-pipeline.readthedocs.io/en/stable/jwst/introduction.html?highlight=logging#logging-configuration](http://example.com/)

In the working directory I created this file: my-logging-config.cfg
The contents of this file are:

[*]
handler = file:pipeline.log
level = CRITICAL

Then in the Jupyter notebook run the pipeline like this:
run_output = detector1.call(uncal_file,logcfg='my-logging-config.cfg')

If you do that then you do not get the double logging. You do not get the timestamped log.
So for example this is what the first part of log looks like: (this is caused by the glue logger trying to control what is being logged).

INFO:stpipe.Detector1Pipeline:Detector1Pipeline instance created.
INFO:stpipe.Detector1Pipeline.group_scale:GroupScaleStep instance created.
INFO:stpipe.Detector1Pipeline.dq_init:DQInitStep instance created.
INFO:stpipe.Detector1Pipeline.saturation:SaturationStep instance created.
INFO:stpipe.Detector1Pipeline.ipc:IPCStep instance created.
INFO:stpipe.Detector1Pipeline.superbias:SuperBiasStep instance created.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Pey-Lian Lim on JIRA:

logcfg seems like a more elegant solution, but what is its behavior if the log file already exists? Does it overwrite, append, or throw error?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Pey-Lian Lim on JIRA:

FWIW, I opened glue-viz/glue#2278 . Feel free to add details there. Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

run_output = detector1.call(uncal_file,logcfg='my-logging-config.cfg') 
If the my-logging-config.cfg only records CRITICAL message and since no CRITICAL messages are reported by the pipeline. No logging file is created. If I change the cfg file to report INFO messages then the logging file is created and it is written over each for each new run. 

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Pey-Lian Lim on JIRA:

Silent overwrite should be fine as far as Jdaviz is concerned. Ori Fox is out on vacation in case you are waiting to hear back from him.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jonathan Eisenhamer on JIRA:

If the log file is not desired, is it possible to set it to /dev/null?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

Not sure how to set the log file to /dev/null
I tried this run_output = detector1.call(uncal_file,logcfg='/dev/null')
and it wrote the log messages to the notebook - results in double logging. Do I do this differently

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jonathan Eisenhamer on JIRA:

That was just a guess as to whether that would work or not. May be worth creating a JP issue to implement the ability to "turn off".

@stscijgbot-jp
Copy link
Collaborator Author

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

I tried: run_output = detector1.call(uncal_file,logcfg=os.devnull)

and it made not difference. The logging information was printed to the notebook.

I will create a JP ticket to implement the ability to "turn off" logging.

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jane Morrison on JIRA:

Created  JP-2529: Turn off logging for the JWST pipeline by redirecting logging to /dev/null

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Pey-Lian Lim on JIRA:

Tom proposed a fix at Glue also glue-viz/glue#2281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants