-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline logging interferences when using datamodel in data analysis #6344
Comments
As I flagged it already on JIRA, this issue is close to this one: #6294 |
JIRA seem to be offline, I can't access it, hence why I post my comment here. I've digged a bit more. I've found a way to disable some handlers (jwst and stdatamodels). But some handlers are created after the import procedure (stpipe). At the beginning of my script, this is the state of all handlers for all loggers (the format is "logger name: list of its handlers") Just after import I have: concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: [<NullHandler (NOTSET)>]
stdatamodels: []
stdatamodels.properties: [<NullHandler (NOTSET)>]
stdatamodels.fits_support: [<NullHandler (NOTSET)>]
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: [<NullHandler (NOTSET)>]
jwst.datamodels: []
jwst: []
jwst.datamodels.container: [<NullHandler (NOTSET)>]
__main__: [] After initialising logging for my own script I have: concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: [] But after my first use of a jwst.datamodel function, some new handlers have appeared and I suspect they are what causes the issue: concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: []
stpipe.config_parser: [<NullHandler (NOTSET)>, <StreamHandler <stderr> (INFO)>]
stpipe: [<StreamHandler <stderr> (INFO)>]
CRDS: [<StreamHandler <stderr> (DEBUG)>]
filelock: []
urllib3.util.retry: []
urllib3.util: []
urllib3: [<NullHandler (NOTSET)>]
urllib3.connection: []
urllib3.response: []
urllib3.connectionpool: []
urllib3.poolmanager: []
charset_normalizer: [<StreamHandler <stderr> (NOTSET)>]
requests: [<NullHandler (NOTSET)>] I also noted that the root logger has a 3rd handlers added: r = logging.getLogger()
r.handlers
[<StreamHandler <stdout> (INFO)>, <FileHandler /local/home/ccossou/.config/JetBrains/PyCharmCE2021.2/scratches/miricap.log (DEBUG)>, <DelegationHandler (NOTSET)>] when deleting it, the double logging issue disappear: r.removeHandler(r.handlers[2])
create_cdp("toto.fits")
11:08:30 INFO Process toto.fits
11:08:30 INFO Model created
11:08:30 INFO CDP written to toto.fits So I guess this comes from the DelegationHandler, but I don't know where it comes from exactly, I just know it appears when jwst.datamodel class is used. (The handler in question seem to be created by the pipeline itself, the type is said to be stpipe.log.DelegationHandler) Hope it helps. |
Problem ultimately traced to issue with logging in Glue package. |
Comment by Christophe Cossou on JIRA: I created a Complete Minimal Example to show the problem. Here it is, attached. [^CME_double_logging.py] Here is the output I get:
The important part here is that the first 2 lines are not duplicated (everything is ok before the first call to the CDP class) |
Comment by Howard Bushouse on JIRA: This may have been fixed by JP-2258. Please retest. |
Comment by Christophe Cossou on JIRA: I've tried with the latest master version: 1.3.4.dev27+gd6f7bf90 I experience the same issue. The output is a bit different than before, but in short, I still see double logs. 16:22:48 INFO i = 0
16:22:48 INFO Process fake_cdp_00000.fits
16:22:48 INFO Model created
16:22:48 INFO Model created
2021-11-04 16:22:48,583 - stpipe - INFO - Model created
2021-11-04 16:22:48,627 - stpipe - INFO - CDP written to fake_cdp_00000.fits
2021-11-04 16:22:48,627 - stpipe - INFO - i = 1
2021-11-04 16:22:48,627 - stpipe - INFO - Process fake_cdp_00001.fits
2021-11-04 16:22:48,637 - stpipe - INFO - Model created
16:22:48 INFO CDP written to fake_cdp_00000.fits
16:22:48 INFO CDP written to fake_cdp_00000.fits
16:22:48 INFO i = 1
16:22:48 INFO i = 1
16:22:48 INFO Process fake_cdp_00001.fits
16:22:48 INFO Process fake_cdp_00001.fits
16:22:48 INFO Model created
16:22:48 INFO Model created
2021-11-04 16:22:48,670 - stpipe - INFO - CDP written to fake_cdp_00001.fits
2021-11-04 16:22:48,670 - stpipe - INFO - i = 2
2021-11-04 16:22:48,670 - stpipe - INFO - Process fake_cdp_00002.fits
16:22:48 INFO CDP written to fake_cdp_00001.fits
16:22:48 INFO CDP written to fake_cdp_00001.fits
16:22:48 INFO i = 2
16:22:48 INFO i = 2
16:22:48 INFO Process fake_cdp_00002.fits
16:22:48 INFO Process fake_cdp_00002.fits
2021-11-04 16:22:48,681 - stpipe - INFO - Model created
16:22:48 INFO Model created
16:22:48 INFO Model created
2021-11-04 16:22:48,713 - stpipe - INFO - CDP written to fake_cdp_00002.fits
16:22:48 INFO CDP written to fake_cdp_00002.fits
16:22:48 INFO CDP written to fake_cdp_00002.fits |
Comment by Christophe Cossou on JIRA: JIRA was offline while I added a new comment in Github, so the daemon supposed to synchronise the comments did not copy it in JIRA. This comment is just a copy of what I wrote in Github (expect a double comment in Github). JIRA seem to be offline, I can't access it, hence why I post my comment here. I've digged a bit more. I've found a way to disable some handlers (jwst and stdatamodels). But some handlers are created after the import procedure (stpipe). At the beginning of my script, this is the state of all handlers for all loggers (the format is "logger name: list of its handlers") Just after import I have: concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: [<NullHandler (NOTSET)>]
stdatamodels: []
stdatamodels.properties: [<NullHandler (NOTSET)>]
stdatamodels.fits_support: [<NullHandler (NOTSET)>]
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: [<NullHandler (NOTSET)>]
jwst.datamodels: []
jwst: []
jwst.datamodels.container: [<NullHandler (NOTSET)>]
__main__: [] After initialising logging for my own script I have: concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: [] But after my first use of a jwst.datamodel function, some new handlers have appeared and I suspect they are what causes the issue: concurrent.futures: []
concurrent: []
asyncio: []
astropy: [<StreamHandler <stderr> (NOTSET)>]
packaging.tags: []
packaging: []
stdatamodels.util: []
stdatamodels: []
stdatamodels.properties: []
stdatamodels.fits_support: []
stcal.dynamicdq: [<NullHandler (NOTSET)>]
stcal: []
jwst.datamodels.util: []
jwst.datamodels: []
jwst: []
jwst.datamodels.container: []
__main__: []
stpipe.config_parser: [<NullHandler (NOTSET)>, <StreamHandler <stderr> (INFO)>]
stpipe: [<StreamHandler <stderr> (INFO)>]
CRDS: [<StreamHandler <stderr> (DEBUG)>]
filelock: []
urllib3.util.retry: []
urllib3.util: []
urllib3: [<NullHandler (NOTSET)>]
urllib3.connection: []
urllib3.response: []
urllib3.connectionpool: []
urllib3.poolmanager: []
charset_normalizer: [<StreamHandler <stderr> (NOTSET)>]
requests: [<NullHandler (NOTSET)>] I also noted that the root logger has a 3rd handlers added: r = logging.getLogger()
r.handlers
[<StreamHandler <stdout> (INFO)>, <FileHandler /local/home/ccossou/.config/JetBrains/PyCharmCE2021.2/scratches/miricap.log (DEBUG)>, <DelegationHandler (NOTSET)>] when deleting it, the double logging issue disappear: r.removeHandler(r.handlers[2])
create_cdp("toto.fits")
11:08:30 INFO Process toto.fits
11:08:30 INFO Model created
11:08:30 INFO CDP written to toto.fits So I guess this comes from the DelegationHandler, but I don't know where it comes from exactly, I just know it appears when jwst.datamodel class is used. (The handler in question seem to be created by the pipeline itself, the type is said to be stpipe.log.DelegationHandler) Hope it helps. |
Comment by Jane Morrison on JIRA: Howard Bushouse Jonathan Eisenhamer Ori Fox Tyler Pauly I am running JWST 1.4.3. When the pipeline is run from a Jupiter notebook AND 'import jdaviz' is imported, then there are many more logging statements being written to the output (and they have a different format from the standard logging information - the line starts with INFO or DEBUG) . If I do not 'import jdaviz' then the logging information is the same as running the pipeline from the command line and these INFO and DEBUG lines are not printed. Any pointers to where I should look to what is being set when 'jdaviz' is imported ?
Here is a sample of the extra type of logging information
Also DEBUG logging information now pops up 2022-02-15 15:42:45,462 - stpipe.Detector1Pipeline.dq_init - INFO - Using MASK reference file /Users/morrison/crds_cache/references/jwst/miri/jwst_miri_mask_0027.fits |
Comment by Jane Morrison on JIRA: Tyler Pauly Jonathan Eisenhamer Pey-Lian Lim Ori Fox Looking back at the comments in this ticked I found a key bit of information. It is related to the handlers of the logging. This logging information is a little beyond my knowledge but here is what I have found.
And now when the pipeline runs no extra logging is printed. So my question is do we need to fix Jdaviz so that this handler is not created or so we need to make sure it is not set in the JWST packag, or maybe both
|
Comment by Jonathan Eisenhamer on JIRA: Not to be too authoritarian, but the general philosophy of Python logging is that the top-level module controls what it needs to output or not. So, technically, if However, I know Relevant documentation for jwst is as follows:
I note there isn't anything about completely turning off. I'm guessing using |
Comment by Pey-Lian Lim on JIRA: There is really no good reason for specutils nor Jdaviz to be using a logger. specutils is too low level and Jdaviz is too high level for such things. You might need one or all of these:
Please report back at spacetelescope/jdaviz#1084 . Thanks! |
Comment by Pey-Lian Lim on JIRA: Jane Morrison and I looked at the problem together today using all three patches above from specutils 1.7.0 and both unmerged PRs from jdaviz (installed together). Alas, they didn't fix the problem. Turns out, there is also logging in glue. Jane was able to reproduce the exact same logging problem by simply adding "import glue" instead of "import jdaviz". Good news: We found the real problem that is glue logger. Bad news: Glue isn't ours and I don't see Tom Robitaille getting rid of it. My recommendation: jwst pipeline should provide users a way to control the verbosity of its logger. Even the ability to set it to just display "critical" stuff would greatly reduce the amount of unwanted logging in the offending notebook. |
Comment by Pey-Lian Lim on JIRA: p.s. Doing this didn't change anything either: from glue.logger import logger
logger.setLevel('ERROR') |
Comment by Jane Morrison on JIRA: Here is a summary of what I have learned. Jonathan Eisenhamer Tyler Pauly Pey-Lian Lim Ori Fox As Pey-Lian Lim reported the real problem is the glue logger. It should be fixed so it does not think it is the top logger. But until that is done. I have 2 possible work arounds. If one of these work around is acceptable then I will close this ticket. In the Jupyter notebook check the loggers and remove the stream handler one (created by glue)This is code I put in the Jupter notebook and it gets rid of the double logging from logging import getLogger, StreamHandler If that is too clumsy then you can direct what the jwst pipeline logs to a file. I followed the instructions at https://jwst-pipeline.readthedocs.io/en/stable/jwst/introduction.html?highlight=logging#logging-configuration](http://example.com/)In the working directory I created this file: my-logging-config.cfg [*] Then in the Jupyter notebook run the pipeline like this: If you do that then you do not get the double logging. You do not get the timestamped log. INFO:stpipe.Detector1Pipeline:Detector1Pipeline instance created. |
Comment by Pey-Lian Lim on JIRA: logcfg seems like a more elegant solution, but what is its behavior if the log file already exists? Does it overwrite, append, or throw error? |
Comment by Pey-Lian Lim on JIRA: FWIW, I opened glue-viz/glue#2278 . Feel free to add details there. Thanks! |
Comment by Jane Morrison on JIRA: run_output = detector1.call(uncal_file,logcfg='my-logging-config.cfg')
|
Comment by Pey-Lian Lim on JIRA: Silent overwrite should be fine as far as Jdaviz is concerned. Ori Fox is out on vacation in case you are waiting to hear back from him. |
Comment by Jonathan Eisenhamer on JIRA: If the log file is not desired, is it possible to set it to |
Comment by Jane Morrison on JIRA: Not sure how to set the log file to /dev/null
|
Comment by Jonathan Eisenhamer on JIRA: That was just a guess as to whether that would work or not. May be worth creating a JP issue to implement the ability to "turn off". |
Comment by Pey-Lian Lim on JIRA: "os.devnull" ? https://stackoverflow.com/questions/2929899/cross-platform-dev-null-in-python |
Comment by Jane Morrison on JIRA: I tried: run_output = detector1.call(uncal_file,logcfg=os.devnull) and it made not difference. The logging information was printed to the notebook. I will create a JP ticket to implement the ability to "turn off" logging.
|
Comment by Jane Morrison on JIRA: Created JP-2529: Turn off logging for the JWST pipeline by redirecting logging to /dev/null |
Comment by Pey-Lian Lim on JIRA: Tom proposed a fix at Glue also glue-viz/glue#2281 |
Issue JP-2280 was created on JIRA by Christophe Cossou:
[Description reworked for clarity]
Hello,
I've found a "bug" related to logging when I try to create a CDP using datamodels. This problem is related to a conflict between what logging configuration the pipeline (or a subpackage) is trying to enforce, and what I want to do.
Here are the two main takes about this issue:
Please find attached a Complete Minimum Example to showcase the issue (https://jira.stsci.edu/secure/attachment/66957/CME_double_logging.py)
The text was updated successfully, but these errors were encountered: