Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: How to handle / customize log records emitted by pypdf #2020

Open
stefan6419846 opened this issue Jul 26, 2023 · 8 comments
Open

DOC: How to handle / customize log records emitted by pypdf #2020

stefan6419846 opened this issue Jul 26, 2023 · 8 comments
Labels
Easy This issue is a good starting point for first-time contributors help wanted We appreciate help everywhere - this one might be an easy start! nf-documentation Non-functional change: Documentation

Comments

@stefan6419846
Copy link
Collaborator

Explanation

According to

pypdf/pypdf/_utils.py

Lines 433 to 449 in 11ee648

def logger_warning(msg: str, src: str) -> None:
"""
Use this instead of logger.warning directly.
That allows people to overwrite it more easily.
## Exception, warnings.warn, logger_warning
- Exceptions should be used if the user should write code that deals with
an error case, e.g. the PDF being completely broken.
- warnings.warn should be used if the user needs to fix their code, e.g.
DeprecationWarnings
- logger_warning should be used if the user needs to know that an issue was
handled by pypdf, e.g. a non-compliant PDF being read in a way that
pypdf could apply a robustness fix to still read it. This applies mainly
to strict=False mode.
"""
logging.getLogger(src).warning(msg)

logger_warning is advertised to allow overwriting, but doing so proves to be more complicated than expected.

There basically are two reasons for this:

  1. The implementation is located at pypdf._utils.logger_warning, which usually marks an internal implementation and thus I would not assume this to be future-proof.
  2. The method usually is imported directly, as can be seen in
    logger_warning,
    for example. This requires patching each import separately.
@MartinThoma
Copy link
Member

The overwriting was meant for testing, so rather for contributors to pypdf than pure users, if I remember correctly.

From a users perspective, this should be good:

import logging
from pypdf import PdfReader


logger = logging.getLogger("pypdf")
logger.setLevel(logging.ERROR)  # <--- if you comment this line out, you will see a warning
    
pdf_path = "sample-files/017-unreadable-meta-data/unreadablemetadata.pdf"
reader = PdfReader(pdf_path)

The warning you will see:

Invalid parent xref., rebuild xref

Does that solve the issue?

@stefan6419846
Copy link
Collaborator Author

The overwriting was meant for testing, so rather for contributors to pypdf than pure users, if I remember correctly.

Thanks for the explanations. For me it sounds like this indeed might be a user-callable configuration.

From a users perspective, this should be good:

While this might be a solution, logger_warning has the advantage that I can more easily change the actual log level. In this specific use case, I do not consider the warnings I have seen from PyPDF as actual warnings to be analyzed, so I am currently using logger_warning to reduce the log level to info. With this, I do not skip any debugging information on my side, but reduce the noise in monitoring where everything below the warning level gets ignored.

@MartinThoma
Copy link
Member

MartinThoma commented Jul 29, 2023

I don't understand the issue here. What are you doing that the following doesn't do?

import logging
logger = logging.getLogger("pypdf")
logger.setLevel(logging.WARNING)

@stefan6419846
Copy link
Collaborator Author

I am doing the following for the different logger_warning occurrences:

import logging
from contextlib import contextmanager

from pypdf import _reader, PdfReader


def custom_logger_warning(msg, src):
    logging.getLogger(src).info(msg)


@contextmanager
def reduce_log_level():
    old_logger_warning = _reader.logger_warning
    _reader.logger_warning = custom_logger_warning
    yield
    _reader.logger_warning = old_logger_warning


with reduce_log_level():
    reader = PdfReader('file.pdf')

This basically allows me to temporarily reduce the log level for the PyPDF warnings. Why? Logging is enabled including the info level, while monitoring filters everything to start at the warning level. Info messages can be analyzed in the original logfiles if required for some reasons, while keeping the focus on the important stuff in monitoring.

For some use cases (background workers which receive lots of broken/strange files) the warning level is too high as the PDF can apparently be processed in some way. Real exceptions are more urgent in these cases, as they might indicate an actual bug.

@MartinThoma
Copy link
Member

MartinThoma commented Jul 31, 2023

I'm sorry, I still don't understand. Your code does not reduce the log-level threshold of the system, but reduce the log level being used.

Couldn't you just do the following to reduce the log level threshold?

import logging
from contextlib import contextmanager

from pypdf import PdfReader


def custom_logger_warning(msg, src):
    logging.getLogger(src).info(msg)


@contextmanager
def reduce_log_level():
    logger = logging.getLogger("pypdf")
    old_level = logger.level
    logger.setLevel(logging.INFO)
    yield
    logger.setLevel(old_level)


with reduce_log_level():
    reader = PdfReader('file.pdf')

Logging is enabled including the info level, while monitoring filters everything to start at the warning level. Info messages can be analyzed in the original logfiles if required for some reasons, while keeping the focus on the important stuff in monitoring.

That part makes me think that you want to manipulate the log records themselves, e.g:

import logging
from pypdf import PdfReader

class NewLogger(logging.Logger):
    def makeRecord(self, *args, **kwargs):
        if len(args) >= 2:
            args = list(args)
            args[1] = logging.INFO
        if 'level' in kwargs:
            kwargs['level'] = logging.INFO
        return super(NewLogger, self).makeRecord(*args, **kwargs)

logging.basicConfig()
logging.setLoggerClass(NewLogger)

pdf_logger = logging.getLogger('pypdf')

# Now, all log messages from pypdf will be converted to the "INFO" level.

# Test
pdf_logger.info("This will be captured in the log.")
pdf_logger.warning("This will also be captured in the log.")
pdf_logger.error("This will also be captured in the log.")


# Especially
pdf_path = "sample-files/017-unreadable-meta-data/unreadablemetadata.pdf"
reader = PdfReader(pdf_path)

you can apply those two patterns at the same time, of course, to make a decorator.

@stefan6419846
Copy link
Collaborator Author

Your code does not reduce the log-level threshold of the system, but reduce the log level being used.

This is indeed correct and desired. As logger_warning promotes such a customization behavior inside its docstring, I assumed that this would indeed be a supported feature out-of-the-box. If I understand it correctly, the docs should be adjusted instead to make it more clear who is intended to monkey-patch logger_warning.

That part makes me think that you want to manipulate the log records themselves

This is true and the proposed plain logging.Logger solution would be a general-purpose alternative, yes.

@MartinThoma
Copy link
Member

If I understand it correctly, the docs should be adjusted instead to make it more clear who is intended to monkey-patch logger_warning

Yes, that would be good :-) And I think we should add more examples (like the ones I gave) to our docs. We have a lot of first-time developers who struggle with this and think it indicates a bug.

@MartinThoma MartinThoma added nf-documentation Non-functional change: Documentation help wanted We appreciate help everywhere - this one might be an easy start! Easy This issue is a good starting point for first-time contributors labels Aug 3, 2023
@MartinThoma MartinThoma changed the title logger_warning is documented to allow overwriting, but actually doing this is hard DOC: How to handle / customize log records emitted by pypdf Aug 3, 2023
@stefan6419846
Copy link
Collaborator Author

@2HR3Y It does not really sound like you have read the previous discussions and maybe just threw this into some ML model. (1) does not work as logger_warning is not part of the class, (2) will not work as due to using from pypdf._utils import logger_warning basically requires monkey-patching all final import paths.

The actual underlying issue is that the docs tend to be unclear and should be adjusted to better indicate these things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy This issue is a good starting point for first-time contributors help wanted We appreciate help everywhere - this one might be an easy start! nf-documentation Non-functional change: Documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants