Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cover arbitrary filtering in the Scrapy logging documentation #4216

Closed
Gallaecio opened this issue Dec 5, 2019 · 12 comments · Fixed by #4965
Closed

Cover arbitrary filtering in the Scrapy logging documentation #4216

Gallaecio opened this issue Dec 5, 2019 · 12 comments · Fixed by #4965

Comments

@Gallaecio
Copy link
Member

It should be clear, from reading the documentation, how to filter out a specific log message that we wish to ignore.

This is specially important for warnings that depend on input, like the one introduced in #4214. Since you seldom have the power to fix the issue that triggers the warning message, caused by the content or behavior of the website you are scraping, you may need to simply ignore those warning messages.

Exposing a setting or a LogFormatter method for each of those warnings does not seem scalable to me, specially when such warnings can come from third-party Scrapy extensions.

@akamanzi
Copy link

@Gallaecio, i would like to try to contribute to this issue. it is my first time contributing, would this be a good fit for me? if Yes, any pointers on where to start on this issue?. i have read the contributing to scrappy documentation. any other pointers are welcome.

Thank you

@Gallaecio
Copy link
Member Author

For this specific issue, I would recommend to:

  1. Have a look at the Scrapy logging documentation
  2. Find out how to filter out messages based on the message contents (not just the log level). We probably want to cover how to filter out based on a substring or a regular expression. There are probably many resources out there to learn how to do this in Python; the official documentation is quite complete here, although I’m not sure if it’s the most straightforward documentation if you are not already familiar to some extent with Python logging
  3. Extend https://docs.scrapy.org/en/latest/topics/logging.html#advanced-customization to cover additional details or examples

@akamanzi
Copy link

@Gallaecio, Thank you for getting back to me.
Let me attempt to look on how to filter based on the log message.
Do i need to do both using regular expression and message content or i may cover any of the two?

@Gallaecio
Copy link
Member Author

Do i need to do both using regular expression and message content or i may cover any of the two?

I think substrings should be fine, it should be trivial for users to go from that to regular expressions in needed. You could alternatively mention that something other than substrings may be used, and mention regular expressions linking to https://docs.python.org/3/library/re.html

@akamanzi
Copy link

@Gallaecio, i created a pull request (#4257 ) for this. could you review and get back to me with your feedback.

Thank you

@gigatesseract
Copy link

@Gallaecio
I see that the issue is still open. Are there any additional features to work on in this issue? I am going through the links in this thread.

@Gallaecio
Copy link
Member Author

There are no additional things, although @akamanzi may be out of time to complete his proposal. If so, you could see if you can address the issue yourself, maybe build on top of his work so far.

@yash-sethia
Copy link

Is this issue still open ? If Yes, then can I would like to contribute to it. I am starting my journey as a open source contributor I hope that's fine.

@akamanzi
Copy link

akamanzi commented Aug 5, 2020

@yash-sethia , i haven't looked at this issue for a while. currently busy with school dissertation. you can give it a try, i suggest looking at the recommendations @Gallaecio made in the pull request i initially created (#4257), review them and build on top of that.

@bikash1317
Copy link

Is This still open, Can I take this up .

@Gallaecio
Copy link
Member Author

@akamanzi started at #4257, but may be too busy to continue at the moment. Maybe you can resume that work?

@anay2103
Copy link
Contributor

anay2103 commented Jan 28, 2021

@Gallaecio could you please have a look at this #4965.
Tried to follow your recommendations given in #4257

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants