Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-21475: Support the Sitemap extension in robotparser #6883

Merged
merged 13 commits into from
May 16, 2018

Conversation

mcscope
Copy link
Contributor

@mcscope mcscope commented May 15, 2018

This ticket has been open for 3 years just because it was awaiting tests. I took the existing patch and added a test

https://bugs.python.org/issue21475

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

When your account is ready, please add a comment in this pull request
and a Python core developer will remove the CLA not signed label
to make the bot check again.

Thanks again to your contribution and we look forward to looking at it!

@mcscope
Copy link
Contributor Author

mcscope commented May 15, 2018

I signed the CLA

Copy link
Member

@Mariatta Mariatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have several comments (quite nitpicky) but the rest looks good.
In addition, I would suggest adding both yours and Peter's name into Misc/ACKs file.


.. versionadded:: 3.8


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being so picky, but we only need two extra spaces between .. versionadded and The following example... So please remove the extra lines.

@@ -0,0 +1,2 @@
Added support for optional Site Map extension to urllib robotparser. Patch
by Lady Red
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please end the sentence with a period. In addition, since it was based off another person's patch, it would be good to also mention Based on patch by Peter Wirtz.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@Mariatta
Copy link
Member

If I don't end up merging this, I'd suggest the core dev merging to remember to add "Co-authored by: Peter Wirtz" in the commit message, since it seems like this was based off Peter's patch.

@mcscope
Copy link
Contributor Author

mcscope commented May 16, 2018

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@Mariatta: please review the changes made to this pull request.

@mcscope
Copy link
Contributor Author

mcscope commented May 16, 2018

Yes please give credit to Peter in the commit message. PS this is my first contribution to cpython! \o/

"""
good = ['/', '/test.html']
bad = ['/cyberworld/map/index.html']
site_maps = ["http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: Please use single quotes.

@@ -292,7 +313,7 @@ def setUp(self):
# Short poll interval to make the test finish quickly.
# Time between requests is short enough that we won't wake
# up spuriously too many times.
kwargs={'poll_interval':0.01})
kwargs={'poll_interval': 0.01})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't make unrelated cosmetic changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh drat, it's my pep8 autoformatter doing that automatically. will remove

@@ -353,5 +374,6 @@ def test_read_404(self):
self.assertIsNone(parser.crawl_delay('*'))
self.assertIsNone(parser.request_rate('*'))

if __name__=='__main__':

if __name__ == '__main__':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't make unrelated cosmetic changes.

@@ -189,6 +196,11 @@ def request_rate(self, useragent):
return entry.req_rate
return self.default_entry.req_rate

def site_maps(self):
if not self.sitemaps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test for this branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this branch is tested by test_site_maps on all the other tests for robotparser - they each test that it is none except for my single class that tests the positive case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're correct, I didn't click to the expand button and didn't notice that the test_site_maps method is part of BaseRobotTest.

@@ -0,0 +1,2 @@
Added support for optional Site Map extension to urllib robotparser. Patch by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some links to the new site_maps (:meth:`RobotFileParser.site_maps() <urllib.robotparser.RobotFileParser.site_maps>` -- untested, you'll need to try it locally :)) method or to the urllib.robotparser (:mod:`urllib.robotparser`) module.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@mcscope
Copy link
Contributor Author

mcscope commented May 16, 2018

@berkerpeksag I added the link to the news as you suggested but I can't find any documentation in the dev guide that explains how to do whatever build step I need to do to build the news to evaluate that link. Mind pointing me to the right place?

@mcscope
Copy link
Contributor Author

mcscope commented May 16, 2018

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@Mariatta, @berkerpeksag: please review the changes made to this pull request.

@mcscope
Copy link
Contributor Author

mcscope commented May 16, 2018

Oh, I think I figured out how to make the news, I have to just make the documentation right?

@mcscope
Copy link
Contributor Author

mcscope commented May 16, 2018

Successfully Tested! The news link works

@berkerpeksag
Copy link
Member

@mcscope I assume you've found https://devguide.python.org/committing/#what-s-new-and-news-entries but I will share it anyway in case someone else wonders how to build the docs :)

Copy link
Member

@berkerpeksag berkerpeksag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for helping to finish bpo-21475. This was in my TODO list along with other urllib.robotparser issues but I couldn't find the time to work on them.

@ned-deily
Copy link
Member

Congratulations on your first cpython PR, @mcscope!

@mcscope
Copy link
Contributor Author

mcscope commented May 17, 2018

@berkerpeksag Any other todo-list items you have I could take care of? I'm looking at bpo but having a hard time finding ones that require code, instead of requiring developer consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants