Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring to prepare for gevent #9

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Refactoring to prepare for gevent #9

wants to merge 6 commits into from

Conversation

hidde-jan
Copy link
Collaborator

This is a WIP branch where I'm refactoring a bit. We currently perform all http requests sequential, which takes a lot of time per submission and limits our ability to add more subreddits to monitor. By using gevent (or some other solution) we can perform the requests (mostly) in parallel, or at least make them non-blocking. This hopefully speeds up the bot a huge deal.

@hidde-jan hidde-jan self-assigned this Jun 9, 2017
@hidde-jan hidde-jan changed the title WIP Refactoring to prepare for gevent Refactoring to prepare for gevent Jun 18, 2017
@hidde-jan
Copy link
Collaborator Author

hidde-jan commented Jun 18, 2017

@justcool393 ok, so I think I got everything worked out. I'm performing all archiving actions in parallel (it's wicked fast 🐎💨), but apply rate limiting to the creation of archives of links pointing to reddit (I know this sounds weird).

I also noticed we got banned from 4 subs, some of which are quite active, so I decided to automatically unsubscribe from subs we have been banned from.

If you're ok with this, I'm going to test this new setup on the server.

@justcool393
Copy link
Owner

justcool393 commented Jun 20, 2017

@hidde-jan I don't know how I missed this message. This is excellent.

Small thing, I believe Archive.is automagically ratelimits its own requests to reddit (ceddit is based of some quirk on how reddit works, so those are based off of the users who are browsing the archive), so it may only be necessary for sites that do not ratelimit themselves. I do think this is an great idea. The reason I had it at five initially was because reddit was weird about requests when they came from archive.org.

Further, I send a message to the admins about maybe getting one of the services (that isn't currently in place) un-spamfiltered for more redundancy. Also, I'll ask again about archive.org when I get a response.

Copy link
Owner

@justcool393 justcool393 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. The only things that I saw were minor (I mentioned in other comments down below), and more suggestions rather than issues

@hidde-jan
Copy link
Collaborator Author

I'm still not completely satisfied. It currently only handles one submission at a time. I want to set up some queue based system that can handle multiple things at once. The xkcd transcriber bot has something like this. I've already started looking at it.

For non-reddit links, we create all archives in parallel. For reddit
links, we rate limit other sites by only creating archives once every
two seconds.
@hidde-jan
Copy link
Collaborator Author

I'm gonna test this in our production setting this week :)

Standard PRAW rate limiting doesn’t work with gevent
@hidde-jan
Copy link
Collaborator Author

hidde-jan commented Mar 19, 2018

Ok, I think I know why I abandoned this last year. Praw is not thread safe. It's rate limit function depends on time.sleep and even which patching that out means that there is no actual way to get this working. I'm still pretty happy with the refactoring in this PR, so I'm going to port them to the master brach and close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants