XMPP pubsub #121
XMPP pubsub #121
Conversation
@@ -62,7 +62,8 @@ | |||
with open('db/initial/tvshows.json', encoding='utf-8', errors='ignore') as f: | |||
data = json.load(f) | |||
try: | |||
engine.execute(TvShow.__table__.insert(), data) | |||
for row in data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this tweak because the single transaction was talking too much time (and perhaps memory) to complete.
Can you give me a use case for this so I can test it? Are you using this to publish new releases to an xmpp channel, an app for notifications, or what? The only modifications I'll probably make to this is to expand how it interacts with release processing (just to clean it up a bit) and the process spawning (just to use the existing concurrent.futures stuff). I'll need to work it into the new branch which changes a lot of startup stuff, too (see branch anustart). |
The idea is to reduce the time from release to download. I have a 'client' (https://github.com/bettse/xmppnzb) that runs as a local xmpp client and subscribes to a list of pubsub nodes. Then when any new pubsub event happens, the client receives it and checks the name against the list of regexes the client got from SABnzbd++ (same list used for rss feeds). If the item matches any regex, the URL is sent to SABnzbd to download. I had the system running quite well for a number of months, but then made the mistake of attempting to switch to CoreOS, dockerize pynab, and switch to its postgres version all at the sametime. Thought I did have it working off and on, it wasn't stable. If you'd like to see it working before merging it in, I fully understand, and would like to help, but it may be some time. I still have a box with an xmpp server running, and I just need some time to get things up on it again. |
Just checking - you know software like sickbeard and sonarr exist, right? :) It's no problem merging it in, I just need a bit more info so I can merge it properly - to do that, I need to be able to test it. Now that I know what you're doing, I can do it :) |
I've heard of sickbeard, but not sonarr. My understanding is that it is aimed at filling in episodes of a show, as well as watching rss for new ones. If anything, I'd love if they would integrate the client part of xmppnzb, so new shows would start downloading faster. The heard of my idea is to replace polling based rss with push based xmpp. Yesterday I got pynab back up and running on a host, if you can contact me at my email (bettse@fastmail.fm) with a preferred username and password, I can setup and account to demonstrate how it all works. |
Ahh, I see what you mean. Currently, they don't grab the download until the rss feed is grabbed again - 5 minutes to an hour, or so. That's quite good. I'll email you. |
Ok, here's how it works: To get the xmpp bot to work nicely with the new init scripts and cli, it needed to be functionally disconnected from pynab itself. What it does is open a tiny wsgi server that takes json events and passes them to the xmpp bot. This also means that there's support for any other service that wants to accept new release data as json and do stuff with it, and you can have as many of those hosts as you want - they're handled asynchronously, so they won't interfere with release processing. To test this (I'm not merging it to the primary branch yet), do To start the process: |
I got the branch and started things up. It may take a bit to validate, but I do see the xmpp bot start message. I used to just watch stdout for logs, but it looks like things are backgrounded now. I changed to specifying a log file and saw the various logs created, although the 'scan' log is looking a little...confused: |
Which logfile is that? pynab.scan.log? |
I happens on all of them, but my example was from a livelier one, pynab_scan.log
here is the snippet of my config about logging log = {
# logging settings
# ----------------
# logging_file: a filepath or None to go to stdout
'logging_file': 'logs/pynab.log',
# logging.x where DEBUG, INFO, WARNING, ERROR, etc
# generally, debug if something goes wrong, info for normal usage
'logging_level': logging.INFO,
# max_log_size: maximum size of logfiles before they get rotated
# number, in bytes (this is 50mb)
'max_log_size': 50*1024*1024,
} |
I'll take a look at this when I get home, but I suspect each greenlet is loading the log module and that's causing the spam. |
Turns out this is probably user-ish error. I think the scan process is dying repeatedly (I get an exception when trying to run it un-daemonized), and each run was adding that line. I'm still working the exception, and I'll let you know how it goes when I've got things running |
What's the exception? |
Yeah, it's not the greenlets doing it. That line gets output to the log each time the logging is loaded - once for each daemon, effectively. The greenlets don't access the logger, so they're not triggering it - but if something's preventing the pubsub stuff from running, it'll spam it out. That said, those messages should only appear in the log associated with the process importing them, so if it's happening in every logfile then every daemon is crashing. |
There are some github issues and the like about gevent not supporting python 3.0, but since that's a prereq of pynab, I'm leaning towards thinking it has to do with my using pyenv since the host system only has python 3.2.3. When I was demonstrating pynab running, it was using pyenv with python 3.3.5, but that was also on the postgres branch. The only log file with a lot of those "started pynab logger" was _scan; the rest would have at least one, but not more. when I was watching ps, it was scan that seemed to fall to stay up; |
Yeah, that's a 2.x->3.x error. Let me check the package in use and I'll force it to use the 3.0 version. No exceptions in the scan log? |
None when it was daemonized; perhaps stderr was going elsewhere? It was when I tried to run it manually ( |
Whoops, just realised that was grequests and not eventlets throwing that error. Grequests depends on gevent and gevent doesn't work on python 3.x. I'll use another package. |
grequests doesn't work on 3.x
Try that. |
I just quickly did a reinstall of requirements using pip and tried it, and this is what I got:
I haven't dug in, so this may well be something wrong on my end |
(I also switched from python 3.3.5 in pyenv to 3.4.2; a misguided attempt at fixing the earlier exception) |
I deleted line 5, the import of pymongo in pynab/imdb.py and scan is now running :) |
Yeah, I think this'll be a problem with your python environment rather than anything from here. I'm not seeing the same errors. Try |
Actually, just run |
yup! now I don't need to modify imdb.py. I'll start testing pubsub and let you know how it goes |
Looks like I need to not daemonize if I'm going to get good results. I noticed a repeat of that starting log message and reran scan.py without daemonizationand it died on the first processing release:
|
It's just for debugging mostly, and it's primarily because I forgot an important piece of code :stare: |
Obviously once there are no exceptions and the script doesn't crash, daemonising won't be a problem. Keep running it by console for the moment. Also, try now. |
updated and restarted, so far so good |
The modification sends the entire release set over json as well, so (if desired) you have a lot more data to play with - group, category, poster, size, etc. |
Still getting a variation of the deletion exception:
|
Oops. Try now. |
I woke up to this little guy:
|
Hopefully that should do it :v |
scan ran stably during the day, and this evening I realized the daemonized pubsub script I'd started the day before meant I wasn't seeing any potential exceptions. I was seeing log output saying that POSTs were being made. I re-ran and saw and exception in the parsing of the json it was receiving. I added a log message and just got |
Hmm, all the release information seems to be hitting the JSONPub handler properly and put into the queue. I can't really test it well past that point. |
ah! You're correct, I was misinterpreting the log. I'll continue to monitor it |
So I had to uncomment https://github.com/Murodese/pynab/blob/notify/pynab/xmpp.py#L68 but I got it working :) I'm seeing releases show up in the xmpp client connected from my desktop. |
Whoops, forgot about that. Is it grabbing nzbs properly? |
Yup. I've at at least one go the whole route from indexing to sabnbd++ downloading, implicitly testing the regex matching, and api retrieval. I'm currently experiencing that form of excitement that programmers get when something works, but if they explain it to a lay person, they get looked at like they're nuts. I'm also really excited that this is getting merged into the official project. I thought this was an esoteric feature, but having seen the IRC bot PR as well, I can see how the new release POST architecture opens up a whole world of near-realtime add ons. |
That's the idea :) When I have a bit more time (not currently), I'm going to work at forking a couple of the content managers (sonarr, sickbeard, couchpotato) and having them accept pushed release notifications. |
Ok, I'm going to merge this into the main branch. |
This adds a bot that will post new releases using xmpp pubsub. It was written for my very specific use, so I think it'll need some work to be mergable. This PR is just the start of that discussion.