Skip to content
This repository has been archived by the owner. It is now read-only.

XMPP pubsub #121

Merged
merged 2 commits into from Jan 18, 2015
Merged

XMPP pubsub #121

merged 2 commits into from Jan 18, 2015

Conversation

bettse
Copy link
Contributor

@bettse bettse commented Jan 11, 2015

This adds a bot that will post new releases using xmpp pubsub. It was written for my very specific use, so I think it'll need some work to be mergable. This PR is just the start of that discussion.

@@ -62,7 +62,8 @@
with open('db/initial/tvshows.json', encoding='utf-8', errors='ignore') as f:
data = json.load(f)
try:
engine.execute(TvShow.__table__.insert(), data)
for row in data:
Copy link
Contributor Author

@bettse bettse Jan 11, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this tweak because the single transaction was talking too much time (and perhaps memory) to complete.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 12, 2015

Can you give me a use case for this so I can test it? Are you using this to publish new releases to an xmpp channel, an app for notifications, or what?

The only modifications I'll probably make to this is to expand how it interacts with release processing (just to clean it up a bit) and the process spawning (just to use the existing concurrent.futures stuff). I'll need to work it into the new branch which changes a lot of startup stuff, too (see branch anustart).

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 13, 2015

The idea is to reduce the time from release to download. I have a 'client' (https://github.com/bettse/xmppnzb) that runs as a local xmpp client and subscribes to a list of pubsub nodes. Then when any new pubsub event happens, the client receives it and checks the name against the list of regexes the client got from SABnzbd++ (same list used for rss feeds). If the item matches any regex, the URL is sent to SABnzbd to download.

I had the system running quite well for a number of months, but then made the mistake of attempting to switch to CoreOS, dockerize pynab, and switch to its postgres version all at the sametime. Thought I did have it working off and on, it wasn't stable.

If you'd like to see it working before merging it in, I fully understand, and would like to help, but it may be some time. I still have a box with an xmpp server running, and I just need some time to get things up on it again.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 13, 2015

Just checking - you know software like sickbeard and sonarr exist, right? :)

It's no problem merging it in, I just need a bit more info so I can merge it properly - to do that, I need to be able to test it. Now that I know what you're doing, I can do it :)

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 13, 2015

Just checking - you know software like sickbeard and sonarr exist, right? :)

I've heard of sickbeard, but not sonarr. My understanding is that it is aimed at filling in episodes of a show, as well as watching rss for new ones. If anything, I'd love if they would integrate the client part of xmppnzb, so new shows would start downloading faster. The heard of my idea is to replace polling based rss with push based xmpp.

Yesterday I got pynab back up and running on a host, if you can contact me at my email (bettse@fastmail.fm) with a preferred username and password, I can setup and account to demonstrate how it all works.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 13, 2015

Ahh, I see what you mean. Currently, they don't grab the download until the rss feed is grabbed again - 5 minutes to an hour, or so. That's quite good.

I'll email you.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 14, 2015

Ok, here's how it works:

To get the xmpp bot to work nicely with the new init scripts and cli, it needed to be functionally disconnected from pynab itself. What it does is open a tiny wsgi server that takes json events and passes them to the xmpp bot.

This also means that there's support for any other service that wants to accept new release data as json and do stuff with it, and you can have as many of those hosts as you want - they're handled asynchronously, so they won't interfere with release processing.

To test this (I'm not merging it to the primary branch yet), do git fetch and git checkout notify to change branch, then recopy the config from sample, edit as appropriate. There's a new bot section as well as some notes at the end of the scan section (for publishing).

To start the process: ./pynab.py pubsub Let me know how it goes - if it works, I'll merge it.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 15, 2015

I got the branch and started things up. It may take a bit to validate, but I do see the xmpp bot start message.

I used to just watch stdout for logs, but it looks like things are backgrounded now. I changed to specifying a log file and saw the various logs created, although the 'scan' log is looking a little...confused:

untitled

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 15, 2015

Which logfile is that? pynab.scan.log?

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 15, 2015

I happens on all of them, but my example was from a livelier one, pynab_scan.log

bettse@kobol ~/Projects/pynab (notify*) $ tail -f logs/pynab_scan.log

here is the snippet of my config about logging

log = {
    # logging settings 
    # ----------------
    # logging_file: a filepath or None to go to stdout 
    'logging_file': 'logs/pynab.log',  

    # logging.x where DEBUG, INFO, WARNING, ERROR, etc
    # generally, debug if something goes wrong, info for normal usage 
    'logging_level': logging.INFO,  

    # max_log_size: maximum size of logfiles before they get rotated 
    # number, in bytes (this is 50mb)  
    'max_log_size': 50*1024*1024, 
}                

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 15, 2015

I'll take a look at this when I get home, but I suspect each greenlet is loading the log module and that's causing the spam.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

Turns out this is probably user-ish error. I think the scan process is dying repeatedly (I get an exception when trying to run it un-daemonized), and each run was adding that line. I'm still working the exception, and I'll let you know how it goes when I've got things running

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

What's the exception?

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Yeah, it's not the greenlets doing it. That line gets output to the log each time the logging is loaded - once for each daemon, effectively. The greenlets don't access the logger, so they're not triggering it - but if something's preventing the pubsub stuff from running, it'll spam it out. That said, those messages should only appear in the log associated with the process importing them, so if it's happening in every logfile then every daemon is crashing.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

bettse@kobol ~/Projects/pynab (notify) $ python scan.py 
Traceback (most recent call last):
  File "scan.py", line 14, in <module>
    import pynab.releases
  File "/home/bettse/Projects/pynab/pynab/releases.py", line 5, in <module>
    import grequests
  File "/home/bettse/.pyenv/versions/3.3.5/lib/python3.3/site-packages/grequests.py", line 14, in <module>
    import gevent
  File "/home/bettse/.pyenv/versions/3.3.5/lib/python3.3/site-packages/gevent/__init__.py", line 36, in <module>
    from gevent.hub import get_hub, iwait, wait
  File "/home/bettse/.pyenv/versions/3.3.5/lib/python3.3/site-packages/gevent/hub.py", line 282
    except Exception, ex:
                    ^
SyntaxError: invalid syntax

There are some github issues and the like about gevent not supporting python 3.0, but since that's a prereq of pynab, I'm leaning towards thinking it has to do with my using pyenv since the host system only has python 3.2.3. When I was demonstrating pynab running, it was using pyenv with python 3.3.5, but that was also on the postgres branch.

The only log file with a lot of those "started pynab logger" was _scan; the rest would have at least one, but not more. when I was watching ps, it was scan that seemed to fall to stay up; pubsub.py seems good (at least in terms of this).

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Yeah, that's a 2.x->3.x error. Let me check the package in use and I'll force it to use the 3.0 version.

No exceptions in the scan log?

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

None when it was daemonized; perhaps stderr was going elsewhere? It was when I tried to run it manually (python scan.py) that I saw the error and started to put things together.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Whoops, just realised that was grequests and not eventlets throwing that error. Grequests depends on gevent and gevent doesn't work on python 3.x. I'll use another package.

jamesmeneghello added a commit that referenced this issue Jan 16, 2015
@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Try that.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

I just quickly did a reinstall of requirements using pip and tried it, and this is what I got:

bettse@kobol ~/Projects/pynab (notify) $ python scan.py                 
Traceback (most recent call last):
  File "scan.py", line 18, in <module>
    import pynab.imdb
  File "/home/bettse/Projects/pynab/pynab/imdb.py", line 5, in <module>
    import pymongo
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/pymongo/__init__.py", line 92, in <module>
    from pymongo.connection import Connection
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/pymongo/connection.py", line 39, in <module>
    from pymongo.mongo_client import MongoClient
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/pymongo/mongo_client.py", line 46, in <module>
    from pymongo import (auth,
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/pymongo/pool.py", line 22, in <module>
    from pymongo import thread_util
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/pymongo/thread_util.py", line 31, in <module>
    from gevent.lock import BoundedSemaphore as GeventBoundedSemaphore
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/gevent/__init__.py", line 36, in <module>
    from gevent.hub import get_hub, iwait, wait
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/gevent/hub.py", line 282
    except Exception, ex:
                    ^
SyntaxError: invalid syntax

I haven't dug in, so this may well be something wrong on my end

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

(I also switched from python 3.3.5 in pyenv to 3.4.2; a misguided attempt at fixing the earlier exception)

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

I deleted line 5, the import of pymongo in pynab/imdb.py and scan is now running :)

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Yeah, I think this'll be a problem with your python environment rather than anything from here. I'm not seeing the same errors. Try pip install --force-reinstall pymongo.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Actually, just run pip uninstall gevent - maybe pymongo is trying to import it because it exists (even though it's a 2.x lib and not 3.x), and that forces that exception. I assume if gevent isn't installed it falls back onto concurrent.futures or something.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

yup! now I don't need to modify imdb.py. I'll start testing pubsub and let you know how it goes

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

Looks like I need to not daemonize if I'm going to get good results. I noticed a repeat of that starting log message and reran scan.py without daemonizationand it died on the first processing release:

bettse@kobol ~/Projects/pynab (notify) $ python scan.py 
Traceback (most recent call last):
  File "scan.py", line 182, in <module>
    main(mode=mode, group=args.group, date=args.date)
  File "scan.py", line 118, in main
    process()
  File "scan.py", line 55, in process
    pynab.releases.process()
  File "/home/bettse/Projects/pynab/pynab/releases.py", line 349, in process
    futures = [request_session.post(host, data=to_json(release)) for host in config.scan.get('publish_hosts')]
  File "/home/bettse/Projects/pynab/pynab/releases.py", line 349, in <listcomp>
    futures = [request_session.post(host, data=to_json(release)) for host in config.scan.get('publish_hosts')]
  File "/home/bettse/Projects/pynab/pynab/db.py", line 130, in to_json
    del obj['_sa_instance_state']
TypeError: 'Release' object does not support item deletion

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

It's just for debugging mostly, and it's primarily because I forgot an important piece of code :stare:

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Obviously once there are no exceptions and the script doesn't crash, daemonising won't be a problem. Keep running it by console for the moment.

Also, try now.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

updated and restarted, so far so good

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

The modification sends the entire release set over json as well, so (if desired) you have a lot more data to play with - group, category, poster, size, etc.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

Still getting a variation of the deletion exception:

bettse@kobol ~/Projects/pynab (notify) $ python scan.py
Traceback (most recent call last):
  File "scan.py", line 182, in <module>
    main(mode=mode, group=args.group, date=args.date)
  File "scan.py", line 118, in main
    process()
  File "scan.py", line 55, in process
    pynab.releases.process()
  File "/home/bettse/Projects/pynab/pynab/releases.py", line 349, in process
    futures = [request_session.post(host, data=to_json(release)) for host in config.scan.get('publish_hosts')]
  File "/home/bettse/Projects/pynab/pynab/releases.py", line 349, in <listcomp>
    futures = [request_session.post(host, data=to_json(release)) for host in config.scan.get('publish_hosts')]
  File "/home/bettse/Projects/pynab/pynab/db.py", line 131, in to_json
    del obj['_sa_instance_state']
TypeError: 'str' object does not support item deletion
``

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Oops. Try now.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 16, 2015

I woke up to this little guy:

bettse@kobol ~/Projects/pynab (notify) $ python scan.py
Connection to github.com closed by remote host.
Traceback (most recent call last):
  File "scan.py", line 182, in <module>
    main(mode=mode, group=args.group, date=args.date)
  File "scan.py", line 118, in main
    process()
  File "scan.py", line 55, in process
    pynab.releases.process()
  File "/home/bettse/Projects/pynab/pynab/releases.py", line 197, in process
    ).first()
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2367, in first
    ret = list(self[0:1])
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2228, in __getitem__
    return list(res)
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/sqlalchemy/orm/loading.py", line 73, in instances
    rows = [process[0](row, None) for row in fetch]
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/sqlalchemy/orm/loading.py", line 73, in <listcomp>
    rows = [process[0](row, None) for row in fetch]
  File "/home/bettse/.pyenv/versions/3.4.2/lib/python3.4/site-packages/sqlalchemy/orm/loading.py", line 369, in _instance
    state = attributes.instance_state(instance)
AttributeError: 'Release' object has no attribute '_sa_instance_state'

jamesmeneghello added a commit that referenced this issue Jan 16, 2015
@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 16, 2015

Hopefully that should do it :v

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 17, 2015

scan ran stably during the day, and this evening I realized the daemonized pubsub script I'd started the day before meant I wasn't seeing any potential exceptions. I was seeing log output saying that POSTs were being made. I re-ran and saw and exception in the parsing of the json it was receiving. I added a log message and just got JSONPub: received EvhoqoEUXhEvhoqoEUXhEvhoqoEUXh. It looks a bit like a release name. I'm wondering if the whole json isn't being POSTed?

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 17, 2015

Hmm, all the release information seems to be hitting the JSONPub handler properly and put into the queue. I can't really test it well past that point.

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 17, 2015

ah! You're correct, I was misinterpreting the log. I'll continue to monitor it

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 17, 2015

So I had to uncomment https://github.com/Murodese/pynab/blob/notify/pynab/xmpp.py#L68 but I got it working :) I'm seeing releases show up in the xmpp client connected from my desktop.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 17, 2015

Whoops, forgot about that. Is it grabbing nzbs properly?

@bettse
Copy link
Contributor Author

@bettse bettse commented Jan 17, 2015

Yup. I've at at least one go the whole route from indexing to sabnbd++ downloading, implicitly testing the regex matching, and api retrieval. I'm currently experiencing that form of excitement that programmers get when something works, but if they explain it to a lay person, they get looked at like they're nuts.

I'm also really excited that this is getting merged into the official project. I thought this was an esoteric feature, but having seen the IRC bot PR as well, I can see how the new release POST architecture opens up a whole world of near-realtime add ons.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 18, 2015

That's the idea :) When I have a bit more time (not currently), I'm going to work at forking a couple of the content managers (sonarr, sickbeard, couchpotato) and having them accept pushed release notifications.

@jamesmeneghello
Copy link
Owner

@jamesmeneghello jamesmeneghello commented Jan 18, 2015

Ok, I'm going to merge this into the main branch.

@jamesmeneghello jamesmeneghello merged commit 0025a9b into jamesmeneghello:development-postgres Jan 18, 2015
@bettse bettse deleted the development-postgres branch Jan 19, 2015
jamesmeneghello added a commit that referenced this issue Jan 19, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants