RSS Regex doesn't works like before did #6367

Open
asturel opened this Issue Feb 10, 2017 · 4 comments

Projects

None yet

2 participants

@asturel
asturel commented Feb 10, 2017

Please provide the following information

qBittorrent version and Operating System:

qbittorrent compiled from master 73f7622

If on linux, libtorrent and Qt version:

What is the problem:

The regex which worked previosly now gives me error.

What is the expected behavior:

Working just like before did :)

Steps to reproduce:

enter this to most contain regex:
(^sherlock|^Lethal.Weapon|westworld|^the.strain|^mr.{1,2}robot|^lucifer|^better.call.saul|^daredevil|^gotham|^colony|x.?files|^workaholics|^the.100|^The.Blacklist|^heroes.reborn|^teen.wolf|^The.Magicians|^supergirl|^limitless|^the.flash|^into.the.badlands|^the.expanse|^The.Shannara.Chronicles|Legends.of.Tomorrow|^supergirl|^the.vampire.diaries|^the.originals|^limitless|^grimm|Marvels.Agents.of.S.H.I.E.L.D|^hannibal|^suits|^banshee|^elementary|^homeland|^falling.skies|^under.the.dome|^futurama|^defiance|^arrow|^supernatural|^doctor.who|game.of.thrones|the.big.bang.theory|^family.guy|walking.dead).*?(S\d+E\d+|\d+x\d+).*720p(?!.*web[\- _\.]?(dl|rip).*)

Extra info(if any):

perl test script:

$s = "The.Blacklist.S04E13.720p.HDTV.x264-KILLERS";
$s =~ /(^sherlock|^Lethal.Weapon|westworld|^the.strain|^mr.{1,2}robot|^lucifer|^better.call.saul|^daredevil|^gotham|^colony|x.?files|^workaholics|^the.100|^The.Blacklist|^heroes.reborn|^teen.wolf|^The.Magicians|^supergirl|^limitless|^the.flash|^into.the.badlands|^the.expanse|^The.Shannara.Chronicles|Legends.of.Tomorrow|^supergirl|^the.vampire.diaries|^the.originals|^limitless|^grimm|Marvels.Agents.of.S.H.I.E.L.D|^hannibal|^suits|^banshee|^elementary|^homeland|^falling.skies|^under.the.dome|^futurama|^defiance|^arrow|^supernatural|^doctor.who|game.of.thrones|the.big.bang.theory|^family.guy|walking.dead).*?(S\d+E\d+|\d+x\d+).*720p(?!.*web[\- _\.]?(dl|rip).*)/;

print "Before: $`\n";
print "Matched: $&\n";
print "After: $'\n";

======
result:

Before:
Matched: The.Blacklist.S04E13.720p
After: .HDTV.x264-KILLERS

previosly this regex worked just fine, now qbt gives me error :(

@asturel
asturel commented Feb 10, 2017

seems like it have problem with
(?!.*web[\- _\.]?(dl|rip).*)
specially with ?!. part

@magao
Contributor
magao commented Feb 10, 2017 edited

@asturel It's showing as an invalid regex. I rolled back my build to 88b2b26 before any of my changes were committed and it still shows as invalid. It also shows as invalid in v3.3.10.

Can you identify the commit where the behaviour changed, and what exactly changed?

Note that due to configurations files being split in commit 077ad65, you may need to execute the following (on Linux) to get your feeds working properly prior to that commit:

tail -n 1 ~/.config/qBittorrent/qBittorrent-rss-feeds.conf >> ~/.config/qBittorrent/qBittorrent-rss.conf

(probably want to make a backup of the 2 files first).

@magao
Contributor
magao commented Feb 11, 2017 edited

Qt has 2 regular expression classes - QRegExp which is Perl-like but not 100% Perl-re compatible; and QRegularExpression, which is supposed to be 100% Perl-re compatible. The code base has (as far as I'm aware) always used QRegExp and this pattern is not a valid QRegExp pattern and would never have been accepted. I'm guessing the pattern used to be less complicated ...

To fix this would require refactoring the RSS matching to use QRegularExpression, which is only available with Qt 5.0+. See PR #6369.

Note that I would strongly recommend splitting this regexp into multiple rules to simplify maintenance.

Quick python program demonstrating the differences.

import re
import sys
import PyQt5.QtCore

PATTERN = r'(^sherlock|^Lethal.Weapon|westworld|^the.strain|^mr.{1,2}robot|^lucifer|^better.call.saul|^daredevil|^gotham|^colony|x.?files|^workaholics|^the.100|^The.Blacklist|^heroes.reborn|^teen.wolf|^The.Magicians|^supergirl|^limitless|^the.flash|^into.the.badlands|^the.expanse|^The.Shannara.Chronicles|Legends.of.Tomorrow|^supergirl|^the.vampire.diaries|^the.originals|^limitless|^grimm|Marvels.Agents.of.S.H.I.E.L.D|^hannibal|^suits|^banshee|^elementary|^homeland|^falling.skies|^under.the.dome|^futurama|^defiance|^arrow|^supernatural|^doctor.who|game.of.thrones|the.big.bang.theory|^family.guy|walking.dead).*?(S\d+E\d+|\d+x\d+).*720p(?!.*web[\- _\.]?(dl|rip).*)'

py_re = re.compile(PATTERN)
qt_re = PyQt5.QtCore.QRegExp(PATTERN)
qt_regex = PyQt5.QtCore.QRegularExpression(PATTERN)

print('Valid QRegExp:', qt_re.isValid())
print('Valid QRegularExpression:', qt_regex.isValid())

for s in (r'The.Blacklist.S04E13.720p.HDTV.x264-KILLERS',):
    m = py_re.search(s)
    print('Python re match:', list(m.groups()) if m else None)

    qtm = qt_re.indexIn(s);
    print('QRegExp index:', qtm)

    qtm = qt_regex.match(s)
    print('QRegularExpression match:', [qtm.captured(i + 1) for i in range(qt_regex.captureCount())] if qtm.hasMatch() else None)

Output:

Valid QRegExp: False
Valid QRegularExpression: True
Python re match: ['The.Blacklist', 'S04E13', None]
QRegExp index: -1
QRegularExpression match: ['The.Blacklist', 'S04E13', '']
@magao magao added a commit to magao/qBittorrent that referenced this issue Feb 11, 2017
@magao magao Use Perl-compatible regexes for RSS rules. Closes #6367.
--HG--
branch : magao-dev
f9abd25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment