Skip to content
This repository has been archived by the owner on Jan 2, 2021. It is now read-only.

Experimental not matching (revisited-titles with special chars) #1938

Closed
pipieye opened this issue Apr 29, 2018 · 7 comments
Closed

Experimental not matching (revisited-titles with special chars) #1938

pipieye opened this issue Apr 29, 2018 · 7 comments

Comments

@pipieye
Copy link

pipieye commented Apr 29, 2018

Hi,
It's me again... Pipi.
Thanks for the quick fix on the '0day' issue.
All working now except for the series with special characters in the title...some examples below, hoping this will be a quick fix as well :)

Batman & the Signal (2018) should have found issue 03
0Day 04.25.18 pt1 [09/76] - "Batman & the Signal 03 (of 03) (2018) (Digital) (Zone-Empire).cbr" yEnc
The Demon: Hell Is Earth (2017) should have found issue 06
0Day 04.25.18 pt1 [65/76] - "The Demon - Hell is Earth 06 (of 06) (2018) (digital) (Son of Ultron-Empire).cbr
Doctor Strange: Damnation (2018) should have found issue 04
0Day 04.25.18 pt1 [21/76] - "Doctor Strange - Damnation 004 (2018) (Digital) (Zone-Empire).cbr

PS> I did some experiment and found the following
If we use
http://nzbindex.nl/rss/alt.binaries.comics.dcp/?sort=agedesc&minsize=10&age=1500&dq=Demon%3A+Hell+Is+Earth+6&max=50&more=1
no hit for the above
but if we use (without %3A and use +06 instead of +6)
http://nzbindex.nl/rss/alt.binaries.comics.dcp/?sort=agedesc&minsize=10&age=1500&dq=Demon+Hell+Is+Earth+06&max=50&more=1
then we get a hit

Cheers,
Pipi

@pipieye
Copy link
Author

pipieye commented Apr 30, 2018

I did more testing and found issues with single digits are being skipped, i.e. no match
Looks like nzbindex.nl is invalidating search that have single digits.
The loop where 1, 01, 001 are all passing the +1 parameter in the url instead of +1, +01, +001.
Hope that helps.

As for the special characters, a workaround is to use the alternate search name option, so:
Doctor Strange: Damnation with alternate search name of Doctor Strange Damnation will give us a match. (assuming issue number is higher than 9)

@evilhero
Copy link
Owner

evilhero commented May 1, 2018

Good work on all the detective stuff ;)

It's all actually linked together. The 1, 01, 001 was problematic yes - at one point searching for just '1' would work for 01, 001 regardless, but that's changed now. Also the alternate search name isn't needed for the Damnation one - Mylar will automatically determine if there's a non-alphanumeric character in the string and do a subsequent search against the given title with the characters removed. It's just that you probably hit the right combo of the padded issue number and the alternate title ;)

I have this fixed and working in my local copy now - going to push it soon but am still testing some other things out with it (right now it will loop through all issue number sequences, and alternate names and THEN parse for results, but I was hoping to let it do the parsing WHILE it's sequencing through so that it doesn't have to iterate over all the feeds and take more time).

@evilhero evilhero self-assigned this May 1, 2018
evilhero added a commit that referenced this issue May 1, 2018
…s that are padded & search titles having common words removed for more accurate/related hits, FIX: When Oneoff has been downloaded on weekly pull, would point to incorrect local link instead of pointing to CV
@pipieye
Copy link
Author

pipieye commented May 4, 2018

Looks like fix is working :)
Thank you sooo sooo much. ! ! !
Cheers
Pipi

@pipieye
Copy link
Author

pipieye commented May 22, 2018

Certain series are causing index out of range. Not sure what is causing this. :(

@evilhero
Copy link
Owner

Can you provide some of the series that are giving the index out of range? Odds are it's specific just to those titles and probably due to something in the nzb header that's causing the problem.

@pipieye
Copy link
Author

pipieye commented Jul 12, 2018

updated to latest experimental and noticed, getting less and less match. As an example manual search shows :
0.day.18.07.04 pt1 [16/71] - "Deathstroke 033 (2018) (2 covers) (Digital) (Zone-Empire).cbr" yEnc
and Mylar is doing this:
12-Jul-2018 09:45:36 - DEBUG :: mylar.Startit.75 : SEARCH-QUEUE : Now searching experimental for issue number: 033 to try and ensure all the bases are covered
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.789 : SEARCH-QUEUE : checking search result: 0.day.18.07.04 pt1 [16/71] - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.799 : SEARCH-QUEUE : sub:0.day.18.07.04 pt1 [16/71] - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.846 : SEARCH-QUEUE : comsize_b: 35636204
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.883 : SEARCH-QUEUE : size given as: 34.0 MB
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.990 : SEARCH-QUEUE : Wed, 04 Jul 2018 16:01:26 +0200 is after store date of 2018-07-04
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1009 : SEARCH-QUEUE : Entry: 0.day.18.07.04 pt1 [16/71] - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1041 : SEARCH-QUEUE : Cleantitle: 0 day 18 07 04 pt1 16 71 - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1156 : SEARCH-QUEUE : Cleantitle: 0 day 18 07 04 pt1 16 71 - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1183 : SEARCH-QUEUE : 0. Bracket Word: 0 day 18 07 04 pt1 16 71 -
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1189 : SEARCH-QUEUE : Comic: 0 day 18 07 04 pt1 16 71 -
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1190 : SEARCH-QUEUE : UseFuzzy is : None
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1191 : SEARCH-QUEUE : ComVersChk : 4
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1183 : SEARCH-QUEUE : 1. Bracket Word: 2018
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1200 : SEARCH-QUEUE : year detected: 2018
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1202 : SEARCH-QUEUE : year looking for: 2018
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1204 : SEARCH-QUEUE : 2018 - right years match baby!
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1408 : SEARCH-QUEUE : original nzb comic and issue: 0 day 18 07 04 pt1 16 71 -
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1419 : SEARCH-QUEUE : [Deathstroke] I have found a - within the nzbname @ position: 27
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1424 : SEARCH-QUEUE : There is no hyphen present in the series title.
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1425 : SEARCH-QUEUE : Assuming position start is : 27
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1469 : SEARCH-QUEUE : adjusted nzb comic and issue: 0 day 18 07 04 pt1 16 71
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1490 : SEARCH-QUEUE : chg_comic:0 day 18 07 04 pt1 16
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1497 : SEARCH-QUEUE : chg_comic: 0DAY180704PT116
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1498 : SEARCH-QUEUE : findcomic_chksplit: DEATHSTROKE
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1503 : SEARCH-QUEUE : changeup to decimal: .71
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1550 : SEARCH-QUEUE : adjusting from: 0 day 18 07 04 pt1 16 71 to: 16.71
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1559 : SEARCH-QUEUE : ['0', 'day', '18', '07', '04', 'pt1', '16', '71'] nzb series word count: 6
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1560 : SEARCH-QUEUE : ['deathstroke'] watchlist word count: 1
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1626 : SEARCH-QUEUE : splitst : 6
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1627 : SEARCH-QUEUE : len-watchcomic : 1
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1629 : SEARCH-QUEUE : incorrect comic lengths after removal...not a match.
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.789 : SEARCH-QUEUE : checking search result: 0.day.18.07.04 pt1 [16/71] - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.799 : SEARCH-QUEUE : sub:0.day.18.07.04 pt1 [16/71] - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.846 : SEARCH-QUEUE : comsize_b: 35636204
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.883 : SEARCH-QUEUE : size given as: 34.0 MB
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.990 : SEARCH-QUEUE : Wed, 04 Jul 2018 16:01:26 +0200 is after store date of 2018-07-04
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1009 : SEARCH-QUEUE : Entry: 0.day.18.07.04 pt1 [16/71] - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1041 : SEARCH-QUEUE : Cleantitle: 0 day 18 07 04 pt1 16 71 - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1156 : SEARCH-QUEUE : Cleantitle: 0 day 18 07 04 pt1 16 71 - (2018)
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1183 : SEARCH-QUEUE : 0. Bracket Word: 0 day 18 07 04 pt1 16 71 -
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1189 : SEARCH-QUEUE : Comic: 0 day 18 07 04 pt1 16 71 -
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1190 : SEARCH-QUEUE : UseFuzzy is : None
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1191 : SEARCH-QUEUE : ComVersChk : 4
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1183 : SEARCH-QUEUE : 1. Bracket Word: 2018
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1200 : SEARCH-QUEUE : year detected: 2018
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1202 : SEARCH-QUEUE : year looking for: 2018
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1204 : SEARCH-QUEUE : 2018 - right years match baby!
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1408 : SEARCH-QUEUE : original nzb comic and issue: 0 day 18 07 04 pt1 16 71 -
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1419 : SEARCH-QUEUE : [Deathstroke] I have found a - within the nzbname @ position: 27
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1424 : SEARCH-QUEUE : There is no hyphen present in the series title.
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1425 : SEARCH-QUEUE : Assuming position start is : 27
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1469 : SEARCH-QUEUE : adjusted nzb comic and issue: 0 day 18 07 04 pt1 16 71
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1490 : SEARCH-QUEUE : chg_comic:0 day 18 07 04 pt1 16
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1497 : SEARCH-QUEUE : chg_comic: 0DAY180704PT116
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1498 : SEARCH-QUEUE : findcomic_chksplit: DEATHSTROKE
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1503 : SEARCH-QUEUE : changeup to decimal: .71
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1550 : SEARCH-QUEUE : adjusting from: 0 day 18 07 04 pt1 16 71 to: 16.71
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1559 : SEARCH-QUEUE : ['0', 'day', '18', '07', '04', 'pt1', '16', '71'] nzb series word count: 6
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1560 : SEARCH-QUEUE : ['deathstroke'] watchlist word count: 1
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1626 : SEARCH-QUEUE : splitst : 6
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1627 : SEARCH-QUEUE : len-watchcomic : 1
12-Jul-2018 09:45:43 - DEBUG :: mylar.NZB_SEARCH.1629 : SEARCH-QUEUE : incorrect comic lengths after removal...not a match.
12-Jul-2018 09:45:43 - INFO :: mylar.search_init.366 : SEARCH-QUEUE : Could not find Issue 33 of Deathstroke (2016) using experimental [api]

At position 27, should parse from there onward, looks like it is parsing words from before position 27 instead???

@pipieye
Copy link
Author

pipieye commented Jul 22, 2018

21-Jul-2018 22:42:48 - INFO :: mylar.NZB_SEARCH.477 : SEARCH-QUEUE : Shhh be very quiet...I'm looking for Batman: Sins of the Father issue: 12 (2018) using experimental.
21-Jul-2018 22:42:48 - ERROR :: mylar.excepthook.314 : SEARCH-QUEUE : Uncaught exception: Traceback (most recent call last):
File "/Volumes/Samsung840EVO/Users/Myapps/Mylar/mylar/mylar/logger.py", line 336, in new_run
old_run(*args, **kwargs)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/Volumes/Samsung840EVO/Users/Myapps/Mylar/mylar/mylar/helpers.py", line 3007, in search_queue
ss_queue = mylar.search.searchforissue(item['issueid'])
File "/Volumes/Samsung840EVO/Users/Myapps/Mylar/mylar/mylar/search.py", line 2118, in searchforissue
foundNZB, prov = search_init(ComicName, IssueNumber, str(IssueYear), SeriesYear, Publisher, IssueDate, StoreDate, actissueid, AlternateSearch, UseFuzzy, ComicVersion, SARC=SARC, IssueArcID=IssueArcID, mode=mode, rsscheck=rsscheck, ComicID=ComicID, filesafe=Comicname_filesafe, allow_packs=allow_packs, oneoff=oneoff, manual=manual, torrentid_32p=TorrentID_32p)
File "/Volumes/Samsung840EVO/Users/Myapps/Mylar/mylar/mylar/search.py", line 336, in search_init
findit = NZB_SEARCH(ComicName, IssueNumber, ComicYear, SeriesYear, Publisher, IssueDate, StoreDate, searchprov, send_prov_count, IssDateFix, IssueID, UseFuzzy, newznab_host, ComicVersion=ComicVersion, SARC=SARC, IssueArcID=IssueArcID, RSS="no", ComicID=ComicID, issuetitle=issuetitle, unaltered_ComicName=unaltered_ComicName, allow_packs=allow_packs, oneoff=oneoff, cmloopit=cmloopit, manual=manual, torznab_host=torznab_host, torrentid_32p=torrentid_32p)
File "/Volumes/Samsung840EVO/Users/Myapps/Mylar/mylar/mylar/search.py", line 759, in NZB_SEARCH
bb = findcomicfeed.Startit(u_ComicName, isssearch, comyear, ComicVersion, IssDateFix)
File "/Volumes/Samsung840EVO/Users/Myapps/Mylar/mylar/mylar/findcomicfeed.py", line 27, in Startit
if all([tehstart == 0, searchName[tehend] == ' ']) or all([tehstart != 0, searchName[tehstart-1] == ' ', searchName[tehend] == ' ']):
IndexError: string index out of range

Finally got one of those index out of range again. Not sure what to make of this.

@evilhero evilhero closed this as completed Jan 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants