Language in movie title #28

robmcmullen · 2013-03-26T12:20:56Z

A movie like "The Italian Job" or "The Spanish Prisoner" returns a guess where the word is identified as the movie's language rather than part of the title:

$ guessit The_Italian_Job.mkv
GuessIt found: {
    [1.00] "type": "movie", 
    [1.00] "container": "mkv", 
    [0.30] "language": [
        "Italian"
    ], 
    [0.60] "title": "The"
}

Seems like a tough problem to solve. Any way around this, maybe a way to craft the filename so this doesn't happen?

The text was updated successfully, but these errors were encountered:

Diaoul · 2013-03-26T13:01:53Z

I think the language position should be after or before a potential title. I'm confident that @wackou will find a way to fix this :)

wackou · 2013-03-26T20:49:49Z

Indeed this is a tricky one... We can't rule out "italian" to not appear as a language, so one solution would be to have a hardcoded list of movie titles that should take precedence over language detection, but that's obviously not the best solution...

Another idea could be that when a language written in english (ie: not a language code such as "en" or "fr") is surrounded by spaces or underscores it should be part of a title but if it is surrounded by [] or () it is probably an audio or subtitle language, although I can see how this could fail, too.

Any better idea?

wackou · 2013-03-26T20:59:44Z

@Diaoul the way it works at the moment is that the title is always guessed last, as that's the thing that we don't know anything about. So the language detection always comes first, and at this moment, we have no information yet about a potential title so we can't reason using that...

hmmm actually writing this just gave me an idea! If we run guessit once normally, and then another time, but by disabling the language detection, and then comparing the titles, we should get the same title most of the time but in your case we would get the title being cut by the language, so we would know something bad happened and then we can still return the title from the "no language" detection. That should work! I'll try to see if I can hack up something.

Diaoul · 2013-03-26T21:32:45Z

Inception 😉

robmcmullen · 2013-03-26T21:34:45Z

For me, just being able to turn off language detection would work since I don't have multi-lingual stuff. I couldn't see how to do that immediately, but I didn't get too far looking...

Delimiting the language string would be more general, but perhaps the delimiter characters could be an input to guessit so they could be customized. And maybe the default would be as it is currently so existing users wouldn't break.

The two-pass solution sounds better than anything I was able to think off. It would probably be able to handle other corner cases as well (e.g. OSS_117--Cairo,_Nest_of_Spies.mkv), so some hardcoded stuff in lng_common_words in language.py could be removed.

Diaoul · 2013-03-26T22:07:14Z

Only do the two pass when the group isn't explicit, by explicit I mean surrounded by [] or () or anything of that kind.

wackou · 2013-03-30T17:51:10Z

I just pushed a solution that I'm quite happy with :-) Let me know if it works for you.

robmcmullen · 2013-03-30T20:53:31Z

Nice! It works for all my test cases. Thanks.

wackou · 2013-03-31T10:33:01Z

Awesome! If you have other test cases that fail, don't hesitate, I'll try to tag a release soon so let's get in as much as possible!

wackou closed this as completed Mar 31, 2013

heroclix mentioned this issue Apr 30, 2013

Language code (abbreviation) in movie Title ( follow up on issue #28) #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language in movie title #28

Language in movie title #28

robmcmullen commented Mar 26, 2013

Diaoul commented Mar 26, 2013

wackou commented Mar 26, 2013

wackou commented Mar 26, 2013

Diaoul commented Mar 26, 2013

robmcmullen commented Mar 26, 2013

Diaoul commented Mar 26, 2013

wackou commented Mar 30, 2013

robmcmullen commented Mar 30, 2013

wackou commented Mar 31, 2013

Language in movie title #28

Language in movie title #28

Comments

robmcmullen commented Mar 26, 2013

Diaoul commented Mar 26, 2013

wackou commented Mar 26, 2013

wackou commented Mar 26, 2013

Diaoul commented Mar 26, 2013

robmcmullen commented Mar 26, 2013

Diaoul commented Mar 26, 2013

wackou commented Mar 30, 2013

robmcmullen commented Mar 30, 2013

wackou commented Mar 31, 2013