You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 18, 2023. It is now read-only.
1
00:00:03,003 --> 00:00:06,131
<i>at the beginning of media
number one, volume one.</i>
Having the <i> tag sometimes (but not always!) make Deepl not to translate this whole "portion" (the first 4500 characters!) of the srt file. Sometimes it translates some part of the text but not all the text, etc. Some other times nothing gets translated when there are HTML tag(s) in the input text.
Going to the Deepl web site and copy&pasting the text manually gives the same unpredictable results. Pressing Ctrl+F5 on the site sometimes changing the translation -- but it is never perfect when there are HTML tags in the input text! The results are simply unpredictable. Especially when not just one subtitle but more subtitles have some text between <i> and </i> tags.
After spending several hours on it, I could only solve the issue by removing all HTML tags from the input srt file first. I never like italic, bold or even colored subtitles anyway.
I also created a patch which removes all HTML tags from the input on the fly as processing. I may create a PR for this change later, but not today. Until then here is the fix. I changed the beginning of the srt_parser.py file to look like this:
import srt
import logging
import re
CLEANR = re.compile('<.*?>')
def open_srt(file_path):
logging.info(f"Reading {file_path}")
with open(file_path, "r", encoding="utf-8", errors="ignore") as srt_file:
srt_file = srt.parse(srt_file)
subs = list(srt_file)
subs = list(srt.sort_and_reindex(subs))
for sub in subs:
sub.content = srt.make_legal_content(CLEANR.sub('', sub.content))
sub.content = sub.content.strip().replace("\n", " ")
return subs
Please note how the CLEANR regular expression is being used in line 16 now. The rest of the file is unchanged.
The text was updated successfully, but these errors were encountered:
I have an srt file which begins like this:
Having the
<i>
tag sometimes (but not always!) make Deepl not to translate this whole "portion" (the first 4500 characters!) of the srt file. Sometimes it translates some part of the text but not all the text, etc. Some other times nothing gets translated when there are HTML tag(s) in the input text.Going to the Deepl web site and copy&pasting the text manually gives the same unpredictable results. Pressing Ctrl+F5 on the site sometimes changing the translation -- but it is never perfect when there are HTML tags in the input text! The results are simply unpredictable. Especially when not just one subtitle but more subtitles have some text between
<i>
and</i>
tags.After spending several hours on it, I could only solve the issue by removing all HTML tags from the input srt file first. I never like italic, bold or even colored subtitles anyway.
I also created a patch which removes all HTML tags from the input on the fly as processing. I may create a PR for this change later, but not today. Until then here is the fix. I changed the beginning of the srt_parser.py file to look like this:
Please note how the CLEANR regular expression is being used in line 16 now. The rest of the file is unchanged.
The text was updated successfully, but these errors were encountered: