Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"SubripDecoder" class throws an Exception when an .srt file omits HOURS from its timestamps (ex: 00:00,000) #7122

Closed
warren-bank opened this issue Mar 22, 2020 · 4 comments
Assignees
Labels

Comments

@warren-bank
Copy link

warren-bank commented Mar 22, 2020

  • in SubripDecoder
    • correctly: the regex SUBRIP_TIMECODE makes this field optional
    • incorrectly: the function parseTimecode blindly assumes this field contains a non-null String value that can be parsed to a Long

minimal code to reproduces the problem:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Main {
  public static void main(String[] args) {
    run_test("00:00,000 --> 00:01,000");
  }

  private static final String SUBRIP_TIMECODE = "(?:(\\d+):)?(\\d+):(\\d+),(\\d+)";
  private static final Pattern SUBRIP_TIMING_LINE = Pattern.compile("\\s*(" + SUBRIP_TIMECODE + ")\\s*-->\\s*(" + SUBRIP_TIMECODE + ")\\s*");

  private static void run_test(String currentLine) {
    Matcher matcher = SUBRIP_TIMING_LINE.matcher(currentLine);
    if (matcher.matches()) {
      long start = parseTimecode(matcher, /* groupOffset= */ 1);
      long end   = parseTimecode(matcher, /* groupOffset= */ 6);

      System.out.println("start: " + start);
      System.out.println("end:   " + end);
    }
    else {
      System.out.println("no match");
    }
  }

  private static long parseTimecode(Matcher matcher, int groupOffset) {
    long timestampMs = Long.parseLong(matcher.group(groupOffset + 1)) * 60 * 60 * 1000;
    timestampMs += Long.parseLong(matcher.group(groupOffset + 2)) * 60 * 1000;
    timestampMs += Long.parseLong(matcher.group(groupOffset + 3)) * 1000;
    timestampMs += Long.parseLong(matcher.group(groupOffset + 4));
    return timestampMs * 1000;
  }
}

output (stderr):

java.lang.NumberFormatException: null
  at java.base/java.lang.Long.parseLong

fixed:

  private static long parseTimecode(Matcher matcher, int groupOffset) {
    long timestampMs = 0;
    String groupVal;

    // HOURS field is optional
    groupVal = matcher.group(groupOffset + 1);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal) * 60 * 60 * 1000;

    // MINUTES field is required
    groupVal = matcher.group(groupOffset + 2);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal) * 60 * 1000;

    // SECONDS field is required
    groupVal = matcher.group(groupOffset + 3);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal) * 1000;

    // MILLISECONDS field is required
    groupVal = matcher.group(groupOffset + 4);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal);

    // convert timecode from MILLISECONDS to MICROSECONDS
    return timestampMs * 1000;
  }

output (stdout):

start: 0
end:   1000000
warren-bank added a commit to warren-bank/Android-ExoPlayer-AirPlay-Receiver that referenced this issue Mar 22, 2020
workaround for a problem with the ExoPlayer .srt file parser:
  google/ExoPlayer#7122
@icbaker icbaker self-assigned this Mar 23, 2020
@icbaker
Copy link
Collaborator

icbaker commented Mar 23, 2020

Thanks for the report!

Do you have a reference to the SRT spec that states the hours are optional, but the other time parameters are all required?

As far as I can tell it's not really formally specified anywhere, but it does seem like a bug that we have an optional group in the regex that is then assumed to be non-null. Since it's kind of a loose format, I'll make the hours properly optional.

@warren-bank
Copy link
Author

I agree; I couldn't find any formal spec anywhere either.

Since it is loosely specified, .srt files in the wild appear to feel free to ignore unwanted fields. For example, here is a similar discussion about whether or not the MILLISECONDS field could be made optional.. and that project concluded: "why not?".

I would propose that while you're touching this decoder:

  • optional fields: hours, milliseconds
  • required fields: minutes, seconds
  • updated regex:
      private static final String SUBRIP_TIMECODE = "(?:(\\d+):)?(\\d+):(\\d+)(?:,(\\d+))?";

@warren-bank
Copy link
Author

warren-bank commented Mar 24, 2020

full example:

// https://repl.it/@WarrenBank/ExoPlayer-SubripDecoder

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Main {
  public static void main(String[] args) {
    run_test("00:00:00,000 --> 00:00:01,000");
    run_test("   00:00,000 -->    00:01,000");
    run_test("   00:00     -->    00:01    ");
  }

  private static final String SUBRIP_TIMECODE = "(?:(\\d+):)?(\\d+):(\\d+)(?:,(\\d+))?";
  private static final Pattern SUBRIP_TIMING_LINE = Pattern.compile("\\s*(" + SUBRIP_TIMECODE + ")\\s*-->\\s*(" + SUBRIP_TIMECODE + ")\\s*");

  private static void run_test(String currentLine) {
    Matcher matcher = SUBRIP_TIMING_LINE.matcher(currentLine);
    if (matcher.matches()) {
      long start = parseTimecode(matcher, /* groupOffset= */ 1);
      long end   = parseTimecode(matcher, /* groupOffset= */ 6);

      System.out.println("start: " + start);
      System.out.println("end:   " + end);
    }
    else {
      System.out.println("no match");
    }
  }

  private static long parseTimecode(Matcher matcher, int groupOffset) {
    long timestampMs = 0;
    String groupVal;

    // HOURS field is optional
    groupVal = matcher.group(groupOffset + 1);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal) * 60 * 60 * 1000;

    // MINUTES field is required
    groupVal = matcher.group(groupOffset + 2);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal) * 60 * 1000;

    // SECONDS field is required
    groupVal = matcher.group(groupOffset + 3);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal) * 1000;

    // MILLISECONDS field is optional
    groupVal = matcher.group(groupOffset + 4);
    if (groupVal != null)
      timestampMs += Long.parseLong(groupVal);

    // convert timecode from MILLISECONDS to MICROSECONDS
    return timestampMs * 1000;
  }
}

output (stdout):

start: 0
end:   1000000
start: 0
end:   1000000
start: 0
end:   1000000

ojw28 pushed a commit that referenced this issue Mar 25, 2020
Add a test for this case, and extend the existing tests to ensure the
hour is parsed when it's present.

issue:#7122
PiperOrigin-RevId: 302472213
@icbaker
Copy link
Collaborator

icbaker commented Mar 26, 2020

Making milliseconds optional makes sense - I'll send a follow-up change.

ojw28 pushed a commit that referenced this issue Mar 27, 2020
@ojw28 ojw28 closed this as completed Mar 27, 2020
ojw28 pushed a commit that referenced this issue Mar 30, 2020
Add a test for this case, and extend the existing tests to ensure the
hour is parsed when it's present.

issue:#7122
PiperOrigin-RevId: 302472213
ojw28 pushed a commit that referenced this issue Mar 30, 2020
@google google locked and limited conversation to collaborators May 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants