Write TerminationBytes for id3.org spec compliance #33

ericmurmur · 2018-04-16T14:04:55Z

According to the spec http://id3.org/id3v2.4.0-structure,
TerminationBytes has to be in the text frame.
Some ID3 reader like Windows file explorer media file properties cannot handle such unicode text frames without proper TerminationBytes.

n10v · 2018-04-16T16:09:14Z

Hi! Thank you for your contribution. Can you please quote a text, where it's written? I can't find it.
The only information I found is here:

     <Header for 'Text information frame', ID: "T000" - "TZZZ",
     excluding "TXXX" described in 4.2.6.>
     Text encoding                $xx
     Information                  <text string(s) according to encoding>

And there is nothing written about termination bytes.

n10v · 2018-04-16T16:12:11Z

And also this (http://id3.org/id3v2.4.0-frames):

All text information frames supports multiple strings, 
stored as a null separated list, where null is 
reperesented by the termination code for the character encoding.

ericmurmur · 2018-04-16T17:17:30Z

There are some argues from google search about this.
However,

1. from 4 in http://id3.org/id3v2.4.0-structure, it states text string in frame with encoding byte prefixed and terminated with its corresponding termination bytes. It also show an empty string example.
1. From 4.2 in http://id3.org/id3v2.4.0-frames, it says multiple strings are stored as null separated list where null is represented by the termination code. It does not clearly state the status of the final termination bytes. ( so shall follow 1.?)
1. iTunes generate tags with termination bytes.
1. Windows File Explorer interprets wrong information if no such termination bytes in Unicode strings
1. A well known free mp3tag utility https://www.mp3tag.de/en/ since year 2003 generates tag with termination bytes

4.   ID3v2 frame overview
... ...

Frames that allow different types of text encoding contains a text
encoding description byte. Possible encodings:

$00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
$01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
strings in the same frame SHALL have the same byteorder.
Terminated with $00 00.
$02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
Terminated with $00 00.
$03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

Strings dependent on encoding are represented in frame descriptions
as <text string according to encoding>, or <full text string
according to encoding> if newlines are allowed. Any empty strings of
type $01 which are NULL-terminated may have the Unicode BOM followed
by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).

From 4.2 in http://id3.org/id3v2.4.0-frames

All text information frames supports multiple strings, 
stored as a null separated list, where null is 
represented by **the termination code** for the character encoding.

n10v · 2018-04-16T18:57:27Z

"it states text string in frame with encoding byte prefixed and terminated with its corresponding termination bytes"

It doesn't state it actually. It just shows the termination bytes for corresponding encodings, but nothing says that every encoded text should be terminated.

"It also show an empty string example"

It only shows an example for Unicode-16 with BOM and only which are NULL-terminated:

Any empty strings of type $01 which are NULL-terminated 
may have the Unicode BOM followed by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).

From 4.2 in http://id3.org/id3v2.4.0-frames,
it says multiple strings are stored as null separated list where null 
is represented by the termination code. 
It does not clearly state the status of the final termination bytes.

Yes, there is nothing said about using one string in text frame, so if you use only one string in text frame with no termination, you don't violate the standard.

4 - I would like to add the writing the termination bytes for text frames, just for compatibility with Windows File Explorer. But please open the new issue for it with detailed description.

3, 5 - We should not rely on how other programs work, but on standards.

n10v · 2018-04-16T19:00:29Z

It's really strange situation about the termination bytes of text frames, but just take a look on other frames like USLT or APIC at http://id3.org/id3v2.4.0-frames. You can see the explicit indication about the termination bytes at the end:

... Description is a short description of the picture, represented as a terminated text string ...

     <Header for 'Attached picture', ID: "APIC">
     Text encoding      $xx
     MIME type          <text string> $00
     Picture type       $xx
     Description        <text string according to encoding> $00 (00) <----
     Picture data       <binary data>

... The 'Content descriptor' is a terminated string ...

     <Header for 'Unsynchronised lyrics/text transcription', ID: "USLT">
     Text encoding        $xx
     Language             $xx xx xx
     Content descriptor   <text string according to encoding> $00 (00) <----
     Lyrics/text          <full text string according to encoding>

In docs about text frames I don't see it:

     <Header for 'Text information frame', ID: "T000" - "TZZZ",
     excluding "TXXX" described in 4.2.6.>
     Text encoding                $xx
     Information                  <text string(s) according to encoding> <----

So I don't think WFE works according to standards.

ericmurmur · 2018-04-17T03:51:46Z

It's really not clear in spec. Hence compatibility will be good.

In addition, in sec 4, frame overview from http://id3.org/id3v2.4.0-structure, it lists

Terminated with $00

or

Terminated with $00 00

for possible encodings of text.

Doesn't it mean to prefix encoding byte and terminate the text with termination bytes?

Frames that allow different types of text encoding contains a text encoding description byte. Possible encodings:
$00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
$01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
strings in the same frame SHALL have the same byteorder.
Terminated with $00 00.
$02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
Terminated with $00 00.
$03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

n10v · 2018-04-17T06:03:27Z

Doesn't it mean to prefix encoding byte and terminate the text with termination bytes?
I think, it doesn't mean so. It just means that if it should be terminated, it terminated with corresponding termination bytes.

n10v · 2018-04-17T06:04:37Z

@ericmurmur Please open a new issue with detailed description that WFE doesn't correctly read text frames, which are set with id3v2 library.

ericmurmur · 2018-04-17T08:03:42Z

It just means that if it should be terminated, it terminated with corresponding termination bytes.

Yes. So it means termination bytes shall be there for text string, doesn't it? And since Golang does not include termination bytes for string like c/c++, it shall be added, isn't it? In c/c++ library, termination bytes are already in their string buffer.

n10v · 2018-04-17T08:22:41Z

Yes. So it means termination bytes shall be there for text string, doesn't it? 
And since Golang does not include termination bytes for string like c/c++, 
it shall be added, isn't it? In c/c++ library, termination bytes are already in their string buffer.

I don't think so. There is no explicit information in standard that I must terminate strings in every text frame.

ericmurmur · 2018-04-17T11:37:12Z

Is the encoding byte added? Yes, right? In the standard, it lists all 4 encoding bytes. In every case, it has stated [Terminated with $00] or [Terminated with $00 00]. It is NOT optional.
So at least, it shall be interpreted as for those started with encoding byte, it shall be terminated with $00 or $00 00. Or where do you suggest to implement the sentence [Terminated with $00] or [Terminated with $00 00]?
In addition, according to 4.2 in http://id3.org/id3v2.4.0-frames,

Information <text string(s) according to encoding>

Doesn't it mean to follow the encoding descriptions in http://id3.org/id3v2.4.0-structure, which has stated [[Terminated with $00] or [Terminated with $00 00] in all encoding cases?

n10v · 2020-02-12T20:06:16Z

Hi @ericmurmur! I'm very sorry for so late answer, I had a lot of to do.

I think, you were right in this discussion. Many sites and apps terminate text frames with $00 (00), although it's not clearly stated in spec. If you'll update tests, I would like to merge this PR :)

Just replace following tests:
parse_test.go

// https://github.com/bogem/id3v2/issues/13.
// https://github.com/bogem/id3v2/commit/3845103da5b1698289b82a90f5d2559b770bd996
func TestParseV3UnsafeSize(t *testing.T) {
	t.Parallel()

	buf := new(bytes.Buffer)
	title := strings.Repeat("A", 254)

	tag := NewEmptyTag()
	tag.SetVersion(3)
	tag.SetTitle(title)
	if _, err := tag.WriteTo(buf); err != nil {
		t.Fatalf("Error while writing tag: %v", err)
	}

	titleFrameHeader := buf.Bytes()[tagHeaderSize : tagHeaderSize+frameHeaderSize]
	expected := []byte{0, 0, 1, 0}
	got := titleFrameHeader[4:8]
	if !bytes.Equal(got, expected) {
		t.Fatalf("Expected %v, got %v", expected, got)
	}

	parsedTag, err := ParseReader(buf, Options{Parse: true})
	if err != nil {
		t.Fatalf("Error while parsing tag: %v", err)
	}
	if parsedTag.Title() != title {
		t.Fatalf("Titles are not equal: len(parsedTag.Title()) == %v, len(title) == %v", len(parsedTag.Title()), len(title))
	}
}

tag_test.go

	framesSize    = 211948

n10v · 2020-02-12T20:07:01Z

And also please push new test.mp3 :)

n10v · 2020-03-03T06:55:57Z

Closing in favour of #52

Write TerminationBytes for id3.org spec compliance

256eb2a

n10v mentioned this pull request Feb 12, 2020

null byte in tags #41

Closed

n10v mentioned this pull request Mar 3, 2020

Write termination bytes for text frames #52

Merged

n10v closed this Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write TerminationBytes for id3.org spec compliance #33

Write TerminationBytes for id3.org spec compliance #33

ericmurmur commented Apr 16, 2018

n10v commented Apr 16, 2018

n10v commented Apr 16, 2018

ericmurmur commented Apr 16, 2018

n10v commented Apr 16, 2018 •

edited

n10v commented Apr 16, 2018 •

edited

ericmurmur commented Apr 17, 2018

n10v commented Apr 17, 2018 •

edited

n10v commented Apr 17, 2018

ericmurmur commented Apr 17, 2018 •

edited

n10v commented Apr 17, 2018 •

edited

ericmurmur commented Apr 17, 2018 •

edited

n10v commented Feb 12, 2020 •

edited

n10v commented Feb 12, 2020 •

edited

n10v commented Mar 3, 2020 •

edited

Write TerminationBytes for id3.org spec compliance #33

Write TerminationBytes for id3.org spec compliance #33

Conversation

ericmurmur commented Apr 16, 2018

n10v commented Apr 16, 2018

n10v commented Apr 16, 2018

ericmurmur commented Apr 16, 2018

n10v commented Apr 16, 2018 • edited

n10v commented Apr 16, 2018 • edited

ericmurmur commented Apr 17, 2018

n10v commented Apr 17, 2018 • edited

n10v commented Apr 17, 2018

ericmurmur commented Apr 17, 2018 • edited

n10v commented Apr 17, 2018 • edited

ericmurmur commented Apr 17, 2018 • edited

n10v commented Feb 12, 2020 • edited

n10v commented Feb 12, 2020 • edited

n10v commented Mar 3, 2020 • edited

n10v commented Apr 16, 2018 •

edited

n10v commented Apr 16, 2018 •

edited

n10v commented Apr 17, 2018 •

edited

ericmurmur commented Apr 17, 2018 •

edited

n10v commented Apr 17, 2018 •

edited

ericmurmur commented Apr 17, 2018 •

edited

n10v commented Feb 12, 2020 •

edited

n10v commented Feb 12, 2020 •

edited

n10v commented Mar 3, 2020 •

edited