Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.9.x recording date fix #623

Open
wants to merge 2 commits into
base: 0.9.x
Choose a base branch
from

Conversation

TJ-59
Copy link

@TJ-59 TJ-59 commented Mar 10, 2024

Reason : This bug is happening on the recording_date in ID3_V2_3, which is still used by many people, due to compatibility with existing multimedia hardware/software, whereas windows 10/11 and the eyed3 lib tend to default to ID3_V2_4.
Changing the default version the lib uses is easy, but some problem may arise with some date tags not being treated correctly by the lib, or at least, not the way people expect it after years of compatibility and useage of "X---" (experimental) tags.

Bug description :

import eyed3
filetag = eyed3.load("C:\somefolder\SomeNotSoRecentmp3.mp3")
filetag.tag.version
#(2, 3, 0)
filetag.tag.recording_date = "2017-05-06"
#Invalid date text: 0605
filetag.tag.recording_date
#Invalid v2.3 TYER, TDAT, or TIME frame: Invalid date string: 2017--
#<eyed3.core.Date object at 0x000001F4560E8490>
filetag.tag.recording_date = "2017-11-06"
#Invalid date text: 0611
filetag.tag.recording_date = "2017-05-16"
#No error
filetag.tag.recording_date = "2017-05-16T07:08"
#Invalid date text: 0708
filetag.tag.recording_date = "2017-05-16T17:08"
#No error
filetag.tag.recording_date = "2017-05-16T07:18"
#Invalid date text: 0718

L.8 : Invalid date string: 2017-- : the dashes are supposed to separate year-month-day, but there is missing content
L.10 and L.12 illustrate that the problems happens in the MM-DD part when DAYS start with a zero (as we'll see, it is stored as DDMM)
L.14,16,18 show that the same problem happens to the HOURS part, stored as HHMM, if the hours characters start with a zero.

When setting the recording date, the actual date is stored differently depending on the available frames, that is, it depends on the ID3 tag version.
The latest, ID3 v2.4, uses the 'TDRC' frame which seems to hold the whole timestamp (as in 'YYYY-MM-DD-hh-mm-ss' ), whereas the ID3 v2.3 holds the values in different frames :
'TYER' for the recording year,
'TDAT' for the day and month of recording,
and 'TIME' for the hours and minutes of recording.
These 3 frames hold only 4 bytes of value each, unlike 'TDRC'.
These informations hold significant values for people using the ID3 tag system for archiving purpose, or anybody who prefers infos to be accurate.
Some solution to this problem was given as a workaround in issue #517 by user iarp, mainly to avoid errors,
but with the limitation that recording_date would be stored incomplete, either by getting rid of the 'TDAT'/'TIME' part(s) or using the year value as a placeholder (which "works" because the year does NOT start with a zero).

Why does this happen ? (Explanation) :

So, 3 files are concerned to understand what happens :

.\eyed3\id3\tag.py
.\eyed3\id3\frames.py
.\eyed3\core.py

I) eyed3\id3\tag.py

class Tag(core.Tag):
    (...)
    def _getRecordingDate(self):
        if self.version == ID3_V2_3:
            return self._getV23RecordingDate()
        else:
            return self._getDate(b"TDRC")

    def _setRecordingDate(self, date):
        if date in (None, ""):
            for fid in (b"TDRC", b"TYER", b"TDAT", b"TIME"):
                self._setDate(fid, None)
        elif self.version == ID3_V2_4:
            self._setDate(b"TDRC", date)
        else:
            if not isinstance(date, core.Date):
                date = core.Date.parse(date)
            self._setDate(b"TYER", str(date.year))
            if None not in (date.month, date.day):
                date_str = "%s%s" % (str(date.day).rjust(2, "0"),
                                     str(date.month).rjust(2, "0"))
                self._setDate(b"TDAT", date_str)
            if None not in (date.hour, date.minute):
                date_str = "%s%s" % (str(date.hour).rjust(2, "0"),
                                     str(date.minute).rjust(2, "0"))
                self._setDate(b"TIME", date_str)

    recording_date = property(_getRecordingDate, _setRecordingDate)
    """The date of the recording. Many applications use this for release date
    regardless of the fact that this value is rarely known, and release dates
    are more correct."""

    def _getV23RecordingDate(self):
        # v2.3 TYER (yyyy), TDAT (DDMM), TIME (HHmm)
        date = None
        try:
            date_str = b""
            if b"TYER" in self.frame_set:
                date_str = self.frame_set[b"TYER"][0].text.encode("latin1")
                date = core.Date.parse(date_str)
            if b"TDAT" in self.frame_set:
                text = self.frame_set[b"TDAT"][0].text.encode("latin1")
                date_str += b"-%s-%s" % (text[2:], text[:2])
                date = core.Date.parse(date_str)
            if b"TIME" in self.frame_set:
                text = self.frame_set[b"TIME"][0].text.encode("latin1")
                date_str += b"T%s:%s" % (text[:2], text[2:])
                date = core.Date.parse(date_str)
        except ValueError as ex:
            log.warning("Invalid v2.3 TYER, TDAT, or TIME frame: %s" % ex)

        return date

    (...)
    def _setDate(self, fid, date):
        def removeFrame(frame_id):
            (...)
        def setFrame(frame_id, date_val):
            (...)
        # Special casing the conversion to DATE objects cuz TDAT and TIME won't
        if fid not in (b"TDAT", b"TIME"):
            # Convert to ISO format which is what FrameSet wants.
            date_type = type(date)
            if date_type is int:
                # The integer year
                date = core.Date(date)
            elif date_type is str:
                date = core.Date.parse(date)
            elif not isinstance(date, core.Date):
                raise TypeError(f"Invalid type: {date_type}")

        if fid == b"TORY":
            setFrame(fid, date.year)
            if date.month:
                setFrame(b"XDOR", date)
            else:
                removeFrame(b"XDOR")
        else:
            setFrame(fid, date)

First, we see that recording_date is made as a property, using _getRecordingDate and _setRecordingDate as getter and setter.
Second, _getRecordingDate() is split in 2 : for v2.3 or older it is using _getV23RecordingDate(),
and for the rest (v2.4+) it does its own thing ( _getDate() ) which has nothing to do with our problem.
Third, _setRecordingDate() has 3 branches :

  1. Erasing the frames' contents when None or an empty string is supplied,
  2. Use _setDate() with 'TDRC' if we're using v2.4,
  3. For every other case, which means v2.3 and lower, with a "not empty string/not None" supplied, we make sure to have a date object, then we got a chain of data splitting, picking the year from the date, and giving it to 'TYER',
    picking day then month, both right-justified with a leading 0 if needed to match a length of 2, formatted into the %s%s string, (which, for the example iso date of 2017-05-06T07:08:09, would give 0605. You might have a hunch of where this is going...) and feeding it to _setDate() for 'TDAT', and then doing about the same thing with hours and minutes for 'TIME'.

This last 'else' in the code above is what sends the designated date into setFrame along with the concerned FrameID;
NOTE that 'TDAT' and 'TIME' are treated differently, in that the date value (in our example, "0605") is not replaced by a date object, but is left as a 4 characters string.

Let's look at setFrame's definition :

    def setFrame(frame_id, date_val):
        if frame_id in self.frame_set:
            self.frame_set[frame_id][0].date = date_val
        else:
            self.frame_set[frame_id] = frames.DateFrame(frame_id, str(date_val))

Whatever happens just above, the string or date object is assigned to the date property of the frame, or the frame itself is created if needed.
self.frame_set is a dictionnary whose name is 1st created during init and loaded during the self.clear(),
as seen below:

class Tag(core.Tag):
    def __init__(self, version=ID3_DEFAULT_VERSION, **kwargs):
        (...)
        self.frame_set = None
	(...)
        self.clear(version=version)
        super().__init__(**kwargs)

    def clear(self, *, version=ID3_DEFAULT_VERSION):
        (...)
        self.frame_set = frames.FrameSet()
        (...)

and is either already containing the required frame, or the frame is added by requiring a specific type of frame for the corresponding ID, (depending on if you start from scratch or if you're loading a file), from the eyed3\id3\frames module
(here, a DateFrame, which is a subclass of frames.TextFrame).

II) eyed3\id3\frames.py

(...)
DEPRECATED_DATE_FIDS = [b"TDAT", b"TYER", b"TIME", b"TORY", b"TRDA",
                        # Nonstandard v2.3 only
                        b"XDOR",
                       ]
(...)
DATE_FIDS = [b"TDEN", b"TDOR", b"TDRC", b"TDRL", b"TDTG"]

Here, we see the approved frame types,

class DateFrame(TextFrame):
    def __init__(self, id, date=""):
        if id not in DATE_FIDS and id not in DEPRECATED_DATE_FIDS:
            raise ValueError(f"Invalid date frame ID: {id}")
        super().__init__(id, text=str(date))
        self.date = self.text
        self.encoding = LATIN1_ENCODING

The "if" statement hereabove used to be an assert not so long ago...
self.date is assigned to the value of self.text (inherited from TextFrame)
(a property, which refers to the private _text variable, itself being the parameter provided after the frame ID, aka "str(date_val)" earlier).

    @property
    def date(self):
        return core.Date.parse(self.text.encode("latin1")) if self.text else None

which uses core.Date.parse() to get (as in "getter") the date object, or None if empty;
But the date in our case is being SET, hence the following being used :

    @date.setter
    def date(self, date):
        """Set value with a either an ISO 8601 date string or a eyed3.core.Date object."""
        if not date:
            self.text = ""
            return

        try:
            if type(date) is str:
                date = core.Date.parse(date)
            elif type(date) is int:
                # Date is year
                date = core.Date(date)
            elif not isinstance(date, core.Date):
                raise TypeError("str, int, or eyed3.core.Date type expected")
        except ValueError:
            log.warning(f"Invalid date text: {date}")
            self.text = ""
            return

        self.text = str(date)

From this, we can see the date object "getter" ALWAYS parse the string to form the date object,
and conversely, the "setter" always check it is valid by 1st creating a date object and then obtaining the string from it,
which means date object probably have a str() function, to do the second part which is saving that string to self.text.
(not shown here, but it is a pretty direct get/set to self._text, only decorated with @requireUnicode(1), my guess is "to avoid decoding errors")

If you remember the part about 'TDAT' and 'TIME' being treated differently, you saw that those strings (like "0605") were sent STRAIGHT in there.
Welp, too bad, they are parsed into a date object anyway...Anyone wondering how a string with a leading zero will be met ?

DO NOTE that just after the except block above, and before the "self.text = str(date)" line, is where anything else could be added to recognize extra date formats, and modification could be done to make sure the string complies with what is expected of that frame (for example, length-wise, or to comply with available formats).

Now, we need to have a good look at the core part...

III) eyed3\core.py

Something that is important later, the list of timestamp formats :

class Date:
    (...)
    TIME_STAMP_FORMATS = ["%Y",
                          "%Y-%m",
                          "%Y-%m-%d",
                          "%Y-%m-%dT%H",
                          "%Y-%m-%dT%H:%M",
                          "%Y-%m-%dT%H:%M:%S",
                          # The following end with 'Z' signally time is UTC
                          "%Y-%m-%dT%HZ",
                          "%Y-%m-%dT%H:%MZ",
                          "%Y-%m-%dT%H:%M:%SZ",
                          # The following are wrong per the specs, but ...
                          "%Y-%m-%d %H:%M:%S",
                          "%Y-00-00",
                          "%Y%m%d",
                          ]
    """Valid time stamp formats per ISO 8601 and used by `strptime`."""

A brief look shows that this Date class doesn't use it's own "__new__()", which might be useful later if we want to check something before actually creating the object.

def __init__(self, year, month=None, day=None,
                 hour=None, minute=None, second=None):
        # Validate with datetime
        from datetime import datetime
        _ = datetime(year, month if month is not None else 1,
                     day if day is not None else 1,
                     hour if hour is not None else 0,
                     minute if minute is not None else 0,
                     second if second is not None else 0)

        self._year = year
        self._month = month
        self._day = day
        self._hour = hour
        self._minute = minute
        self._second = second

        # Python's date classes do a lot more date validation than does not
        # need to be duplicated here.  Validate it
        _ = Date._validateFormat(str(self))

So, we see that "year" is a mandatory parameter for the treatment (which is only half a solution, given ISO8601 is allowing some other cases like "not giving the year but keeping the month-day or hour:minute parts", depending on which version is used), and the parameters are passed to private variables, each having their own property accessors.
Also, the parameters are previously passed to a datetime.datetime just to check if the date is valid (remember, there are like 10 days that didn't happen when switching to the current calendar in the 1580's to correct an offset in leap years from the Julian calendar... Not that we'll ever find some centuries old MP3 files, but some people might use the date tags by using ID3 tags on another format, maybe some scientific classification involving dating things...),
and after the private variables are set, the _validateFormat() is called.
Note that datetime.datetime REQUIRES a year to be given (actually, year, month, day is required, the rest is optional, but provided with default 0 by the above code).

    @staticmethod
    def _validateFormat(s):
        pdate, fmt = None, None
        for fmt in Date.TIME_STAMP_FORMATS:
            try:
                pdate = time.strptime(s, fmt)
                break
            except ValueError:
                # date string did not match format.
                continue

        if pdate is None:
            raise ValueError(f"Invalid date string: {s}")

        assert pdate
        return pdate, fmt

So, we're going through every and each date formats to try to obtain "pdate" which is a time.struct_time, breaking out of the loop if we find a correct format, or getting a ValueError and continuing the loop to the next format.
If none of the formats gives us a pdate, we raise a ValueError, else, we return the pdate and the actual format that validated it.
NOTE that time.strptime works depending on the format given (fmt) and as such, could perfectly give you a time.struct_time without you providing a year, as long as the format acknowledge it ; in such case, the year is defaulted to "1900".
If you've followed through, you should see that this time.strptime() would use "0605" as input string, and with the list of timestamp formats being looped through, you might have an idea of which format is going to be validating that string :
"%Y", the very first, is expecting up to 4 numbers, and we just gave it 4 numbers.
Even more problematic, this is being interpreted as "year 605", and that leading zero isn't found in the 'tm_year' integer value, obviously, which leads to even more problems down the road.

Remember that in frames, Date.parse was called (and as you can see below, it ends up running a Date.__init__() anyway, using the parameters obtained from the time.struct_time named pdate, resulting of _validateFormats() ):

    @staticmethod
    def parse(s):
        """Parses date strings that conform to ISO-8601."""
        if not isinstance(s, str):
            s = s.decode("ascii")
        s = s.strip('\x00')

        pdate, fmt = Date._validateFormat(s)

        # Here is the difference with Python date/datetime objects, some
        # of the members can be None
        kwargs = {}
        if "%m" in fmt:
            kwargs["month"] = pdate.tm_mon
        if "%d" in fmt:
            kwargs["day"] = pdate.tm_mday
        if "%H" in fmt:
            kwargs["hour"] = pdate.tm_hour
        if "%M" in fmt:
            kwargs["minute"] = pdate.tm_min
        if "%S" in fmt:
            kwargs["second"] = pdate.tm_sec

        return Date(pdate.tm_year, **kwargs)

which is somewhat the opposite of the __str__() method :

    def __str__(self):
        """Returns date strings that conform to ISO-8601.
        The returned string will be no larger than 17 characters."""
        s = "%d" % self.year
        if self.month:
            s += "-%s" % str(self.month).rjust(2, '0')
            if self.day:
                s += "-%s" % str(self.day).rjust(2, '0')
                if self.hour is not None:
                    s += "T%s" % str(self.hour).rjust(2, '0')
                    if self.minute is not None:
                        s += ":%s" % str(self.minute).rjust(2, '0')
                        if self.second is not None:
                            s += ":%s" % str(self.second).rjust(2, '0')
        return s

NOTE that the parse returns a date with the year parameter already filled on the return line.
The __str__ method also assume the year is mandatory, but we DO know that some of the ID3 frames are specific about "date" (as in month and day) and "time" (hour and minute), like "TDAT" and "TIME".

Reminder, the recording_date is saved using up to 3 frames (excerpt from ID3 docs):

TYER
The 'Year' frame is a numeric string with a year of the recording.
This frames is always four characters long (until the year 10000).

TDAT
The 'Date' frame is a numeric string in the DDMM format containing
the date for the recording. This field is always four characters
long.

TIME
The 'Time' frame is a numeric string in the HHMM format containing
the time for the recording. This field is always four characters
long.

Also, we can see the above code is only able to form standard ISO8601 strings, up to 17 chars supposedly, which means it is actually supposed to go up to "YYYY-mm-ddTHH:MMZ" at best, Z being "Zulu time" aka UTC.
Which is also reinforced by the above excerpt of ID3 docs stating that 'TIME' only consider HHMM for his 4 chars, no "SS" for seconds.

So, there is a bunch of small modifications to do to get it all working, which includes adding new formats, modified Date objects functions that work with such formats, modified id3.tag functions to use those specific formats when dealing with those special parts ('TDAT' and 'TIME'), and modified DateFrame functions to only pick the relevant 4 chars out of these new formats before it can be saved to .date -> .text ->_text .

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Modifications to fix the bug :

First, we need to agree on 2 new formats we'll add to eyed3\core.py's class Date's TIME_STAMP_FORMATS.
We saw earlier that a group of up to 4 numbers is impossible, due to the very 1st format being "%Y",
So we need to differentiate somehow; considering this is for a date ('TDAT') and a time ('TIME'),
I suggest "D%d-%m" and "T%H:%M", which should be self explanatory.
So let's add them to the end of TIME_STAMP_FORMATS :

class Date:
    (...)
    TIME_STAMP_FORMATS = ["%Y",
                          "%Y-%m",
                          "%Y-%m-%d",
                          "%Y-%m-%dT%H",
                          "%Y-%m-%dT%H:%M",
                          "%Y-%m-%dT%H:%M:%S",
                          # The following end with 'Z' signally time is UTC
                          "%Y-%m-%dT%HZ",
                          "%Y-%m-%dT%H:%MZ",
                          "%Y-%m-%dT%H:%M:%SZ",
                          # The following are wrong per the specs, but ...
                          "%Y-%m-%d %H:%M:%S",
                          "%Y-00-00",
                          "%Y%m%d",
                          "D%d-%m",
                          "T%H:%M",
                          ]

This means we might get some "dates" that do not include a year, and this asks for several tweaks.
Second, to make sure the init is not called on empty parameters, we can use the __new__() classmethod
that isn't used in this class, so let's add, in-between TIME_STAMP_FORMATS and __init__() :

    @classmethod
    def __new__(cls, *args, **kwargs):
        if ([arg for arg in args[1:] if arg is not None]) or ([kwarg for kwarg in kwargs.values() if kwarg is not None]):
            return super().__new__(cls)
        else :
            return

Yeah, it's not exactly refined but it makes good use of the list comprehension. Feel free to adjust as needed.

Third, we need the __init__() to acknowledge the non-mandatory nature of the year parameter,
which starts with a year=None in the arguments, and a fix for the datetime() call :

    def __init__(self, year=None, month=None, day=None,
                 hour=None, minute=None, second=None):
        # Validate with datetime
        from datetime import datetime
        _ = datetime(year if year is not None else 1899,
                     month if month is not None else 1,
                     day if day is not None else 1,
                     hour if hour is not None else 0,
                     minute if minute is not None else 0,
                     second if second is not None else 0)

        self._year = year
        self._month = month
        self._day = day
        self._hour = hour
        self._minute = minute
        self._second = second

        # Python's date classes do a lot more date validation than does not
        # need to be duplicated here.  Validate it
        _ = Date._validateFormat(str(self))                           # noqa

I arbitrarily defaulted the year to 1899, similar to the "1900" default of time.struct_time,
allowing to differentiate in case of errors, and using an odd year to keep the exact same reaction from datetime.datetime
(1900, while "even" and "multiple of 4", is NOT a leap year, since "not divisible by 400"),
this exact problem is what caused the 10 days shift in 1582 : https://en.wikipedia.org/wiki/Gregorian_calendar .

This means time.strptime (in _validateFormat) could give you a "ValueError: day is out of range for month" if you were
to specify time.strptime("1899-02-29", "%Y-%m-%d"), but not with time.strptime("D29-02", "D%d-%m"), since it's own "default year" (1900) is not held against you for error.
datetime.datetime, on the other hand will get you the very same "ValueError: day is out of range for month" if the year
provided is not a leap year.
You can test it in python easily :

import time
import datetime
dd1 = datetime.datetime(1899,2,29)
#Traceback (most recent call last):
#  File "<pyshell#72>", line 1, in <module>
#    dd1 = datetime.datetime(1899,2,29)
#ValueError: day is out of range for month
dd2 = datetime.datetime(1900,2,29)
#Traceback (most recent call last):
#  File "<pyshell#73>", line 1, in <module>
#    dd2 = datetime.datetime(1900,2,29)
#ValueError: day is out of range for month
dd3 = datetime.datetime(1896,2,29)
#No error
dd3
#datetime.datetime(1896, 2, 29, 0, 0)
ts1 = time.strptime("D29-02", "D%d-%m")
#No error
ts1
#time.struct_time(tm_year=1900, tm_mon=2, tm_mday=29, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=60, tm_isdst=-1)
ts2 = time.strptime("1899-02-29", "%Y-%m-%d")
#Traceback (most recent call last):
#  File "<pyshell#70>", line 1, in <module>
#    ts2 = time.strptime("1899-02-29", "%Y-%m-%d")
#  File "C:\Python311\Lib\_strptime.py", line 561, in _strptime_time
#    tt = _strptime(data_string, format)[0]
#  File "C:\Python311\Lib\_strptime.py", line 533, in _strptime
#    julian = datetime_date(year, month, day).toordinal() - \
#ValueError: day is out of range for month
ts3 = time.strptime("1896-02-29", "%Y-%m-%d")
#No error
ts3
#time.struct_time(tm_year=1896, tm_mon=2, tm_mday=29, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=60, tm_isdst=-1)

This error was already there, ready to pounce on unsuspecting people with a recording_date set to February 29th...
The easy fix would be to use a leap year as a default, like 1896, or "anywhen" that is both a leap year AND unmistakable.
Keep in mind the modern "sound" recording started in the 1800's, but scientists found a clay record by serendipity on some 4000BCE, about 6000 years old, pottery on which the potter made a grove spiral around the whole item (for decoration purpose) with a setup pretty similar to what a phonautograph would do in the mid-1800's. Laser scans and sound reconstruction allowed the scientist to reproduce the sound without altering the item, and they were floored to hear a child's voice and their father's response.

I'll leave it to Travis Shirk to decide on a correct default date, since I don't know what other "date" properties could be
having a similar problem with datetime.datetime in eyeD3.

Then, we need to modify the parse to make the year optional :

    @staticmethod
    def parse(s):
        """Parses date strings that conform to ISO-8601."""
        if not isinstance(s, str):
            s = s.decode("ascii")
        s = s.strip('\x00')

        pdate, fmt = Date._validateFormat(s)

        # Here is the difference with Python date/datetime objects, some
        # of the members can be None
        kwargs = {}
        if "%Y" in fmt:
            kwargs["year"] = pdate.tm_year
        if "%m" in fmt:
            kwargs["month"] = pdate.tm_mon
        if "%d" in fmt:
            kwargs["day"] = pdate.tm_mday
        if "%H" in fmt:
            kwargs["hour"] = pdate.tm_hour
        if "%M" in fmt:
            kwargs["minute"] = pdate.tm_min
        if "%S" in fmt:
            kwargs["second"] = pdate.tm_sec

        return Date(**kwargs)

Of course, we need to deal with __str__() too, since it's parse's opposite :

    def __str__(self):
        """Returns date strings that conform to ISO-8601.
        The returned string will be no larger than 17 characters."""
        s = "" #the string
        c = "" #the separator character
        if self.year is not None : #branch 1, aka "there is a year, maybe more"
            s += "%d" % self.year
            c = "-"
            if self.month is not None : #there is a month
                s += c + "%s" % str(self.month).rjust(2, '0')
                if self.day is not None: #there is a day
                    s += c + "%s" % str(self.day).rjust(2, '0')
        else : #branch 2, aka "we start without a year" aka "D%d-%m" format
            c = "D"
            if (self.day is not None) and (self.month is not None) : #checking both
                s += c + "%s" % str(self.day).rjust(2, '0')
                c = "-"
                s += c + "%s" % str(self.month).rjust(2, '0')
                return s #We send a "Ddd-mm" string for 'TDAT'
        #Here is the 'TIME' part, which starts or continues the string from branch 1
        c = "T"
        if self.hour is not None:
            s += c + "%s" % str(self.hour).rjust(2, '0')
        c = ":"
        if self.minute is not None:
            s += c + "%s" % str(self.minute).rjust(2, '0')
##        if self.second is not None:  #Are seconds really needed, or at least used ?
##            s += c + "%s" % str(self.second).rjust(2, '0')
        return s #We send either a YYYY-mm-ddTHH:MM, or just a THH:MM (for 'TIME') 

Again, For both parse and __str__, I left the seconds in the code just to be sure, but the max length is supposedly 17 chars :
YYYY-mm-ddTHH:MMZ is the maximum info that will be saved if that docstring is right.
So I commented out the part about seconds, in order to avoid the new "T%H:%M" format mismatching. (that, or we need to include it's variants, "T%H" and "T%H:%M:%S" ).

We're done with eyed3\core.py, let's look at the next one.

In eyed3\id3\frames.py :
Remember the spot at the end of explanation part II), where I said this was where "anything else could be added to recognize extra date formats".
We'll add a little bit of code to the date.setter and use an intermediate variable (basically splitting the final "self.text = str(date)" above and below the code, to avoid modifying it through property everytime) :

(...)
class DateFrame(TextFrame):
    def __init__(...)
    def parse(...)
    @property
    def date(...)

    @date.setter
    def date(self, date):
        """Set value with a either an ISO 8601 date string or a eyed3.core.Date object."""
        if not date:
            self.text = ""
            return

        try:
            if type(date) is str:
                date = core.Date.parse(date)
            elif type(date) is int:
                # Date is year
                date = core.Date(date)
            elif not isinstance(date, core.Date):
                raise TypeError("str, int, or eyed3.core.Date type expected")
        except ValueError:
            log.warning(f"Invalid date text: {date}")
            self.text = ""
            return

        str_date = str(date)
        if ((DorT := str_date[0]) in ("D","T")) and (len(str_date) <= 6): #maxlength 6, not using seconds, modify if needed
            str_date = str_date.replace(DorT, "", 1)
            str_date = str_date.replace("-" if DorT =="D" else ":", "")
        self.text = str_date

So, we inline define DorT, which is the 1st character of the string, and use the condition of it being either a "D" or a "T" (beware, case sensitive, "d" and "t" could be added to the list)
AND we check that the length is no more than 6 (assuming we're not getting strings with seconds in it, from the previous modifications and the 17 chars limit; TWEAK IT IF NEEDED).
We then manipulate the string to remove the first (and presumably the ONLY) occurence of that "either a D or a T" character that brought us here,
and then manipulate it again to remove the dash (if it was a "D") or else remove the colon (because "else" means "T").
This giving us a 4 characters string for the corresponding V2.3 frames.

We now only have eyed3\id3\tag.py to modify :
We have to make sure that when setting the recording date for "v2.3 or lesser", every frame ID in 'TYER','TDAT','TIME' receives the appropriate format, since this is the one operation dealing with those 3 frames, with each requiring it's own format.

class Tag(core.Tag):
    (...)
    def _setRecordingDate(self, date):
        if date in (None, ""):
            for fid in (b"TDRC", b"TYER", b"TDAT", b"TIME"):
                self._setDate(fid, None)
        elif self.version == ID3_V2_4:
            self._setDate(b"TDRC", date)
        else:
            if not isinstance(date, core.Date):
                date = core.Date.parse(date)
            self._setDate(b"TYER", str(date.year))
            if None not in (date.month, date.day):
                date_str = "D%s-%s" % (str(date.day).rjust(2, "0"),
                                     str(date.month).rjust(2, "0"))
                self._setDate(b"TDAT", date_str)
            if None not in (date.hour, date.minute):
                date_str = "T%s:%s" % (str(date.hour).rjust(2, "0"),
                                     str(date.minute).rjust(2, "0"))
                self._setDate(b"TIME", date_str)

As you can see, the modification here is minimal, switching the "%s%s" to "D%s-%s" or "T%s:%s" depending on the frame ID.
This should be enough to fix issue #517 .

As a side note, the _getV23RecordingDate() function does "date = core.Date.parse(date_str)" 3 times in a single call,
based on the updating of date_str after each frame is read; since it's a getter, and any of these operations failing would be causing a halt,
wouldn't it be the same, just faster, to wait after the 3 "ifs" are done to do a single operation ?
Basically turning this (original code) :

    def _getV23RecordingDate(self):
        # v2.3 TYER (yyyy), TDAT (DDMM), TIME (HHmm)
        date = None
        try:
            date_str = b""
            if b"TYER" in self.frame_set:
                date_str = self.frame_set[b"TYER"][0].text.encode("latin1")
                date = core.Date.parse(date_str)
            if b"TDAT" in self.frame_set:
                text = self.frame_set[b"TDAT"][0].text.encode("latin1")
                date_str += b"-%s-%s" % (text[2:], text[:2])
                date = core.Date.parse(date_str)
            if b"TIME" in self.frame_set:
                text = self.frame_set[b"TIME"][0].text.encode("latin1")
                date_str += b"T%s:%s" % (text[:2], text[2:])
                date = core.Date.parse(date_str)
        except ValueError as ex:
            log.warning("Invalid v2.3 TYER, TDAT, or TIME frame: %s" % ex)

        return date

To this :

    def _getV23RecordingDate(self):
        # v2.3 TYER (yyyy), TDAT (DDMM), TIME (HHmm)
        date = None
        try:
            date_str = b""
            if b"TYER" in self.frame_set:
                date_str = self.frame_set[b"TYER"][0].text.encode("latin1")
            if b"TDAT" in self.frame_set:
                text = self.frame_set[b"TDAT"][0].text.encode("latin1")
                date_str += b"-%s-%s" % (text[2:], text[:2])
            if b"TIME" in self.frame_set:
                text = self.frame_set[b"TIME"][0].text.encode("latin1")
                date_str += b"T%s:%s" % (text[:2], text[2:])
            date = core.Date.parse(date_str)
        except ValueError as ex:
            log.warning("Invalid v2.3 TYER, TDAT, or TIME frame: %s" % ex)

        return date

Or is there a specific merit to having these 3 parses done independently ?

core.py for 0.9.x-recording_date-fix branch.
id3/frames.py and id3/tag.py for 0.9.x-recording_date-fix branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant