Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reason : This bug is happening on the recording_date in ID3_V2_3, which is still used by many people, due to compatibility with existing multimedia hardware/software, whereas windows 10/11 and the eyed3 lib tend to default to ID3_V2_4.
Changing the default version the lib uses is easy, but some problem may arise with some date tags not being treated correctly by the lib, or at least, not the way people expect it after years of compatibility and useage of "X---" (experimental) tags.
Bug description :
L.8 :
Invalid date string: 2017--
: the dashes are supposed to separate year-month-day, but there is missing contentL.10 and L.12 illustrate that the problems happens in the MM-DD part when DAYS start with a zero (as we'll see, it is stored as DDMM)
L.14,16,18 show that the same problem happens to the HOURS part, stored as HHMM, if the hours characters start with a zero.
When setting the recording date, the actual date is stored differently depending on the available frames, that is, it depends on the ID3 tag version.
The latest, ID3 v2.4, uses the 'TDRC' frame which seems to hold the whole timestamp (as in 'YYYY-MM-DD-hh-mm-ss' ), whereas the ID3 v2.3 holds the values in different frames :
'TYER' for the recording year,
'TDAT' for the day and month of recording,
and 'TIME' for the hours and minutes of recording.
These 3 frames hold only 4 bytes of value each, unlike 'TDRC'.
These informations hold significant values for people using the ID3 tag system for archiving purpose, or anybody who prefers infos to be accurate.
Some solution to this problem was given as a workaround in issue #517 by user iarp, mainly to avoid errors,
but with the limitation that recording_date would be stored incomplete, either by getting rid of the 'TDAT'/'TIME' part(s) or using the year value as a placeholder (which "works" because the year does NOT start with a zero).
Why does this happen ? (Explanation) :
So, 3 files are concerned to understand what happens :
.\eyed3\id3\tag.py
.\eyed3\id3\frames.py
.\eyed3\core.py
I) eyed3\id3\tag.py
First, we see that recording_date is made as a property, using
_getRecordingDate
and_setRecordingDate
as getter and setter.Second,
_getRecordingDate()
is split in 2 : for v2.3 or older it is using_getV23RecordingDate()
,and for the rest (v2.4+) it does its own thing (
_getDate()
) which has nothing to do with our problem.Third,
_setRecordingDate()
has 3 branches :None
or an empty string is supplied,_setDate()
with 'TDRC' if we're using v2.4,picking day then month, both right-justified with a leading 0 if needed to match a length of 2, formatted into the
%s%s
string, (which, for the example iso date of 2017-05-06T07:08:09, would give0605
. You might have a hunch of where this is going...) and feeding it to_setDate()
for 'TDAT', and then doing about the same thing with hours and minutes for 'TIME'.This last '
else
' in the code above is what sends the designated date intosetFrame
along with the concerned FrameID;NOTE that 'TDAT' and 'TIME' are treated differently, in that the date value (in our example, "0605") is not replaced by a date object, but is left as a 4 characters string.
Let's look at setFrame's definition :
Whatever happens just above, the string or date object is assigned to the date property of the frame, or the frame itself is created if needed.
self.frame_set
is a dictionnary whose name is 1st created during init and loaded during the self.clear(),as seen below:
and is either already containing the required frame, or the frame is added by requiring a specific type of frame for the corresponding ID, (depending on if you start from scratch or if you're loading a file), from the eyed3\id3\frames module
(here, a
DateFrame
, which is a subclass offrames.TextFrame
).II) eyed3\id3\frames.py
Here, we see the approved frame types,
The "if" statement hereabove used to be an assert not so long ago...
self.date
is assigned to the value ofself.text
(inherited fromTextFrame
)(a property, which refers to the private
_text
variable, itself being the parameter provided after the frame ID, aka "str(date_val)
" earlier).which uses core.Date.parse() to get (as in "getter") the date object, or
None
if empty;But the date in our case is being SET, hence the following being used :
From this, we can see the date object "getter" ALWAYS parse the string to form the date object,
and conversely, the "setter" always check it is valid by 1st creating a date object and then obtaining the string from it,
which means date object probably have a str() function, to do the second part which is saving that string to self.text.
(not shown here, but it is a pretty direct get/set to
self._text
, only decorated with@requireUnicode(1)
, my guess is "to avoid decoding errors")If you remember the part about 'TDAT' and 'TIME' being treated differently, you saw that those strings (like "0605") were sent STRAIGHT in there.
Welp, too bad, they are parsed into a date object anyway...Anyone wondering how a string with a leading zero will be met ?
DO NOTE that just after the except block above, and before the "
self.text = str(date)
" line, is where anything else could be added to recognize extra date formats, and modification could be done to make sure the string complies with what is expected of that frame (for example, length-wise, or to comply with available formats).Now, we need to have a good look at the core part...
III) eyed3\core.py
Something that is important later, the list of timestamp formats :
A brief look shows that this Date class doesn't use it's own "
__new__()
", which might be useful later if we want to check something before actually creating the object.So, we see that "
year
" is a mandatory parameter for the treatment (which is only half a solution, given ISO8601 is allowing some other cases like "not giving the year but keeping the month-day or hour:minute parts", depending on which version is used), and the parameters are passed to private variables, each having their own property accessors.Also, the parameters are previously passed to a
datetime.datetime
just to check if the date is valid (remember, there are like 10 days that didn't happen when switching to the current calendar in the 1580's to correct an offset in leap years from the Julian calendar... Not that we'll ever find some centuries old MP3 files, but some people might use the date tags by using ID3 tags on another format, maybe some scientific classification involving dating things...),and after the private variables are set, the
_validateFormat()
is called.Note that
datetime.datetime
REQUIRES a year to be given (actually, year, month, day is required, the rest is optional, but provided with default 0 by the above code).So, we're going through every and each date formats to try to obtain "
pdate
" which is atime.struct_time
, breaking out of the loop if we find a correct format, or getting aValueError
and continuing the loop to the next format.If none of the formats gives us a pdate, we raise a ValueError, else, we return the pdate and the actual format that validated it.
NOTE that
time.strptime
works depending on the format given (fmt
) and as such, could perfectly give you a time.struct_time without you providing a year, as long as the format acknowledge it ; in such case, the year is defaulted to "1900
".If you've followed through, you should see that this
time.strptime()
would use "0605
" as input string, and with the list of timestamp formats being looped through, you might have an idea of which format is going to be validating that string :"
%Y
", the very first, is expecting up to 4 numbers, and we just gave it 4 numbers.Even more problematic, this is being interpreted as "year 605", and that leading zero isn't found in the '
tm_year
' integer value, obviously, which leads to even more problems down the road.Remember that in frames,
Date.parse
was called (and as you can see below, it ends up running aDate.__init__()
anyway, using the parameters obtained from the time.struct_time namedpdate
, resulting of_validateFormats()
):which is somewhat the opposite of the
__str__()
method :NOTE that the
parse
returns a date with the year parameter already filled on thereturn
line.The
__str__
method also assume the year is mandatory, but we DO know that some of the ID3 frames are specific about "date" (as in month and day) and "time" (hour and minute), like "TDAT" and "TIME".Reminder, the
recording_date
is saved using up to 3 frames (excerpt from ID3 docs):TYER
The 'Year' frame is a numeric string with a year of the recording.
This frames is always four characters long (until the year 10000).
TDAT
The 'Date' frame is a numeric string in the DDMM format containing
the date for the recording. This field is always four characters
long.
TIME
The 'Time' frame is a numeric string in the HHMM format containing
the time for the recording. This field is always four characters
long.
Also, we can see the above code is only able to form standard ISO8601 strings, up to 17 chars supposedly, which means it is actually supposed to go up to "YYYY-mm-ddTHH:MMZ" at best, Z being "Zulu time" aka UTC.
Which is also reinforced by the above excerpt of ID3 docs stating that 'TIME' only consider HHMM for his 4 chars, no "SS" for seconds.
So, there is a bunch of small modifications to do to get it all working, which includes adding new formats, modified Date objects functions that work with such formats, modified id3.tag functions to use those specific formats when dealing with those special parts ('TDAT' and 'TIME'), and modified DateFrame functions to only pick the relevant 4 chars out of these new formats before it can be saved to .date -> .text ->_text .
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Modifications to fix the bug :
First, we need to agree on 2 new formats we'll add to eyed3\core.py's class
Date
'sTIME_STAMP_FORMATS
.We saw earlier that a group of up to 4 numbers is impossible, due to the very 1st format being "
%Y
",So we need to differentiate somehow; considering this is for a date ('TDAT') and a time ('TIME'),
I suggest "
D%d-%m
" and "T%H:%M
", which should be self explanatory.So let's add them to the end of
TIME_STAMP_FORMATS
:This means we might get some "dates" that do not include a year, and this asks for several tweaks.
Second, to make sure the init is not called on empty parameters, we can use the
__new__()
classmethodthat isn't used in this class, so let's add, in-between
TIME_STAMP_FORMATS
and__init__()
:Yeah, it's not exactly refined but it makes good use of the list comprehension. Feel free to adjust as needed.
Third, we need the
__init__()
to acknowledge the non-mandatory nature of the year parameter,which starts with a
year=None
in the arguments, and a fix for thedatetime()
call :I arbitrarily defaulted the year to
1899
, similar to the "1900" default oftime.struct_time
,allowing to differentiate in case of errors, and using an odd year to keep the exact same reaction from
datetime.datetime
(1900, while "even" and "multiple of 4", is NOT a leap year, since "not divisible by 400"),
this exact problem is what caused the 10 days shift in 1582 : https://en.wikipedia.org/wiki/Gregorian_calendar .
This means
time.strptime
(in_validateFormat
) could give you a "ValueError: day is out of range for month
" if you wereto specify
time.strptime("1899-02-29", "%Y-%m-%d")
, but not withtime.strptime("D29-02", "D%d-%m")
, since it's own "default year" (1900) is not held against you for error.datetime.datetime
, on the other hand will get you the very same "ValueError: day is out of range for month
" if the yearprovided is not a leap year.
You can test it in python easily :
This error was already there, ready to pounce on unsuspecting people with a recording_date set to February 29th...
The easy fix would be to use a leap year as a default, like 1896, or "anywhen" that is both a leap year AND unmistakable.
Keep in mind the modern "sound" recording started in the 1800's, but scientists found a clay record by serendipity on some 4000BCE, about 6000 years old, pottery on which the potter made a grove spiral around the whole item (for decoration purpose) with a setup pretty similar to what a phonautograph would do in the mid-1800's. Laser scans and sound reconstruction allowed the scientist to reproduce the sound without altering the item, and they were floored to hear a child's voice and their father's response.
I'll leave it to Travis Shirk to decide on a correct default date, since I don't know what other "date" properties could be
having a similar problem with
datetime.datetime
in eyeD3.Then, we need to modify the
parse
to make the year optional :Of course, we need to deal with
__str__()
too, since it'sparse
's opposite :Again, For both
parse
and__str__
, I left the seconds in the code just to be sure, but the max length is supposedly 17 chars :YYYY-mm-ddTHH:MMZ is the maximum info that will be saved if that docstring is right.
So I commented out the part about seconds, in order to avoid the new "
T%H:%M
" format mismatching. (that, or we need to include it's variants, "T%H
" and "T%H:%M:%S
" ).We're done with eyed3\core.py, let's look at the next one.
In eyed3\id3\frames.py :
Remember the spot at the end of explanation part II), where I said this was where "anything else could be added to recognize extra date formats".
We'll add a little bit of code to the
date.setter
and use an intermediate variable (basically splitting the final "self.text = str(date)
" above and below the code, to avoid modifying it through property everytime) :So, we inline define
DorT
, which is the 1st character of the string, and use the condition of it being either a "D" or a "T" (beware, case sensitive, "d" and "t" could be added to the list)AND we check that the length is no more than 6 (assuming we're not getting strings with seconds in it, from the previous modifications and the 17 chars limit; TWEAK IT IF NEEDED).
We then manipulate the string to remove the first (and presumably the ONLY) occurence of that "either a D or a T" character that brought us here,
and then manipulate it again to remove the dash (if it was a "D") or else remove the colon (because "else" means "T").
This giving us a 4 characters string for the corresponding V2.3 frames.
We now only have eyed3\id3\tag.py to modify :
We have to make sure that when setting the recording date for "v2.3 or lesser", every frame ID in 'TYER','TDAT','TIME' receives the appropriate format, since this is the one operation dealing with those 3 frames, with each requiring it's own format.
As you can see, the modification here is minimal, switching the "
%s%s
" to "D%s-%s
" or "T%s:%s
" depending on the frame ID.This should be enough to fix issue #517 .
As a side note, the
_getV23RecordingDate()
function does "date = core.Date.parse(date_str)
" 3 times in a single call,based on the updating of
date_str
after each frame is read; since it's a getter, and any of these operations failing would be causing a halt,wouldn't it be the same, just faster, to wait after the 3 "ifs" are done to do a single operation ?
Basically turning this (original code) :
To this :
Or is there a specific merit to having these 3 parses done independently ?