-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: create the jauthchk tool - check on the contents of an .author.json file #69
Comments
The following is a guide as to what needs to be checked in an To recap an import point made above:
See www.json.org for details of the JSON format.
All files must start with a "IOCCC_info_version" : "1.7 2022-02-04", The "ioccc_contest" : "IOCCC28", The "ioccc_year" : 2022, The "mkiocccentry_version" : "0.35 2022-02-07", The "iocccsize_version" : "28.7 2022-02-01", The "IOCCC_contest_id" : "test", The "entry_num" : 0, The "authors" : [
{
"name" : "author name",
"location_code" : "CC",
"location_name" : "Cocos (Keeling) Islands (the)",
"email" : "test@example.com",
"url" : "https:\/\/example.com\/index.html",
"twitter" : "@twitter",
"github" : "@github",
"affiliation" : "an affiliation",
"winner_handle" : "author-last",
"author_number" : 0
}
], The
The values conform with the restrictions imposed by the NOTE: When the user declined to enter a value (where permitted by the NOTE: Due to a design flaw of the JSON spec, the last value of the "formed_timestamp" : 1644618833, The "formed_timestamp_usec" : 631668, The "timestamp_epoch" : "Thr Jan 1 00:00:00 1970 UTC", The "min_timestamp" : 1643987926 The "formed_UTC" : "Fri Feb 11 23:09:42 2022 UTC" The date "+%a %b %d %H:%M:%S %Y UTC"
All files must end with a To be valid, the |
Comments, suggestions, corrections and clarifications for the above long comment are welcome. We recommend that you copy only the relevant parts of the long comment when you do. :-) If/where needed, we will attempt to modify the long comments in place above, where and when possible. |
This will be helpful to me. Thanks.
That's easy enough to check.
For the above two we can just use
In this case "ioccc_year" : "2022", I.e. add quotes or something else. That is a general question for all the other fields and the other file as well: if the user changes it so that it's not the right type (for example - int to string) should this be an error in validation OR should the quotes be ignored? Since you're only after if the right fields (with the right values) are there is it okay if the correct fields are there at the same time as the format being wrong (by format being wrong I mean not validly JSON)?
As for this one: how do you propose this is tested? What might be helpful here: can you provide an actual UUID string that might be valid in the contest so that testing this tool can be easier?
Here this makes me think that the format of the JSON file does in fact matter - since you say it must be an array and the fields must be exactly the below. Does this hold everywhere else too?
That should be helpful.
Good to know.
That's interesting bit of information. Thanks for clarifying this.
Which integer type do you suggest?
Do you have a recommended way to go about this? The first thing that popped into my head is -- These comments apply to this tool as well as what I'll say about the next one (if any comments/questions): Hopefully these questions can start a good conversation on it and help bring clarity. Although I started this I'm not sure if I can finish this one. I might be able to but I think I'll have to take a break for a few days and maybe return to it in the middle of next week. Really what it comes down to is learning a bit about JSON (after your replies). That being said I might start writing a parser that simply separates the field name and value into a list (I don't mean a linked list necessarily). I don't think that'll be today though: I'd rather get your feedback first and then process it. It's possible tomorrow morning I can work on this a bit but if not there's no other time I can tomorrow. Monday I should be able to do a bit of it but I might be away a while since it is after all my 40th birthday (that's also - as I said - why I'll be gone most of tomorrow). |
Something occurred to me that you did not address: should certain characters be verified that they're escaped? For example should the URL have the Also should the |
The value 2022 without quotes is correct. It is valid JSON (see the spec). The value "2022", a string, would be an error for that numeric value. We will reply later to your other comments. |
Does that mean that with quotes it should be considered invalid in the context of the tool? I get that impression but want to be sure.
Thank you. I'll probably look at it tomorrow - or else later on today if I get a chance. |
Yes. If the value is numeric, then there MUST be no quotes around the value.
Only strings are names appear to be in double quotes. JSON values such these are not double quoted:
In the JSON used by IOCCC, we do not have use for numeric values that are non-integers. So the last 2 value forms may be safely ignored. |
We are sure that you remember that the JSON elements may come in any order, and that whitespace can change without impacting the JSON validity, and that string such as |
Here is a sample UUID:
For more info see this other comment. |
Probably, yes. |
JSON numbers can be of any length. JSON numbers are typeless. JSON integers are just a string of decimal digits of any length. See the You need not support huge multi-precision numbers. Instead try to form a You might want to look at the length of the characters of a JSON number. Now LLONG_MAX is:
And in decimal, 9223372036854775807 has 19 digits. So define: #define LLONG_MAX_BASE10_DIGITS (19) Then if the length (not counting and leading - sign) of the JSON number exceeds Then use While the checking length (ignoring any leading - sign) of the JSON number is option, it might still be a good idea in case |
The use of |
It is hard for us to determine what parts of a long reply need to be responded to and what are just comments. Perhaps single issue messages might make that easier? Anyway if we missed a question, please ask it again, perhaps as 1 question (or 1 question set) per post? |
This makes sense.
This is helpful, thanks. I'll also take a look at the JSON documentation you provided - but tomorrow. Just quickly going through this with any comments and then turning in for the night. |
Yes I do but it's good that you made it clear. Appreciate that. One possibility is stripping the spaces out but I'll worry about the technicalities when working on it. Edit: Ah but you're saying it because of my reference to |
Thanks. A short bit ago I remembered the tool |
Thanks for confirming this. |
In that case (this is just a quick thought): since the values are of limited range in C it might be possible to just use
Will do.
Okay.
Yep. And I actually wondered about this. At least if you mean for each JSON number count the number of digits. Is this what you mean?
Good idea. I actually wrote a function years ago that counts the number of digits in an int - but since this will be parsed as a string (initially) I can just use
That sounds good and reasonable.
And any The way I read this is:
Is that what you're saying? That seems reasonable to me at a quick glance. |
Okay I'll consider that then. Thanks. |
No need to be sorry. Actually I'm sorry: I thought just quoting the specific parts would be enough. I'll make sure to do a single question in a single comment in the future. I should know this but I tend to write a lot - apologies!
I'll ask anything again - in a single message - if anything else comes up (or I should say when something else comes up). But no worries if you missed anything. Perhaps you have some things you can add to my above comments and I'll read them in the morning. I'm going to check the other thread and if nothing else was posted there I'll turn in for the night. Take good care and enjoy the rest of your day! More tomorrow. I'm sure I'll make a pull request in the morning but almost certainly after that nothing until Monday. |
Ah, there's this question you did miss. Should the JSON validator detect |
We think so. The JSON spec seems to suggest that for some reason, /'s needs to be -escaped. See /*
* json_putc - print a UTF-8 character with JSON encoding
*
* JSON string encoding JSON string encoding.
*
* These escape characters are required by JSON:
*
* old new
* --------------------
* " \"
* / \/
* \ \\
* <backspace> \b (\x08)
* <tab> \t (\x09)
* <newline> \n (\x0a)
* <vertical tab> \f (\x0c)
* <enter> \r (\x0d)
*
* These escape characters are implied by JSON due to
* HTML and XML encoding, although not strictly required:
*
* old new
* --------------------
* < \u003C
* > \u003E
* & \u0026
*
* These escape characters are implied by JSON to let humans
* view JSON without worrying about characters that might
* not display / might not be printable:
*
* old new
* --------------------
* \x00-\x07 \u0000 - \u0007
* \x0e-\x1f \u0005 - \x001f
* \x7f-\xff \u007f - \u00ff
*
* See:
*
* https://developpaper.com/escape-and-unicode-encoding-in-json-serialization/
*
* NOTE: We chose to not escape '%' as was suggested by the above URL
* because it is neither required by JSON nor implied by JSON. |
That's very helpful, thanks. I guess that means when parsing the fields one will have to keep track of the previous character too so that when they encounter a character that has to be escaped and the previous character was not Perhaps there could be a function that does the checking: it would take the previous character and the current character and if the current character is one of the above and the previous character is not then it's an error. That would make it more modular and cleaner. Well as you'll see I did a pull request with two checks added to The latter test can be modified a bit so that it can be used in the json checker (but since this is only basename perhaps the code in |
On the subject of escaping: I checked the What to do about characters > I haven't started working on any of the parsing yet; for now I'm wanting to get these things clarified. I might work on some of the parsing in a little bit but I'm not sure if I'll have the time and energy (time yes for now but not sure if I have both). |
We can assume UTF-8 throughout the tool chain. As the encoding article recommends, JSON tools should insist that the
The
Moreover, JSON tools should flag as an error, if any of the following UTF-8 characters are found when NOT preceded by a
The following encodings are encouraged (and implied) but not required by the JSON definition:
We suggest that you may wish to create a utility function that converts JSON encoded strings into a malloced un-encoded string. That is: char * json_decode(char const *json_string) The And what you are at it, the following reverse utility function should be written: char * json_encode(char const *utf8_string) The The For testing and "general tool usefulness", two utilities are in order: jstrencode [-h] [-v level] [-V] [string ...]
jstrdecode [-h] [-v level] [-V] [string ...] These utility tools write to As with Unix tools, the output of one should be able to be fed into the other, so: jstrencode < foo | jstrdecode > bar
if ! cmp foo bar; then
echo "foo and bar differ"
fi Then you can add to the test rule, testing for JSON string encoding/decoding as well as tests for detecting improper JSON string encoding. jstrdecode '\error' >/dev/null
status = "$?"
if [[ $status != 0 ]]; then
echo "Improper decoding not detected" 1>&2
exit "$status"
fi Etc. |
Please let me know if this reply is okay or if it should be split off into other messages. I wasn't quite sure of this since it's all one reply I'm replying to. If necessary I'll rewrite it at another time. Just let me know and I'll be happy to do that - then you can disregard the below (I just am about to head off for the day - well I'll be at the computer a bit longer but I'm afraid I'm done with this for the day). I was thinking one of the next things I'll do is add to the processing of lines identifying which field it is so that the only thing left to be done is to parse it (the arrays will of course be different but I'll worry about that later). As you know I also already detect whether the file starts with a Anyway reply below. Have a great rest of your day!
In other words: if any other character has a
But not when validating the file, right? I mean if it has for example
I guess that the
Makes sense.
In other words the above functions can be put in
Of course. Just like my Enigma machine does. I made sure to design it that way!
I guess you're referring to |
As for running it with
and there were no errors and the only memory leaks that I could see were from |
Update: I'm playing with a function I wrote years ago that extracts expressions (or fields) delimited by It's re-entrant as well and this is the customised |
This has also been done and pushed. |
We can turn off -O during development. |
Probably a good idea. Want me to do that? Do you want |
We thought those were already added. |
They are indeed. If I didn't you did. I only saw that later on and I forgot to update this. Sorry! |
Change: COPT= -O3 -g3
#COPT= -O0 -g to this: #COPT= -O3 -g3
COPT= -O0 -g
`` |
Done and pushed. |
The next commit will the the 500th commit. Thank you @xexyl for helping with a number of those 500! |
Thank you! That means a great deal to me! I'm proud and happy to participate! I'm having a lot of fun even if I have to take a lot of breaks and many of the days I can't do much. |
I'm hoping to make at least one additional commit today but I'm not sure of that. In Well I think I will do that because it's good to clean it up and it's a source of pride: something I have a difficult time with. Expect a pull request soon! |
There. Now I'll work a bit more on the extraction of the array but I'm guessing I won't work at it much longer. I did get some important insight today at least which will probably help with the actual task at hand when I'm working on it next. |
Just had an idea with the array... going to play with it and if it works I'll commit it. If not I'll think it over when not working on it. Update: Will return to this later: either later today or tomorrow morning. Not sure which yet. |
I took another moment and I think I solved this problem! Both Looking a bit more and if so I'll commit and push. That'll probably be all I do for the day. |
Pushed. Now I'm done for now. More tomorrow if not later today. Enjoy your time with Leo and stay safe! Cheers. |
Thanks, the demo with Leo went well. Leo suggested a change in a mkiocccenty message that will be done in the next fix. |
Great! Looking forward to it. I will be going to sleep soon so that will be tomorrow. I do have a thought that might help solve the comma issue but that’s also for tomorrow. Hope you have a great night! |
This issue is pending the completion of the JSON parser and closing of #156 "Enhancement: finish the C-based general JSON parser". |
See comment 1155885090 for changes to the chk code / chk warning and error facilities. |
Closing this request in favor of issue #259. |
We need to create the
jauthchk
tool in order to help verify that contents of an.author.json
file found within an entry directory.This tool will primarily be used by other tools (not humans). As such it should behave like
fnamchk
in that if all is well, it should not print anything and simply exit 0. If there are problems found with the .author.json file, then warning messages should be printed to stderr AND thejauthchk
tool should exit with a non-zero status. The use of a-v level
may be use to assist in debugging.The
jauthchk
tool is primarily a stand alone tool. As a sanity check, themkiocccentry
program should execute thejauthchk
code AFTER .author.json file has been created and before the compressed tarball is formed. Ifmkiocccentry
program sees a 0 exit status, then all is well. For a non-zero exit code, the tool probably should abort because any problems detected byjauthchk
based on whatmkiocccentry
wrote into.author.json
indicates there is a serious mismatch between whatmkiocccentry
is doing and whatjauthchk
expects.The following might be how
mkiocccentry
output is changed with the use of this tool (and the other tool):As a stand alone tool, the
jauthchk
tool will be invoked by other tools as part of the IOCCC submission process. That process is beyond the scope of this repo. Suffice it to sat the the IOCCC judges will use this tool is part of their submission workflow.Here is a possible command line usage message:
NOTE: We mention
file
above even though the canonical filename will be.author.json
. The tool should NOT check, nor object to using a different filename.The
mkiocccentry
tool will need to invoke this tool. As such a similar method used to find and specify the location oftxzchk
should be used. As this tool is one of 2 tools being considered, we recommend the following of added to themkiocccentry
command line:IMPORTANT: While it might be tempting to consider depending on some general JSON checker, we do NOT need nor want that. It is important that the mkiocccentry GitHub repo remain stand alone. I.e., all the code needed by someone wishing to enter the IOCCC (beside a C compiler, make, tar, cp, ls) should found in this GitHub repo alone. As there is NO standard JSON tool in widespread distribution the all of the code for this tool needs to reside in this repo only.
IMPORTANT: We do not need a general JSON format checker. We only need to verify that the
file
contains the JSON needed and only the JSON needed for the judges to process IOCCC entries.While is it NOT recommended, if someone wishes to edit their
.author.json
and re-create the compressed tarball we cannot stop them. As suchmkiocccentry
should be STRICT on what is writes into.author.json
ANDjauthchk
should be permissive (but not to a fault) in what is considers as OK.This tool should neither generate an error, nor warn if someone were to reformat the JSON. And as JSON is not order dependent, of someone wishes to reorder the JSON elements, that is fine. As long as all the requirement JSON elements are present, and no new JSON elements are found, and the version string matches, all is OK.
See the a followup comment for details on the checks needed against an
.author.json
file.The text was updated successfully, but these errors were encountered: