Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Replace binary data blobs under test_JSON with uuencoded blobs #239

Closed
lcn2 opened this issue Jun 4, 2022 · 35 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@lcn2
Copy link
Contributor

lcn2 commented Jun 4, 2022

To avoid the annoyance of carrying around binary blobs in the repo, binary blobs under test_JSON should be replaced with uuencoded ASCII files. The json_test.sh should be modified to uudecode such files during the test.

Consider using a .uu file extension for such unencoded binary blobs under test_JSON. Consider how to clean up such uudecoded tests: within the json_test.sh itself and/or the make clean (or make clobber) rule.

@xexyl
Copy link
Contributor

xexyl commented Jun 4, 2022

Good call on these tools. I tend to forget them as I rarely need them. I am not sure if I will do this as I should focus on dbg but maybe I will look at it if I don’t have the energy and motivation for the other but do for this.

A question to consider: the option letters to specify the paths to the tools. My thinking is:

-u for uuencode and -U for the decode counterpart.

But maybe you think it should be the other way round? What do you think?

(Disclaimer: typing this on the phone: but if I made a typo on the tools and their names you know what I mean!)

@lcn2 lcn2 changed the title Enhancement: Replace binary data blobs under test_JSON with c blobs Enhancement: Replace binary data blobs under test_JSON with uuencoded blobs Jun 12, 2022
@xexyl
Copy link
Contributor

xexyl commented Jul 4, 2022

I just remembered this issue and I know nothing has happened with it so I thought I'd reply to this to get some conversation going (besides what I suggested already).

To avoid the annoyance of carrying around binary blobs in the repo, binary blobs under test_JSON should be replaced with uuencoded ASCII files. The json_test.sh should be modified to uudecode such files during the test.

I guess the script that you refer to is now: old.chk_test.sh. Is that correct?

I still think this script will be necessary so I think this issue is a valid issue still but it might be that this modification will not be so useful to worry about until after chkentry is written as this script is to help test it.

Consider using a .uu file extension for such unencoded binary blobs under test_JSON. Consider how to clean up such uudecoded tests: within the json_test.sh itself and/or the make clean (or make clobber) rule.

But with that in mind I suppose the script could be modified so that first it finds *.json (using find) and then as you suggested find *.json.uu. But the question is if these files are removed by make clobber won't this rather defeat the purpose of using the tool? I mean the point is to not have binary data in text files, right?

EDIT 0:

Ah. You must mean it would remove the uudecoded files. In that case maybe it shouldn't be done with make clobber but once the script is finished. Maybe make clobber could also do it just to be sure but the script should probably do it as well?

@lcn2
Copy link
Contributor Author

lcn2 commented Jul 4, 2022

On finding a *.json.uu file, the script could use uudecode(1) and pipe the output into the test. There is no need to form decoded files and then test them.

@xexyl
Copy link
Contributor

xexyl commented Jul 4, 2022

On finding a *.json.uu file, the script could use uudecode(1) and pipe the output into the test. There is no need to form decoded files and then test them.

That’s true. I was still waking up and trying to fit it into the make clobber idea. This means that we don’t even need to modify that rule I guess.

@xexyl
Copy link
Contributor

xexyl commented Jul 5, 2022

I started working on this but it occurred to me that much more has to be done here as now the semantics of the checks also have to change as it's not two tools but one. This requires too many changes in my tired state (truthfully I'm exhausted and I was thinking all I'd have to do is parse a -u option and then add a loop for each info and auth check for uuencoded files and then .. well you know - but there's more to it).

So for now I'm not going to do that and I don't think I'll end up doing anything else today either I'm afraid.

Hopefully tomorrow I can do something or some things but we'll see. It's still possible I do something but I don't know if I will and I rather doubt it as I need to just rest and I have a zoom meeting in the afternoon.

I do know one thing I could do but I'm realising that I'm too tired for that even even though it's very simple.

Hope you're having a nice sleep my friend!

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Something occurred to me on this issue. If we want this as part of the test procedure - which we will want with bug_report.sh testing - then there's a problem with this idea. Well not the idea itself but more so that these two tools are not always installed. So how to handle this?

If the idea is to get rid of NUL files from the repo it forces the user to have these tools. But what if they don't have them? Then we cannot test the NUL bytes. If they do it's not a problem. Now we could maybe have two sets of tests: one is the regular files and one is the binary tests which instead of aborting if the tools are not found it simply skips the tests.

What do you think?

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

... strange problem ...

$ /usr/bin/uudecode -o /dev/stdout ./test_JSON/bad/nul0.json.uu |./jparse -v 0 -J 0 -- -
ERROR[33]: parse_json_stream: stream is not open
ERROR[1]: ./jparse: invalid JSON

Except for that this is done.

EDIT 0

Using - in other cases is fine btw. The NUL uu file looks like:

$ cat test_JSON/bad/nul0.json.uu  
begin 644 nul0.json
$>P!]"@``
`
end

and it's possible I have done something wrong as well of course.

UPDATE 0

It must be because of:

    /*
     * determine stream position
     *
     * The ftell() function will fail if stream is not open.
     */
    pos = ftell(stream);
    if (pos < 0) {
        return false;
    }

Should we not use this if the file is stdin ?

UPDATE 1:

It's because of this. Modifying that function a bit I get:

$ /usr/bin/uudecode -o /dev/stdout ./test_JSON/bad/nul0.json.uu |./jparse -v 0 -J 0 -- -
Warning: is_open_stream: ftell() returned < 0: errno[29]: Illegal seek

EDIT 1

Can you seek on binary data ? I'm not sure. It seems like one should be able to since the script works fine on the file with NUL itself but not when it's on stdin.

EDIT 2

Another test: redirecting the uudecode of the file to some other file and then running jparse on that file and it has no problem.

UPDATE 1

I fixed this which you'll see below.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

.. and with commit 550ab6a I believe this issue is resolved. You can let me know what you think. The log details the change to the function that was causing a problem.

EDIT 0

Oh wait. I forgot. This might not be strictly true that all binary data is removed. There are a few places I just remembered. The info.json/bad and author.json/bad subdirectories also have binary data. But the question is: what actually makes use of those files ? Do we still need them ? If we do what tests it ? I'm not sure but at least the one part is fixed.

EDIT 1

Okay it's actually the old check script. Can we safely get rid of the test_JSON/{info,author}.json subdirectories ? We would keep test_JSON/{good,bad} but remove the others.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Can you seek on binary data ?

yes

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Something occurred to me on this issue. If we want this as part of the test procedure - which we will want with bug_report.sh testing - then there's a problem with this idea. Well not the idea itself but more so that these two tools are not always installed. So how to handle this?

If the idea is to get rid of NUL files from the repo it forces the user to have these tools. But what if they don't have them? Then we cannot test the NUL bytes. If they do it's not a problem. Now we could maybe have two sets of tests: one is the regular files and one is the binary tests which instead of aborting if the tools are not found it simply skips the tests.

What do you think?

By NUL files you mean files with NUL bytes in them?

We don't think we need to that. It is not a problem for people using the mkioccentry(1) tool. If someone wants to maintain it, they will need to required tools including things like an up to date bison(1) and flex(1) as well as the skills needed to handle data. :-)

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Can you seek on binary data ?

yes

Yes which I tested. I was just trying to think of 'what could it be'. I never did get quite the right answer except that maybe it was a fifo. Not sure now .. that was too many hours ago now.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Something occurred to me on this issue. If we want this as part of the test procedure - which we will want with bug_report.sh testing - then there's a problem with this idea. Well not the idea itself but more so that these two tools are not always installed. So how to handle this?
If the idea is to get rid of NUL files from the repo it forces the user to have these tools. But what if they don't have them? Then we cannot test the NUL bytes. If they do it's not a problem. Now we could maybe have two sets of tests: one is the regular files and one is the binary tests which instead of aborting if the tools are not found it simply skips the tests.
What do you think?

By NUL files you mean files with NUL bytes in them?

Of course.

We don't think we need to that. It is not a problem for people using the mkioccentry(1) tool. If someone wants to maintain it, they will need to required tools including things like an up to date bison(1) and flex(1) as well as the skills needed to handle data. :-)

Yeah. Well I just made it skip the test if they don't have the right tool. I presume that's okay.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

We are considering closing this issue with a Close as not planned as this is neither required nor a priority.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

We are considering closing this issue with a Close as not planned as this is neither required nor a priority.

Sure though it doesn't hurt to have the uudecode in there I guess. If you need me to change it I can roll that back though.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Sure though it doesn't hurt to have the uudecode in there I guess. If you need me to change it I can roll that back though.

Well on 2nd thought, perhaps the use of uudecode(1) is a good idea (and thus keep this issue open until it is finished) makes things easier to maintain?

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Sure though it doesn't hurt to have the uudecode in there I guess. If you need me to change it I can roll that back though.

Well on 2nd thought, perhaps the use of uudecode(1) is a good idea (and thus keep this issue open until it is finished) makes things easier to maintain?

Well as I said the issue is resolved unless we need to do something with the old scripts that use test_JSON/{info,auth}.json/bad subdirectories. If that's the case I can adapt it maybe but I'll need to know more about what needs to happen. Right now since the scripts were removed from the repo I presume this might not be necessary? In which case we could probably delete those subdirectories of test_JSON (just not the good/bad as those are for json in general) and close this?

@lcn2 lcn2 closed this as completed in 550ab6a Oct 30, 2022
@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Should we delete those two subdirectories then ? Or should we hold on to them in case they're necessary ?

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Sure though it doesn't hurt to have the uudecode in there I guess. If you need me to change it I can roll that back though.

Well on 2nd thought, perhaps the use of uudecode(1) is a good idea (and thus keep this issue open until it is finished) makes things easier to maintain?

Well as I said the issue is resolved unless we need to do something with the old scripts that use test_JSON/{info,auth}.json/bad subdirectories. If that's the case I can adapt it maybe but I'll need to know more about what needs to happen. Right now since the scripts were removed from the repo I presume this might not be necessary? In which case we could probably delete those subdirectories of test_JSON (just not the good/bad as those are for json in general) and close this?

The idea, of course, was that the bad directories would contain invalid JSON files in order to test that the parser will reject invalid files.

Under test_JSON we see:

  • author.json
  • bad
  • good
  • info.json

We do like the idea that the generic good and bad sub-directories (those that are independent of the .author.json and .info.json file concepts) exist, BTW. This way, bogus JSON files that have nothing to do with .author.json and .info.json can be considered.

We suspect that tests for chkentry(1) will focus on files under test_JSON/{info,author).json/{good,bad}/ and ignore the top level test_JSON/bad/ because in order for semantic tests to start, the JSON parser must first declare the code valid.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Sure though it doesn't hurt to have the uudecode in there I guess. If you need me to change it I can roll that back though.

Well on 2nd thought, perhaps the use of uudecode(1) is a good idea (and thus keep this issue open until it is finished) makes things easier to maintain?

Well as I said the issue is resolved unless we need to do something with the old scripts that use test_JSON/{info,auth}.json/bad subdirectories. If that's the case I can adapt it maybe but I'll need to know more about what needs to happen. Right now since the scripts were removed from the repo I presume this might not be necessary? In which case we could probably delete those subdirectories of test_JSON (just not the good/bad as those are for json in general) and close this?

The idea, of course, was that the bad directories would contain invalid JSON files in order to test that the parser will reject invalid files.

Of course.

Under test_JSON we see:

  • author.json
  • bad
  • good
  • info.json

We do like the idea that the generic good and bad sub-directories (those that are independent of the .author.json and .info.json file concepts) exist, BTW. This way, bogus JSON files that have nothing to do with .author.json and .info.json can be considered.

That's indeed why I added it. More specifically it was for when you do a mass test of the parser that you referred to not too long ago.

We suspect that tests for chkentry(1) will focus on files under test_JSON/{info,author).json/{good,bad}/ and ignore the top level test_JSON/bad/ because in order for semantic tests to start, the JSON parser must first declare the code valid.

I'd think that too. So are you saying the files should stay there? If that's the case will there need to be a new script that tests chkentry(1)? Now I think on it that's probably a good idea. It could also use uudecode and it'd be added to ioccc_test.sh I guess.

What do you think ?

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Should we delete those two subdirectories then ? Or should we hold on to them in case they're necessary ?

As for the test_JSON/{info,author).json/bad/ directories. Those should be reserved for VALID JSON files that are otherwise improper (semantically invalid) as .author.json and .info.json files, perhaps?

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Should we delete those two subdirectories then ? Or should we hold on to them in case they're necessary ?

As for the test_JSON/{info,author).json/bad/ directories. Those should be reserved for VALID JSON files that are otherwise improper (semantically invalid) as .author.json and .info.json files, perhaps?

That would be one way of dealing with I suppose. It would remove the need for uudecode. The only possible thing to consider is what if someone wants to screw with the contest by submitting binary data in the json file? Okay the entry would be rejected but should chkentry check it? Or would the fact the json parser would first declare it as invalid be enough? But then consider if someone comes up with a clever way to circumvent that: if we had files that are invalid json we could then test it? Just a few random thoughts on that.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

I'd think that too. So are you saying the files should stay there?

Yes, the test_JSON/{info,author).json/{good,bad}/ should remain.

In particular, test_JSON/good/ should be for valid JSON test cases that have nothing to do with the IOCCC AND test_JSON//bad/ should be for invalid JSON files.

If you prefer, you can rename them:

  • test_JSON/good ==> test_JSON/generic/good
  • test_JSON/bad ==> test_JSON/generic/bad

As that makes the purpose of those files more clear.

If that's the case will there need to be a new script that tests chkentry(1)?

Yes, we working on such script as part of our chkentry(1) testing/

Now I think on it that's probably a good idea. It could also use uudecode and it'd be added to ioccc_test.sh I guess.

Yes, please. If you want to both add uudecode(1) to the set of optional tools AND even complete issue #239, and do the above directory rename,g that would be helpful.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

I'd think that too. So are you saying the files should stay there?

Yes, the test_JSON/{info,author).json/{good,bad}/ should remain.

Right.

In particular, test_JSON/good/ should be for valid JSON test cases that have nothing to do with the IOCCC AND test_JSON//bad/ should be for invalid JSON files.

Yes. That's what I had in mind.

If you prefer, you can rename them:

  • test_JSON/good ==> test_JSON/generic/good
  • test_JSON/bad ==> test_JSON/generic/bad

As that makes the purpose of those files more clear.

Or maybe general or just json? The latter might be more clearer? Read it as it's a json file that isn't specially an info.json or auth.json file?

If that's the case will there need to be a new script that tests chkentry(1)?

Yes, we working on such script as part of our chkentry(1) testing/

Makes sense.

Now I think on it that's probably a good idea. It could also use uudecode and it'd be added to ioccc_test.sh I guess.

Yes, please. If you want to both add uudecode(1) to the set of optional tools AND even complete issue #239, and do the above directory rename,g that would be helpful.

Well I can do the renaming but until the chkentry script is made I of course cannot add uudecode to it. However jparse_test.sh was updated to use it with commit 550ab6a.

So as far as I can tell right now the only thing I can do is rename the directories. Is json a good idea or do you prefer one of the others?

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Thanks for closing this issue, @xexyl. We think it is for the best.

Please undo the need for uudecode(1) as well.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Thanks for closing this issue, @xexyl. We think it is for the best.

Please undo the need for uudecode(1) as well.

How do you mean? Undo the changes to jparse_test.sh and remove the NUL byte files? To be clear it's not actually a need - it's only if the user has it. Else the test is just disabled.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Should we delete those two subdirectories then ? Or should we hold on to them in case they're necessary ?

As for the test_JSON/{info,author).json/bad/ directories. Those should be reserved for VALID JSON files that are otherwise improper (semantically invalid) as .author.json and .info.json files, perhaps?

That would be one way of dealing with I suppose. It would remove the need for uudecode.

We realize that we are changing our minds on the uudecode(1) issue.

Changing is what minds are for! :-)

Perhaps we should NOT fear binary blobs after all. :-)

Let us remove the need for uudecode(1) and simplify the test code.

The only possible thing to consider is what if someone wants to screw with the contest by submitting binary data in the json file? Okay the entry would be rejected but should chkentry check it? Or would the fact the json parser would first declare it as invalid be enough?

If the JSON parser rejects the file as invalid JSON, then chkentry(1) won't even apply any of the semantical tests.

No need to special case binary stuff. Let such files pass or fail by the JSON parser.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Should we delete those two subdirectories then ? Or should we hold on to them in case they're necessary ?

As for the test_JSON/{info,author).json/bad/ directories. Those should be reserved for VALID JSON files that are otherwise improper (semantically invalid) as .author.json and .info.json files, perhaps?

That would be one way of dealing with I suppose. It would remove the need for uudecode.

We realize that we are changing our minds on the uudecode(1) issue.

Changing is what minds are for! :-)

Funny! :-)

Perhaps we should NOT fear binary blobs after all. :-)

Let us remove the need for uudecode(1) and simplify the test code.

Sure. So rename the files back and change the script to what it was before. Also change the man page.

The only possible thing to consider is what if someone wants to screw with the contest by submitting binary data in the json file? Okay the entry would be rejected but should chkentry check it? Or would the fact the json parser would first declare it as invalid be enough?

If the JSON parser rejects the file as invalid JSON, then chkentry(1) won't even apply any of the semantical tests.

So I thought.

No need to special case binary stuff. Let such files pass or fail by the JSON parser.

Sure.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

Thanks for closing this issue, @xexyl. We think it is for the best.
Please undo the need for uudecode(1) as well.

How do you mean? Undo the changes to jparse_test.sh and remove the NUL byte files? To be clear it's not actually a need - it's only if the user has it. Else the test is just disabled.

Remove the need for the uudecode(1), remove it from an option tool list, do not make use of uudecode(1) in the test tools.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

Thanks for closing this issue, @xexyl. We think it is for the best.
Please undo the need for uudecode(1) as well.

How do you mean? Undo the changes to jparse_test.sh and remove the NUL byte files? To be clear it's not actually a need - it's only if the user has it. Else the test is just disabled.

Remove the need for the uudecode(1), remove it from an option tool list, do not make use of uudecode(1) in the test tools.

Yes doing that now. Shouldn't take long.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

If you prefer, you can rename them:

  • test_JSON/good ==> test_JSON/generic/good
  • test_JSON/bad ==> test_JSON/generic/bad

As that makes the purpose of those files more clear.

Or maybe general or just json? The latter might be more clearer? Read it as it's a json file that isn't specially an info.json or auth.json file?

We like the general and the json names, but it seems that general.json is the best of both.

Why don't you rename the directories and update the test tools accordingly?

In particular, test_JSON/{author.info}.json/ should be reserved for VALID JSON files.

The test_JSON/{author.info}.json/good/ for VALID JSON files that are semantically valid .{author.info}.json files.

The test_JSON/{author.info}.json/bad/ for VALID JSON files that are semantically invalid .{author.info}.json` files.

The test_JSON/general.json/{good,bad}/ should focus on general JSON testing.

UPDATE 0

Minor edits to the above.

@xexyl
Copy link
Contributor

xexyl commented Oct 30, 2022

If you prefer, you can rename them:

  • test_JSON/good ==> test_JSON/generic/good
  • test_JSON/bad ==> test_JSON/generic/bad

As that makes the purpose of those files more clear.

Or maybe general or just json? The latter might be more clearer? Read it as it's a json file that isn't specially an info.json or auth.json file?

We like the general name. Why don't you rename the directories and update the test tools accordingly?

In particular, test_JSON/{author.info}.json/ should be reserved for VALID JSON files. The test_JSON/{author.info}.json/good/ for VALID JSON files that are semantically valid .{author.info}.json files AND test_JSON/{author.info}.json/bad/ for VALID JSON files that are semantically invalid .{author.info}.json` files.

Whereas test_JSON/general.json/{good,bad}/ should focus on general JSON testing.

Ah. Well this will slow me down slightly. I was just writing a commit message. I'll save it, make changes and then do a git commit amend. But as for the only good files in the {author,info}.json directories that will take more time to figure out which files should be removed and which should not so I'll not worry about that now. I don't want to accidentally remove a test file that is actually needed!

I'll be making a pull request soon.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 30, 2022

I'll be making a pull request soon.

Thanks .. we will wait for it.

@xexyl
Copy link
Contributor

xexyl commented Oct 31, 2022

I'll be making a pull request soon.

Thanks .. we will wait for it.

Well although you already saw I wanted to say you're welcome.

Also with commit 66622c8 the invalid json files were removed from test_JSON/{info,auth}.json/bad. A special note is there is still one file (or at least one?) that has a NUL byte but that's because it's in a string which is valid.

I don't know if this file will fail for some other reason but if not and it's not checked in our functions this might need to be added? I'm not sure: what do you think ?

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 31, 2022

I'll be making a pull request soon.

Thanks .. we will wait for it.

Well although you already saw I wanted to say you're welcome.

Also with commit 66622c8 the invalid json files were removed from test_JSON/{info,auth}.json/bad. A special note is there is still one file (or at least one?) that has a NUL byte but that's because it's in a string which is valid.

I don't know if this file will fail for some other reason but if not and it's not checked in our functions this might need to be added? I'm not sure: what do you think ?

There will be a bunch of additions to test_JSON soon.

@xexyl
Copy link
Contributor

xexyl commented Oct 31, 2022

I'll be making a pull request soon.

Thanks .. we will wait for it.

Well although you already saw I wanted to say you're welcome.
Also with commit 66622c8 the invalid json files were removed from test_JSON/{info,auth}.json/bad. A special note is there is still one file (or at least one?) that has a NUL byte but that's because it's in a string which is valid.
I don't know if this file will fail for some other reason but if not and it's not checked in our functions this might need to be added? I'm not sure: what do you think ?

There will be a bunch of additions to test_JSON soon.

Sounds good. I'll be leaving soon so I hope what I did today will work for you. If not I'll have to fix it tomorrow.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants