Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: write JSON utilities for jfmt, jval and jnamval #523

Open
7 of 29 tasks
lcn2 opened this issue May 24, 2023 · 890 comments
Open
7 of 29 tasks

Enhancement: write JSON utilities for jfmt, jval and jnamval #523

lcn2 opened this issue May 24, 2023 · 890 comments
Assignees
Labels
enhancement New feature or request

Comments

@lcn2
Copy link
Contributor

lcn2 commented May 24, 2023

TODO

other TODOs

Pre-implementation

  • Resolve issue Bug report: jparse -J 3 does not correctly print JSON number values #785
  • Decide how jfmt should format JSON
  • Usage messages produced by -h
  • Complete command line parsing without implementation
  • Finish man pages, essentially the examples (perhaps the useful ones from jparse/json_util_README.md)
  • Send stdout & stderr to a log file for print_test during make test

TODOs not in the json_util_README.md writeup

  • Add -I stuff to jnamval

    See comment 1664716491.

  • Change -t type in jnamval to match the list of types used by jval

    See comment 1664729434.

  • Add -P type to jnamval

    See comment 1664733695.

  • Add -I stuff (identical to what jfmt uses) be added to jnamval

    See comment 1664799913.

  • Add a few JSON format styles, via -F fmt, for both jfmt and jnamval

    See comment 1664830588.

  • Update the json_util_README.md file for the -F option.

  • finalize jparse/json_util_README.md examples and fix typos

  • update jfmt(1), jval(1) and jnamval(1) mam pages to reference the GitHub URL for json_util_README.md

    See comment 1666669582.

  • Update -p usage message for jnamval

    See comment 1667145128.

  • Update exit codes and usage message exit code text

    See comment 1667176965.

  • Update json_util_README.md to match the -p updates

  • Add "ne" to the set of ops for -n ne=num and -S ne-str

    See comment 1670365375

  • Add -a to both jval and jnamval

    See comment 1670409906.

Finally

  • Update jparse/json_util_README.md as needed and as per the above TODOs in this section.

  • Add functions to turn on ANSI colour, turn off ANSI colour and to print out text with said colour. How to let users configure colours is not yet determined.

  • Agree the specifications are complete

  • Update struct json

    See the Implementation notes section of comment 1664179993.

  • Write JSON parse tree search and match facility.

  • Write jfmt

  • Write jval

  • Write jnamval

  • Check that tmp/TODO.md is complete

See json_util_README.md for information about the JSON utilities for jparse.

@xexyl
Copy link
Contributor

xexyl commented May 24, 2023

Please assign this to me and I will hopefully be able to look at it much more thoroughly tomorrow.

I am afraid that right now it is beyond my faculties. Been awake too long with too little sleep.

But hopefully tomorrow I will be clearer. I suspect that we will have to discuss some of this but we shall see.

@xexyl
Copy link
Contributor

xexyl commented May 25, 2023

There's a lot to process with this issue and right now I'm not awake enough yet but I do hope to be later on. I do want to reply to one part though:

We believe that once the "temp-test-ioccc" has put the JSON parser thru its paces via tools that will use the JSON parser code, then it will be a good time to form the jparse repo.

I like this a lot! Sounds good.

I'm going to take a break now (I think) and later on I'll look at reading this and then replying. I'm not sure how to best reply yet: it might be it's necessary to reply in parts just to help us gather our thoughts as it is a long comment.

@xexyl
Copy link
Contributor

xexyl commented May 25, 2023

With commit 2f9c0fb I created the initial C and header files and updated the Makefile to compile it. All it does is exit(0) for now but this way we don't have to worry about creating the files and updating the Makefile when it's time to start writing code.

I'm taking a break now and I hope to return to this issue later today ... hope you're sleeping well! I'm doubting that I'll be able to go back to sleep but I'm going to at least rest a bit.

Oh and please assign this to me when you get a chance.

@xexyl
Copy link
Contributor

xexyl commented May 25, 2023

I started to write the command line parser - or rather part of it - but I noticed we should come up with a usage message first. I actually did think of this but I felt like it might be developed whilst working on the command line options. Then we could worry about the next step (or is that NeXTSTEP ? :-) ) and only after that we could write the actual code.

I also think this will be a joint effort so I changed the comment to say this is being co-developed by both of us.

However since I see the part about coming up with the usage message first I have not committed. I might do just to get it out of the way but it would be incomplete.

At this point though I am too tired to fully process the possible options and some are not clear. But it might be that it's more clear once I've read the OP in full. To start out though (I'll make a new comment for additions) am I right in saying that:

  • -f specifies which type of data to print as a comma separated list
  • -j prints the json name:value pair (?)
  • after the file is specified the field to print (or is it fields?) should be specified
  • -J is json debug level
  • -q is quiet mode (what would that mean?)
  • -v is debug level
  • -V is version info
  • -h is help/usage

I'm not sure what the key option -k is for exactly versus the arg or args after the file name. I haven't looked at anything else but I hope this is a good start.

I'm afraid I probably won't get much more done here today but it depends on when you get back and also if I'm even awake .. finding myself having to sleep in the day lately and I leave pretty early too nowadays (as in shut the laptop down .. go to bed several hours later). Still I do hope to at least read the OP here and I also hope that we can get the discussion rolling!

@xexyl
Copy link
Contributor

xexyl commented May 25, 2023

With commit 1f36048 the standard option parsing is out of the way so that WE CAN NOW FOCUS ON THE TOOL ITSELF. Once we have discussed this thoroughly then we can start adding code for the tool!

@xexyl
Copy link
Contributor

xexyl commented May 25, 2023

I've not forgotten this but I think I will have to return to it tomorrow. I do hope that the above comments and the two commits are a good start. I agree that we should come up with a usage string and doing that might give us more ideas too. It was a pretty difficult night and an early morning and I need to do other things now.

There's a lot of good material in your comment and I want to give it the attention it deserves and I don't think I can do that today. Sorry but I think it'll be better that way and I guess you'd agree too. Even so I hope we can start the conversation on usage string.

@xexyl
Copy link
Contributor

xexyl commented May 25, 2023

Okay I had a chance to look at it quickly but not thoroughly. I believe that the below is enough to start the conversation but I'll have to continue tomorrow as I'll be leaving soon. I hope this is of help!

usage: jprint [-p] [-j] [-t width] [-k] [-p] [-Q] [-e] [-v] [-J] [-f type[,type,...]] [-c count_spec] [-C] [-V]
              [-L] [-l level]
    -j      print a JSON syntax values (what is a syntax value as such ?)
    -t      specify tab width
    -k      print key or value (?? how is it both? What is the idea here?)
    -p      print contents of file if valid
    -Q      print surrounding '"' in strings
    -e      print string as a JSON encoded string
    -q      quiet mode
    -v      verbosity level
    -J      JSON verbosity level
    -f      comma separated list of JSON data types to print
    -c      see man page for more details (for now)
    -C      compound (?? is this correct? What do you have in mind here?)
    -L      print depth/levels of data
    -l      only print data if at specified level (or levels ?)
    -V      print version and exit
    

-J conflicts with your idea but it might be better to choose something else for yours since this way the -J is consistent with what we have elsewhere?

You addressed my immediate concern when the same field is repeated but what about arrays? How will arrays be dealt with?

I put questions in the usage string too that might want to be thought about. Some options not listed by you were added but they might or might not be of value. If not they can be removed or modified or whatever is needed.

What do you think ?

@lcn2
Copy link
Contributor Author

lcn2 commented May 27, 2023

Mini holiday here .. we hope to return today sometime to answer questions.

@xexyl
Copy link
Contributor

xexyl commented May 27, 2023

Mini holiday here .. we hope to return today sometime to answer questions.

I should hope you'll be back! Hope the holiday is going well ... have fun and stay safe!

@lcn2
Copy link
Contributor Author

lcn2 commented May 27, 2023

Mini holiday here .. we hope to return today sometime to answer questions.

I should hope you'll be back! Hope the holiday is going well ... have fun and stay safe!

Sorry, we have extended the holiday weekend: traveled early. Will get back to these topics.

@xexyl
Copy link
Contributor

xexyl commented May 27, 2023

Mini holiday here .. we hope to return today sometime to answer questions.

I should hope you'll be back! Hope the holiday is going well ... have fun and stay safe!

Sorry, we have extended the holiday weekend: traveled early. Will get back to these topics.

No worries. I am dealing with some other problems and I am also very tired. I won't be able to do more here today anyway.

Best wishes.

@lcn2
Copy link
Contributor Author

lcn2 commented May 29, 2023

We are "clearing the decks" in preparation for an important temp-test-ioccc pull request. See temp-test-ioccc comment 1567576879 for details.

We do plan to address comment 1563367357 afterwards.

@xexyl
Copy link
Contributor

xexyl commented May 30, 2023

We are "clearing the decks" in preparation for an important temp-test-ioccc pull request. See temp-test-ioccc comment 1567576879 for details.

We do plan to address comment 1563367357 afterwards.

Good idea to do the merge conflict first! Having it done will help relieve some stress I am sure. I think it's a good idea how you're documenting it too so that we know how to do it in case it happens again. With that in mind maybe it should have been an issue as well? So that we could more easily access it. But I'll make a link to the pull request in the other comment in issue #5 so that we have an easier time to find it. I'm going to I hope reply to that comment now though it might take some time to do and I feel like I might have to take a break, like it or not, even before I finish it, but that's uncertain at this point: going to try reading it now and then replying though I might kind of do both at the same time as I often (usually maybe?) do.

@lcn2
Copy link
Contributor Author

lcn2 commented May 31, 2023

Now that the temp-test-ioccc pull request 811 has been resolved, we will address comment 1563367357 and related jprint command line use cases.

... after we go to sleep and wake up and do chores and .. perhaps Thursday or Friday.

@xexyl
Copy link
Contributor

xexyl commented May 31, 2023

Now that the temp-test-ioccc pull request 811 has been resolved, we will address comment 1563367357 and related jprint command line use cases.

... after we go to sleep and wake up and do chores and .. perhaps Thursday or Friday.

I'm glad you said: 'after we go to sleep ...' because it was my immediate thought when I saw the time!

I think Thursday or Friday is good too because I should be able to get the task done that I mentioned a number of times the past few days by then and with the exhaustion it's going to be slow times :( but we'll see.

Hope you're sleeping well my friend! I went to sleep a bit later and woke up earlier so it's probably going to be a difficult time in the sleepiness department later on but maybe I'll be lucky .. as Strider says to the hobbits: 'who can tell?'

@lcn2
Copy link
Contributor Author

lcn2 commented Jun 2, 2023

Regarding comment comment 1563367357: The top level comment has an updated command line as per a number of your suggestions.

Comments, questions and corrections are, of course, welcome.

@xexyl
Copy link
Contributor

xexyl commented Jun 2, 2023

Regarding comment comment 1563367357: The top level comment has an updated command line as per a number of your suggestions.

Comments, questions and corrections are, of course, welcome.

Looks great! My suggestions (which I started doing but am too tired to finish .. been awake since half past 1 and I hope to sleep after this or at least rest):

  • the list of options and explanations are so long so maybe it should just give the summary of each and refer to the to-be-written man page? I'm not sure: some commands do have longer output of help. ls in linux is a great example. Not sure: what do you think here? Certainly examples will be needed in the man page though...
  • Since the list will already be so long I suggest the exit codes are not in the help string but rather in the man page once it's been written.

I'm sure I have other comments but I'm too tired to think right now. That being said see commit 757b14e.

Now time for a rest whether I can truly rest or not I need to try.

UPDATE 0

Quickly .. I think that maybe we can match ls in the sense of longer help string. I'll do that: later on either today or tomorrow. I'm afraid it might be tomorrow. I do have another issue to bring up in the other repo but that will be today or tomorrow. I hope to make it today actually as it should be pretty easy to bring up .. once I have a bit more energy and remember the full details: I'm vague at this time which is why it's time to close the laptop and turn the lights off and try and rest.

Do you agree that the long usage string is the way to go? Should the mkiocccentry tool also have the full synopsis in the usage string? It now shows something like [options] but lists the options below in full: just because it's so long. The man page does have the full synopsis. Well we can talk about that later ... rest and hopefully sleep time.

UPDATE 1

Making progress on this but I have to try and rest again .. if I can't rest I'll do other things so that I am still on a break here. I do think and hope to finish this part today. Hopefully I manage.

Before I do go I want to make another comment I think is useful ...

@xexyl
Copy link
Contributor

xexyl commented Jun 2, 2023

See to TODO list in the top comment.

@xexyl
Copy link
Contributor

xexyl commented Jun 2, 2023

I'm kind of running into a problem with the usage string and the code. Not that the code is difficult but rather the formatting and the length of it. I think that because of how hard last night was I will delay working on the rest of the usage string for now. I expect that if I don't return to it today I will get to it tomorrow.

Perhaps I can get to other things here or the other repo today too but we'll see. I would like to believe this is likely but I am afraid it probably isn't. Hope you have a good day and I'll either get back to this today or if not then tomorrow.

@xexyl
Copy link
Contributor

xexyl commented Jun 2, 2023

As for ...

-g	grep-like extended regular expressions are used to match (def: name args are not regex)
		    To match from the name beginning, start name_arg with ^
		    To match to the name end, end name_arg with $
		    To match the entire name, enclose name_arg between ^ and $
		    Use of -g conflicts with -S

Do we want to make use of the regular expression library ? I mean if we're going to do the line boundaries should we go a bit further?

I am afraid that yes I am just too tired to do more today but I hope tomorrow I can finish the usage string and possibly come up with any additional thoughts in regards to it. I did notice some things in it that might not be entirely clear but I'm not sure if it was unclear to me or unclear full stop. I also fixed a typo (well something like it) but that's not in yet - it's in th pending changes too.

Good day!

UPDATE 0

Quickly .. did you write about the -S option listed in -g and I missed it or is it one you forgot to bring up? (I mean in the usage string.)

Leaving now.

UPDATE 1, 03 June 2023

You can ignore the question about the -S: I did indeed just miss it.

Also as for the grep idea see comment #523 (comment) for more thoughts about this.

@xexyl
Copy link
Contributor

xexyl commented Jun 3, 2023

With commit d2a42b6 the usage string is for this time complete:

--

New jprint version - 'complete' help string

After many, many, many (many, many, many!) :-) new inodes[0], the usage string is at this time complete: for certain values of complete :-) There are some things that are unclear to me but these will be ironed out with discussion on GitHub (not that it's physically on or IN GitHub :-) ) and then the usage string can be fixed.

Why a new version? Because it is significant even if there is no functional addition (usage string was always printed).

Note that not even the new options are accepted and none of the options accepted are actually parsed. This can come later once details are ironed out though I possibly will add them to the getopt() call and switch() cases - but not right now.

Required number of args is 2.

[0] To those who do not know every time you save a file it creates a new inode for the same file. If you don't understand why this is I suggest you look it up (as I'm not about to explain since I've been awake since stupid o'clock and it's still stupid o'clock and I want to try and rest soon).

--

I will discuss the issues that are unclear to me and any other thoughts I had later on .. today or tomorrow if not today: or so I hope.

Hope you're sleeping well. I'll be trying to rest soon .. I hope. Been awake since 1.

@xexyl
Copy link
Contributor

xexyl commented Jun 3, 2023

[0] To those who do not know every time you save a file it creates a new inode for the same file. If you don't understand why this is I suggest you look it up (as I'm not about to explain since I've been awake since stupid o'clock and it's still stupid o'clock and I want to try and rest soon).

Hope you're sleeping well. I'll be trying to rest soon .. I hope. Been awake since 1.

BTW if you don't know what stupid o'clock is: it's a funny Britishism for stupid (as in extremely :-) ) early or stupid late. Unfortunately in my case extremely early but at least I did go to sleep earlier (for this reason I feared). But it was a pretty awful night too (though not as awful as the previous night) so it'll probably be a difficult day though I hope I'm wrong there. Might be okay once I wake up though obviously it'll be long.

UPDATE 0

On the other hand it's been pretty productive even in the wee hours of the morning!

@xexyl
Copy link
Contributor

xexyl commented Jun 3, 2023

Current usage string btw (with several typo fixes though being so tired it's possible I have missed some):

$ ./jprint -h
usage: ./jprint [-h] [-V] [-v level] [-J level] [-e] [-Q] [-t type] [-q] [-j lvl] [-i count]
		[-N num] [-p {n,v,b}] [-b {t,number}] [-L] [-c] [-C] [-B] [-I {t,number}] [-j] [-E]
		[-I] [-S] [-g] file.json [name_arg ...]

	-h		Print help and exit
	-V		Print version and exit
	-v level	Verbosity level (def: 0)
	-J level	JSON verbosity level (def: 0)
	-e		Print JSON strings as encoded strings (def: decode JSON strings)
	-Q		Print JSON strings surrounded by double quotes (def: do not)
	-t type		Print only if JSON value matches one of the comma-separated
			types (def: simple):

				int		integer values
				float		floating point values
				exp		exponential notation values
				num		alias for int,float,exp
				bool		boolean values
				str		string values
				null		null values
				simple		alias for 'num,bool,str,null' (the default)
				object		JSON objects
				array		JSON array
				compound	alias for object,array
				any		any type of value


	-q		Quiet mode (def: print stuff to stdout)

	-l lvl		Print values at specific JSON levels (def: any level, '0:')
			If lvl is a number (e.g. '-l 3'), level must == number
			If lvl is a number followed by : (e.g. '-l 3:'), level must be >= number
			If lvl is a : followed by a number (e.g. '-l :3'), level must be <= number
			If lvl is num:num (e.g. '-l 3:5'), level must be inclusively in the range

	-i count	Print up to count matches (def: print all matches)
			If count is a number (e.g. '-i 3'), the matches must == number
			If count is a number followed by : (e.g. '-i 3:'), matches must be >= number
			If count is a : followed by a number (e.g. '-i :3'), matches must be <= number
			If count is num:num (e.g. '-i 3:5'), matches must be inclusively in the range
			NOTE: when number < 0 it refers to the instance from last: -1 is last, -2 second to last ...

	-N num		Print only if there are only a given number of matches (def: do not limit)
			If num is only a number (e.g. '-l 1'), there must be only that many matches
			If num is a number followed by : (e.g. '-l 3:'), there must >= num matches
			If num is a : followed by a number (e.g. '-i :3'), there must <= num matches
			If num is num:num (e.g. '-i 3:5'), the number of matches must be inclusively in the range

	-p {n,v,b}	print JSON key, value or both (def: print JSON values)
			if the type of value does not match the -t type specification,
			then the key, value or both are not printed.
	-p name		Alias for '-p n'.
	-p value	Alias for '-p v'.
	-p both		Alias '-p v'.

	-b {t,number}	print between name and value (def: 1)
			print a tab or spaces (i.e. '-b 4') between the name and value.
			Use of -b {t,number} without -j or -p b has no effect.
	-b tab		Alias for '-b t'.

	-L		Print JSON levels, followed by tab (def: do not print levels).
			The root (top) of the JSON document is defined as level 0.

	-c		When printing -j both, separate name/value by a : (colon) (def: do not)
			NOTE: When -C is used with -b {t,number}, the same number of spaces or tabs
			separate the name from the : (colon) AND a number of spaces or tabs
			and separate : (colon) from the value by the same.

	-C		When printing JSON syntax, always print a comma after final line (def: do not).
			Use of -C without -j has no effect.

	-B		When printing JSON syntax, start with a { line and end with a } line
			Use of -B without -j has no effect.

	-I {t,number}	When printing JSON syntax, indent levels (i.e. '-I 4') (def: do not indent i.e. '-I 0')
			Indent levels by tab or spaces (i.e. '-t 4').
			Use of -I {t,number} without -j has no effect.
	-I tab		Alias for '-I t'.

	-j		Print using JSON syntax (def: do not).
			Implies '-p b -b 1 -c -e -Q -I 4 -t any'.
			Subsequent use of -b {t,number} changes the printing between JSON tokens.
			Subsequent use of -I {t,number} changes how JSON is indented.
			Subsequent use of -t type will change which JSON values are printed.
			Use of -j conflicts with use of '-p {n,v}'.

	-E		Match the JSON encoded name (def: match the JSON decoded name).
	-I		Ignore case of name (def: case matters).
	-S		Substrings are used to match (def: the full name must match).
	-g		grep-like extended regular expressions are used to match (def: name args are not regexps).
			To match from the name beginning, start name_arg with '^'.
			To match to the name end, end name_arg with '$'.
			To match the entire name, enclose name_arg between '^' and '$'.
			Use of -g conflicts with -S.

	file.json	JSON file to parse
	name_arg	JSON element to print
jprint version: 0.0.1 2023-06-03

but as I said there are some things I'm unclear about. I'll get back to you later on that. I'm going to see about resting very soon or so that's my current plan. I don't know if I can do the getopt() update today or tomorrow but I hope if not today then indeed tomorrow. However it'll only be the getopt() and switch() update as the details of the options have to be ironed out for sure. Well maybe some don't and those might be added. I might also check that the file exists and is a regular file but that's another matter entirely. I won't yet open it: just check for it being a regular file. But that can be done at a later point today or in the coming days.

@xexyl
Copy link
Contributor

xexyl commented Jun 3, 2023

With commit 79734f1 in pull request #531 I have made what I feel to be tremendous progress and what will allow us to work on the parsing of options and then immediately add code (once the details are discussed) to search for matches! I forgot in the log that I updated the jparse/README.md and the CHANGES.md file but the commit log is:

--

New jprint version - test and report valid JSON

New number of required args is 1: the file to check. It was an error to
require a pattern to match according to the spec and usage string as
well.

The jprint tool now checks that the first arg exists as a file, is a
regular file and that it can be opened for reading. If any of these are
not true it is an error. Otherwise the file is checked for valid JSON.

Once file is parsed (or we attempt to parse) as JSON the file is closed
and the file pointer is set to NULL. If invalid JSON print invalid and
exit. If valid it depends on whether another arg is specified or not. If
another arg is specified we will show the pattern requested (this will
not be done later on) and exit 0; otherwise we will print that no
pattern is requested (this will also not be done later) and exit 1.

Besides the initial parsing of options (to the extent that we have
details) we can do no more here until more details are ironed out.

TODO discuss option details and parsing, add code to parse and then
refine anything necessary. After this we can write the rest of the code,
the pattern matching. We also need to decide if more than one name_arg
can be specified. I'm mixed on this one: with grep -E one can do this so
I think it would be wise to do this but this will be determined at a
later date.

--

We can add more code once the details are discussed but now the only thing to be done is parsing of options and then processing the options and the tree if valid!

@xexyl
Copy link
Contributor

xexyl commented Jun 3, 2023

QUESTION about name_arg: do we want to allow more than one ?

I personally am in favour of this as it is kind of like grep and with grep -E this is possible. We could have a loop (for example) that does the same checks on each pattern.

Do we want way to specify that ALL patterns have to be matched in the requested restrictions ?

Although the latter would complicate the tool it would make it more useful still I think.

What do you think ?

I'm going to do other things now and in particular I'm going to try and rest. I hope to later today discuss the usage string more but if not I feel great about what I did! Thus if I don't do anything else I think that's more than fine.

UPDATE 0

I see that it is supposed to allow more than one pattern but what about the additional feature I mentioned, an AND operator?

@xexyl
Copy link
Contributor

xexyl commented Jun 3, 2023

As for ...

-g	grep-like extended regular expressions are used to match (def: name args are not regex)
		    To match from the name beginning, start name_arg with ^
		    To match to the name end, end name_arg with $
		    To match the entire name, enclose name_arg between ^ and $
		    Use of -g conflicts with -S

Do we want to make use of the regular expression library ? I mean if we're going to do the line boundaries should we go a bit further?

An argument I thought for doing this is that it won't require (I think) any external libs and it might simplify the pattern matching code as well as allowing for the word boundaries that we are going to support and without any special code.

We also could get rid of both -g and -S.

Thoughts ?

@lcn2
Copy link
Contributor Author

lcn2 commented Aug 8, 2023

Environmental variables

macOS ls(1) has an environmental variable $LSCOLORS. It looks something like this (a copy of mine): DxFxBxDxCxegedabagacad where each position is a colour for a different thing. This is not only not descriptive but it puts a real burden on the user. I think that's a good reason to not use this.

Make use of environment variables:

  • $JFMT_COLOR
  • $JVAL_COLOR
  • $JNAMVAL_COLOR

However, have the content of those environment variables be a complete JSON document. For example:

[
    {
        "mode" : "dark" ,
        "boolean" : "cyan",
        "string" : "red"
    }
]

The JSON document in the environment variable would consist of a JSON array containing a single JSON object containing zero or more JSON members of the form "name" : "value".

In absence of such an environment variable, a default color scheme would be used.

If the environment variable contained invalid JSON, then the contents of the environment variable would be silently ignored, unless the -v level mode was ">=1" in which case a debug message about the invalid JSON would be printed.

If a certain JSON member were missing from the environment variable, such as if it were missing { "number" : "green" }, then the default for the given item would be used.

If the JSON document in the environment variable was valid, and were to contain an unexpected JSON tree structure, then that other JSON information would be silently ignored, unless if the -v level mode was ">=1" in which case a debug message about what is being ignored would be printed.

If the JSON document in the environment variable was valid, and was in the proper JSON tree structure (a JSON array containing a single JSON object containing zero or more JSON members of the form "name" : "value") and one of the "name" or "value" contained an unexpected JSON string contents, then that unexpected JSON string contents would be silently ignored, unless if the -v level mode was ">=1" in which case a debug message about what is being ignored would be printed.

If all is well with the JSON document in the environment variable and if the -v level was ">=1", then an informative message debug message would be printed.

If the -v level was ">=3", then an informative messages about what color scheme values used would be printed.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

Environmental variables
macOS ls(1) has an environmental variable $LSCOLORS. It looks something like this (a copy of mine): DxFxBxDxCxegedabagacad where each position is a colour for a different thing. This is not only not descriptive but it puts a real burden on the user. I think that's a good reason to not use this.

Make use of environment variables:

  • $JFMT_COLOR
  • $JVAL_COLOR
  • $JNAMVAL_COLOR

However, have the content of those environment variables be a complete JSON document. For example:

[
    {
        "mode" : "dark" ,
        "boolean" : "cyan",
        "string" : "red"
    }
]

The JSON document in the environment variable would consist of a JSON array containing a single JSON object containing zero or more JSON members of the form "name" : "value".

In absence of such an environment variable, a default color scheme would be used.

If the environment variable contained invalid JSON, then the contents of the environment variable would be silently ignored, unless the -v level mode was ">=1" in which case a debug message about the invalid JSON would be printed.

If a certain JSON member were missing from the environment variable, such as if it were missing { "number" : "green" }, then the default for the given item would be used.

If the JSON document in the environment variable was valid and were to contain unexpected JSON content, then that other JSON information would be silently ignored, unless the -v level mode was ">=1" in which case a debug message about what is being ignored would be printed.

If the JSON document in the environment variable was valid and was in the proper structure (a JSON array containing a single JSON object containing zero or more JSON members of the form "name" : "value") however one of the "name" or "value" contained an unexpected JSON string contents, then that unexpected JSON string contents would be silently ignored, unless the -v level mode was ">=1" in which case a debug message about what is being ignored would be printed.

If all is well with the JSON document in the environment variable and the -v level was ">=1", then an informative message debug message would be printed.

If the -v level was ">=3", then an informative messages about what color scheme values used would be printed.

I will address this comment in either a bit or later on today. I have an important change (or so I believe) to commit .. prepared it just have to make a new branch and make the commit.

I might rest after that. The change is that the json_number and some other struct json_s now have a struct json *node which allows for freeing the node if necessary. This is in particular used for the -S and -n options of jval and jnamval. There might be something I'm missing though? I'll let you decide and I can make corrections. I'll open a pull request soon and then I will indeed try and rest. When I'm back I hope to reply to the comment about colours.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

See commit 521dfe0 for an important change that allows one to use the parse_json_ functions outside of a document. This is important because the -S and -n options of jval and jnamval will need (and already do parse it) this to verify the number and string can be converted properly and also depending on other options it will need different variables. An example is if we're matching encoded or decoded strings.

Did I miss a struct that needs it? Can you think of a better way to go about it? I'm going to rest now.

@lcn2
Copy link
Contributor Author

lcn2 commented Aug 8, 2023

The change is that the json_number and some other struct json_s now have a struct json *node which allows for freeing the node if necessary. This is in particular used for the -S and -n options of jval and jnamval. There might be something I'm missing though?

But why does the parsing of -S and -n options of jval and jnamval care about json_number and some other struct json_s, let alone need to create one of those structures that needs to be freed later?

The parsing of -S op=str should save op as an enum and save str as a string, only. The parsing of -n op=num should save save op as an enum, then save num as a string (it will need num as a string for diagnostic purposes), then use utility functions such as:

extern bool string_to_intmax(char const *str, intmax_t *ret);
extern bool string_to_uintmax(char const *str, uintmax_t *ret);
extern bool is_decimal(char const *ptr, size_t len);
extern bool is_decimal_str(char const *str, size_t *retlen);
extern bool is_floating_notation(char const *ptr, size_t len);
extern bool is_floating_notation_str(char const *str, size_t *retlen);
extern bool is_e_notation(char const *str, size_t len);
extern bool is_e_notation_str(char const *str, size_t *retlen);

To determine the type of number contained in num and convert it into a machine type such as "intmax" or "uninmax" (in the case of decimal) or "long double" in the other case.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

The change is that the json_number and some other struct json_s now have a struct json *node which allows for freeing the node if necessary. This is in particular used for the -S and -n options of jval and jnamval. There might be something I'm missing though?

But why does the parsing of -S and -n options of jval and jnamval care about json_number and some other struct json_s, let alone need to create one of those structures that needs to be freed later?

So that it can verify that it can be converted properly and to set up the correct strings etc. (think of matching against decoded or encoded strings).

The parsing of -S op=str should save op as an enum and save str as a string, only. The parsing of -n op=num should save save op as an enum, then save num as a string (it will need num as a string for diagnostic purposes), then use utility functions such as:

This happens already but the function allows for verifying that the number can be converted properly. Remember that there is even an exit code if this fails.

extern bool string_to_intmax(char const *str, intmax_t *ret);
extern bool string_to_uintmax(char const *str, uintmax_t *ret);
extern bool is_decimal(char const *ptr, size_t len);
extern bool is_decimal_str(char const *str, size_t *retlen);
extern bool is_floating_notation(char const *ptr, size_t len);
extern bool is_floating_notation_str(char const *str, size_t *retlen);
extern bool is_e_notation(char const *str, size_t len);
extern bool is_e_notation_str(char const *str, size_t *retlen);

To determine the type of number contained in num and convert it into a machine type such as "intmax" or "uninmax" (in the case of decimal) or "long double" in the other case.

The single parse_json_number() makes it much easier and sets up the proper booleans and other variables to use for later on. It also will make it easier to print later on, possibly.

Doing it this way simplifies parsing greatly. I also think having the node will be useful anyway though maybe you can think of a better way to go about it. If so I'll fix it.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

The change is that the json_number and some other struct json_s now have a struct json *node which allows for freeing the node if necessary. This is in particular used for the -S and -n options of jval and jnamval. There might be something I'm missing though?

But why does the parsing of -S and -n options of jval and jnamval care about json_number and some other struct json_s, let alone need to create one of those structures that needs to be freed later?

So that it can verify that it can be converted properly and to set up the correct strings etc. (think of matching against decoded or encoded strings).

The parsing of -S op=str should save op as an enum and save str as a string, only. The parsing of -n op=num should save save op as an enum, then save num as a string (it will need num as a string for diagnostic purposes), then use utility functions such as:

This happens already but the function allows for verifying that the number can be converted properly. Remember that there is even an exit code if this fails.

extern bool string_to_intmax(char const *str, intmax_t *ret);
extern bool string_to_uintmax(char const *str, uintmax_t *ret);
extern bool is_decimal(char const *ptr, size_t len);
extern bool is_decimal_str(char const *str, size_t *retlen);
extern bool is_floating_notation(char const *ptr, size_t len);
extern bool is_floating_notation_str(char const *str, size_t *retlen);
extern bool is_e_notation(char const *str, size_t len);
extern bool is_e_notation_str(char const *str, size_t *retlen);

To determine the type of number contained in num and convert it into a machine type such as "intmax" or "uninmax" (in the case of decimal) or "long double" in the other case.

The single parse_json_number() makes it much easier and sets up the proper booleans and other variables to use for later on. It also will make it easier to print later on, possibly.

Doing it this way simplifies parsing greatly. I also think having the node will be useful anyway though maybe you can think of a better way to go about it. If so I'll fix it.

After all these are supposed to be proper JSON values, right?

@lcn2
Copy link
Contributor Author

lcn2 commented Aug 8, 2023

After all these are supposed to be proper JSON values, right?

As far as the command line parser is concerned, initially -S op=str, the str is a just string.

As far as the command line parser is concerned, initially -n op=num, the num is a just string.

Later on, there will be a phase, after command line parsing, where the search will be setup. The "search setup phase" will have to handle pairs of ops that form range searches. During that search setup phase, the num as a string can be validated as a convertible number and turned into a maximal sized machine value that will later be used by search functions for numeric value comparisons.

UPDATE 0a

BTW: This is why there is a special exit value for when someone does -n eq=1e1000000000. This is not treated as a command line error by the command line parser. It is valid command line syntax even those the num as a string cannot be converted into a machine value. Instead, during later the "search setup phase", if the num as a string cannot be converted into a maximal sized machine value, the special exit 7 code is used.

The details of how this next phase beyond the command line parsing scope. For the curious, there will be a simple structure containing a maximum signed integer, a maximum unsigned integer and a maximum floating point value: all maximal sized machine values.

Again, this is well into the JSON search implementation, and we are still trying to finalize what the command line should look like and then the man pages need to be written before implemtation starts. We can say with certainty, that when it comes to doing numerical comparisons, a maximal sized machine value will be compared to the appropriate element in struct json_number.

In the "search setup phase", we won't be comparing two struct json_numbers, nor will be be creating JSON nodes, nor calling parse_json_number().

UPDATE 1

Sorry that you went into what you thought was part of the command line parser with that rejected pull request.

Create a dynamic array of structures for the -S op=str and the -n op=num found by the command line parser such as:

struct search_op {
    enum search_type type;  /* STRING_SEARCH or NUMBER_SEARCH */
    enum op_type op;  /* OP_EQ, OP_LT, OP_LE, OP_GT, OP_GE */
    char *val;     /* str or num as a string */
};

As the command line parser finds a -S op=str or -n op=num, validate the syntax (op is one of the 5, followed by "=", following by a string that is possibly an empty string), create the struct search_op and add it to the dynamic array.

Once the command line parser is finished, the resulting dynamic array will contain, in command line order, the information needed by the "search setup phase" to turn that dynamic array into what the JSON search will need.

UPDATE 2

Sorry that you went into an area that you thought was part of the command line parser and we rejected the pull request.

The command line parser just is to fill out a structure that indicates what was found on the command line. Some minimal syntax checking can be done, just do not get too far into code that performs actions.

Start with some command line structure such as, just as an example so you can see the direction we are suggesting:

struct cmdline {
   ... stuff ...
   char *filename;  /* the json filename arg */
   int arg_count;   /* number of arg values found */
   chat **args;     /* argv[X] pointer where args start */
};

Have that structure contains booleans indicating when a given option was seen, such as:

    bool have_d;   /* true ==> -d was seen */
    bool have_cap_D;  /* true ==> -D was seen */

For more complex things such as the -S op=str and -n op=num, have:

    bool have_cap_S;  /* true ==> -S op=str was seen */
    bool have_n;     /* true ==> -n op=num was seen */
    struct dyn_array *search_op_set;  /* -S op=str and -n op=num info */

Or for example, for -t type you could do:

    bool have_t;  /* true ==> -t type was seen */
    struct dyn_array *t_type; /* comma separated -t types in dynamic array form */

For the similar -r type, you could use:

    bool have_r;  /* true ==> -r type seen */
    struct dyn_array *r_type; /* comma separated -r types in dynamic array form */

Where t_type is a pointer to a dynamic array of:

enum jcmdtype {
   J_UNSET=0,  /* not a valid -t type */
   J_INT,      /* int type requested */
   J_FLOAT,    /* float type requested */
   J_EXP,      /* exp type requested */
   J_NUM,      /* num type requested */
   J_STR,      /* str type requested */
   J_NULL,     /* null type requested */
   J_MEMBER,   /* member type requested */
   J_OBJECT,   /* object type requested */
   J_ARRAY,    /* array type requested */
   J_COMPOUND, /* compound type requested */
   J_SIMPLE,   /* simple type requested */
   J_ANY,      /* any type requested */
};

We do not recommend reusing the "JTYPE_STUFF" here. Let the functions that implement search actions do the mapping between the enum jcmdtype and the "JTYPE_STUFF".

We don't even recommend converting aliases here. Let there be a J_NUM, J_SIMPLE, J_COMPOUD. Let the functions implement search actions do the alias interpretations.

You could have a single enum jcmdtype if you wish,b it you don;t have to do so. You could have one enum ttype and a larger enum rtype:

enum ttype {
   J_UNSET=0,  /* not a valid -t type */
   J_INT,      /* int type requested */
   J_FLOAT,    /* float type requested */
   J_EXP,      /* exp type requested */
   J_NUM,      /* num type requested */
   J_STR,      /* str type requested */
   J_NULL,     /* null type requested */
   J_SIMPLE,   /* simple type requested */
};
enum rtype {
   J_UNSET=0,  /* not a valid -t type */
   J_INT,      /* int type requested */
   J_FLOAT,    /* float type requested */
   J_EXP,      /* exp type requested */
   J_NUM,      /* num type requested */
   J_STR,      /* str type requested */
   J_NULL,     /* null type requested */
   J_MEMBER,   /* member type requested */
   J_OBJECT,   /* object type requested */
   J_ARRAY,    /* array type requested */
   J_COMPOUND, /* compound type requested */
   J_SIMPLE,   /* simple type requested */
   J_ANY,      /* any type requested */
};

Before going into the getopt loop, initialize the struct cmdline to defaults have the have_foo set to false, and the search_op_set initialized to an empty dynamic array.

Then as the getopt loop parses the command line, set the appropriate values in struct cmdline.

After the getopt loop, set the filename, arg_count, and args elements according to what is left on the command line. Check that the required file.json was set. Check that the arg_count is OK.

Next, look for option violations where a some true have_x conflicts with some have_y (we are making up -x and -y as fake conflicting options just to serve as an example).

You can now check the dynamic arrays of jcmdtype for improper values set. For example, if you find a "J_ARRAY" in the t_type dynamic array, raise a command line error because -t array is not allowed.

The list of have_foo booleans in struct cmdline should be limited to what getopt(3) looks for.

At this point the command line has been parsed. At this point all checks for conflicting options have been made. At this point your struct cmdline contains everything that later functions need to perform actions.

A pointer to the struct cmdline may then be passed to functions that implement stuff, such as "search function setup", the function to invoke the JSON parser, and the "search JSON tree", and "print stuff".

UPDATE 3

You can use a single struct cmdline for all 3 utilities if you wish.

The 3 utilities will initialize the struct cmdline differently, but the same getopt parsing loop can be used for all 3 utilities if you wish.

After the getopt loop is done, a given utility can look for options that do not apply. For example, if jfmt finds that have_n is true (there is no -n for jfmt), a command line error can be raised and the command can exit.

Again, the list of have_foo booleans in struct cmdline should be limited to what getopt(3) looks for. They can be the union of all the options used by all 3 commands if you wish.

You can have the command line parser check the basename of argv[0] and set a enum according to which utility is being executed. For example, struct cmdline could have:

    enum cmd jcmd;  /* CMD_JFMT, CMD_JVAL, CMD_NAMVAL */

This way much of the command line parsing can be common. Moreover, it is easy to add or remove a given option from a given command by adjusting the post getopt checking of the struct cmdline for the given utility. Moreover, the command lines for the 3 utilities can be kept in "sync" while allowing for the individual differences to be expressed.

UPDATE 4

Have the command line parser focus on just parsing the comma and line, checking for unexpected options (via a getopts(3) error or for some have_g being true when the -g does not apply to the given command), checking for required args and options args.

This will allow the command line parser to pass the pointer to the struct cmdline to functions that will do the work of implementing what was on the command line.

It is important that the command line parser just parse the command line, fill out a structure, and leave the implementation details to other functions.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

After all these are supposed to be proper JSON values, right?

As far as the command line parser is concerned, initially -S op=str, the str is a just string.

As far as the command line parser is concerned, initially -n op=num, the num is a just string.

So do you want me to just strdup() the optarg into a simple list?

Later on, there will be a phase, after command line parsing, where the search will be setup. The "search setup phase" will have to handle pairs of ops that form range searches. During that search setup phase, the num as a string can be validated as a convertible number and turned into a maximal sized machine value that will later be used by search functions for numeric value comparisons.

I would have thought that validation of command line failing is part of parsing. I guess I misunderstood.

UPDATE 0a

BTW: This is why there is a special exit value for when someone does -n eq=1e1000000000. This is not treated as a command line error by the command line parser. It is valid command line syntax even those the num as a string cannot be converted into a machine value. Instead, during later the "search setup phase", if the num as a string cannot be converted into a maximal sized machine value, the special exit 7 code is used.

As above I thought this was part of command line parsing.

The details of how this next phase beyond the command line parsing scope. For the curious, there will be a simple structure containing a maximum signed integer, a maximum unsigned integer and a maximum floating point value: all maximal sized machine values.

Okay.

Again, this is well into the JSON search implementation, and we are still trying to finalize what the command line should look like and then the man pages need to be written before implemtation starts. We can say with certainty, that when it comes to doing numerical comparisons, a maximal sized machine value will be compared to the appropriate element in struct json_number.

Sure.

In the "search setup phase", we won't be comparing two struct json_numbers, nor will be be creating JSON nodes, nor calling parse_json_number().

Okay.

UPDATE 1

Sorry that you went into what you thought was part of the command line parser with that rejected pull request.

It was yes.

Create a dynamic array of structures for the -S op=str and the -n op=num found by the command line parser such as:

struct search_op {
    enum search_type type;  /* STRING_SEARCH or NUMBER_SEARCH */
    enum op_type op;  /* OP_EQ, OP_LT, OP_LE, OP_GT, OP_GE */
    char *val;     /* str or num as a string */
};

Well as I said before I'm not yet familiar with dynamic arrays truly - not the facility we have. I figured I would become more familiar with it later on.

As the command line parser finds a -S op=str or -n op=num, validate the syntax (op is one of the 5, followed by "=", following by a string that is possibly an empty string), create the struct search_op and add it to the dynamic array.

This is done except that it tried to parse the number or string as that sounds like command line parsing to me. I can change it to not do that.

Once the command line parser is finished, the resulting dynamic array will contain, in command line order, the information needed by the "search setup phase" to turn that dynamic array into what the JSON search will need.

I'm not sure what order I put it in but I can change it to append to the list if necessary.

UPDATE 2

Sorry that you went into an area that you thought was part of the command line parser and we rejected the pull request.

Yes that's what I thought indeed.

The command line parser just is to fill out a structure that indicates what was found on the command line. Some minimal syntax checking can be done, just do not get too far into code that performs actions.

Okay.

Start with some command line structure such as, just as an example so you can see the direction we are suggesting:

struct cmdline {
   ... stuff ...
   char *filename;  /* the json filename arg */
   int arg_count;   /* number of arg values found */
   chat **args;     /* argv[X] pointer where args start */
};

I already have something like that except that it sets up the booleans, parses args etc.

Have that structure contains booleans indicating when a given option was seen, such as:

    bool have_d;   /* true ==> -d was seen */
    bool have_cap_D;  /* true ==> -D was seen */

Already done.

For more complex things such as the -S op=str and -n op=num, have:

    bool have_cap_S;  /* true ==> -S op=str was seen */
    bool have_n;     /* true ==> -n op=num was seen */
    struct dyn_array *search_op_set;  /* -S op=str and -n op=num info */

See above. There is more than one struct. See below.

Or for example, for -t type you could do:

    bool have_t;  /* true ==> -t type was seen */
    enum jcmdtype jtype; /* J_INT, J_FLAOT, J_EXP, J_STR, J_BOOL, ... */

Done.

We do not recommend reusing the "JTYPE_STUFF" here. Let the functions that implement search actions do the mapping between the enum jcmdtype and the "JTYPE_STUFF".

I don't reuse the JTYPE_.

We don't even recommend converting aliases here. Let there be a J_NUM, J_SIMPLE, J_COMPLEX. Let the functions implement search actions do the alias interpretations.

I don't understand what you mean by converting aliases here.

Before going into the getopt loop, initialize the struct cmdline to defaults have the have_foo set to false, and the search_op_set initialized to an empty dynamic array.

Already done. But each tool has a struct. For one like jfmt it just contains the common things in all the structs. Meaning there's a common struct. The other tools also have this struct. For the other two those that are shared are in a common struct but not the one for all three. Then for jnamval which has things other tools do not it's in that struct.

This way everything is encapsulated well and is modular.

Then as the getopt loop parses the command line, set the appropriate values in struct cmdline.

This is done.

After the getopt loop, set the filename, arg_count, and args elements according to what is left on the command line. Check that the required file.json was set. Check that the arg_count is OK.

This is done except that I also validate that the file can be opened and if it's valid JSON. Why? Because that already existed in jprint and it seemed like a waste to not just put it in. Certainly it won't just print it but it's already in a buffer and tree.

Next, look for option violations where a some true have_x conflicts with some have_y (we are making up -x and -y as fake conflicting options just to serve as an example).

Already done.

The list of have_foo booleans in struct cmdline should be limited to what getopt(3) looks for.

I think it does. But again it's not called cmdline. I think that would be a mistake as it doesn't indicate what it's a command line of. It might even be for jparse.

At this point the command line has been parsed. At this point all checks for conflicting options have been made. At this point your struct cmdline contains everything that later functions need to perform actions.

It seems like the only thing I have to do is change the -S and -n to for now have the booleans (already done) and the string. That can easily be done.

A pointer to the struct cmdline may then be passed to functions that implement stuff, such as "search function setup", the function to invoke the JSON parser, and the "search JSON tree", and "print stuff".

Already done.

UPDATE 3

You can use a single struct cmdline for all 3 utilities if you wish.

I don't think using a single one for all three is a good idea because there are some booleans that are not in all three tools. See above for how I did it.

The 3 utilities will initialize the struct cmdline differently, but the same getopt parsing loop can be used for all 3 utilities if you wish.

See above.

After the getopt loop is done, a given utility can look for options that do not apply. For example, if jfmt finds that have_n is true (there is no -n for jfmt), a command line error can be raised and the command can exit.

This isn't necessary because they don't exist.

Again, the list of have_foo booleans in struct cmdline should be limited to what getopt(3) looks for. They can be the union of all the options used by all 3 commands if you wish.

I don't as I'd rather only common options be shared.

You can have the command line parser check the basename of argv[0] and set a enum according to which utility is being executed. For example, struct cmdline could have:

    enum cmd jcmd;  /* CMD_JFMT, CMD_JVAL, CMD_NAMVAL */

This way much of the command line parsing can be common. Moreover, it is easy to add or remove a given option from a given command by adjusting the post getopt checking of the struct cmdline for the given utility. Moreover, the command lines for the 3 utilities can be kept in "sync" while allowing for the individual differences to be expressed.

Much of it is common in so far as booleans. But the rest maybe not so much. I couldn't think of a file name to use in some cases so I used json_util and tool_util and tool_test.

UPDATE 4

Have the command line parser focus on just parsing the comma and line, checking for unexpected options (via a getopts(3) error or for some have_g being true when the -g does not apply to the given command), checking for required args and options args.

This is done. Well sanity checks are done after it is done. Which as far as I was aware should have been done. But no need to remove it even if one could call it an implementation. Some of it would be done the same anyway. In my head validating options is part of the command line parsing. That isn't part of implementation. To me implementation refers to the tool itself. If that was in error I am sorry but that's how I interpreted it. I don't see the need to remove it though. Checking booleans that cannot be used together can't be done many ways after all.

This will allow the command line parser to pass the pointer to the struct cmdline to functions that will do the work of implementing what was on the command line.

This is done.

It is important that the command line parser just parse the command line, fill out a structure, and leave the implementation details to other functions.

Well it does this but as I said I interpreted 'implementation' as of the tool so the validation of the command line is part of the parsing in my head. Maybe that was a misinterpretation.

Now can it be cleaned up? Maybe. Some messages can be changed too. But the options are parsed and the validation of the options and the existence of the file and valid JSON (see above why this is and see below too) is done.

To me parsing command line is validating it too. The implementation is the features of the tools. That's not done. The file being validated is also not an implementation but it was already there anyway and it can only be done one way anyway so why not have it there?

Now if some validation steps have to be changed that's fine but again I interpreted parsing the command line as parsing it and making sure it's sane.

Sorry for the misunderstanding.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

Per the above comments I think I might have to redo a lot of things, reading it again. The -t as far as I am concerned does not need to be an array. It's a bitvector so one can just do a quick test, with a function, whether or not to search or print it, rather than go through an array and see if it's valid to to search for or print. I would think that going through an array would be slower and less efficient.

Maybe you had other ideas but I wish I had known that prior to starting parsing the command line. Maybe some of the details were not finished but I thought that the document was in a state where I was supposed to parse and as noted above I thought you meant implementation of the tool by the word implementation.

It seems that this was a mistake and I might have to start over for some things. That's very unfortunate. Maybe I don't. I don't know. Anyway I can't deal with it today. Sorry. I have to go for now. Maybe I can discuss some things later on but I've been very exhausted lately and I need a break. Almost every single day I'm falling asleep later on in the day. I just can't stay awake. This is unfortunate but I think that today is a good day for a break since there clearly was a miscommunication that might necessitate doing some things over. I don't want to start until there's a clearer consensus and I also am too tired to work on the above comment of yours. I'm also not sure what needs to change. I'll have to look at what I have.

Maybe I should wipe everything except the structs and the getopt() loop, perhaps saving the other parts (somewhere else) that will be the same like parsing the json file. I don't know. As you can see I'm unsure about things here due in part to the miscommunication and due in part to being wiped out lately so it's better that I don't bother with it today. Off to do other things. Sorry for the misunderstanding and delays.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

Although I doubt I will get anything done today here I wanted you to know that I have been giving it some thought too.

I don't think as much will have to change as I might fear but certainly some things will have to change.

It will be helpful when I look at your comment again with a clearer head but that's not going to happen today. I hope it will be tomorrow.

I think a mixture of what I have and what you have said might be useful but we shall see I suppose.

I believe that I will start out by reading what you wrote and then I will make sure that there is no misunderstanding again and only after that I can do things. I do not want to have the problem again.

I am afraid that this might take some days and I am sorry for the original misunderstanding and the delay caused by it.

To me when I hear parse options I also think of validation. Of course since some of it was already there anyway and since checking for invalid option combinations is done only one way I didn't see it as a problem either. Same with the verifying the file. It was already there after all.

Anyway I hope you have a great day and I should think that I will be able to read and reply tomorrow.

Good day.

@xexyl
Copy link
Contributor

xexyl commented Aug 8, 2023

One thing that I can ask now that I thought of that I don't want to forget.

You suggested that the -t option add to a list. Is there a reason it has to be ordered? Is it that it's supposed to search more than once printing them in the order specified?

If so I can see why a list might be useful but if not a bitvector is a simple way to go about it. Just loop through the comma separated words and add to the bits. Return the bits at the end if all words are valid. Then when traversing the tree check the bits with the helper functions. Very simple.

But is there a reason to have it ordered like you said? It would seem to me to be more complicated than it has to be but if there's some reason that I am unaware of please advise and perhaps update the readme file.

Thanks. Will look back tomorrow.

@lcn2
Copy link
Contributor Author

lcn2 commented Aug 8, 2023

One thing that I can ask now that I thought of that I don't want to forget.

You suggested that the -t option add to a list. Is there a reason it has to be ordered? Is it that it's supposed to search more than once printing them in the order specified?

If so I can see why a list might be useful but if not a bitvector is a simple way to go about it. Just loop through the comma separated words and add to the bits. Return the bits at the end if all words are valid. Then when traversing the tree check the bits with the helper functions. Very simple.

But is there a reason to have it ordered like you said? It would seem to me to be more complicated than it has to be but if there's some reason that I am unaware of please advise and perhaps update the readme file.

Thanks. Will look back tomorrow.

We have been working on the -S op=str and -n op=num meaning and are about to make an addition to the set.

And yes, the list will be ordered as you will see later.

@lcn2
Copy link
Contributor Author

lcn2 commented Aug 8, 2023

Add missing ne op

We should add a 6th op of "ne" to the list of 5 ("eq", "lt", "le", "gt", and "ge"). So these will be valid:

... -n ne=num
... -S ne=str

UPDATE 0

Updated the TODO list.

p.s. There is another changing pending that will show up in a new comment soon.

@lcn2
Copy link
Contributor Author

lcn2 commented Aug 8, 2023

OR-ing of op's by default

By default, multiple -S op=str and -n op=num are logically OR-ed together. For example:

jval -t num ... -n ge=3 -n le=1 ... foo.json

Would search for numeric JSON values that are ">= 3" OR "<= 1".

Or for example:

jnamval ... -S eq=foo -S eq=bar ... foo.json

This would search for JSON member name stringy at are "== foo" OR "== bar".

The OR-ing is done, by default, for subsequent multiple -S op=str and -n op=num even if there are other options in between then. For example, this has the same effect:

jnamval ... -S eq=foo -v 3 -Q -S eq=bar ... foo.json

Even though there is a -v 3 -Q in between the two -S op=str on the command line.

Of course this doesn't prevent someone from doing something silly such as:

jnamval ... -t num -n lt=3 -n eq=2 ... foo.json
jnamval ... -t num -n eq=42 -n ne=42 ... foo.json

There will be no attempt to point out that such command lines are silly.

UPDATE 0

There is a new option that is needed and will appear in a new comment soon.

@lcn2
Copy link
Contributor Author

lcn2 commented Aug 8, 2023

Add -a option

We need to add -a to both jval and jnamval.

As mentioned in comment 1670377096, by default consecutive -S op=str and -n op=num are OR-ed together.

We add a new -a option that will change a pair of consecutive -S op=str and -n op=num to be AND-ed together.

The -a MUST come after some -S op=str or -n op=num. The -a MUST come before some other -S op=str or -n op=num.

As with the OR case as mentioned in comment 1670377096, one may have other options in the mix.

For example:

jval -t num -n ge=1 -v 3 -a -o foo.out -n le=10 bar.json

This will match numeric JSON values that are ">= 1 AND <=10".

There will be no attempt to warn the user of they do something silly such as:

jval ... -S eq=curds -a -v 3 -J 3 -S eq=whey bar.json

Because no JSON value can be "== curds AND == whey", the silly command will match nothing.

We will certainly have a verbosity mode output the match actions and their failure, so of the user is confused, they might get a clue from the verbosity debugging stuff.

If there is no -S op=str or -n op=num anywhere before an -a then an command line error should be thrown. If there is no -S op=str or -n op=num anywhere after a -a, then a command line error should be thrown.

UPDATE 0a

We do not need to go too far with matching options. Things like xor-ing, parentheses, could be added later if there was enough demand. We don't see much of a realistic use case for extending these matching options beyond what we have proposed now. The 6 ops OR-ed by default with -a to change a consecutive pair into an AND is good enough for the "99% case".

UPDATE 1

It is OK to have more than one -a in a command line. For example:

jval -t num -n ge=2 -a -n lt=4 -n ge=6 -a -n lt=8 foo.json

This would attempt to match JSON numbers that were ">= 2 AND <= 4" or JSON numbers that were ">= 6 and <= 8".

We don't want to put more than one -a in-between -S op=str or -n op=num. For example:

jval -t num -n ge=2 -a -v 3 -a n ge=6 foo.json    # invalid command line

The use of -a following another -a should raise a command line error. There must be a -S op=str or -n op=num AND no other -a before a -a. There must be a -S op=str or -n op=num AND no other -a after a -a.

UPDATE 2

It is OK to mix -S op=str and -n op=num. For example:

jval -t int,float -n eq=2 -a -S ne=2 -a -S ne=2.00 jparse/test_jparse/test_JSON/good/two-array.json

This would only print:

2.0

Here we have two different -a which is OK because there is a -S op=str or -n op=num both prior and after each -a. Moreover, mixing -S op=str and -n op=num is OK.

UPDATE 3

This is why you need to build a dynamic array of the -S op=str, -n op=num and a options. We need to see the list of these, in command line order, without all of the other options.

After the getopt(3) look finishes, a scan of the dynamic array (of the -S op=str, -n op=num and a options) needs to look for problematic -a options and raise a command line error if found.

UPDATE 4a

We updated the TODO list accordingly.

@xexyl
Copy link
Contributor

xexyl commented Aug 9, 2023

Environmental variables
macOS ls(1) has an environmental variable $LSCOLORS. It looks something like this (a copy of mine): DxFxBxDxCxegedabagacad where each position is a colour for a different thing. This is not only not descriptive but it puts a real burden on the user. I think that's a good reason to not use this.

Make use of environment variables:

  • $JFMT_COLOR
  • $JVAL_COLOR
  • $JNAMVAL_COLOR

The names sound reasonable though I might have the _COLOR be the secondary name for each.

However, have the content of those environment variables be a complete JSON document. For example:

[
    {
        "mode" : "dark" ,
        "boolean" : "cyan",
        "string" : "red"
    }
]

The JSON document in the environment variable would consist of a JSON array containing a single JSON object containing zero or more JSON members of the form "name" : "value".

This is an intriguing thought but I would think this would put a big burden on the user to construct such a file when they might not be familiar with the syntax (of what we expect) plus the fact that most people would expect an environmental variable to be a line. Wouldn't it be hard to set up in say a .bashrc file ? I would think so unless it's one long line which might be a burden too.

In absence of such an environment variable, a default color scheme would be used.

If the environment variable contained invalid JSON, then the contents of the environment variable would be silently ignored, unless the -v level mode was ">=1" in which case a debug message about the invalid JSON would be printed.

I would think that it could be ignored always at least as far as making it an error.

If a certain JSON member were missing from the environment variable, such as if it were missing { "number" : "green" }, then the default for the given item would be used.

That makes sense yes.

If the JSON document in the environment variable was valid, and were to contain an unexpected JSON tree structure, then that other JSON information would be silently ignored, unless if the -v level mode was ">=1" in which case a debug message about what is being ignored would be printed.

Debug message but not fatal error, right?

If the JSON document in the environment variable was valid, and was in the proper JSON tree structure (a JSON array containing a single JSON object containing zero or more JSON members of the form "name" : "value") and one of the "name" or "value" contained an unexpected JSON string contents, then that unexpected JSON string contents would be silently ignored, unless if the -v level mode was ">=1" in which case a debug message about what is being ignored would be printed.

Okay.

If all is well with the JSON document in the environment variable and if the -v level was ">=1", then an informative message debug message would be printed.

Okay.

If the -v level was ">=3", then an informative messages about what color scheme values used would be printed.

Okay.

Overall I think it's an intriguing idea to have the environmental variable be a json document but I still am concerned it would be a burden on the user ... I don't know.

@xexyl
Copy link
Contributor

xexyl commented Aug 9, 2023

With commit fd963b7 the -S and -n options are in an ordered list. Not a dynamic array but that can come later if necessary. The list is in the order specified. However as noted there is a list for both -S and -n. I suspect that this has to be a single list. If that is so I can fix it.

I also added the ne operator. I forgot to note this and I forgot the man pages. Time to rest .. hope to reply to the other comments later. Not feeling very well right now though and very tired. I hope that later on I feel more capable to reply.

Hope you're sleeping well.

UPDATE 0

I think that dyn_test.c will provide what I need to make it a dynamic array but we'll see ... maybe later today, maybe tomorrow but either way the operators are at least in the order specified. I'm correct that it has to be a single list though, right? If so that seems like the most important thing to fix first.

Well maybe I can reply to other things later on today. Resting now.

@xexyl
Copy link
Contributor

xexyl commented Sep 3, 2023

Note that I am NOT ready to start discussing the details yet. I hope that if my lungs keep at the rate (or better yet continue to get better) that assuming I can feel a bit more rested that I will be able to start in a week or two. That's not a guarantee but it's what I hope. What this comment is about is to try and figure out where we left off as I got sick weeks ago now (and again I feel badly about it even if it's not my fault). More so I am hoping that this comment can get you to start thinking about what you feel are the next steps of what we have to do - and that includes most importantly (because of where we left off) discussions.

I will start with the one thing that I saw at the bottom. I made some things a list but it might be better to look at it as a dynamic array. I can maybe play with that when 'I'm back' (as Sam says) but more likely I think it'll be better to discuss things - very possibly having to re -discuss some things - to get back into it.

I know we were talking about ANSI / colour mode for some things like syntax highlighting but what else I have no idea and I don't know what would be the best way to go about getting back to discussion.

Again I'm not ready for it but I felt like it might be good if you had some time to think about how we should proceed when it is time. Hopefully that way things will go smoother and we can be more productive.

Just some thoughts. I'm pretty off still and not up to thinking much more than this but it was on my mind so I wanted to say these things.

Enjoy the rest of your weekend! I'm not sure if there is need to reply to this comment but if there is I will decide to reply when I get it or later on. I think it would depend on what you reply with.

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 11, 2023

Status question

Where are we on the current TODO list. In that TODO list are there items that should be checked complete? Are there items that should be added to the list? Are there completed items test should be unchecked?

The JSON files issue, once the inventory is ready, will need the utilities in this issue ready in order for it to be fully complete.

@xexyl
Copy link
Contributor

xexyl commented Sep 11, 2023

Status question

Where are we on the current TODO list. In that TODO list are there items that should be checked complete? Are there items that should be added to the list? Are there completed items test should be unchecked?

The JSON files issue, once the inventory is ready, will need the utilities in this issue ready in order for it to be fully complete.

I'll see if I can address this tomorrow morning. I believe I can at least do this. Hopefully I can start looking at this one more sometime soon.

Unless maybe I have a chance to tell you report of doctor appointment I'll be gone until tomorrow so I'll bid you a good day!

I hope I've answered everything but do please feel free to remind me about something if I missed or end up missing something.

@xexyl
Copy link
Contributor

xexyl commented Sep 12, 2023

TODO statuses

Okay as far as todo statuses I'll go through the list you have in the first comment and make comments. For those that seem in good standing I'll remove them unless I have a comment or question.

TODO

other TODOs

This does indeed still need to be done. This was needed shortly before I got sick and I never got a chance to do it. It might be one I can do more easily to ease back into this but we'll see about that I guess.

Pre-implementation

Unless something else comes up this does indeed seem good.

  • Decide how jfmt should format JSON

I believe this will be a difficult one but I also believe it can't be resolved until more is done. That might not be the case though if we can show examples here in comments. I might suggest something and I will suggest something. When we show example formatting output we should first show the json file itself in the comment and then show the formatted output. This way we don't have to go to terminal to check json file then go back and forth to try and see how it will be changed.

Other thoughts is that we might start by looking at other formatters and see if they give us any ideas?

A question is when should this part be resolved?

Another question is which tool should be completed first? And should other tools be worked on in the meantime? What lower level things have to be done before the formatter can be completed?

Sorry for all the questions here but they felt important and I didn't want to forget them!

  • Usage messages produced by -h

This is definitely not done because you have added some options that are not yet implemented. See next item.

  • Complete command line parsing without implementation

This is no longer complete because you have added at least one option, -a. I have concerns about implementation of this one but they will have to be dealt with after I have energy to read that comment thoroughly and I'm afraid that won't be today. Sorry!

.. I also wonder about how some things are done as you prefer a dynamic array but whether that can be part of this todo item or not I don't know. That doesn't matter much though because -a is not added yet and it might need to be discussed before it can even be added.

  • Finish man pages, essentially the examples (perhaps the useful ones from jparse/json_util_README.md)

I guess this will be easier to do once the tools are complete or do you think this should be done before that? As you say some are in the readme file. It also depends of course if we include output. Do we want to do that? I'm not sure but if we do it might be better to first finish the respective tool? I really have no idea: up to your opinion.

  • Send stdout & stderr to a log file for print_test during make test

Although this is done we might want to add a todo item or keep it at the back of our minds:

  • once all tools are complete make sure that make test checks everything and looks okay. For instance it does not yet run the test scripts of the three tools.

I did start those scripts but they're not useful yet. They only (if I recall) run the -K (test) option.

Now as for the todo sections:

TODOs not in the json_util_README.md writeup

Finally

I cannot help right now: it involves checking things that I'm too tired at this moment to do. I hope that the above helps though!

@xexyl
Copy link
Contributor

xexyl commented Sep 13, 2023

Not ready yet but: with the comment above that addresses the todo list how do you want to continue this issue when it is time?

It might be useful to know that much.

Today I am quite tired. Unsure on lungs right now. I will say more in the other thread later on today if I have any updates or else tomorrow.

Hope all is well and good day!

@xexyl
Copy link
Contributor

xexyl commented Sep 13, 2023

I might try adding the ne option today. Just parsing of course.

I am not sure but I hope to. That doesn't mean that I am all better. Sadly I am definitely not. But it should be very simple.

I have other things I have to do but if there's a bit of luck it will be added today.

We shall see! For now I will be afk for at least an hour but possibly longer. Not that that actually matters as such but I won't be looking at this for at least that long.

@xexyl
Copy link
Contributor

xexyl commented Sep 13, 2023

I might try adding the ne option today. Just parsing of course.

I am not sure but I hope to. That doesn't mean that I am all better. Sadly I am definitely not. But it should be very simple.

I have other things I have to do but if there's a bit of luck it will be added today.

We shall see! For now I will be afk for at least an hour but possibly longer. Not that that actually matters as such but I won't be looking at this for at least that long.

Interestingly I already added this: I just forgot to show it in the help and the syntax errors.

I'll do that and then a pull request .. will probably all I'll do today. I'm not sure if I have it in man pages. If not I'll try doing that too.

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 22, 2023

  • Decide how jfmt should format JSON

I believe this will be a difficult one but I also believe it can't be resolved until more is done. That might not be the case though if we can show examples here in comments. I might suggest something and I will suggest something. When we show example formatting output we should first show the json file itself in the comment and then show the formatted output. This way we don't have to go to terminal to check json file then go back and forth to try and see how it will be changed.

Other thoughts is that we might start by looking at other formatters and see if they give us any ideas?

A question is when should this part be resolved?

We should choose to use the simplified idea for levels AND put spaces between most tokens (with a few exceptions such as "foo" would not have spaces inside the "'s).

Something basic is fine.

Another question is which tool should be completed first? And should other tools be worked on in the meantime? What lower level things have to be done before the formatter can be completed?

Probably jval and jnamval should be completed before jfmt.

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 22, 2023

  • Finish man pages, essentially the examples (perhaps the useful ones from jparse/json_util_README.md)

I guess this will be easier to do once the tools are complete or do you think this should be done before that? As you say some are in the readme file. It also depends of course if we include output. Do we want to do that?

Yes.

I'm not sure but if we do it might be better to first finish the respective tool? I really have no idea: up to your opinion.

Useful to get the documentation finalized before lots of coding.

@lcn2
Copy link
Contributor Author

lcn2 commented Oct 27, 2023

We believe we have addressed all of the current questions that still need answering at this time. If we've missed something or something else needs to be clarified, please ask again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants