-
-
Notifications
You must be signed in to change notification settings - Fork 25
Multiple Improvements to portindex2json.tcl #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
You need to rebase your branch. https://trac.macports.org/wiki/WorkingWithGit#upstream-fetch |
Done, thanks. |
portindex2json/portindex2json.tcl
Outdated
|
|
||
| package require json::write | ||
|
|
||
| proc format_string {string} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this takes a list of strings, so the names of the proc and arg are misleading. Using strings rather than string would be better.
portindex2json/portindex2json.tcl
Outdated
| proc format_string {string} { | ||
| if {[llength $string] > 1} { | ||
| foreach element $string { | ||
| lappend new_string "\"$element\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use ::json::write::string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::json::write string is being already used, it would give a single string as output. But we need a list of strings which would be possible only with ::json::write array and it takes a list as an argument.
portindex2json/portindex2json.tcl
Outdated
| lappend new_string "\"$element\"" | ||
| } | ||
| } else { | ||
| set new_string [list "\"$string\""] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what situation does the above foreach loop not work on a list of length 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting error in case of "depends_lib" and "depends_build". But now I realise it was because in several cases the values of "depends_lib" and "depends_build" are empty. There sure is a better way to go through this. What do you suggest?
|
As a first step don't worry about commits and just try to get the contents right. Commits can be rewritten at the end. |
|
Initially the script was using spaces and I had made use of tab with four spaces. But the indentation is itself getting converted to 8 spaces as soon as I commit the changes (although it is 4 spaces in my text editor.) In second commit I converted all spaces to tab, and set to use 4 spaces. After the commit, irregular spacing is gone but the indentation again got converted to 8 spaces. |
portindex2json/portindex2json.tcl
Outdated
| foreach key [array names portinfo] { | ||
| if { $key == "categories" || $key=="variants" || $key=="depends_lib" || $key=="depends_build" } { | ||
| set json_portinfo($key) [::json::write array {*}[format_strings $portinfo($key)]] | ||
| } elseif { $key == "maintainers"} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably try to put the elseif into a function that parses and writes maintainers, so that the main function stays easy enough to comprehend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be a really good improvement, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to use a procedure. Thanks
|
I need to go to bed, so I didn't investigate yet, but the script throws an error for me when I try to run it: |
The Script is working fine for me. Sample Output: |
portindex2json/portindex2json.tcl
Outdated
| package require json::write | ||
|
|
||
| proc format_strings {strings} { | ||
| if {[llength $strings] > 1} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @jmroot already said, there is no reason to use the if clause here. The only problem you get is that in case the list is empty, new_strings would be left undefined. You can avoid that by first defining an empty list:
set new_strings {}and then running the loop, without any need to check for the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, applied the changes and it is working fine.
portindex2json/portindex2json.tcl
Outdated
| proc format_strings {strings} { | ||
| if {[llength $strings] > 1} { | ||
| foreach element $strings { | ||
| lappend new_strings "\"$element\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it makes sense to use [::json::write string $element] here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be used with lappend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was what my previous comment on this line was getting at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was what my previous comment on this line was getting at.
Sorry, I had misinterpreted it that time.
mojca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry that it took me so long for a proper review, I probably needed a Saturday to get slightly more free time as my Tcl skills are not something to brag about either.
There's something wrong with quoting of macfile-gimp.
Another problem is that at some point (after the first maintainer is found) the alignment of the file changes, probably because the defaults are set to a different value when you tried to fit the maintainer's info into a single line.
portindex2json/portindex2json.tcl
Outdated
| set json_portinfo($key) [::json::write array {*}$portinfo($key)] | ||
| if { $key == "categories" || $key=="variants" || $key=="depends_lib" || $key=="depends_build" || $key=="depends_fetch" } { | ||
| set json_portinfo($key) [::json::write array {*}[format_strings $portinfo($key)]] | ||
| } elseif { $key == "maintainers"} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use consistent whitespacing inside braces.
I don't know why this highlighter is marking this one red, but according to most other sources it seems that we don't use a space after { and before }.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made it consistent (without the use of spaces)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The red highlighting is still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have absolutely no idea why the strange highlighting here, but I assume we can ignore it. But there are other locations where there's still whitespace. Sorry for nitpicking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that is perfectly fine. We need the final code in perfect shape, and I am ready for any number of changes. I will give it a more detailed check. Thanks
| } | ||
|
|
||
| proc remove_extra_braces {text} { | ||
| return [regsub -all {[\{\}]} $text ""] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to do the job, but I'm not a huge fan of this approach which blindly replaces all braces. I hope that @jmroot or @MarcusCalhoun-Lopez are slightly more skilled in Tcl and could help with some nice command to properly "unquote" text without "hard-code" replacements?
As an explanation for anyone not following why this was needed: Whenever the Portfile uses
long_description ${description}the long description ended up with an additional pair of undesired braces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably just stop fixing that after the fact. The right thing to do in the Portfile is write long_description {*}${description}.
That said, if $long_description is length 1 then it's safe to use [lindex $long_description 0].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I saw the source of the problem, I agree that fixing it here is kind of the wrong place to perform any fixes.
However, we would need to fix nearly 4000 portfiles if we follow your suggestion. Definitely off-topic for this PR, but my naive question is: Tcl is eventually able to print the string to the screen without all the wrong quoting. Is there really no way to write the properly expanded string to PortIndex automatically if we add some fixes to the base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tcl is eventually able to print the string to the screen without all the wrong quoting.
It's not "wrong" quoting, it's precisely the quoting needed to define the list as it was given. If you want to join all the list elements together into one string, you use join. Tcl doesn't just decide to do this on its own; have a look at what action_info does.
Is there really no way to write the properly expanded string to PortIndex automatically if we add some fixes to the base?
It would be very strange to have the value of $long_description in the Portfile be different to the value you get from reading the PortIndex. Think about what this would mean: you would get one answer from mportinfo and another from mportlookup (both return the same PortInfo array, but the former does it by parsing the Portfile and the latter gets it from the PortIndex.)
portindex2json/portindex2json.tcl
Outdated
| if { $key == "categories"} { | ||
| set json_portinfo($key) [::json::write array {*}$portinfo($key)] | ||
| if { $key == "categories" || $key=="variants" || $key=="depends_lib" || $key=="depends_build" || $key=="depends_fetch" } { | ||
| set json_portinfo($key) [::json::write array {*}[format_strings $portinfo($key)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of personal opinion, but I would rewrite this in such a way that ::json::write array would already be called inside the format_strings function. Then you would only call something like
set json_portinfo($key) [strings2json_array $portinfo($key)](I changed the function name from format_strings to strings2json_array. Not sure if that's the best name either.)
portindex2json/portindex2json.tcl
Outdated
| foreach key [array names portinfo] { | ||
| if { $key == "categories"} { | ||
| set json_portinfo($key) [::json::write array {*}$portinfo($key)] | ||
| if { $key == "categories" || $key=="variants" || $key=="depends_lib" || $key=="depends_build" || $key=="depends_fetch" } { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using inconsistent whitespacing here. My preference and generally the style used elsewhere is to use whitespace around ==.
Another way of writing this would be to check whether $key is in a list of strings:
if {[lsearch -exact {categories variants depends_lib dependes_build dependes_fetch} ${key}] != -1} {even though I'm not sure if that makes the comparison any clearer (in python it would have been nicer, but here the syntax is somewhat strange). You can probably leave it as it is, just fix the spaces.
Well, the "simplification" and maybe better said "generalisation" could have been if you compared $key to depends_*. There is also depends_test and I almost definitely forgot one of the depend types. So: check if $key equals categories or variants or depends_*.
(I would prefer if one of the more experienced developers commented about that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect compromise.
portindex2json/portindex2json.tcl
Outdated
| return [regsub -all {[\{\}]} $text ""] | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably no need for two newlines.
In the PortIndex, the description is: |
The portfile contains |
|
You do need to |
portindex2json/portindex2json.tcl
Outdated
| foreach maintainer $maintainers { | ||
| if {$maintainer == "openmaintainer"} { | ||
| continue | ||
| } elseif {$maintainer == "nomaintainer"} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use "if $maintainer is not openmaintainer and is not nomaintainer, then append to the list"? And remove continue altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be very neat. Thank You
portindex2json/portindex2json.tcl
Outdated
| unset maintainer_subobjects | ||
| return $maintainers_list | ||
| } else { | ||
| return "\[\]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to the same logic as in format_strings and initialize with an empty list first, then add maintainers. You can then get rid of the $ismaintainer variable altogether, and you won't need the else part either.
This whole function can probably be rewritten in five lines at most (not counting the function signature and lines containing just the closing brace): initialize an empty list, loop through maintainers, make sure it's a valid maintainer line, add maintainer to the list, return the json array with maintainers.
portindex2json/portindex2json.tcl
Outdated
| set closedmaintainer true | ||
| foreach maintainer $maintainers { | ||
| if {$maintainer == "openmaintainer" || $maintainer == "nomaintainer"} { | ||
| set closedmaintainer false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have one million maintainers and the first one is openmaintainer, you would keep iterating through all of them, just to return false at the end anyway. Don't set the variable, just return false here.
(I would somewhat prefer if the two functions (is_closedmaintainer and parse_maintainers) would be combined into one to avoid doing the same work twice, but it's true that it's then a tiny bit more convoluted code if one needs to return two results, fetch the first one, then the second one etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have one million maintainers and the first one is
openmaintainer, you would keep iterating through all of them, just to returnfalseat the end anyway. Don't set the variable, just return false here.
Thanks
(I would somewhat prefer if the two functions (
is_closedmaintainerandparse_maintainers) would be combined into one to avoid doing the same work twice, but it's true that it's then a tiny bit more convoluted code if one needs to return two results, fetch the first one, then the second one etc.)
We have to set values for two keys, so I could think of this approach only.
portindex2json/portindex2json.tcl
Outdated
| set closedmaintainer false | ||
| } | ||
| } | ||
| return $closedmaintainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have returned earlier for all the cases where this was false, you can safely return true here.
portindex2json/portindex2json.tcl
Outdated
| foreach element $strings { | ||
| lappend new_strings [::json::write string $element] | ||
| } | ||
| return $new_strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return the json array directly as suggested earlier.
portindex2json/portindex2json.tcl
Outdated
| set name [string trimright [lindex $email_list 1]] | ||
| set email(name) "\"$name\"" | ||
| set email(domain) "\"[lindex $email_list 0]\"" | ||
| ::json::write indented false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These intermediate identations mess up the global intentation of the file and make the spacing inconsistent in general. I'm kind of inclined to remove them altogether.
What we could (and probably should) do is provide a global option to switch between the readable and the compact form of the output. Any software would probably want the compact form for everything, not just emails. And any human being who wants to read the long form would not mind having the person split into 6 lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the part that was writing maintainers object into one line.
For the global part we just need to add ::json::write intended false.
portindex2json/portindex2json.tcl
Outdated
| if {[string first : [lindex $detail_list 0]] != -1} { | ||
| set email_list [split [lindex $detail_list 0] :] | ||
| set name [string trimright [lindex $email_list 1]] | ||
| set email(name) "\"$name\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using json function to quote the name? Here and at a few other places.
portindex2json/portindex2json.tcl
Outdated
| ::json::write indented true | ||
| } | ||
|
|
||
| set maintainer_array(github) "\"[lindex $detail_list 1]\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be kind of inclined to either leave this out completely or maybe set it to null if the github handle doesn't exist. That said, we should probably get a warning whenever a github handle is missing (probably in application rather than here), so in a way making it explicit that it doesn't exist (either with an empty string or with null) is still informative if one wants to search through the json file itself. Second opinion welcome.
And please use json function for quoting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have set it use null if the GitHub handle is not provided.
|
Thanks for helping in making the code really neat and efficient @mojca. I have made another commit. Please let me know what further improvements can be done. And what should we finally do with 'removing extra braces' from |
|
@jmroot: What do you think? It looks mostly ok to me by now (I didn't test the results yet). Regarding the quoting, I'm indifferent whether we make an ugly workaround or not if the problem needs to be fixed elsewhere anyway. If we don't work around it here, it will at least serve as a reminder for fixing this properly at the source. |
portindex2json/portindex2json.tcl
Outdated
| set name [string trimright [lindex $detail_list 0]] | ||
| set email(name) [::json::write string $name] | ||
| set email(domain) [::json::write string "macports.org"] | ||
| set email_subobject [::json::write object {*}[array get email]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
email(name) and email_subobject seem to be the same, so they could probably go out of the if-else loop?
I still want to test this (it would probably be nice to have some unit tests) to make sure that different scenarios work ok (if there is no email, if there is no github handle, if github comes before email, if it comes after email etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
email(name)andemail_subobjectseem to be the same, so they could probably go out of the if-else loop?
They are not same, email(name) is used to set the key name. And using different keys, the object email_subobject is compiled.
Am I understanding your message correctly?
I still want to test this (it would probably be nice to have some unit tests) to make sure that different scenarios work ok (if there is no email, if there is no github handle, if github comes before email, if it comes after email etc.)
I am afraid if won't work if the formats {domain:name @github} or {name @github} are altered. Because it totally relies on splitting the string at @.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was the following:
if {[string first : [lindex $detail_list 0]] != -1} {
set email_list [split [lindex $detail_list 0] :]
set name [string trimright [lindex $email_list 1]]
- set email(name) [::json::write string $name]
set email(domain) [::json::write string [lindex $email_list 0]]
- set email_subobject [::json::write object {*}[array get email]]
} else {
set name [string trimright [lindex $detail_list 0]]
- set email(name) [::json::write string $name]
set email(domain) [::json::write string "macports.org"]
- set email_subobject [::json::write object {*}[array get email]]
-}
+set email(name) [::json::write string $name]
+set email_subobject [::json::write object {*}[array get email]]I didn't test so maybe some variables need to be defined outside of the loop in order not to loose the scope.
My question is: does the code still work if you change {domain:name @github} into {@github domain:name}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was the following:
if {[string first : [lindex $detail_list 0]] != -1} { set email_list [split [lindex $detail_list 0] :] set name [string trimright [lindex $email_list 1]] - set email(name) [::json::write string $name] set email(domain) [::json::write string [lindex $email_list 0]] - set email_subobject [::json::write object {*}[array get email]] } else { set name [string trimright [lindex $detail_list 0]] - set email(name) [::json::write string $name] set email(domain) [::json::write string "macports.org"] - set email_subobject [::json::write object {*}[array get email]] -} +set email(name) [::json::write string $name] +set email_subobject [::json::write object {*}[array get email]]
Got it, thanks! It seems that it can be done. I will give it a try.
My question is: does the code still work if you change
{domain:name @github}into{@github domain:name}?
No, it won't. After splitting, it relies on github username being at index 1 and email at index 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is: does the code still work if you change
{domain:name @github}into{@github domain:name}?No, it won't. After splitting, it relies on
github usernamebeing atindex 1andindex 0.
This doesn't sound OK then. I'm not aware of any rule that dictates the github handle to come after the email address. You should iterate through entries in the list and detect github handles based on whether the first character is @, and for entries without @ distinguish emails from macports usernames based on the existence of the colon. Putting email after github handle is a perfectly valid setup; it's also valid (though discouraged) to use just email or just github handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank You for the information @mojca. I thought it was always going to be in the same format. I shall do the changes shortly.
How about this: split at space and assign GitHub username if @ is present in the entry. The other entry would then automatically be email, following the presence of : to determine the domain. And keeping in mind that entries might be missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need an explicit splitting at space?
Just treat entries inside maintainers the same way as variants, depends_x etc. and loop over them. Then yes, you do an if-else to check for presence of '@' and ':'.
|
Here are some tests I ran:
I am now dropping the keys which are not available. |
This would usually read but it need to work in both cases.
That's OK. |
I ran this : and this : This would work in both cases. |
mojca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view this is now complete. The only remaining question is about keeping or dropping remove_extra_braces (leaning towards dropping for now, but no strong opinion) and the final squash of these commits.
I would be grateful for approval from another member, but if nobody complains, I'll eventually merge.
|
Have we verified that the changes will still work with our submission to https://repology.org, which is the main user of the PortIndex in JSON format? |
|
Sorry, you can mostly ignore my previous comment. I should have checked before. In fact, repology maintains a separate copy of the script. |
It won't work without changes to their python code. They have expressed a desire not to maintain that script in their repo BTW. |
Yes, it won't work. But I just saw the parser being used for Macports, they are manipulating the values of only two keys: |
|
I would currently not worry about incompatibility with repology. It uses an independent source code, so their system won't break, and the maintainer will be much happier anyway if we simply add a few lines to our buildbot configuration to generate the json file immediately after PortIndex, and also mirror that one. The fixes on the repology side should be minimal. I would suggest squashing the commits, ask again for opinion about the ugly workaround with braces, and then merge it, unless you have some further suggestions for code improvements. |
|
I have squashed them all into a single commit. Is that fine? |
|
Yes. The only super minor thing is that first line of the commit messages should be somewhat shorter (github will already not show the full commit message; it cuts at 70 characters; https://chris.beams.io/posts/git-commit/ suggests to limit yourself to 50). You can write more in the third row and further; the second row should be empty). (In this case you could mention closedmaintainer in the third row.) |
Add another key 'closedmaintainer' and keep details of maintainers in the 'maintainers' key. Output 'depends_*' and 'variants' as a list of strings.
|
Thank you @mojca |
|
Thank you very much for the contribution. I suggest we clean up the long descriptions later. I searched for any remaining braces and there are some in licence fields: but that's a different issue. We should switch to SPDX one day. |
Uh oh!
There was an error while loading. Please reload this page.