Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taginfo listing of tags added or modified by StreetComplete #4225

Closed
matkoniecz opened this issue Jul 20, 2022 · 32 comments
Closed

Taginfo listing of tags added or modified by StreetComplete #4225

matkoniecz opened this issue Jul 20, 2022 · 32 comments
Labels
enhancement feedback required more info is needed, issue will be likely closed if it is not provided

Comments

@matkoniecz
Copy link
Member

matkoniecz commented Jul 20, 2022

It is deployed to taginfo as it is superior to what was deployed already - see https://taginfo.openstreetmap.org/projects/streetcomplete#tags and for example https://taginfo.openstreetmap.org/keys/addr:place#projects

Overall I will close this issue, as I have no idea how to provide non-nightmarish solution that can implement it. I am planning to support what I created, so feel free to ping me here or create an issue at https://github.com/matkoniecz/Zazolc/issues if you spot anything wrong.

See https://github.com/matkoniecz/Zazolc/tree/taginfo and more specifically https://github.com/matkoniecz/Zazolc/blob/taginfo/buildSrc/src/main/java/UpdateTaginfoListingTask.kt and https://github.com/matkoniecz/Zazolc/blob/taginfo/res/documentation/taginfo_listing_of_tags_added_or_edited_by_StreetComplete.json


original issue:

I actually have an idea how it can be done. Creating an issue to prevent (really unlikely) conflict with someone else also working on it.

The previous attempt was #2754 and there was also an earlier attempt at manual handling.

@westnordost

replaceShop - do you think that taginfo listing should also list all that tags that may be added on "shop has changed" and on specifying more detail on shop=yes?

(My opinion is that yes, it should be listed - but as it requires extra work I prefer to ask before even designing this)

@westnordost

Would you be OK with partially manual/hardcoded answers and handling for some tricky cases and quests? I think it is not avoidable without spending outrageous effort on this.

@matkoniecz matkoniecz self-assigned this Jul 20, 2022
@westnordost
Copy link
Member

replaceShop - do you think that taginfo listing should also list all that tags that may be added on "shop has changed" and on specifying more detail on shop=yes?

Not necessary.

Would you be OK with partially manual/hardcoded answers and handling for some tricky cases and quests? I think it is not avoidable without spending outrageous effort on this.

Depends on how frequently this would need to be done / for how many we are talking about. No promises until I have seen how much manual work this is amounts to for current and future quests and overlays.

@peternewman
Copy link
Collaborator

I actually have an idea how it can be done.

Care to give some detail @matkoniecz . Obviously I spent quite a bit of time looking at it before in various iterations of the code.

replaceShop - do you think that taginfo listing should also list all that tags that may be added on "shop has changed" and on specifying more detail on shop=yes?

FWIW that bit feels like it ought to be pretty trivial compared to the rest of it, you'd just need to parse the JSON for the presets and dump it out.

Would you be OK with partially manual/hardcoded answers and handling for some tricky cases and quests? I think it is not avoidable without spending outrageous effort on this.

I'd agree with this and proposed similar in the past, there was some stuff like I think the cycleway left/right/both where there were just too many levels of indirection to do it in a sane way.

It would also act as a stepping stone; what I'd generated before automatically picked up most new quests, and was already a lot better than the current very stale manual effort. If we covered 90% automatically and just hardcoded some, we could get something out there and then potentially one or more people could look at automating more of the 10% in future, but I suspect as is often the case the last 5/10% will be the hardest (e.g. looking up the name:en etc suffixes is a special case for one or two quests and very different to everything else).

@matkoniecz
Copy link
Member Author

matkoniecz commented Jul 21, 2022

Care to give some detail @matkoniecz

Parsing AST generated with https://github.com/kotlinx/ast

you'd just need to parse the JSON for the presets and dump it out.

not really, as SC is using only part of them - you need to also filter them

@peternewman
Copy link
Collaborator

Care to give some detail @matkoniecz

Parsing AST generated with https://github.com/kotlinx/ast

I think I said similar in my old PR, while it may deal with some of the whitespace issues the regexes had, you're still going to have to interpret the AST from even something relatively simple like:

PICNIC_TABLE -> {
tags["leisure"] = "picnic_table"
tags.remove("amenity")
}
YES -> tags["backrest"] = "yes"
NO -> tags["backrest"] = "no"

or
override fun applyAnswerTo(answer: StileTypeAnswer, tags: Tags, timestampEdited: Long) {
when (answer) {
is StileType -> {
val newType = answer.osmValue
val newMaterial = answer.osmMaterialValue
val oldType = tags["stile"]
val oldMaterial = tags["material"]
val stileWasRebuilt =
oldType != null && oldType != newType
|| newMaterial != null && oldMaterial != null && oldMaterial != newMaterial
// => properties that refer to the old replaced stile should be removed
if (stileWasRebuilt) {
STILE_PROPERTIES.forEach { tags.remove(it) }
}
if (newMaterial != null) {
tags["material"] = newMaterial
}
tags["stile"] = newType
}
is ConvertedStile -> {
STILE_PROPERTIES.forEach { tags.remove(it) }
tags.remove("stile")
tags["barrier"] = answer.newBarrier
}
}
// policy is to not remove a check date if one is already there but update it instead
if (!tags.hasChanges || tags.hasCheckDate()) {
tags.updateCheckDate()
}
}

Maybe the former isn't too bad, but for the latter, you need to know that answer is of type StileTypeAnswer then iterate through all the values that has. I guess it might simplify finding the object definition at least. In the case of newType you've got another layer of indirection in the way with the variable.

Maybe it's worth outputting the AST from those two quests as a starter if you haven't already?

you'd just need to parse the JSON for the presets and dump it out.

not really, as SC is using only part of them - you need to also filter them

Still easier than the above, even if you have to code the filters in parallel! Hell you could probably just write some Kotlin that runs a for loop on osmfeatures and uses the exact same filter(s).

@HolgerJeromin
Copy link
Contributor

Can we extract the information from the test infrastructure?
There we have running code and not kotlin itself.

@matkoniecz
Copy link
Member Author

Can we extract the information from the test infrastructure?

How we would do this?

@HolgerJeromin
Copy link
Contributor

Hack/hook into verifyAnswer and see/log what tags flows through it in all tests.

@matkoniecz
Copy link
Member Author

matkoniecz commented Jul 21, 2022

That would require manual creation of tests exercising use of every single value which would be both pointless. And would be both manual and requiring tricky code.

@peternewman
Copy link
Collaborator

Another "fun" corner case is semi-colon separated lists see #4230 !

@matkoniecz
Copy link
Member Author

The parsing code is filled with special cases and hardcoding, to the point that I am unsure about making a pull request. And more complex than I expected. If I would guess the complexity correctly at start, then I would skip it and definitely not treat it as a stepping stone to some noncritical improvements.

My plan right now is to wait say half year and see how much effort is caused by StreetComplete development - is it requiring new complex tweaks all the time or is it stable.

See https://github.com/matkoniecz/Zazolc/tree/taginfo and more specifically https://github.com/matkoniecz/Zazolc/blob/taginfo/buildSrc/src/main/java/UpdateTaginfoListingTask.kt and https://github.com/matkoniecz/Zazolc/blob/taginfo/res/documentation/taginfo_listing_of_tags_added_or_edited_by_StreetComplete.json

It is deployed to taginfo as it is superior to what was deployed already - see https://taginfo.openstreetmap.org/projects/streetcomplete#tags and for example https://taginfo.openstreetmap.org/keys/addr:place#projects

Feedback about code or about mistakes in produced listing is welcome.

@matkoniecz
Copy link
Member Author

matkoniecz commented Jul 30, 2022

Another "fun" corner case is semi-colon separated lists

This one was relatively easy to handle

https://github.com/matkoniecz/Zazolc/blob/31f1971bfc3a73b0ada60e7a3c33da9e2776390b/buildSrc/src/main/java/UpdateTaginfoListingTask.kt#L1350-L1357

AddStileType.kt

I gave up here and switched to partially hardcoding things https://github.com/matkoniecz/Zazolc/blob/31f1971bfc3a73b0ada60e7a3c33da9e2776390b/buildSrc/src/main/java/UpdateTaginfoListingTask.kt#L607-L632

AddBenchBackrest.kt

works fine, there was no need for doing anything really special. Code in this case just notices that tags["leisure"] = "picnic_table" is accessing tags variable via index, with key and value being strings.

https://github.com/matkoniecz/Zazolc/blob/31f1971bfc3a73b0ada60e7a3c33da9e2776390b/buildSrc/src/main/java/UpdateTaginfoListingTask.kt#L1143

It makes few assumptions, for example that tags will not be ever named differently or that there is no dead code and so on. But this works fine.

@matkoniecz
Copy link
Member Author

matkoniecz commented Jul 30, 2022

Overall it was fun to implement, but likely not worth invested time if you treat taginfo listing as the only beneficial output.

@westnordost
Copy link
Member

westnordost commented Aug 3, 2022

Feedback about code [...] is welcome.

Hmm, I have to admit that I do not particularly fancy reviewing a 2000 lines of code that are(after reading it vertically) full of debug logs, TODO comments, commented out code and 200+ lines of hardcoded/semi-hardcoded rules for quests.

I can't tell how much of that code is just for generally traversing the AST and finding what one needs to know and what of it is workarounds, hardcoding, semi-hardcoding, debugging stuff etc, as it is all mashed in one script, so I cannot tell if it is worth maintaining in the future.
If (you think) it is worth maintaining in the future, i.e. the amount of code that would need to be changed for future quests or changes to quests is small, then it can be included upstream (into streetcomplete master). But then, then what is required for it to be merged is to

  • separate the code into meaningful modules/files/classes. General AST parsing should be in another file, data model and generating the JSON from that should be in another, hardcoded logic for certain quests etc.
  • remove all that debug code, commented out code etc; solve the open TODOs (one way or another)
  • document what exactly is missing / not regarded

Feedback about [...] mistakes in produced listing is welcome.

This is also not something I would like to review. It would mean going through the taginfo pages one by one and comparing wiht the quests, one by one, if the information is correct. This is something that should happen by the person who developed it, not by (code) reviewers

@westnordost westnordost added the feedback required more info is needed, issue will be likely closed if it is not provided label Aug 3, 2022
@1ec5
Copy link
Contributor

1ec5 commented Aug 3, 2022

For what it’s worth, I took a different approach when putting together a taginfo project file in osm-americana/openstreetmap-americana#179. That map renderer project’s interactions with raw OSM tags are largely isolated into a method that can be run “headlessly” as part of a mocked-up environment. Instead of outputting a map style or unit test results, it spits out the major portion of a project file. A separate “template” file contains anything else that needs to be mentioned in the template that isn’t as easy to isolate or that is obfuscated by OpenMapTiles.

StreetComplete is a very different kind of application, but I would suggest finding a way to execute the code instead of scraping it. It looks like some quests already have automated tests that would be easier to turn into taginfo documentation. If a code coverage tool says test coverage of the quest code is good enough, then maybe figure out how to log something from methods like StringMapEntryAdd so that you can run the test and collect the added tags from the test logs. One advantage is that, if a quest looks like it adds a certain tag, but that code is unreachable for some reason, it won’t end up in the project file.

@matkoniecz
Copy link
Member Author

matkoniecz commented Aug 4, 2022

Overall I will close this issue, as I have no idea how to provide non-nightmarish solution that can implement it. I am planning to support what I created, so feel free to ping me here or at https://github.com/matkoniecz/Zazolc/issues if you spot anything wrong.


Hmm, I have to admit that I do not particularly fancy reviewing a 2000 lines of code that are(after reading it vertically) full of debug logs, TODO comments, commented out code and 200+ lines of hardcoded/semi-hardcoded rules for quests.

Yes, at this point I am really dubious about including it as a part of the official SC code. Maybe it would be better for it to live in a unofficial state like right now?

project’s interactions with raw OSM tags are largely isolated

That is not true at all in SC

if a code coverage tool says test coverage of the quest code is good enough, then maybe figure out how to log something from methods like StringMapEntryAdd so that you can run the test and collect the added tags from the test logs.

That would not work at all. For example https://github.com/streetcomplete/StreetComplete/blob/master/app/src/main/java/de/westnordost/streetcomplete/quests/building_type/AddBuildingType.kt#L58 may have many different values passed to it.

And to trigger all values for recording - that would require manual creation of tests exercising use of every single value which would be both pointless. And would be both manual and requiring tricky code. And still would not deal with a freeform values.

This is something that should happen by the person who developed it, not by (code) reviewers

Oh, I did it and found a bug (in hardcoded answer listing). Still, if anyone spots problem - lets me know.

@1ec5
Copy link
Contributor

1ec5 commented Aug 4, 2022

project’s interactions with raw OSM tags are largely isolated

That is not true at all in SC

I understand, but even though the two projects greatly differ, my point was that piggybacking on running code is the only way to have certainty that the project file is consistent with the running code. Even with an AST, trying to interpret source code independently of the running application will inevitably lead to unmanageable edge cases in a nontrivial codebase.

if a code coverage tool says test coverage of the quest code is good enough, then maybe figure out how to log something from methods like StringMapEntryAdd so that you can run the test and collect the added tags from the test logs.

That would not work at all. For example https://github.com/streetcomplete/StreetComplete/blob/master/app/src/main/java/de/westnordost/streetcomplete/quests/building_type/AddBuildingType.kt#L58 may have many different values passed to it.

If this code can set building to any arbitrary value that the user enters, then the taginfo project file should only list building=*, not every possible value that may be entered. A test of the freeform editing functionality would presumably set building to some bogus value that the test would be able to look out for. The Projects tab for a specific tag value does list entries associated with the key after entries associated with the specific value: taginfo/taginfo#356. On the other hand, if the building key is supposed to be set to a fixed set of values, this project must have some way of verifying that those values are set correctly.

@matkoniecz
Copy link
Member Author

Possible values are listed in https://github.com/streetcomplete/StreetComplete/blob/master/app/src/main/java/de/westnordost/streetcomplete/quests/building_type/BuildingType.kt#L61 - then this list is used to build interface and selected value is passed in https://github.com/streetcomplete/StreetComplete/blob/master/app/src/main/java/de/westnordost/streetcomplete/quests/building_type/AddBuildingType.kt#L58

I see no way of piggybacking on running code that will work, but maybe someone will notice a way to do this.

It is possible to write tests to trigger manually use of each value, but that is equivalent of making this tag list by hand.

If this code can set building to any arbitrary value that the user enters, then the taginfo project file should only list building=*

In such cases I see no way to detect it by piggybacking on running code. While AST analysis can actually sort-of detect this.

trying to interpret source code independently of the running application will inevitably lead to unmanageable edge cases in a nontrivial codebase.

Yes, part of my parser degenerated into hardcoding answers.

@1ec5
Copy link
Contributor

1ec5 commented Aug 4, 2022

I see no way of piggybacking on running code that will work, but maybe someone will notice a way to do this.

It is possible to write tests to trigger manually use of each value, but that is equivalent of making this tag list by hand.

Does Kotlin support iterating over the values of an enumeration? If so, a test could loop over it, logging out the values as being supported by the application. I realize this would be quite contrived as a test of the application’s actual functionality, so it’s a question of to what degree the project would accommodate test changes needed only for the taginfo project file.

If this code can set building to any arbitrary value that the user enters, then the taginfo project file should only list building=*

In such cases I see no way to detect it by piggybacking on running code. While AST analysis can actually sort-of detect this.

Earlier, I suggested mocking the user input: a test could pass in an easily detectable bogus string as the tag value, which the test could then check for. Techniques like this have the advantage of improving test coverage at the same time as they’d make the taginfo project file more maintainable. Even though AST analysis is resilient to differences in formatting, it constrains the code structure so that developers have more rules they have to follow when contributing a quest, just for taginfo’s sake. But relying on code coverage is easier because there are tools for automatically measuring it.

(Unfortunately, I can only casually suggest these strategies, since I’m not as familiar with Kotlin or this codebase, but hopefully a maintainable taginfo file doesn’t seem so hopeless.)

@matkoniecz
Copy link
Member Author

OK, good point about such iteration. I am starting to see how one may get it to work.

Though I am not planning on doing this and bet that it would likely also degenerate into mess of special cases. But maybe less ugly mess?

But I suspect that it could be actually a better strategy.

@matkoniecz matkoniecz removed their assignment Jul 28, 2024
@matkoniecz
Copy link
Member Author

people who told me that this way to maintain it (by parsing AST of StreetComplete code) was not going to work were right

in taginfo/taginfo-projects#195 I requested removal of current taginfo listing as it is not maintained and outdated

overlays were never supported, supporting new quests required adding growing list of exceptions and special handling

not sure what is the best way to handle it and is there workable one

if someone is interested I can share with them my code and maybe help them a bit (at least with testing)

right now I am not planning to work on this

as it is not criticial to implement and huge amount of work I will not reopen this issue

@matkoniecz matkoniecz closed this as not planned Won't fix, can't repro, duplicate, stale Jul 28, 2024
@rugk
Copy link
Contributor

rugk commented Jul 28, 2024

not sure what is the best way to handle it and is there workable one

Theoretically you may reverse the behaviour: Let the StreetComplete code "push" it's modified tags to some service.

Less abstract my suggestion would be to do the following:

  1. Adjust the SC code to somehow "export" all these into a common function/have a function/CLI (kotlin) tool that can access the required data in a structured way.
  2. Then create a GitHub action like this one that is already used (source obviously) that calls such a CLI tool and exports the data in the required format, like CSV or whatever (JSON apparently).
  3. This action can then push it to some location or even create a GitHub pull request if needed etc. etc.

The advantages would be:

  • This is integrated into SC itself as being (Kotline) source code and a GitHub Action.
  • The thing is fully testable, unit tests are possible, but also the "integration test" aka the GitHub Actions CI that actually runs it can also be configured run on PRs e.g. and ensure the data can be generated properly. (It just needs to have a flag/exclude the last "upload" step in the GitHub Actions that uploads such stuff. I.e. it just generates the JSON and uploads it as an artifact without further handling it, so it can be inspected.)
  • Adjustment would need to be made in SC code itself and ensure the data is properly exportable. This could increase the code quality itself, as that should IMHO always be possible in an object-oriented language.
  • Further fun stuff could be made with such a setup being there. You could compare and access the data that a PR modifies and comment that as a PR or whatever. (Though yeah not needed if you see the source, but well… if you need such a complex thing for whatever reason. it could give some insights. Also possibly for conflicts arising though a new quest aka two quests editing the same tags or whatever quality control you may think of.)

If you ned help, I could help with some GitHub Actions thing, though that is likely the easier part of it. 😅

@matkoniecz
Copy link
Member Author

Adjust the SC code to somehow "export" all these into a common function/have a function/CLI (kotlin) tool that can access the required data in a structured way.

That would be effort-prohibitive and would noticeably complicate code for relatively small gain.

@rugk
Copy link
Contributor

rugk commented Jul 28, 2024

Really? Could not you just have a CLI calling into the quest functions and extracting the required stuff as said…

@mnalis
Copy link
Member

mnalis commented Jul 28, 2024

Theoretically you may reverse the behaviour: Let the StreetComplete code "push" it's modified tags to some service.

@rugk For modified tags, you could also have an external service that just looks at minutely diffs (or occasionally parse Full history dump) to see all changes that StreetComplete produced, and build database of modification tags from that. (hey, even I might be able to write one...)

However, Taginfo projects are supposed to show all tags the project uses, and not only those that they modify (i.e. it should include tags that are used in read-only ways too).

E.g. AddShoulder / AddCycleway / AddSidewalk use the tag expressway=* to make decisions, even if none of the quests will actually ever modify expressway=* tag.

@rugk
Copy link
Contributor

rugk commented Jul 28, 2024

This is also not what I meant. I meant a CLI Kotlin tool (wrapped in Gradle or whatever, also whatever language actually, Kotlin just makes sense) just like generateQuestList e.g., that generates data into a JSON. As it has full access to the source code and can parse it (not requiring an outside parser that was mentioned – "parsing AST of StreetComplete code" sounds like it), it can just use native Kotlin, so hmm? What more could be made to have direct access to what is needed. AFAIK the app also needs access to the modified tags, so any Kotlin script likely can, too.

I also found the PR for the other mentioned example: #2523
Maybe I am missing sth. here, but depending on how easy it is to import the source code of SC in such a CLI tool, it may be possible? I saw the other script read the files from disk, which obviously is not ideal, as it may break stuff and is a fault-prone way. Though, usable for that use case there, very likely.

@rugk
Copy link
Contributor

rugk commented Jul 28, 2024

Ah sorry should read til the end…

Taginfo projects are supposed to show all tags the project uses, and not only those that they modify (i.e. it should include tags that are used in read-only ways too).

Yeah okay, but would not "modify" be a good first step? I guess r/o access is not that often in there, and IIRC one can also differentiate in taginfo whether a tag is added or how it is used (at least I see an auto-generated description)? So we would not include wrong data, just maybe not complete one… until such cases of "r/o accessed-tags" are also implemented?

@mnalis
Copy link
Member

mnalis commented Jul 29, 2024

Yeah okay, but would not "modify" be a good first step?

Perhaps; AFAICT that it is how it was generated in the past, so one may argue that something is better than nothing, especially as I can put a disclaimer there.

Than again, it might be argued that the partial data (especially one skewed far to one side) paints the wrong picture about usage of the tags, so that it is more harmful then useful. 🤷‍♂️
I guess it all depends what is one's purpose on consulting Taginfo Projects usage in the first place...

I guess r/o access is not that often in there

Well, I wouldn't even dare to guess. There is certainly much more reading of tags then writing of them. And even the few ways I've seen seem complex enough to try to handle, and I'm sure there are plenty more I don't even know about, so I feel it might be some work to even make an attempt at educated guess 🤔

if someone is interested I can share with them my code and maybe help them a bit (at least with testing)

@matkoniecz if you can link the code, it would be interesting to see how complex / kludgy it is currently, and how more might need to be done for it to be more useful...

@1ec5
Copy link
Contributor

1ec5 commented Jul 29, 2024

As a starting point, what about listing out the tags and keys that StreetComplete quests use to identify candidate questions? The filter DSL seems to be much more formulaic, possibly amenable to a scraping technique. A mapper consulting taginfo would still find it helpful to know that StreetComplete cares about a given key, even if they don’t know whether it writes out that key.

@matkoniecz
Copy link
Member Author

matkoniecz commented Jul 29, 2024

@matkoniecz if you can link the code, it would be interesting to see how complex / kludgy it is currently, and how more might need to be done for it to be more useful...

https://github.com/matkoniecz/Zazolc/blob/taginfo/buildSrc/src/main/java/UpdateTaginfoListingTask.kt

copy in case I will delete this branch: https://gist.github.com/matkoniecz/03c812162ba43654188c2ad537e4956a

@rugk
Copy link
Contributor

rugk commented Aug 2, 2024

Would not a kinda easy refactoring be that, if tags are used not in that SC overpass-like DSL, all quests should somehow register such tags as constants/list or in a function or so (or whatever data structure), where they query it. This function can then be accessed from another function which collects all these tags.

Again, I would advise against scraping and adjust the SC quests source code to make it easier to implement.

@matkoniecz
Copy link
Member Author

#4225 (comment)

@westnordost
Copy link
Member

westnordost commented Aug 2, 2024

By the way, if anyone is going to look into this again, I recommend NOT including the tags in the quest filters in the list of "tags used by StreetComplete" because these tags are just used for filtering (to e.g. find ambiguous data) and may also include deprecated tags etc. It would be quite misleading to display them as "used by StreetComplete".

Rather, a script that runs through the codebase should rather look for tags that are actually tagged by StreetComplete, i.e. added or modified. Even tags deleted by StreetComplete should not be included, as these, again, might be deprecated tags that are just cleaned up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feedback required more info is needed, issue will be likely closed if it is not provided
Projects
None yet
Development

No branches or pull requests

7 participants