-
Notifications
You must be signed in to change notification settings - Fork 15
Profanity tag ... where? (Whom did the mapper offend?) #209
Comments
@Piskvor Thanks for the ticket. Currently the compare function compares words to a set list of bad words. These are forked from https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words. The compare function uses the specific language list when a suffix is used Like you said, it would be good to have description of which exact word got flagged for profanity. This needs adding support in OSMCha database, api, and then have osm-compare compare functions return descriptions. Tagging in @willemarcel to write his thoughts. |
Thanks for the explanation. Specifically in OSM, this looks easier than a general (contextless, text-only) matching: The method for comparing against main (sic!) languages is probably a useful heuristic even for the OSM name corpus, but since we do have geographical data both for the feature at hand, and for countries, it would IMNSHO make sense to check for the local language as well. Usually, the In pseudocode: which country bounding box(es) contain (any nodes of) the feature (changeset?)? If any, add their language(s), if any, to the (front of?) checkable list. I'm aware that this only looks easy as pseudocode, but "let's check for Spanish and Russian obscenities" is not very useful in locations where Spanish or Russian |
I think this is an excellent point and an excellent suggestion - making the assumption that To expand on the above pseudo-code a little bit:
My guess is there maybe points on the earth where you cannot get the country, or you do not have a valid language mapping, and there it's probably fine to default to english. I guess the problem would be for egregious insults in english or another language to show up in a country that has a different local language, and hence that profanity is not detected. Here one suggestion might be to have a smaller list of "really bad words across languages" that one may want to flag for review regardless of local language. To another point here: it would be helpful to have better descriptions of the various detectors, and a way for users on the Thank you again for the ticket @Piskvor +cc @willemarcel |
Sure, that's why I suggested "add the local language(s), if any, with the list of languages that are checked by default."
|
To vastly reduce the false positive rate of the profanity comparator only the first entry from the This entry causes every1 changeset to be flagged where the number I stumbled upon this problem via MapRoulette and the "Profanity OSMCha detections" challenge (more info also in my other comment on a related issue) this single line fix would probably reduce the the changesets tagged as profanity by ~70%. 1 To be precise tags where the |
Fix merged, but apparently bug persists: another changeset flagged as "profanity" even though completely clean - a way matches the "13." if interpreted as regex - https://osmcha.org/changesets/102995578?aoi=e76ef7d4-5ae9-4c96-ad14-98ab138d19d2 |
If a feature is flagged as indecent, it would be helpful to see what exactly triggered the flag and in what language - otherwise the users are left scratching their heads over an edit that looks completely benign in the language(s) they know.
Example:
https://osmcha.mapbox.com/changesets/60306121/ , specifically https://www.openstreetmap.org/relation/1293828 is tagged "Profanity tag", why? All the text are official stop names - not rude, not naughty, not even a double entendre or a pun - yet there it is. It is quite possible that "U Waltrovky" is considered the rudest possible curse in some language (infinite monkeys and birthday paradox and whatnot), but there is no indication which.
The text was updated successfully, but these errors were encountered: