Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public Open Space validation #415

Open
ryn-trnr opened this issue Apr 17, 2024 · 6 comments
Open

Public Open Space validation #415

ryn-trnr opened this issue Apr 17, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@ryn-trnr
Copy link
Collaborator

ryn-trnr commented Apr 17, 2024

Feedback received from the Public Urban Green Space validation exercise with international collaborators has identified that it is worth considering adding the OSM tag natural=wood to the current list for generating public open space. Below is a list of areas tagged with natural=wood that are not identified as public open space and therefore subsequently not identified as public urban green space.

  1. Mexico City's most significant urban green space, Bosque de Chapultepec.
    https://www.openstreetmap.org/relation/2514869

  2. Melbourne suburb of Heathmont, Wombolano Park.
    https://www.openstreetmap.org/way/27926584

  3. Melbourne suburb of Wantirna, Bateman Street Bushland.
    https://www.openstreetmap.org/way/49751126

  4. Identified by Rossano, various green spaces along the river Sangone in Turin.
    https://www.openstreetmap.org/way/613826844
    https://www.openstreetmap.org/way/613826852
    https://www.openstreetmap.org/way/613835108

  5. Identified by Vuokko Heikinheimo, many urban forests in Helsinki such as the one below:
    https://www.openstreetmap.org/way/24368409

The challenge is how does this natural=wood tag play out in different geographic contexts. The provided list are a few examples of missed spaces that obviously should be included but there is still the risk of misclassification if natural=wood is included.

In future, developing a more comprehensive and diverse list of natural=wood examples would be beneficial.

@ryn-trnr ryn-trnr added the enhancement New feature or request label Apr 17, 2024
@ryn-trnr ryn-trnr self-assigned this Apr 17, 2024
@carlhiggs
Copy link
Member

Thanks for opening this issue @ryn-trnr ; this will be important to explore and find solutions for.

It is possible for users to customise the configuration file that defines various aspects that are used in identifying public open space. This file is provided as a template, that is copied over when users first run the software to the process/configuration folder. It could be that we could add an option to provide over-rides on a per-city basis within a region's configuration file; I think that could be a good approach that would be easy to implement. A user could conduct sensitivity analysis pre- and post-change (like described on our website here).

So, there are some existing references to 'natural=wood' that are good to be aware of in that configuration file:

explanation: These tags are to be joined in a comma seperated list, once they have been enclosed in single quotation marks. I omitted 'pitch' (which are private basketball courts in some places, e.g. Sao Paulo), and blue space tags (e.g. river) which are not relevant for this project. Note that while wood is included here, it is marked as being not public below (as there are many places tagged as natural=wood on OSM which would not be thought of as public open space)

On the above line that identifies the kinds of land uses that may indicate open space (in conjunction with other checks), there is this caveat re natural=wood:

Note that while wood is included here, it is marked as being not public below (as there are many places tagged as natural=wood on OSM which would not be thought of as public open space)

So, clearly this tag has been explored below (I should do search of offline/online notes)

This is the part where natural=wood is explicitly flagged as being an exclusion criteria for an open space being 'public':

public_not_in:
type: json
comprehension: ("{0}" IS NULL OR "{0}" NOT IN {1})
join: ' AND '
explanation: Where the keys in this json snippet are found to have values in their associated lists, these are used to indicate areas which are not flagged as public. Incorporating Olomouc feedback regarding garden_type. Also areas which are not necessarily public, except if located within the bounds of a broader public area were excluded (added pitch as an excluded form of leisure, and 'wood' as an excluded form of natural). Also added 'building=yes'. These modifications are now serving for this to more specifically signal public open space.
criteria: |-
("amenity" IS NULL OR "amenity" NOT IN ('aged_care', 'animal_boarding', 'allotments', 'animal_boarding', 'bank', 'bar', 'biergarten', 'boatyard', 'carpark', 'childcare', 'casino', 'church', 'club', 'club_house', 'college', 'conference_centre', 'embassy', 'fast_food', 'garden_centre', 'grave_yard', 'hospital', 'gym', 'kindergarten', 'monastery', 'motel', 'nursing_home', 'parking', 'parking_space', 'prison', 'retirement', 'retirement_home', 'retirement_village', 'school', 'scout_hut', 'university')) AND ("leisure" IS NULL OR "leisure" NOT IN ('garden', 'golf_course', 'horse_riding', 'pitch', 'racetrack', 'summer_camp', 'sports_club', 'stadium', 'sports_centre')) AND ("building" IS NULL OR "building" NOT IN ('yes',)) AND ("area" IS NULL OR "area" NOT IN ('school',)) AND ("natural" IS NULL OR "natural" NOT IN ('fell', 'bay', 'bog', 'cliff', 'geyser', 'reef', 'scrub', 'sinkhole', 'strait', 'volcano', 'wetland', 'wood', 'water')) AND ("recreation_ground" IS NULL OR "recreation_ground" NOT IN ('showground', 'school_playing_field', 'horse_racing', 'show_grounds', 'school_playing_fields')) AND ("sport" IS NULL OR "sport" NOT IN ('archery', 'badminton', 'bocce', 'boules', 'bowls', 'croquet', 'dog_racing', 'equestrian', 'futsal', 'gokarts', 'golf', 'greyhound_racing', 'horse_racing', 'karting', 'lacross', 'lacrosse', 'lawn_bowls', 'motocross', 'motor', 'motorcycle', 'polo', 'shooting', 'snooker', 'trugo')) AND ("access" IS NULL OR "access" NOT IN ('customers', 'private', 'no')) AND ("tourism" IS NULL OR "tourism" NOT IN ('alpine_hut', 'apartment', 'aquarium', 'bed_and_breakfast', 'caravan_site', 'chalet', 'gallery', 'guest_house', 'hostel', 'hotel', 'information', 'motel', 'museum', 'theme_park', 'zoo')) AND ("garden:type" IS NULL OR "garden:type" NOT IN ('residential', 'residental', 'private', 'commercial', 'pub', 'school', 'roof_garden'))

The above contains this explanation about wood,

Also areas which are not necessarily public, except if located within the bounds of a broader public area were excluded (added pitch as an excluded form of leisure, and 'wood' as an excluded form of natural). Also added 'building=yes'.

This method of identifying public areas was initially developed based on case studies of Australian cities as part of the Australian National Liveability Study (https://doi.org/10.1038/s41597-023-02013-5), and subsequently adapted to identify public open space across international cities working with local collaborators as part of the 25 cities global indicators study (described in https://doi.org/10.1111/gean.12290 and https://doi.org/10.1016/S2214-109X(22)00072-9).

It is not surprising that as we expand to more cities additional modifications required to meet validation checks are identified; this is why we have the approach with configuration files provided as templates, but copied to a separate folders for users to modify while retaining the original template.

I think useful things will be:

  • make it easier to modify specific configuration settings (e.g. for modifying open space tags) on a per-city basis through optional additions to region configuration file
  • providing additional guidance for users on how to do this and how to conduct sensitivity analyses
  • conducting sensitivity analyses ourselves to evaluate the impact of the current implementation of natural=wood vs, not excluding from public if this is not included within a public area

Cities where this is an issue can already have their open street map configuration file modified to evaluate whether modifications make identification of public open spaces more accurate. I am happy to provide guidance on how to do this. What we can do is make it easier for this to be done, and provide explicit guidance to do it. I think we should approach changing the overall default setting cautiously, as it was implemented for a reason -- its like a trade off between sensitivity and specificity; let's make sure we get the balance right, and provide guidance for specific contexts where the exact definition needs extra calibration to be right for that setting.

For now, is a user amends line 81 of their process/configuration/osm_open_space.yml file to be the following will mean that areas that aren't otherwise identified as being public but have natural=wood shouldn't be excluded from being considered public because of that:

        ("amenity" IS NULL OR "amenity" NOT IN ('aged_care', 'animal_boarding', 'allotments', 'animal_boarding', 'bank', 'bar', 'biergarten', 'boatyard', 'carpark', 'childcare', 'casino', 'church', 'club', 'club_house', 'college', 'conference_centre', 'embassy', 'fast_food', 'garden_centre', 'grave_yard', 'hospital', 'gym', 'kindergarten', 'monastery', 'motel', 'nursing_home', 'parking', 'parking_space', 'prison', 'retirement', 'retirement_home', 'retirement_village', 'school', 'scout_hut', 'university')) AND ("leisure" IS NULL OR "leisure" NOT IN ('garden', 'golf_course', 'horse_riding', 'pitch', 'racetrack', 'summer_camp', 'sports_club', 'stadium', 'sports_centre')) AND ("building" IS NULL OR "building" NOT IN ('yes',)) AND ("area" IS NULL OR "area" NOT IN ('school',)) AND ("natural" IS NULL OR "natural" NOT IN ('fell', 'bay', 'bog', 'cliff', 'geyser', 'reef', 'scrub', 'sinkhole', 'strait', 'volcano', 'wetland', 'water')) AND ("recreation_ground" IS NULL OR "recreation_ground" NOT IN ('showground', 'school_playing_field', 'horse_racing', 'show_grounds', 'school_playing_fields')) AND ("sport" IS NULL OR "sport" NOT IN ('archery', 'badminton', 'bocce', 'boules', 'bowls', 'croquet', 'dog_racing', 'equestrian', 'futsal', 'gokarts', 'golf', 'greyhound_racing', 'horse_racing', 'karting', 'lacross', 'lacrosse', 'lawn_bowls', 'motocross', 'motor', 'motorcycle', 'polo', 'shooting', 'snooker', 'trugo')) AND ("access" IS NULL OR "access" NOT IN ('customers', 'private', 'no')) AND ("tourism" IS NULL OR "tourism" NOT IN ('alpine_hut', 'apartment', 'aquarium', 'bed_and_breakfast', 'caravan_site', 'chalet', 'gallery', 'guest_house', 'hostel', 'hotel', 'information', 'motel', 'museum', 'theme_park', 'zoo')) AND ("garden:type" IS NULL OR "garden:type" NOT IN ('residential', 'residental', 'private', 'commercial', 'pub', 'school', 'roof_garden'))

@carlhiggs
Copy link
Member

I wrote the following comment sequentially, its a bit long:

  1. I provide guidance to @eugenrb that could be useful for her analysis of Mexican cities with Cesar.
  2. I confirm that the issue with Chapultepec park wasn't an issue when we did our 25-city study; someone changed the tagging a couple of years ago. (perhaps misleadingly/incorrectly; if one thought that was the case, e.g. @eugenrb, with your local knowledge, you could amend the tag?

Assistance to re-do analysis with an amended open space definition
@eugenrb the above descibes a recently observed issue, where Chapultepec park was not identified as public open space, which is clearly an over sight. I think I provide a better solution further below based on editing OpenStreetMap below, but wanted to provide this as an option on our side.

To make things easier for those wanting to explore the influence after implementing the above I prepared an alternative osm_open_space.yml file containing the edit on line 81 described above (ie. removing , 'wood'). The zipped file is linked below, and once extracted the file could be saved in the process/configuration folder. This will then remove that potential exclusion for other cities analysed. However, it may cause other problems in correct identification of public open spaces (the exclusion was added for that reason). These potentially could be picked up when validating the locations identified as accessible as public open space. But, it should ensure Chapultepec park isn't excluded based on that tagging, if that was the case.

osm_open_space.yml.zip

Chapultepec park was a 'park' when we analysed it previously
Having said the above, I just had a look at results from our 25 city study and confirmed that Chapultepec park was included, as visible in below image where hexagons surrounding it had access within 500m (which is great; I was concerned when this issue was raised):
image

It seems that the area that appears park-like at surface value southwest of there was not included in that analysis because it was only incorporated as part of Chapultepec Park in 2021: https://en.wikipedia.org/wiki/Chapultepec#Fourth_section. I am not sure how it as tagged there previously, and could check, but this makes sense to me, that it was excluded because at that point in time, it wasn't a park.

However the change history for Chapultec park indicates that prior to 2022 it was tagged as leisure=park
image

In 2022 that tag was removed and replaced with nature=wood. Tags aren't mutually exclusive; @eugenrb if you think there is a more appropriate way to tag Bosque de Chapultepec --- for example, if you thought leisure=park were appropriate, you could add that tag or any others.

This might be a constructive solution, correcting OpenStreetMap, rather than tweaking things on our side, that could have unintended consequences. It might be that similar solutions could be appropriate for other cities.

Love to hear your thoughts @ryn-trnr and @eugenrb

Let me know how you go with this, or if it doesn't make sense; happy to work together to make an interim solution as required, and great that @ryn-trnr is interested in exploring sensitivity analyses for the impact of this modification, within-cities and between.

@gboeing
Copy link
Collaborator

gboeing commented May 1, 2024

Moving a recent email thread here for public discussion.

From @rychennn on 12 Apr:

The validation process is finished. Quick comments on issues that we think you might want to check:

  1. In the LACO-wiki validation, some public urban green spaces in the LA data require admission fees (e.g., LA County Arboretum Garden for 15 USD), and we chose CORRECT in the validation process. Those parks are marked "open access" in the government data.

  2. In the LACO-wiki validation, many tracks and fields (for schools) are marked as public urban green spaces. We marked them INCORRET.

From @dapugacheva on 1 May:

I am writing to follow up on the comments that Ruoyu pointed out in the previous email. There are some parks and green spaces that are marked open access in the government data yet there is an an entrance fee. In addition, some school green fields identified as green public spaces in the dataset have restricted access to them during school hours yet are open to the public outside of school time. All these comments are noted in the validation spreadsheet.

These conditions may influence the definition of 'public' of the green space in the study. We assume that a part of the validation process was to identify such potential issues mentioned above. We are open to further discussion.

From @carlhiggs on 1 May:

Identifying public spaces using openstreetmap is challenging, and great that you've identified some false positives, of places that charge fees. Our next steps will be to look at openstreetmap and our criteria for identifying public storage and saw if there are additional clues from tags that we could use to filter these out.

With regards to schools after hours; that is challenging --- as a rule, I don't think we can assume areas tagged as schools are public access. Perhaps for specific study regions we could find a way for customisation of public open space queries, to overcome exceptions.

Broadly there are four approaches we can take to modifications for identifying public open space:

  1. Change the default criteria used in the software (this effects all things studied, and is a big change, considering we have previously validated identification of public open space in diverse cities globally; our methods can be improved, but we need to be cautious that improving analysis for one city doesn't make it worse for others. This is where the sensitivity analysis that I believe Ryan is conducting comes in)

  2. Allow custom exceptions for specific study regions (this will be a useful addition; we'll have to code it, and find a way to make it accessible/flexible for people to use, but it's doable)

  3. Specific problems identified through validation (that should always be conducted after any analysis) may sometimes be best addressed through direct contribution to openstreetmap. For example, in the issue linked above I describe how a user recently removed "leisure=park" tag from an important public open space in Mexico City, meaning it was not located for Ryan's analysis. I believe that is best addressed in openstreetmap by directing the tagging, rather than in our software.

  4. We should add the option for users to add their own polygon layers of public open space. The risk is that their definition of public open space may vary from those used in other cities (ie our default), but arguably the most important thing is local relevance/acceptability/validity of results and if that's best achieved by using local data, we should support that.

The above are points we should add to our FAQ, as they apply in general to all features we identify, not just public open space. I'll do this later today.

We can all discuss this thread further here in this issue as needed.

@carlhiggs carlhiggs changed the title Public Open Space additional OSM tag natural=wood Public Open Space validation May 1, 2024
@carlhiggs
Copy link
Member

I was discussing the above with @MelanieLowe and @ryn-trnr yesterday, and Melanie suggested an additional fifth approach that we should support/suggest when validation of a study region's results identifies specific issues with data representations of features of interest. So, I'll reiterate this list here:

  1. Exceptions: Allow exceptions for specific regions for criteria within their configuration file
  2. Contributions: Encourage contributions to OpenStreetMap where more comprehensive tagging would be appropriate
  3. Customisation: Allow configuration of custom data
  4. Methodological: Modify the default ways of identifying features of interest, following sensitivity analyses to understand implications for a range of urban contexts to be sure that this provides a net improvement
  5. Acknowledgement: Acknowledge specific issues as limitations when reporting data

Regarding the first approach (Exceptions), I think that may be the best option for the Finnish cities regarding the 'natural=wood' issue described at the top of this post. We should consult with @VuokkoH and Rossano. Perhaps, we could even allow case by case exceptions, e.g. using specific OSM IDs that have been identified through validation that should be included (that may be a good approach for the Melbourne and Turin examples).

Regarding the second approach (Contributions), I believe that maybe the best option in many cases of the 'natural=wood' issue described for Mexico City, where a user removed the tag 'leisure=park' from an important park only leaving 'natural=wood'. I think that was an error and have suggested to Eugen that if she agrees, it could be best fixed by addressing that in OpenStreetMap.

Regarding the third approach (Customisation), currently we can do this in an awkward way for point features, but its awkward and could be improved (e.g. allow specification of generic spatial data formats, e.g. layers in a geopackage; the current CSV approach is quite esoteric). We should improve the implementation of custom data configuration, and extend to allow for polygon data such as areas of public open space.

The remaining two approaches are where this is a bigger issue that the first three can't solve (methodological change, that must be undertaken with care), or the issue is small enough that it doesn't meaningfully impact inference for the research question being considered (acknowledgement).

Do you think the above typology of approaches is good? If so, we can plan to add implementation of Exceptions and Customisation as priorities. I'll add a version of the above as a new 'feature request' for generic support of responses to validation.

@gboeing
Copy link
Collaborator

gboeing commented May 3, 2024

I think this looks like a good menu of approaches.

@eugenrb
Copy link
Member

eugenrb commented May 3, 2024

I wrote the following comment sequentially, its a bit long:

  1. I provide guidance to @eugenrb that could be useful for her analysis of Mexican cities with Cesar.
  2. I confirm that the issue with Chapultepec park wasn't an issue when we did our 25-city study; someone changed the tagging a couple of years ago. (perhaps misleadingly/incorrectly; if one thought that was the case, e.g. @eugenrb, with your local knowledge, you could amend the tag?

Assistance to re-do analysis with an amended open space definition @eugenrb the above descibes a recently observed issue, where Chapultepec park was not identified as public open space, which is clearly an over sight. I think I provide a better solution further below based on editing OpenStreetMap below, but wanted to provide this as an option on our side.

To make things easier for those wanting to explore the influence after implementing the above I prepared an alternative osm_open_space.yml file containing the edit on line 81 described above (ie. removing , 'wood'). The zipped file is linked below, and once extracted the file could be saved in the process/configuration folder. This will then remove that potential exclusion for other cities analysed. However, it may cause other problems in correct identification of public open spaces (the exclusion was added for that reason). These potentially could be picked up when validating the locations identified as accessible as public open space. But, it should ensure Chapultepec park isn't excluded based on that tagging, if that was the case.

osm_open_space.yml.zip

Chapultepec park was a 'park' when we analysed it previously Having said the above, I just had a look at results from our 25 city study and confirmed that Chapultepec park was included, as visible in below image where hexagons surrounding it had access within 500m (which is great; I was concerned when this issue was raised): image

It seems that the area that appears park-like at surface value southwest of there was not included in that analysis because it was only incorporated as part of Chapultepec Park in 2021: https://en.wikipedia.org/wiki/Chapultepec#Fourth_section. I am not sure how it as tagged there previously, and could check, but this makes sense to me, that it was excluded because at that point in time, it wasn't a park.

However the change history for Chapultec park indicates that prior to 2022 it was tagged as leisure=park image

In 2022 that tag was removed and replaced with nature=wood. Tags aren't mutually exclusive; @eugenrb if you think there is a more appropriate way to tag Bosque de Chapultepec --- for example, if you thought leisure=park were appropriate, you could add that tag or any others.

This might be a constructive solution, correcting OpenStreetMap, rather than tweaking things on our side, that could have unintended consequences. It might be that similar solutions could be appropriate for other cities.

Love to hear your thoughts @ryn-trnr and @eugenrb

Let me know how you go with this, or if it doesn't make sense; happy to work together to make an interim solution as required, and great that @ryn-trnr is interested in exploring sensitivity analyses for the impact of this modification, within-cities and between.

Thanks for this very thorough review of the issue, Carl!

I have been considering working on the parks label in OSM for a while. This issue with Chapultepec was relevant since it is the most important park in the city. However, I think the tag may have been changed to "wood" because the park's name is "Bosque de Chapultepec" or "Chapultepec Forest," even though it is a large metropolitan urban park with some wooded areas. To me, the most relevant issue we found through the validation was the inclusion of hilltops as parks. This is because most of those green areas are either private land or conservation land that is not accessible for recreational purposes.

Mexico City's public park database is quite thorough and an excellent resource. Using that layer would be the preferred data source for parks versus the OSM data. However, we do not have such good data or even any official data for some of the other cities. For the summer, we will be working on the policy analysis for 5 of the 10 Mexican cities we will analyze. Thus, I will have time to review the OSM park labels for these cities, and if I can, I will correct the tags and add some other parks that may need to be added.

I am very excited about @ryn-trnr doing this validation/sensitivity analysis since this highlights the need for people to do a quick review of the OSM data before using it and if possible, to correct some of these errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants