Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10m Populated Place: Puducherry, India #454

Closed
elliotap opened this issue Dec 28, 2020 · 14 comments
Closed

10m Populated Place: Puducherry, India #454

elliotap opened this issue Dec 28, 2020 · 14 comments
Assignees

Comments

@elliotap
Copy link
Collaborator

Pondicherry, ID: 1159150163, was renamed to Puducherry in 2006. English-language wikipedia and wikidata (Q639421) still use well-known former name for page title.

https://en.wikipedia.org/wiki/Pondicherry
https://www.india.com/news/india/cabinet-approves-constitutional-amendment-to-change-name-of-the-union-territory-from-pondicherry-to-puducherry-1801316/
https://www.openstreetmap.org/node/245600130

@mizmay
Copy link
Collaborator

mizmay commented Dec 29, 2020

OpenStreetMap
Google Maps

@ImreSamu
Copy link
Collaborator

it is so complicated ; wikipedia voting:

@nvkelso
Copy link
Owner

nvkelso commented Dec 29, 2020

Hi @ImreSamu, long time no see! Thanks for the deep dive into Wiki Talk.

I'm in favor of updating the default name to Puducherry with name_alt of Pondicherry since it's a big town with a long time conventional name in the earlier form and the official name has been in circulation for 14 years (!!!).

For all the localizations the script follows https://www.wikidata.org/wiki/Q639421, which would result in the older name being used for name_en... and hilariously the alternative name is noted as the new name (but wouldn't be picked up by the script). @mizmay Note that while we audit name versus name_en we'd still want to preserve some deltas like this one. If Wikidata were updated then the script would pick up the new values.

image

@mizmay
Copy link
Collaborator

mizmay commented Dec 29, 2020

We can put anything in NAME, NAMEALT and NAMEPAR as these are not currently populated. It's likely that @elliotap is flagging the name_en value, which as the two of you point out, comes from wikidata and will not change.

> ne_10m_populated_places[ne_10m_populated_places$ne_id == 1159150163,]
     SCALERANK NATSCALE LABELRANK      FEATURECLA NAME NAMEPAR NAMEALT DIFFASCII   NAMEASCII ADM0CAP
6638         4       50         1 Admin-1 capital <NA>    <NA>    <NA>         0 Pondicherry       0
     CAPIN WORLDCITY MEGACITY SOV0NAME SOV_A3 ADM0NAME ADM0_A3   ADM1NAME ISO_A2 NOTE LATITUDE
6638  <NA>         0        0    India    IND    India     IND Puducherry     IN <NA> 11.93499
     LONGITUDE CHANGED NAMEDIFF DIFFNOTE POP_MAX POP_MIN POP_OTHER RANK_MAX RANK_MIN GEONAMEID
6638     79.83       0        0     <NA>  227411  227411   1518183       10       10   1259425
     MEGANAME     LS_NAME LS_MATCH CHECKME MAX_POP10 MAX_POP20 MAX_POP50 MAX_POP300 MAX_POP310
6638     <NA> Pondicherry        1       5   1518412   3438854   5677101          0          0
     MAX_NATSCA MIN_AREAKM MAX_AREAKM MIN_AREAMI MAX_AREAMI MIN_PERKM MAX_PERKM MIN_PERMI MAX_PERMI
6638         50       1148       6352        443       2453      1040      6495       646      4036
     MIN_BBXMIN MAX_BBXMIN MIN_BBXMAX MAX_BBXMAX MIN_BBYMIN MAX_BBYMIN MIN_BBYMAX MAX_BBYMAX
6638   78.63333   79.41786   79.95833   80.14167   11.16306   11.82518   12.38964   12.88783
     MEAN_BBXC MEAN_BBYC COMPARE    GN_ASCII FEATURE_CL FEATURE_CO ADMIN1_COD GN_POP ELEVATION
6638  79.56258   11.9822       0 Pondicherry          P       PPLA         22 227411         0
     GTOPO30     TIMEZONE              GEONAMESNO UN_FID UN_ADM0 UN_LAT UN_LONG POP1950 POP1955
6638       1 Asia/Kolkata GeoNames match general.      0    <NA>      0       0       0       0
     POP1960 POP1965 POP1970 POP1975 POP1980 POP1985 POP1990 POP1995 POP2000 POP2005 POP2010 POP2015
6638       0       0       0       0       0       0       0       0       0       0       0       0
     POP2020 POP2025 POP2050 CITYALT min_zoom wikidataid    wof_id CAPALT     name_en    name_de
6638       0       0       0    <NA>      5.1    Q639421 102028985     NA Pondicherry Puducherry
        name_es    name_fr name_pt    name_ru  name_zh label name_ar name_bn name_el name_hi name_hu
6638 Puducherry Pondichéry    <NA> Пондичерри 本地治里  <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
     name_id name_it name_ja name_ko name_nl name_pl name_sv name_tr name_vi wdid_score      ne_id
6638    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>          4 1159150163
               geometry
6638 79.83000, 11.93499

@elliotap
Copy link
Collaborator Author

@mizmay yes - flagging for name_en. I've been checking issues against a copy of the 10m shapefile downloaded 5/21/2020 where I am also seeing NAME populated with "Pondicherry"

@mizmay
Copy link
Collaborator

mizmay commented Dec 29, 2020

@nvkelso is it appropriate to use NAMEDIFF or some other existing field to denote where NAME is not expected/required to match name_en?

@nvkelso
Copy link
Owner

nvkelso commented Dec 29, 2020

In the past namediff has been used to indicate (and combined with note field for explanation) when a change has been made version to version. It hadn't been maintained in several major versions.

In most cases it may be better to simply edit the wikidata value upstream? It's only loosely associated with Wikipedia and it's controversies?

@mizmay
Copy link
Collaborator

mizmay commented Dec 29, 2020 via email

@nvkelso
Copy link
Owner

nvkelso commented Dec 29, 2020

There is a limit to number of columns in a DBF (it is an old but reliable format) and we're running up against that. I could see a new column with a JSON object to store such a mask for the ~ 2 dozen localizations? The crux is how often we update name vis-a-vis name_en? I think not very often and human reviewed? What about noting deliberate deltas in notes column and including that column in the wikidata QA generated tables?

@nvkelso nvkelso added this to the v5.1.0 milestone Feb 11, 2021
@mizmay mizmay self-assigned this Mar 7, 2021
@mizmay
Copy link
Collaborator

mizmay commented Mar 7, 2021

Suggested resolution: We'll use Puducherry in NAME and NAMEASCII but continue to rely on consensus from Wikidata editors for the localizations.

@nvkelso
Copy link
Owner

nvkelso commented Mar 9, 2021

Agree with these changes for ne_id = 1159150163:

  • name = Puducherry
  • nameascii = Puducherry
  • namepar = Pondicherry

@mizmay
Copy link
Collaborator

mizmay commented Apr 5, 2021

Here we go!

BEFORE:

NE_ID NAME NAMEPAR NAMEALT NAMEASCII SOV0NAME ADM0NAME ADM1NAME MEGANAME LS_NAME NAME_EN NAME_DE NAME_ES NAME_FR NAME_PT NAME_RU NAME_ZH NAME_AR NAME_BN NAME_EL NAME_HI NAME_HU NAME_ID NAME_IT NAME_JA NAME_KO NAME_NL NAME_PL NAME_SV NAME_TR NAME_VI
1159150163 Pondicherry NA NA Pondicherry India India Puducherry NA Pondicherry Pondicherry Puducherry Puducherry Pondichéry NA Пондичерри 本地治里 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

mizmay added a commit to mizmay/natural-earth-vector that referenced this issue Apr 5, 2021
@mizmay
Copy link
Collaborator

mizmay commented Apr 5, 2021

AFTER:

NE_ID NAME NAMEPAR NAMEALT NAMEASCII SOV0NAME ADM0NAME ADM1NAME MEGANAME LS_NAME NAME_EN NAME_DE NAME_ES NAME_FR NAME_PT NAME_RU NAME_ZH NAME_AR NAME_BN NAME_EL NAME_HI NAME_HU NAME_ID NAME_IT NAME_JA NAME_KO NAME_NL NAME_PL NAME_SV NAME_TR NAME_VI
1159150163 Puducherry Pondicherry Puducherry India India Puducherry NA Pondicherry Pondicherry Puducherry Puducherry Pondichéry NA Пондичерри 本地治里 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

mizmay added a commit to mizmay/natural-earth-vector that referenced this issue Apr 5, 2021
mizmay added a commit to mizmay/natural-earth-vector that referenced this issue Apr 5, 2021
mizmay added a commit to mizmay/natural-earth-vector that referenced this issue Apr 17, 2021
@nvkelso
Copy link
Owner

nvkelso commented Apr 22, 2021

Fixed via #508 and 54a6619 to reapply #5.

@nvkelso nvkelso closed this as completed Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants