Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add withOptional flag that will trigger region when needed #42

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented Feb 5, 2021

I added a commit above #41 that implement my comment #41 (comment)

What do you think @missinglink ? The PR for the API is in progress 😉

@missinglink
Copy link
Member

missinglink commented Feb 10, 2021

Heya, sorry I didn't reply earlier, I wanted to think about this a bit more before commenting.

I think that generally this is a great approach, namely:

  • have multiple label generators per country
  • let the API layer decide if it should re-generate the labels using a different generator

One change I would suggest is around code organisation.
The withOptional and meta.optional syntax don't feel very natural to me.

Just throwing out an idea, but could we change that so that each 'generator' (such as fr for instance) could have multiple sub-generators, so fr.verbose or fr.succinct for example.

I think that has a few advantages:

  • Ability to add more than one 'optional' generator
  • Cleaner syntax / module API
  • Easier to understand the logic and reason about the outcome of each generator
  • Easier to test each generator independently of each other
  • Potentially configurable which sub-generator is selected via pelias.json
  • Extensible in the future if we decided that in some cases we needed to re-label more than once

Not sure, does that make sense?

@Joxit
Copy link
Member Author

Joxit commented Feb 11, 2021

If I understand correctly, the schema should look something like this ?

  'FRA': {
    'valueFunctions': {
      'local': getFirstProperty(['locality', 'localadmin']),
      'country': getFRACountryValue()
    },
    'verbose': {
      'valueFunctions': {
        'local': getFirstProperty(['locality', 'localadmin']),
        'regional': getFRARegionValue(),
        'country': getFRACountryValue()
      },
      'meta': {
        'separator': ',' // This is useless for FRA
      }
    },
    'meta': {
      'separator': ',' // This is useless for FRA
    }
  }

I like the term verbose, but I don't think it will be enough for what I have in mind.

I want to expand the best part of the label based on the results. That means:

  1. For Bagneux, France, I will expand the regional part
Bagneux, Hauts-de-Seine, France
Bagneux, Deux-Sèvres, France
...
  1. For Starbucks, New York, NY, USA, I will expand the street part
Starbucks, 5th Avenue, New York, NY, USA
Starbucks, 6th Avenue, New York, NY, USA
...
  1. For Police nationale, Marseille, France, I will expand the borough (maybe with street & only in France)
Police nationale, 12e Arr., Marseille, France
Police nationale, 16e Arr., Marseille, France
...

With an option like withOptional and verbose, the results for Starbucks an Police nationale will include the regional part which is not relevant because it is not discriminatory.


Maybe I will need metadata from the API to identify similar parts... Or pelias-label should take the result list...

⚠️ MIDNIGHT DRAFT

Metadata example for Bagneux, France:

{
        "country": 2, // same name and same ID
        "macroregion": 0, // no match
        "region": 0, // no match
        "macrocounty": 0, // no match
        "county": 0, // no match
        "localadmin": 1, // same name
        "locality": 1, // same name
        "continent": 2, //same name and same ID
}

Metadata example for Starbucks, New York, NY, USA:

{
        "housenumber": 0, // no match
        "street": 0, // no match
        "postalcode": 0, // no match
        "country": 2,  //same name and same ID
        "region": 2,  //same name and same ID
        "county": 2,  //same name and same ID
        "locality": 2,  //same name and same ID
        "borough": 2,  //same name and same ID
        "neighbourhood": 0, // no match
        "continent": 2,  //same name and same ID
}

@missinglink
Copy link
Member

missinglink commented Feb 12, 2021

hmm yeah sorry if I'm actually making this more complicated 😆
I just look at this code (even prior to this PR) and think there must be a better way to do it 🤔

Just throwing out an idea here but if we returned an array of objects from the generator, we could then pass it up to the API layer and let it make the decision about how to .join() it.

So off the top of my head, something like this:

var doc = thePeliasDocForBagneux()
var parts = generator.fr(doc)

/**
  parts would be equal to:
  
  [
    { label: 'Bagneux', layer: 'locality', role: 'required' },
    { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' },
    { label: 'France', layer: 'country', role: 'required' },
  ]
  
**/

// Get the existing 'simple' label
var simple = (p) => ['required'].includes(p.role)
var str = parts.filter(simple).join(', ')

// Get the label with 'optional' parts
var extended = (p) => ['required','optional'].includes(p.role)
var str = parts.filter(extended).join(', ')

The generator would just be responsible for specifying the order of the array items, what is included or discarded from the array and what roles are assigned based on the document layer.

The stringification could be performed either within this repo or in the pelias/api codebase, whichever makes the most sense.

🤷‍♂️

@Joxit
Copy link
Member Author

Joxit commented Feb 12, 2021

Hum... Yes and in this case, if we return an array to the API:

  1. We do the simple join
  2. We check duplicates
  3. On duplicates, we looking for the left most discriminatory optional label on the first two documents
  4. We recreate the label with required ones and the discriminatory label

We should not forget to return the separator character (space in Korea)

This is of course a breaking change

@Joxit
Copy link
Member Author

Joxit commented Feb 15, 2021

Hi there, I updated this branch with pelias/api#1507 to apply your idea @missinglink.

This is backward compatible, I added a new function partsGenerator that will return the new object:

{
  parts:   [
    { label: 'Bagneux', layer: 'locality', role: 'required' },
    { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' },
    { label: 'France', layer: 'country', role: 'required' },
  ],
  separator: ', '
}

role has only two values, I don't know if one day we will use it with other values there or if we could simplify with a boolean required: true 🤷

Anyway, the result looks very good !

➡️ Police nationale, Marseille, France

Police nationale, 12th Arr., Marseille, France
Police nationale, 16th Arr., Marseille, France
Police nationale, 2nd Arr., Marseille, France
Police nationale, 3rd Arr., Marseille, France
Police nationale, 14th Arr., Marseille, France
Police nationale, 13th Arr., Marseille, France
Police nationale, 8th Arr., Marseille, France
Police nationale, 11th Arr., Marseille, France
Police nationale, 1st Arr., Marseille, France
Police nationale, 10th Arr., Marseille, France

➡️ Bagneux, France

Bagneux, Hauts-de-Seine, France
Bagneux, Deux-Sèvres, France
Bagneux, Indre, France
Bagneux, Marne, France
Bagneux, Allier, France
Bagneux, Meurthe-et-Moselle, France
Bagneux, Aisne, France

➡️ Starbucks, New York

Starbucks, 5th Avenue, New York, NY, USA
Starbucks, 6th Avenue, New York, NY, USA
Starbucks, 3rd Avenue, New York, NY, USA
Starbucks, West 181st Street, New York, NY, USA
Starbucks, 7th Avenue, New York, NY, USA
Starbucks, 1st Avenue, New York, NY, USA
Starbucks, East 93rd Street, New York, NY, USA
Starbucks, East 90th Street, New York, NY, USA
Starbucks, West 145th Street, New York, NY, USA
Starbucks, West 23rd Street, New York, NY, USA

@Joxit Joxit force-pushed the joxit/feat/with-optional branch 2 times, most recently from 12ab8d0 to 73ef845 Compare May 16, 2022 12:11
@Joxit
Copy link
Member Author

Joxit commented May 16, 2022

Hello there, I did an update of this PR

  • I added some tests (173 passed => 310 passed)
  • Some layer values were missing
  • I've updated the output of partsGenerator now it's { labelParts: [], separator: '' } instead of { parts: [], separator: '' }
  • It's now sync with master

Now I have two questions.
Do you think this PR is still interesting for the project? Otherwise I will stop updating it and close it.

I saw that there were spaces in Japanese labels. Maybe this PR is a perfect use case to handle this?

// combine components, inserting a space between the postalcode, admin, and name sections
// when written on an envelope these would correspond to new lines, a space should help with some clarity
let labelParts = _.concat(postalcode, ' ', admin, distric, block, ' ', venue);

let labelParts = _.concat(postalcode, { label: ' ', role: 'newline' }, admin, distric, block, { label: ' ', role: 'newline' }, venue); 
// OR
let labelParts = _.concat(postalcode, { label: ' ', role: 'separator' }, admin, distric, block, { label: ' ', role: 'separator' }, venue); 

Also, the space seems to be ignored when the label is built:

const expected = '' + // no postalcode for this record
'北海' + // prefecture (region), Hokkaido Prefecture
'札幌' + // city (county), Sapporo
'札幌市' + // designated city (locality), Sapporo-shi
'北二十四条西四丁目' + // chome/ward (street)
'2番12号'; // "bango" (block and building number)

@Joxit Joxit force-pushed the joxit/feat/with-optional branch 3 times, most recently from f5b56ed to 37056b8 Compare May 17, 2022 10:13
This will help for label deduplication in the API.

Sample of the response
```json
{
  labelParts:   [
    { label: 'Bagneux', layer: 'name', role: 'required' },
    { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' },
    { label: 'France', layer: 'country', role: 'required' },
  ],
  separator: ', '
}
```
@Joxit Joxit force-pushed the joxit/feat/with-optional branch from 37056b8 to 762f22b Compare May 17, 2022 11:11
@Joxit
Copy link
Member Author

Joxit commented May 19, 2022

I added some minor fixes, now this is ready to review/merge

@orangejulius
Copy link
Member

We really need to test this out, you've done a lot of work here. Stand by :)

@Joxit Joxit marked this pull request as draft January 17, 2023 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants