Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[do not merge] do not throw on name/address regex matching errors #122

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

this is one solution to the verbose regex matching errors.

we've had a lot of users worried and confused about the verbose errors we return when the source data for a name or address erroneously contains a URL.

there are a few ways of dealing with it, I considered returning a PeliasModelWarning error but solutions like that require that each importer is updated and it's not nice code because throwing will terminate the current scope, necessitating that each setter is wrapped in its own catch block 💩

this solution is fairly clean, I'm still outputting the error message via console.warn and considered using pelias-logger although I'd prefer to reduce the coupling to other modules to this one to avoid the 'bump cascades'.

@missinglink
Copy link
Member Author

also worth adding that the existing workflow is error-prone, in the case where a record contains multiple names in multiple languages, if one of those names is a URL then an Error is thrown and the whole record is discarded.

there are solutions for this which importers can elect to implement (such as the per-setter try/catch 💩 I mentioned above) although to-date none of the importers have adopted this approach and it's unlikely custom importer authors will have the insight to do so.

@orangejulius
Copy link
Member

Cool, it will be good to finally have this fixed. I have two suggestions:

1.) Definitely do use pelias-logger. It's super helpful to have all the logging formats controlled in a single place, and to keep the log output consistent.
2.) Lets try to think of a warning message that reassures users what is going on. We have learned that even the most "gentle" messages that are explicitly logged as warnings will still lead to some users thinking they are doing something wrong or that the entire import is invalid. invalid regex test, ${val} should not match ${regex} is pretty terse and absolute-sounding. How about something along the lines of "skipping record #{record_gid} because its name contains the url ${val}"? Having the GID in there would also be very helpful as we could then parse the logs and create a list of all records that did not pass the validation.

@missinglink
Copy link
Member Author

Yeah that sounds good, although there is an issue with the chain-of-command now, since the error is not being passed back to the caller we can't say things like "skipping record" as this library has no control over that behaviour ☹️

@orangejulius
Copy link
Member

Oh, hm. So what will the behavior be with this PR for a record that has a name that contains a URL?

@missinglink
Copy link
Member Author

If the record has a name with a URL it will console.warn and not throw an Error.
Prior to this PR it would throw an Error and it would be up to the importer to handle that, most would just reject the whole record.

@orangejulius
Copy link
Member

Got it, and with this PR, what document would end up being stored in Elasticsearch? Would it have a property containing the URL?

@missinglink
Copy link
Member Author

Hmm yes, I just had the same thought, and the tests don't properly cover checking that.

I think we need to have a rethink about how the validation works in general.

It makes sense to throw on mandatory properties in the constructor but in reality there are also additional properties which are mandatory such as a name.default and center_point.

One option is that we stop throwing on all setter methods (and ensure that the property is not set) along with an appropriate log message and log level.

I think this would be simpler for importers to handle due to less catch statements. The 'chaining' API complicated things as setters can't return values.

In order to ensure that the minimum viable set of properties is set on the object we can introduce a new method called isValid() or isValidOrThrow() which gets called just before the document is pushed downstream.

@missinglink missinglink changed the title do not throw on name/address regex matching errors [do not merge] do not throw on name/address regex matching errors Nov 20, 2019
@missinglink
Copy link
Member Author

Closes pelias/openstreetmap#534

@missinglink
Copy link
Member Author

related #116
related pelias/polylines#216
related pelias/polylines#225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants