Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Optionally parse list of tags to filter indexing by #25

Merged
merged 4 commits into from Mar 21, 2018
Merged

Conversation

karenzshea
Copy link
Contributor

@karenzshea karenzshea commented Jan 24, 2018

Closes #24

First pass implementation of filtering on OSM file by certain tags.

Remaining tasks

  • Figure out segfault
  • Write tests
  • Fix TODOs around optional handling
  • Figure out handling when no ways are returned
  • Allow loadOSMExtract to accept an array of files as well as one file string

@karenzshea karenzshea force-pushed the tags-filter branch 3 times, most recently from 54c0e7b to 14d72b1 Compare February 5, 2018 21:57
{
tags_filter.add_rule(true, osmium::TagMatcher(line));
}
tagfile.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove; the ifstream gets closed automatically

throw std::runtime_error(strerror(errno));
}
ParseTags(tagfile);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ending scope closes the ifstream (in its destructor)

std::strcmp(highway, "living_street") == 0 ||
std::strcmp(highway, "unclassified") == 0 || std::strcmp(highway, "service") == 0 ||
std::strcmp(highway, "ferry") == 0 || std::strcmp(highway, "movable") == 0 ||
std::strcmp(highway, "shuttle_train") == 0 || std::strcmp(highway, "default") == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about creating a function taking a vector of tags and checking all of them?

Or better put all tags here into a hashset and do constant time hset.count(highway) > 1 checks

Copy link
Collaborator

@danpat danpat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs:

  • Changelog
  • package.json bump
  • Maybe some docs on the format for the tagfilter file (in the README?)

async.each(wayIds, (way_id, next) => {
if (way_id)
{
annotator.getAllTagsForWayId(way_id, (err, tags) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably handle err here?

while (std::getline(tagfile, line))
{
tags_filter.add_rule(true, osmium::TagMatcher(line));
std::cout << "tag added: " << line << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some logging is done with cout and some with cerr - should probably make this consistent so that upstream usage doesn't have to jump through hoops to capture everything (cout is usually stdout and cerr is usually stderr).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, does it make sense then to switch to using cout for info type log lines and cerr for error specific messages?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good writeup on the philosophy of stderr vs stdout:
https://www.jstorimer.com/blogs/workingwithcode/7766119-when-to-use-stderr-instead-of-stdout

TL;DR - normal messages from the program, and things explicitly requested by adding parameters should go to stdout, out-of-the-ordinary messages and errors should go to stderr.

// add tags to tag filter object for use in way parsing
if (!tagfilename.empty())
{
std::cerr << "Parsing " << tagfilename << " ... " << std::flush;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, should be consistent with which stream we log to.

}
}

Extractor::Extractor(const std::string &osmfilename, Database &db, const std::string &tagfilename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably time to refactor the constructors so the code isn't repeated multiple times - I already felt bad about the duplication in the existing constructors. Want to have a go at rejiggering these so we don't have 3 copies of almost-identical logic?

@@ -22,6 +25,7 @@ struct Extractor final : osmium::handler::Handler
* @param d the Database object where everything will end up
*/
Extractor(const std::string &osmfilename, Database &d);
Extractor(const std::string &osmfilename, Database &d, const std::string &tagfilename);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of an additional constructor, maybe just have a single:

Extractor(const std::string &osmfilename, Database &d, const boost::optional<std::string> &tagfilename = boost::none);

I think we've already got all the needed Boost libraries available.

void way(const osmium::Way &way);
void ParseTags(std::ifstream &tagfile);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two new methods might be better off being private (or at least protected). Not sure we'd ever want to call them from outside the class constructor, they're mostly just internal helpers.

@karenzshea karenzshea force-pushed the tags-filter branch 2 times, most recently from 9e20470 to a8cae2a Compare February 20, 2018 14:35
@karenzshea karenzshea force-pushed the tags-filter branch 2 times, most recently from 9717d38 to 840df60 Compare March 9, 2018 15:41
@karenzshea karenzshea removed the request for review from TheMarex March 9, 2018 16:53
@danpat
Copy link
Collaborator

danpat commented Mar 21, 2018

@karenzshea I've rebased this on master, which has pulled in the changes from #30.

I've tagged and published 0.1.0-rc3 from this branch.

There's only one outstanding question: should all tags from a matching way be indexed, or only the tags indicated by the filter?

Currently, master separates the "tags to use to filter ways" from the "tags to store in the index". I've duplicated that logic on this branch, but it caused one test to fail:

https://github.com/mapbox/route-annotator/pull/25/files#diff-879eb6ab79a85f13ac84c620132435e1R100

I've commented it out so that tests pass.

If you're OK with this behaviour (only store the tags that we mention in the filter), then I think this PR is ready to merge to master and do a release of 0.1.0 on.

If you think we should index all tags on matching ways, then this if statement:

https://github.com/mapbox/route-annotator/pull/25/files#diff-3cb06dc074df26ca8f412b291b6902ccR171

should be removed, and we should index all tags. For the 0.1.0 release, I don't think it matters if we only index tags listed by the filter, we can certainly make this more flexible in a future release.

@karenzshea
Copy link
Contributor Author

@danpat I'm OK with the behavior of only storing tags that are in the filter. I only wrote the test that way to check that the way I anticipated was being detected.

Thanks for rebasing this after the other PR was merged!

@karenzshea karenzshea merged commit 913af6e into master Mar 21, 2018
@karenzshea karenzshea deleted the tags-filter branch March 21, 2018 13:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants