Optionally parse list of tags to filter indexing by #25
Conversation
54c0e7b
to
14d72b1
Compare
src/extractor.cpp
Outdated
{ | ||
tags_filter.add_rule(true, osmium::TagMatcher(line)); | ||
} | ||
tagfile.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove; the ifstream gets closed automatically
src/extractor.cpp
Outdated
throw std::runtime_error(strerror(errno)); | ||
} | ||
ParseTags(tagfile); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ending scope closes the ifstream (in its destructor)
src/extractor.cpp
Outdated
std::strcmp(highway, "living_street") == 0 || | ||
std::strcmp(highway, "unclassified") == 0 || std::strcmp(highway, "service") == 0 || | ||
std::strcmp(highway, "ferry") == 0 || std::strcmp(highway, "movable") == 0 || | ||
std::strcmp(highway, "shuttle_train") == 0 || std::strcmp(highway, "default") == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about creating a function taking a vector of tags and checking all of them?
Or better put all tags here into a hashset and do constant time hset.count(highway) > 1
checks
465ba7d
to
9fe9371
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs:
- Changelog
- package.json bump
- Maybe some docs on the format for the tagfilter file (in the README?)
example-server.js
Outdated
async.each(wayIds, (way_id, next) => { | ||
if (way_id) | ||
{ | ||
annotator.getAllTagsForWayId(way_id, (err, tags) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably handle err
here?
src/extractor.cpp
Outdated
while (std::getline(tagfile, line)) | ||
{ | ||
tags_filter.add_rule(true, osmium::TagMatcher(line)); | ||
std::cout << "tag added: " << line << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some logging is done with cout
and some with cerr
- should probably make this consistent so that upstream usage doesn't have to jump through hoops to capture everything (cout
is usually stdout
and cerr
is usually stderr
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, does it make sense then to switch to using cout
for info type log lines and cerr
for error specific messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good writeup on the philosophy of stderr
vs stdout
:
https://www.jstorimer.com/blogs/workingwithcode/7766119-when-to-use-stderr-instead-of-stdout
TL;DR - normal messages from the program, and things explicitly requested by adding parameters should go to stdout
, out-of-the-ordinary messages and errors should go to stderr
.
src/extractor.cpp
Outdated
// add tags to tag filter object for use in way parsing | ||
if (!tagfilename.empty()) | ||
{ | ||
std::cerr << "Parsing " << tagfilename << " ... " << std::flush; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, should be consistent with which stream we log to.
src/extractor.cpp
Outdated
} | ||
} | ||
|
||
Extractor::Extractor(const std::string &osmfilename, Database &db, const std::string &tagfilename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably time to refactor the constructors so the code isn't repeated multiple times - I already felt bad about the duplication in the existing constructors. Want to have a go at rejiggering these so we don't have 3 copies of almost-identical logic?
src/extractor.hpp
Outdated
@@ -22,6 +25,7 @@ struct Extractor final : osmium::handler::Handler | |||
* @param d the Database object where everything will end up | |||
*/ | |||
Extractor(const std::string &osmfilename, Database &d); | |||
Extractor(const std::string &osmfilename, Database &d, const std::string &tagfilename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of an additional constructor, maybe just have a single:
Extractor(const std::string &osmfilename, Database &d, const boost::optional<std::string> &tagfilename = boost::none);
I think we've already got all the needed Boost libraries available.
src/extractor.hpp
Outdated
void way(const osmium::Way &way); | ||
void ParseTags(std::ifstream &tagfile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two new methods might be better off being private (or at least protected). Not sure we'd ever want to call them from outside the class constructor, they're mostly just internal helpers.
9e20470
to
a8cae2a
Compare
9717d38
to
840df60
Compare
@karenzshea I've rebased this on I've tagged and published 0.1.0-rc3 from this branch. There's only one outstanding question: should all tags from a matching way be indexed, or only the tags indicated by the filter? Currently, https://github.com/mapbox/route-annotator/pull/25/files#diff-879eb6ab79a85f13ac84c620132435e1R100 I've commented it out so that tests pass. If you're OK with this behaviour (only store the tags that we mention in the filter), then I think this PR is ready to merge to If you think we should index all tags on matching ways, then this https://github.com/mapbox/route-annotator/pull/25/files#diff-3cb06dc074df26ca8f412b291b6902ccR171 should be removed, and we should index all tags. For the 0.1.0 release, I don't think it matters if we only index tags listed by the filter, we can certainly make this more flexible in a future release. |
@danpat I'm OK with the behavior of only storing tags that are in the filter. I only wrote the test that way to check that the way I anticipated was being detected. Thanks for rebasing this after the other PR was merged! |
Closes #24
First pass implementation of filtering on OSM file by certain tags.
Remaining tasks
loadOSMExtract
to accept an array of files as well as one file string