New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does an MVP for an OA geocoder look like? #12

Closed
waldoj opened this Issue Apr 4, 2016 · 50 comments

Comments

Projects
None yet
6 participants
@waldoj
Member

waldoj commented Apr 4, 2016

I envision a bespoke Pelias instance creator, where somebody can indicate what physical area that they're interested in, and get a geocoder preloaded with that data from OpenAddresses. I think these are the basic components of that:

  1. A system to ingest OA data and, in response to a geoquery, return address geodata for that area.
  2. A generator of machine images in common formats (e.g., Docker, Vagrant, Heroku, AMI) that can package the requested geodata with Pelias to be deployed by the end user.
  3. The (eventual) capacity for those machine images to request updated data automatically and periodically.

The idea is to close the loop on the publication and consumption of address data. Right now, governments publish address data, which we aggregate within OpenAddresses, and the private sector uses address data published on OpenAddresses. That fails to provide incentives for governments to continue to publish that data. (This is unrelated to those governments who publish address data via ArcGIS, in which case we're getting the data where they happen to store it. They already have existing, internal incentives.) This model will allow governments to run local geocoders (much faster than an API) powered by their own data, that improve as they improve their own data, and that are only updated as often as they update their public data. This creates a better incentive for them to publish that data.

I propose that the MVP for this consists of step 1 in the above list. The 2 subsequent steps depend on step 1, so it can't be either of those. And step 1, on its own, is useful—people can use that as-is, or build atop it.

What's the consensus here? Is this a good MVP? Are the subsequent steps the correct ones? Bonus questions: Do existing project volunteers have the capacity to make step 1 happen, or is this something that should be bid out? (Is it even plausible to bid this out?)

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 4, 2016

Member

This is a great idea, and thank you for pushing it forward.

For the data ingestion, we’ve talked internally about what an ElasticSearch data prep process would look like for small extracts of OA data. @orangejulius or @dianashk are most up-to-date on this topic.

For the machine images, I think it would make sense to tighten the list and support a smaller range of possibilities. When an offered format stops working, it’s a debugging and support bummer for us. I think Heroku is an interesting direction, and I’ve built “app builders” before that anyone with an account should be able to point-and-click their way through. High effort, maximum reach, strong dependency on single vendor. I think that Docker or Vagrant approaches are a weak compromise: easy for the kinds of nerds who don’t need it, but still too difficult for mortals. AMI is up there somewhere, and could be scripted using Amazon’s API and a builder-style approach with some effort.

I have a weak bias toward a trash-and-replace model for the data updates. If it’s easy to set one of these up, it should be easy to use rapid replacement instead of updating.

Member

migurski commented Apr 4, 2016

This is a great idea, and thank you for pushing it forward.

For the data ingestion, we’ve talked internally about what an ElasticSearch data prep process would look like for small extracts of OA data. @orangejulius or @dianashk are most up-to-date on this topic.

For the machine images, I think it would make sense to tighten the list and support a smaller range of possibilities. When an offered format stops working, it’s a debugging and support bummer for us. I think Heroku is an interesting direction, and I’ve built “app builders” before that anyone with an account should be able to point-and-click their way through. High effort, maximum reach, strong dependency on single vendor. I think that Docker or Vagrant approaches are a weak compromise: easy for the kinds of nerds who don’t need it, but still too difficult for mortals. AMI is up there somewhere, and could be scripted using Amazon’s API and a builder-style approach with some effort.

I have a weak bias toward a trash-and-replace model for the data updates. If it’s easy to set one of these up, it should be easy to use rapid replacement instead of updating.

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Apr 4, 2016

Contributor

I like the idea! Could you say more about who might use it? I'm trying to figure out who isn't served by just using a public geocoder, either free or paid.

Contributor

NelsonMinar commented Apr 4, 2016

I like the idea! Could you say more about who might use it? I'm trying to figure out who isn't served by just using a public geocoder, either free or paid.

@dianashk

This comment has been minimized.

Show comment
Hide comment
@dianashk

dianashk Apr 4, 2016

This is a very cool use of Pelias, so we're excited to see it come to fruition... even if we don't have the bandwidth to do it ourselves. Hooray for open-source!

As it stands today, Pelias is already setup to ingest all or any subset of OA data that you point it at. Setting this up isn't elegant at the moment, and this is where the majority of the work needs to be done. We're working on something to make it a bit simpler to install and build the whole system. Users would still need to install Elasticsearch on their own. So this effectively covers step 1.

I personally like the idea of supporting something simple and accessible, like Heroku, for the first attempt at a builder. If that is all successful, we can always branch out to support other platforms. But no need to rush there.

As for automated updates, we can set it up to rebuild on a schedule, like we currently do with our hosted Mapzen Search instance of Pelias. We rebuild weekly, because we do the whole world and it takes a few days. But with a small dataset you can rebuild daily or even hourly to keep the data fresh. We don't currently support real-time updates, so getting that implemented would require some significant effort.

dianashk commented Apr 4, 2016

This is a very cool use of Pelias, so we're excited to see it come to fruition... even if we don't have the bandwidth to do it ourselves. Hooray for open-source!

As it stands today, Pelias is already setup to ingest all or any subset of OA data that you point it at. Setting this up isn't elegant at the moment, and this is where the majority of the work needs to be done. We're working on something to make it a bit simpler to install and build the whole system. Users would still need to install Elasticsearch on their own. So this effectively covers step 1.

I personally like the idea of supporting something simple and accessible, like Heroku, for the first attempt at a builder. If that is all successful, we can always branch out to support other platforms. But no need to rush there.

As for automated updates, we can set it up to rebuild on a schedule, like we currently do with our hosted Mapzen Search instance of Pelias. We rebuild weekly, because we do the whole world and it takes a few days. But with a small dataset you can rebuild daily or even hourly to keep the data fresh. We don't currently support real-time updates, so getting that implemented would require some significant effort.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Apr 4, 2016

Member

Could you say more about who might use it? I'm trying to figure out who isn't served by just using a public geocoder, either free or paid.

There are no free, public geocoders that aren't license-restricted (e.g., Google) or query-restricted (e.g., TAMU). So there's a big obstacle for a lot of people. Paid geocoders have a price tag that's a real burden on good work. (For example, I wanted to geocode every business in Virginia, as a public service. That was going to cost $1,200. Nope. Turned out, Virginia has a geocoder that is open to the world, and I used that, which took care of the ~75% of addresses that are within Virginia.) The next obstacle is speed. Making a call to a remote API takes time. Making a million calls to a remote API takes a million times longer. Being able to run a geocoder locally is vastly faster.

I appreciate that, from your perspective, geocoding seems like a highly-available service. But that's true for vanishingly few people.

Member

waldoj commented Apr 4, 2016

Could you say more about who might use it? I'm trying to figure out who isn't served by just using a public geocoder, either free or paid.

There are no free, public geocoders that aren't license-restricted (e.g., Google) or query-restricted (e.g., TAMU). So there's a big obstacle for a lot of people. Paid geocoders have a price tag that's a real burden on good work. (For example, I wanted to geocode every business in Virginia, as a public service. That was going to cost $1,200. Nope. Turned out, Virginia has a geocoder that is open to the world, and I used that, which took care of the ~75% of addresses that are within Virginia.) The next obstacle is speed. Making a call to a remote API takes time. Making a million calls to a remote API takes a million times longer. Being able to run a geocoder locally is vastly faster.

I appreciate that, from your perspective, geocoding seems like a highly-available service. But that's true for vanishingly few people.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Apr 4, 2016

Member

As it stands today, Pelias is already setup to ingest all or any subset of OA data that you point it at. Setting this up isn't elegant at the moment, and this is where the majority of the work needs to be done.

Would you please explain this process further? If I wanted to stand up a Pelias instance for the greater Charlottesville, VA area, what steps would that entail?

Member

waldoj commented Apr 4, 2016

As it stands today, Pelias is already setup to ingest all or any subset of OA data that you point it at. Setting this up isn't elegant at the moment, and this is where the majority of the work needs to be done.

Would you please explain this process further? If I wanted to stand up a Pelias instance for the greater Charlottesville, VA area, what steps would that entail?

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Apr 4, 2016

Contributor

Thanks @waldoj! My apologies, I didn't mean to question whether an install-your-own geocoder was a good idea. I was just trying to understand who might be users of it. I think you've identified three reasons for running your own: more useful than a free service, cheaper than a paid service, and faster if you run it locally. I think in all cases it's a user who is motivated to do a bit more work to get things running for themselves rather than just paying a service provider.

To that last point, faster if you run locally, that would argue for a self-hosted option. Ie: not Heroku or EC2 or some other remote server, but something you can run on your local network as well.

One old school proposal for a deliverable: an Ubuntu PPA that lets you run apt-get install openaddresses-geocoder, built on top of Ubuntu 16.04 LTS. It would require several packages. The stuff required to run Pelias, Pelias itself, and then an OpenAddresses-specific package that contains the scripts necessary to download and install the OA data. You could package the data itself as Ubuntu packages too but that only makes sense for a few well-defined geographic regions, not customized data dumps.

Another old-school proposal is just good documentation. Work with Pelias to make it really easy for someone who knows some command line to install it, then write those download + import scripts. That requires more work on the part of the user than Ubuntu packages, but is (in theory) usable in many Unix environments.

For modern new stuff everyone seems to love Docker. A Docker container that just served geocoding data would be pretty neat. I agree with @migurski that it's more realistic to only support one or a small set of possibilities.

Contributor

NelsonMinar commented Apr 4, 2016

Thanks @waldoj! My apologies, I didn't mean to question whether an install-your-own geocoder was a good idea. I was just trying to understand who might be users of it. I think you've identified three reasons for running your own: more useful than a free service, cheaper than a paid service, and faster if you run it locally. I think in all cases it's a user who is motivated to do a bit more work to get things running for themselves rather than just paying a service provider.

To that last point, faster if you run locally, that would argue for a self-hosted option. Ie: not Heroku or EC2 or some other remote server, but something you can run on your local network as well.

One old school proposal for a deliverable: an Ubuntu PPA that lets you run apt-get install openaddresses-geocoder, built on top of Ubuntu 16.04 LTS. It would require several packages. The stuff required to run Pelias, Pelias itself, and then an OpenAddresses-specific package that contains the scripts necessary to download and install the OA data. You could package the data itself as Ubuntu packages too but that only makes sense for a few well-defined geographic regions, not customized data dumps.

Another old-school proposal is just good documentation. Work with Pelias to make it really easy for someone who knows some command line to install it, then write those download + import scripts. That requires more work on the part of the user than Ubuntu packages, but is (in theory) usable in many Unix environments.

For modern new stuff everyone seems to love Docker. A Docker container that just served geocoding data would be pretty neat. I agree with @migurski that it's more realistic to only support one or a small set of possibilities.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 4, 2016

Member

I am convinced about the self-hosted option. I know that Mapnik has a ton of experience with Ubuntu releases and later with a PPA, so I'd like see if @springmeyer has any wisdom or advice to share.

Member

migurski commented Apr 4, 2016

I am convinced about the self-hosted option. I know that Mapnik has a ton of experience with Ubuntu releases and later with a PPA, so I'd like see if @springmeyer has any wisdom or advice to share.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Apr 4, 2016

Member

I didn't mean to question whether an install-your-own geocoder was a good idea.

That's too bad, because you should. I often convince myself that my terrible ideas are brilliant! :)

To that last point, faster if you run locally, that would argue for a self-hosted option. Ie: not Heroku or EC2 or some other remote server, but something you can run on your local network as well.

I don't think self-hosted is the only use case, I just think it's a good one. But I am persuaded that, in terms of prioritization for deployment methods, it's worth favoring deployment methods that work locally ahead of those that only work remotely. Docker works well for both—you can run it locally, or can you can deploy it to AWS/Heroku/DigitalOcean. Seems like the way to start!

Member

waldoj commented Apr 4, 2016

I didn't mean to question whether an install-your-own geocoder was a good idea.

That's too bad, because you should. I often convince myself that my terrible ideas are brilliant! :)

To that last point, faster if you run locally, that would argue for a self-hosted option. Ie: not Heroku or EC2 or some other remote server, but something you can run on your local network as well.

I don't think self-hosted is the only use case, I just think it's a good one. But I am persuaded that, in terms of prioritization for deployment methods, it's worth favoring deployment methods that work locally ahead of those that only work remotely. Docker works well for both—you can run it locally, or can you can deploy it to AWS/Heroku/DigitalOcean. Seems like the way to start!

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 4, 2016

Member

I'll research the PPA path. For various reasons I'm really bullish on that and not on Docker these days, mostly due to some experience with Docker oddities biting me.

Member

migurski commented Apr 4, 2016

I'll research the PPA path. For various reasons I'm really bullish on that and not on Docker these days, mostly due to some experience with Docker oddities biting me.

@riordan

This comment has been minimized.

Show comment
Hide comment
@riordan

riordan Apr 4, 2016

We've been talking about npm-ifying pelias so that you can npm install pelias and then pelias install a full setup. But a PPA could take care of the Nodependencies and the Elasticsearch installations. Could be a solid start.

Then our efforts would be in building a really lovely configuration & build wizard to help folks pick the datasets/regions they're most interested in.

riordan commented Apr 4, 2016

We've been talking about npm-ifying pelias so that you can npm install pelias and then pelias install a full setup. But a PPA could take care of the Nodependencies and the Elasticsearch installations. Could be a solid start.

Then our efforts would be in building a really lovely configuration & build wizard to help folks pick the datasets/regions they're most interested in.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 5, 2016

Member

Yeah, that is my thinking as well. npm would be the developer / tester installation method of choice, while an apt package might be accessible more broadly and would allow for simple usage like RUN apt-get in a CI config, Dockerfile, Vagrantfile, or other Productfile.

Member

migurski commented Apr 5, 2016

Yeah, that is my thinking as well. npm would be the developer / tester installation method of choice, while an apt package might be accessible more broadly and would allow for simple usage like RUN apt-get in a CI config, Dockerfile, Vagrantfile, or other Productfile.

@migurski migurski closed this Apr 5, 2016

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 5, 2016

Member

Oops, did not mean to hit the mic drop button.

Member

migurski commented Apr 5, 2016

Oops, did not mean to hit the mic drop button.

@migurski migurski reopened this Apr 5, 2016

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 8, 2016

Member

I have done a bit of work on getting .debs and PPAs set up. I’ve successfully installed a package of my own from a non-PPA URL added to sources.list, and now I’m waiting for some key-signing step in Launchpad that’s supposed to take a few hours. Baby steps, so far so good, seems to work.

Mostly cross-referencing suggestions from these articles:

My goal is to get to approximately where Dane and @rcoup succeeded with https://launchpad.net/~mapnik

Member

migurski commented Apr 8, 2016

I have done a bit of work on getting .debs and PPAs set up. I’ve successfully installed a package of my own from a non-PPA URL added to sources.list, and now I’m waiting for some key-signing step in Launchpad that’s supposed to take a few hours. Baby steps, so far so good, seems to work.

Mostly cross-referencing suggestions from these articles:

My goal is to get to approximately where Dane and @rcoup succeeded with https://launchpad.net/~mapnik

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 19, 2016

Member

After a bit of back-and-forth with a helpful Ubuntu Launchpad person, I’ve gotten… someplace.

It’s a surprisingly fiddly process but I’m liking the progress. Feeling like it’s a thing that’s possible to understand.

Member

migurski commented Apr 19, 2016

After a bit of back-and-forth with a helpful Ubuntu Launchpad person, I’ve gotten… someplace.

It’s a surprisingly fiddly process but I’m liking the progress. Feeling like it’s a thing that’s possible to understand.

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Apr 19, 2016

Contributor

Do you have a feeling for if a Debian/Ubuntu package is a reasonable deliverable? I threw that out there as an idea but I'm not confident it's the right thing.

Contributor

NelsonMinar commented Apr 19, 2016

Do you have a feeling for if a Debian/Ubuntu package is a reasonable deliverable? I threw that out there as an idea but I'm not confident it's the right thing.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 19, 2016

Member

I don’t have a feeling for it yet. I believe this is a one-time pain and so far it’s been about the same level of b.s. as I’ve experienced with Docker and Vagrant. It still looks worthwhile.

Member

migurski commented Apr 19, 2016

I don’t have a feeling for it yet. I believe this is a one-time pain and so far it’s been about the same level of b.s. as I’ve experienced with Docker and Vagrant. It still looks worthwhile.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Apr 19, 2016

Member

I've really dived into Docker into the past week, and I feel good about using a .deb as a deliverable. That's a single line in a Dockerfile, and of course just as easy to use outside of Docker. I like it.

Member

waldoj commented Apr 19, 2016

I've really dived into Docker into the past week, and I feel good about using a .deb as a deliverable. That's a single line in a Dockerfile, and of course just as easy to use outside of Docker. I like it.

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Apr 19, 2016

Contributor

I guess the question is requiring Ubuntu. Is that OK for our target users? I think it's the best guess of the Linux distros, but I see a lot of CentOS/RedHad variants in use too.

Contributor

NelsonMinar commented Apr 19, 2016

I guess the question is requiring Ubuntu. Is that OK for our target users? I think it's the best guess of the Linux distros, but I see a lot of CentOS/RedHad variants in use too.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 19, 2016

Member

I’m not as familiar with the Red Hat environment, so I wonder whether it’s possible or advisable to skip the PPA route, and self-host .deb files and RPMs in one place?

Having spent some time with PPA’s, it’s attractive to just put a .deb at a URL someplace and be done with it. I haven't yet successfully installed my test package at https://launchpad.net/~migurski/+archive/ubuntu/hello.

Member

migurski commented Apr 19, 2016

I’m not as familiar with the Red Hat environment, so I wonder whether it’s possible or advisable to skip the PPA route, and self-host .deb files and RPMs in one place?

Having spent some time with PPA’s, it’s attractive to just put a .deb at a URL someplace and be done with it. I haven't yet successfully installed my test package at https://launchpad.net/~migurski/+archive/ubuntu/hello.

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Apr 19, 2016

Contributor

PPAs offer a lot of advantages though, it's required to make apt-get upgrade and other apt stuff work. If you get frustrated I could take a look, or maybe someone from Ubuntu will help us?

The drawback of supporting RPMs too isn't so much building the RPM, it's sorting out the operating system compatibilities, library versions, etc. That's why I suggested just supporting Ubuntu LTS 16.04; the M in MVP.

Contributor

NelsonMinar commented Apr 19, 2016

PPAs offer a lot of advantages though, it's required to make apt-get upgrade and other apt stuff work. If you get frustrated I could take a look, or maybe someone from Ubuntu will help us?

The drawback of supporting RPMs too isn't so much building the RPM, it's sorting out the operating system compatibilities, library versions, etc. That's why I suggested just supporting Ubuntu LTS 16.04; the M in MVP.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Apr 19, 2016

Member

I guess the question is requiring Ubuntu. Is that OK for our target users?

It's fine for Docker, at least (because I don't think many people could care which distro that their Docker instance runs). Personally, I look forward to the problem of people saying "gosh, I'd love to use this, but I use CentOS." That seems like a bridge worth crossing when we come to it. :)

Member

waldoj commented Apr 19, 2016

I guess the question is requiring Ubuntu. Is that OK for our target users?

It's fine for Docker, at least (because I don't think many people could care which distro that their Docker instance runs). Personally, I look forward to the problem of people saying "gosh, I'd love to use this, but I use CentOS." That seems like a bridge worth crossing when we come to it. :)

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 20, 2016

Member

Spoke with Nelson offline, and he offered to help with two things I'm stuck on: PPAs with multiple owners (since we’ll likely want one called openaddresses or openaddr), and getting my hellodeb package actually installed.

Member

migurski commented Apr 20, 2016

Spoke with Nelson offline, and he offered to help with two things I'm stuck on: PPAs with multiple owners (since we’ll likely want one called openaddresses or openaddr), and getting my hellodeb package actually installed.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 25, 2016

Member

Followup to the last note:

Member

migurski commented Apr 25, 2016

Followup to the last note:

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Apr 28, 2016

Member

I got Pelias API published and installed to my PPA sandbox.

Member

migurski commented Apr 28, 2016

I got Pelias API published and installed to my PPA sandbox.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 1, 2016

Member

Based my tests, I think this would be the bones of an installation process for Ubuntu 16.04, and ought to work manually or in a container-type context:

  1. Install Oracle JDK, using instructions from Pelias install docs.
    • add-apt-repository ppa:webupd8team/java -y
    • apt-get update && apt-get install oracle-java7-installer -y

      This throws up a license acceptance form; I’m not sure how it will behave under Docker or Vagrant.
  2. Install ElasticSearch, using instructions from Elastic.co.
    • wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
    • echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
    • apt-get update && apt-get install elasticsearch
  3. Install Pelias from OpenAddresses Ubuntu PPA.
    • add-apt-repository ppa:openaddresses/geocoder -y
    • apt-get update && apt-get install pelias-api
  4. Work with Pelias team to document import of sample or extract data into ElasticSearch index.
  5. Make Pelias API available on public port 80 with an HTTP proxy, and possibly packaged documentation.
Member

migurski commented May 1, 2016

Based my tests, I think this would be the bones of an installation process for Ubuntu 16.04, and ought to work manually or in a container-type context:

  1. Install Oracle JDK, using instructions from Pelias install docs.
    • add-apt-repository ppa:webupd8team/java -y
    • apt-get update && apt-get install oracle-java7-installer -y

      This throws up a license acceptance form; I’m not sure how it will behave under Docker or Vagrant.
  2. Install ElasticSearch, using instructions from Elastic.co.
    • wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
    • echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
    • apt-get update && apt-get install elasticsearch
  3. Install Pelias from OpenAddresses Ubuntu PPA.
    • add-apt-repository ppa:openaddresses/geocoder -y
    • apt-get update && apt-get install pelias-api
  4. Work with Pelias team to document import of sample or extract data into ElasticSearch index.
  5. Make Pelias API available on public port 80 with an HTTP proxy, and possibly packaged documentation.
@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar May 1, 2016

Contributor

That's a pretty straightforward set of instructions! Shame it's all third party repos, but perhaps that's unavoidable.

Contributor

NelsonMinar commented May 1, 2016

That's a pretty straightforward set of instructions! Shame it's all third party repos, but perhaps that's unavoidable.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 1, 2016

Member

Yeah. ElasticSearch suggests that the open JDK might work, but @baldur reports having seen problems using it with ES. Only Oracle’s is officially supported. Getting https://github.com/pelias/schema and sample data in there is a next step.

A possible Dockerfile:

FROM ubuntu:16.04
RUN add-apt-repository ppa:webupd8team/java -y
RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
RUN apt-get update
RUN apt-get install oracle-java7-installer elasticsearch pelias-api -y
Member

migurski commented May 1, 2016

Yeah. ElasticSearch suggests that the open JDK might work, but @baldur reports having seen problems using it with ES. Only Oracle’s is officially supported. Getting https://github.com/pelias/schema and sample data in there is a next step.

A possible Dockerfile:

FROM ubuntu:16.04
RUN add-apt-repository ppa:webupd8team/java -y
RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
RUN apt-get update
RUN apt-get install oracle-java7-installer elasticsearch pelias-api -y
@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar May 2, 2016

Contributor

I was wondering if this was complicated enough it should be encapsulated in a script, or a Dockerfile, or an image. The nice thing is the Ubuntu packaging is worth the effort since it makes that script simpler too.

Contributor

NelsonMinar commented May 2, 2016

I was wondering if this was complicated enough it should be encapsulated in a script, or a Dockerfile, or an image. The nice thing is the Ubuntu packaging is worth the effort since it makes that script simpler too.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj May 2, 2016

Member

I had to add, after the first line:

RUN apt-get update -y
RUN apt-get install python-software-properties -y
RUN apt-get install software-properties-common -y
RUN apt-get install wget -y

I know that apt-get update is frowned upon in a Dockerfile, but I couldn't install add-apt-repository without it.

Member

waldoj commented May 2, 2016

I had to add, after the first line:

RUN apt-get update -y
RUN apt-get install python-software-properties -y
RUN apt-get install software-properties-common -y
RUN apt-get install wget -y

I know that apt-get update is frowned upon in a Dockerfile, but I couldn't install add-apt-repository without it.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj May 2, 2016

Member

It finally died with this:

Errors were encountered while processing:
 /var/cache/apt/archives/oracle-java7-installer_7u80+7u60arm-0~webupd8~1_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
The command '/bin/sh -c apt-get install oracle-java7-installer elasticsearch pelias-api -y' returned a non-zero code: 100

I'm not sure why (other than Java ¯_(ツ)_/¯), but I'll see if I can figure out what's up.

Member

waldoj commented May 2, 2016

It finally died with this:

Errors were encountered while processing:
 /var/cache/apt/archives/oracle-java7-installer_7u80+7u60arm-0~webupd8~1_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
The command '/bin/sh -c apt-get install oracle-java7-installer elasticsearch pelias-api -y' returned a non-zero code: 100

I'm not sure why (other than Java ¯_(ツ)_/¯), but I'll see if I can figure out what's up.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 2, 2016

Member

Damn, I bet that's the part where it asks for a license click-through.

Member

migurski commented May 2, 2016

Damn, I bet that's the part where it asks for a license click-through.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 3, 2016

Member

I would be curious to learn more about why Oracle’s Java is necessary for ES. Maybe for smaller uses, it’d be sufficient to use Open JRE?

Member

migurski commented May 3, 2016

I would be curious to learn more about why Oracle’s Java is necessary for ES. Maybe for smaller uses, it’d be sufficient to use Open JRE?

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj May 3, 2016

Member

Here is a product matrix of which JVMs work with which Elasticsearch versions. I don't see Open JRE on there, but I know very little about Java, so that may or may not mean anything.

Member

waldoj commented May 3, 2016

Here is a product matrix of which JVMs work with which Elasticsearch versions. I don't see Open JRE on there, but I know very little about Java, so that may or may not mean anything.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 3, 2016

Member

The OpenJDK in 16.04 says this about itself:

Package: openjdk-8-jdk
Priority: optional
Section: java
Installed-Size: 458
Maintainer: OpenJDK Team <openjdk@lists.launchpad.net>
Architecture: amd64
Source: openjdk-8
Version: 8u77-b03-3ubuntu3
Provides: java-compiler, java-sdk, java2-sdk, java5-sdk, java6-sdk, java7-sdk, java8-sdk
…
Description-en: OpenJDK Development Kit (JDK)
 OpenJDK is a development environment for building applications,
 applets, and components using the Java programming language.
 .
 The packages are built using the IcedTea build support and patches
 from the IcedTea project.
…

So it’s using IcedTea. I believe Java 8 is internally 1.8, so it also matches the supported 1.7.0.55+ version number. Waldo, what happens if you replace the oracle-java7-installer installation with openjdk-8-jdk? For me, ES seemed to work.

Member

migurski commented May 3, 2016

The OpenJDK in 16.04 says this about itself:

Package: openjdk-8-jdk
Priority: optional
Section: java
Installed-Size: 458
Maintainer: OpenJDK Team <openjdk@lists.launchpad.net>
Architecture: amd64
Source: openjdk-8
Version: 8u77-b03-3ubuntu3
Provides: java-compiler, java-sdk, java2-sdk, java5-sdk, java6-sdk, java7-sdk, java8-sdk
…
Description-en: OpenJDK Development Kit (JDK)
 OpenJDK is a development environment for building applications,
 applets, and components using the Java programming language.
 .
 The packages are built using the IcedTea build support and patches
 from the IcedTea project.
…

So it’s using IcedTea. I believe Java 8 is internally 1.8, so it also matches the supported 1.7.0.55+ version number. Waldo, what happens if you replace the oracle-java7-installer installation with openjdk-8-jdk? For me, ES seemed to work.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 3, 2016

Member

Simpler possible Dockerfile:

FROM ubuntu:16.04
RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
RUN apt-get update
RUN apt-get install openjdk-8-jdk elasticsearch pelias-api -y

Waldo, for me it was not necessary to install python-software-properties software-properties-common wget to get add-apt-repository; it just worked. Curious why.

Member

migurski commented May 3, 2016

Simpler possible Dockerfile:

FROM ubuntu:16.04
RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
RUN apt-get update
RUN apt-get install openjdk-8-jdk elasticsearch pelias-api -y

Waldo, for me it was not necessary to install python-software-properties software-properties-common wget to get add-apt-repository; it just worked. Curious why.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj May 4, 2016

Member

Running that Dockerfile yields this:

$ docker build .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> 44776f55294a
Step 2 : RUN add-apt-repository ppa:openaddresses/geocoder -y
 ---> Running in bac8df07b705
/bin/sh: 1: add-apt-repository: not found
The command '/bin/sh -c add-apt-repository ppa:openaddresses/geocoder -y' returned a non-zero code: 127

I needed to add these to get this to run:

RUN apt-get update -y
RUN apt-get install python-software-properties -y
RUN apt-get install software-properties-common -y

When I did that, this was the outcome:

Step 6 : RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
 ---> Running in fa96df30947b
/bin/sh: 1: wget: not found
gpg: no valid OpenPGP data found.
The command '/bin/sh -c wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -' returned a non-zero code: 2
Member

waldoj commented May 4, 2016

Running that Dockerfile yields this:

$ docker build .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> 44776f55294a
Step 2 : RUN add-apt-repository ppa:openaddresses/geocoder -y
 ---> Running in bac8df07b705
/bin/sh: 1: add-apt-repository: not found
The command '/bin/sh -c add-apt-repository ppa:openaddresses/geocoder -y' returned a non-zero code: 127

I needed to add these to get this to run:

RUN apt-get update -y
RUN apt-get install python-software-properties -y
RUN apt-get install software-properties-common -y

When I did that, this was the outcome:

Step 6 : RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
 ---> Running in fa96df30947b
/bin/sh: 1: wget: not found
gpg: no valid OpenPGP data found.
The command '/bin/sh -c wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -' returned a non-zero code: 2
@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 4, 2016

Member

I guess the 16.04 Docker image is much slimmer than the server distribution, which I suppose makes sense. So:

FROM ubuntu:16.04

RUN apt-get update -y
RUN apt-get install python-software-properties software-properties-common wget -y

RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list

RUN apt-get update -y
RUN apt-get install openjdk-8-jdk elasticsearch pelias-api -y
Member

migurski commented May 4, 2016

I guess the 16.04 Docker image is much slimmer than the server distribution, which I suppose makes sense. So:

FROM ubuntu:16.04

RUN apt-get update -y
RUN apt-get install python-software-properties software-properties-common wget -y

RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list

RUN apt-get update -y
RUN apt-get install openjdk-8-jdk elasticsearch pelias-api -y
@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 5, 2016

Member

Trying to create the Pelias index failed for me with this message:

[mapper_parsing_exception] analyzer on field [borough_id] must be set when search_analyzer is set

@orangejulius pointed out that Pelias wants ElasticSearch 1.7, so the process should look like this with 1.7 instead of 2.x:

FROM ubuntu:16.04

RUN apt-get update -y
RUN apt-get install python-software-properties software-properties-common wget -y

RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list

RUN apt-get update -y
RUN apt-get install openjdk-8-jdk elasticsearch pelias-api -y

That works:

% node scripts/create_index.js;
[put mapping]    pelias      { acknowledged: true }
Member

migurski commented May 5, 2016

Trying to create the Pelias index failed for me with this message:

[mapper_parsing_exception] analyzer on field [borough_id] must be set when search_analyzer is set

@orangejulius pointed out that Pelias wants ElasticSearch 1.7, so the process should look like this with 1.7 instead of 2.x:

FROM ubuntu:16.04

RUN apt-get update -y
RUN apt-get install python-software-properties software-properties-common wget -y

RUN add-apt-repository ppa:openaddresses/geocoder -y
RUN wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
RUN echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list

RUN apt-get update -y
RUN apt-get install openjdk-8-jdk elasticsearch pelias-api -y

That works:

% node scripts/create_index.js;
[put mapping]    pelias      { acknowledged: true }
@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 7, 2016

Member

Aw yeah, getting some results from a single-county import: http://dpaste.com/0612XXJ

Member

migurski commented May 7, 2016

Aw yeah, getting some results from a single-county import: http://dpaste.com/0612XXJ

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj May 7, 2016

Member

!

Member

waldoj commented May 7, 2016

!

@riordan

This comment has been minimized.

Show comment
Hide comment
@riordan

riordan May 7, 2016

Mazel!

Sent from my iPhone

On May 6, 2016, at 9:03 PM, Waldo Jaquith notifications@github.com wrote:

!


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

riordan commented May 7, 2016

Mazel!

Sent from my iPhone

On May 6, 2016, at 9:03 PM, Waldo Jaquith notifications@github.com wrote:

!


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

migurski added a commit to openaddresses/pelias-ubuntu that referenced this issue May 8, 2016

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 8, 2016

Member

This is basically current: https://github.com/openaddresses/pelias-ubuntu-xenial#readme

There’s still some documentation to do around database setup, address import, and why @#$% elasticsearch doesn’t want to start on boot. Also, Amazon are taking their time making an Ubuntu 16.04 image available and there’s not yet a supported upgrade path, so maybe we should build these for 14.04 as well?

Member

migurski commented May 8, 2016

This is basically current: https://github.com/openaddresses/pelias-ubuntu-xenial#readme

There’s still some documentation to do around database setup, address import, and why @#$% elasticsearch doesn’t want to start on boot. Also, Amazon are taking their time making an Ubuntu 16.04 image available and there’s not yet a supported upgrade path, so maybe we should build these for 14.04 as well?

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 15, 2016

Member

Progress report: I’ve run the setup above on a few machines, and I’m slowly working through the foibles of ElasticSearch. It’s pretty greedy for RAM; even running import on a 4GB had troubles and @missinglink suggests 8GB. Still don’t have an idea on getting it to start at boot.

I did build Ubuntu 14.04 versions of all the packages, though. This is getting close to blog post or tutorial state, though I still there are going to be some bad ops surprises for users.

Member

migurski commented May 15, 2016

Progress report: I’ve run the setup above on a few machines, and I’m slowly working through the foibles of ElasticSearch. It’s pretty greedy for RAM; even running import on a 4GB had troubles and @missinglink suggests 8GB. Still don’t have an idea on getting it to start at boot.

I did build Ubuntu 14.04 versions of all the packages, though. This is getting close to blog post or tutorial state, though I still there are going to be some bad ops surprises for users.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 27, 2016

Member

I blogged the process for getting this set up, here: http://mike.teczno.com/notes/openaddr/5min-geocoder.html

Member

migurski commented May 27, 2016

I blogged the process for getting this set up, here: http://mike.teczno.com/notes/openaddr/5min-geocoder.html

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar May 27, 2016

Contributor

That's amazing @migurski.

Contributor

NelsonMinar commented May 27, 2016

That's amazing @migurski.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 28, 2016

Member

I… think it’s possible to close this issue?

Member

migurski commented May 28, 2016

I… think it’s possible to close this issue?

@iandees

This comment has been minimized.

Show comment
Hide comment
@iandees

iandees May 28, 2016

Member

I agree. It might be good to find a place to put your blog post in our repo
as a document for people to follow.

On Sat, May 28, 2016, 13:12 migurski notifications@github.com wrote:

I… think it’s possible to close this issue?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#12 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAP90A8bSJuG1IT5Kcl81EMucWtAAOFvks5qGHeVgaJpZM4H_L7f
.

Member

iandees commented May 28, 2016

I agree. It might be good to find a place to put your blog post in our repo
as a document for people to follow.

On Sat, May 28, 2016, 13:12 migurski notifications@github.com wrote:

I… think it’s possible to close this issue?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#12 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAP90A8bSJuG1IT5Kcl81EMucWtAAOFvks5qGHeVgaJpZM4H_L7f
.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 28, 2016

Member

Good call, I’ll do that.

Member

migurski commented May 28, 2016

Good call, I’ll do that.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski May 28, 2016

Member

Added a link to the bottom of the post, http://mike.teczno.com/notes/openaddr/5min-geocoder.html.

Member

migurski commented May 28, 2016

Added a link to the bottom of the post, http://mike.teczno.com/notes/openaddr/5min-geocoder.html.

@migurski migurski closed this May 28, 2016

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj May 29, 2016

Member

👍

Member

waldoj commented May 29, 2016

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment