New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] GeoIP database auto update - API design #5860
Comments
Couple of Comments:
should the endpoint variable value be a valid URI?
Will there be automated retries? or can a user do the retry without changing policy names? |
Correct. Updated the post.
There won't be automated retries. It will try to update in next interval. A user can change the interval to update it earlier than it was set before. |
@dagneyb , @dbwiddis , @saratvemulapalli any comments? |
Other than regular updates, how is this a "policy"? Is "geoip data" (aka |
Didn't use the term "data" because the endpoint and update interval does not represent "geoip data" by themselves. The actual data will be stored in an index. I think it is a policy about where and how to get geoip data. |
Other names could be "datasource". But maybe that's confusing with actual data sources. Policy implies some kind of principle, so that was an odd name. I don't feel strongly about it, maybe others have comments. |
"datasource" sounds fair for me. JDBC uses a term datasource when it access database. |
The design in #5856 talks about the possibility of storing all data in an OpenSearch index. Is that an option? These files are large and now we're introducing additional storage requirements. In the future I'd like to be able to have remote storage (e.g. S3) entirely, including for this GeoIp data. |
Yes. We are going to store GeoIP database file in an index which will take about 500MB for each node. |
I am considering of implementing the feature with new processor type called ip2geo in geospatial repository with following advantages.
|
I like this plan better, assuming the new geoip processor can fully replace the existing one in 3.0. We should mark the latter deprecated when we release the first version of the new one. |
The purpose of this RFC (request for comments) is to gather community feedbacks on a proposal of API design for #5856
Manifest file in a database distribution server
The free database distribution server will contains following manifest file of which will be used by OpenSearch cluster to know about where the actual database file exist and all other metadata for the database file.
Example
API to trigger an auto update of a database
A user will call an API to create a GeoIP datasource. This will be a new API. Once the API is called, OpenSearch cluster starts to download file from the given endpoint with given interval. Those two parameters are optional and the default value will be provided. A user can update the value of a datasource. Also, if a user delete the datasource, OpenSearch will remove GeoIP data from a cluster. If
update_interval
is larger thanvalid_for
in a manifest file, it will throw an error.Example
API to get a status of the datasource
After an OpenSearch cluster read a manifest file and everything is good, it stores all relevant metadata of a datasource and a user can query those data.
preparing
and moves toavailable
once a GeoIP database is ready to be used. If there was an issue during the first database preparation, it will be marked asfailed
. When a user delete the datasource, it will marked as deleting and the actual deletion will take afterward.Example
GeoIP processor using the datasource
Once a GeoIP policy is available, a user can create a GeoIP processor to use the datasource by providing a datasource name in new
datasource
field. The value is optional. If the value is not provided, the processor will fall back to current behavior which uses a static GeoIP database.Example
API to get a metrics of the datasource
The GeoIP datasource will contains metrics for update activity as well.
Example
The text was updated successfully, but these errors were encountered: