Background
What is the problem/pain point?
Customers want to be able to identify the (approximate) geolocations of all segments/pieces on the network for a given "dataset."
Who is impacted?
All customers/users
What is the impact?
It is not possible to easily see the geolocation of object data (with the exception of files that are shared via Linksharing). Customers will be able to easily see and showcase where their data resides. If customers are using geofencing, they can confirm the locations within their selected region that the data resides.
Requirements
User Story
As a Storj user, I want to be able to retrieve the geolocation data for all the pieces of a specified dataset stored on Storj DCS so I can use them in a web application or data visualization tool.
Acceptance Criteria [WIP]
- Geolocation data can be retrieved for a given dataset
- A dataset can be one or many objects within a whole bucket, prefix, or single object
- Geolocation data can be retrieved by a frontend application (such as a data visualization tool or web app) without the use of a server
- If we go the route of dumping the JSON into a DCS bucket, linksharing can be used to enable this
- Geolocation data should be refreshed at least once per week (may not be relevant if generated real time)
- User should be able to have Geolocation data publicly accessible OR private (limited) access
- Enabling public view of Geolocation data should not expose the underlying objects
Success Metrics
- Geolocation data can be easily retrieved for a given dataset
- Performance isn't degraded
Key Considerations
- The dataset may not be at the "root" of the bucket so should be able to handle whole bucket, prefix, or single object
- The dataset can be comprised of many objects
Open Discussion/Questions
- How is "dataset" defined? Will it be a single file? Multiple files?
- The answer to this will impact the design (bucket level, object level, object tag level, etc)
- What is the frequency in which the data needs to be refreshed? Does it need to be refreshed?
- Does this functionality need to live in uplink? S3 gateway? both? Doesn't matter?
- Are we able to share the coordinates generated (any licensing restrictions?)
- Our use of geolocation database requires attribution unless we purchase a commercial license ($456/year)
Possible Design
- Add new endpoint/method to Linksharing that returns GeoJSON (similar to the map endpoint: https://link.storjshare.io/s/accesshere/bucketname%2Ffilename?map=1)
- Dump JSON into Buckets:
- Use object tags (PutObjectTagging) or other metadata when uploading a dataset to DCS
- If a dataset if we need to perform this query on a bucket, the use of object tags might not be the right solution (note that we do not currently support PutBucketTagging)
- Periodically "dump" a GeoJSON file into a specified bucket for a specified object tag - say each week
- or even simpler per request, but don't generate a new GeoJSON if the existing one is younger than, say, a week
- Or even simpler -- a one time operation that generates the GeoJSON once and dumps it into a bucket
Milestone(s)
Background
What is the problem/pain point?
Customers want to be able to identify the (approximate) geolocations of all segments/pieces on the network for a given "dataset."
Who is impacted?
All customers/users
What is the impact?
It is not possible to easily see the geolocation of object data (with the exception of files that are shared via Linksharing). Customers will be able to easily see and showcase where their data resides. If customers are using geofencing, they can confirm the locations within their selected region that the data resides.
Requirements
User Story
As a Storj user, I want to be able to retrieve the geolocation data for all the pieces of a specified dataset stored on Storj DCS so I can use them in a web application or data visualization tool.
Acceptance Criteria [WIP]
Success Metrics
Key Considerations
Open Discussion/Questions
Possible Design
Milestone(s)