Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a federated street level imagery substitute for OpenStreetMap contributors #885

Open
bkil opened this issue Nov 7, 2022 · 6 comments

Comments

@bkil
Copy link

bkil commented Nov 7, 2022

Similar to KartaView, Mapillary, Bing StreetSide, Google StreetView, etc. See:

In our former use case, we have asked as a favor a person ("remote controlled drone") to scan all streets of a remote town while taking geolocated photographs either automatically or manually. Then the helper uploaded the set of photos to folders. At the end, we've organized "remote mapping parties" where everyone would sync the whole folder to their machines so they could load them into the JOSM editor as an overlay.

This is already quite wasteful, as I don't really want to store the vast amount of images - I usually go through them once and delete them. It would be more desirable if we implemented a new JOSM plugin where we could just add the NextCloud or folder URL and it could stream the photos real time via caching.

Another inefficiency is that in case we split the work by areas based on attendance, it would be sufficient if participants only needed to transfer the photos that they are working on, not the whole town.

For this, little additional development would be required on part of NextCloud. However, if we needed to implement a JOSM plugin like this anyway, it could mind as well be a full featured alternative to Mapillary & KartaView. I.e., based on a globally-federated NextCloud instance (or just a CSV with links to NextCloud instances managed on GitHub, etc), we could visualize the photos of everyone on the map. To make this more efficient, it would be desirable to implement a bit more geo-indexing/sharding/quadtrees/whatnot.

The basic idea would be that the client would have a cached index about which instances of NextCloud have photos within the given zoomed-in area (e.g., graticule), and then the client could ask each instance about the photos they have within the specific boundary box.

@tacruc
Copy link
Collaborator

tacruc commented Mar 2, 2023

@bkil Sorry for the late reply. This sounds quite cool. Although I'm not quite sure if I fully understood it. Would you like to build one big open distributed database, or be able to build multiple distributed semi private databases? E.g. visible for OSM contributors, but not public?

@bkil
Copy link
Author

bkil commented Mar 2, 2023

@tacruc Technically, neither. Nobody (well, non-monopolistic entity) can commit sufficient resources to host all photographs all over the world indefinitely.

Instead, consider how The Fediverse works or PeerTube: everyone needs to only host a tiny part of the whole "puzzle", while anyone else may contribute. Paying for a 10-100GB static web host or VPS is a trivial cost for an individual or a circle of friends & family.

If you publish your own pictures using a copyleft licence, they can be made public and mirrored by others as well. I don't see how sharing private imagery would be in scope for this proposal. Perhaps if you live in a country that has severe limitations relating to copyright and freedom of panorama?

The question whether one wants to host a given photo indefinitely, until a fixed deadline (e.g., for 6 months) or until it has been marked processed & verified in OSM (as per Tasking Manager) is a separate decision to make.

@tacruc
Copy link
Collaborator

tacruc commented Mar 2, 2023

Technically, neither. Nobody (well, non-monopolistic entity) can commit sufficient resources to host all photographs all over the world indefinitely.

Instead, consider how The Fediverse works or PeerTube: everyone needs to only host a tiny part of the whole "puzzle", while anyone else may contribute. Paying for a 10-100GB static web host or VPS is a trivial cost for an individual or a circle of friends & family.

I think this is the only part that I understood previously.
I was rather wondering who is maintaining the list of contributing servers. Leading to the question how spam protection works and who is responsible for the content.

So two possible cases:

  • one centralized list mainteinted by someone or some organisation
  • users or admins subscribing to sources they think are trustworthy.

Further Questions:

  • How many photos in total do we need to expect to handle? How many at the start how many when it is working?
  • Are the contribution of the individual servers localized or do for all regions all servers ve querried?
  • What to display on the client when the entire world is shown? Somekind of heat map would be cool or?
  • To which part of nc federated shares can be used? Which extension are missing?

@bkil
Copy link
Author

bkil commented Mar 2, 2023

I think it would be the easiest to scale and maintain if the local chapter or equivalent local community in each country would maintain the list of trusted photo donors. For bigger countries, this might even be subdivided & delegated to states I guess.

Just extrapolating from our own example, the approximate volume of both active OSM editors and Mapillary contributors is below a hundred and most of them are active for years. Even a single person from each country could keep them reviewed against flooded spam and obvious abuse, while the usual report button could help mitigate the occasional trolling. Although, a rotating pager duty would be better and surely running some existing computer vision algorithms on the client of each moderator, but that's a different issue.

In our case hosting & maintaining our own planet & rendering pipeline would already challenging our limits (not accepting donations) and the resources for the tens of TB's of potential photos is absolutely out of the question. But just hosting URLs to a couple hundred URL profiles (or even to the graticule lists) is absolutely doable, probably consuming much less than a gigabyte and can be arranged to be hosted statically anywhere.

On Mapillary, the top contributors around here produce a few hundred GB of photos over the years (I think they encode to 0.5MB/photo when I did the math) and the average within the top hundred (or whatever stats they shared) was a few dozen GB per user.

Worldwide, an end user should be presented (transparently) by an aggregate of each local community (i.e., each country would be responsible for serving the URLs for their graticule). Each country could also share their heat map in a compact format and that assembled picture could also be mirrored everywhere and served from the nearest country by GeoIP.

For our specialized, OSM-mapping campaigns, we usually only need access to the domestic imagery.

Computing and serving the heat map from NC would be useful. All served data should be signed using a widespread format (e.g., PGP) by the donor (and any moderators who had seen it) and be equipped with a license declaration, to facilitate mirroring. Each instance should probably have endpoints to return a kind of sitemap that contains some kind of geo-XML or GeoJSON with all images and their ID (hopefully their hash/signatures), perhaps optimized by clustering by lat/lon fractions within the file.

As rough math gives you an estimated target of up to 20k-200k images per NC instance, it would be small enough to be hosted as a single file (to be processed by the local chapter). If we are thinking of larger deployments or if we want to skip the middlemen and serve users from these edges, we might want to serve the index through another level of indirection that would split this up (based on quadtrees?) to smaller, more manageable chunks of 10-64kB per piece as users would typically request 4-6 of these per bbox.

@blaueente
Copy link

Content-addressable technologies such as IPFS or similar (not sure what else exists in that area?) could be very beneficial here.

Each image file would have an ID derived from its hash, and would be hosted by one original uploader regardless if they are trusted or not. Then, other contributors could just pin (=mirror) those images. And even if the original contributor shut down their host, those images would stay available.
Important: Every image file then should contain all necessary data to re-build the index. Mostly the data from exif including location, compass, time, plus some info about author username/ID, and a sequence ID, more or less similar to what mapillary did. A cryptographic signature, be it PGP or otherwise, being integral part of this image file might be very helpful at this level enabling rebuild or reuse of single files.

One advantage would be that individuals could mirror and host areas to their liking, e.g. I could host a Raspi on my residential DSL with a cheap 5TB USB harddrive with all of my images, plus all of the images of my areas of interest, e.g. hometown. A local OSM chapter could then go on to pin all of the images of their area on a cheap VPS or colocated server, and so on. The more people who are interested in specific imagery, the more redundant those would be stored, and the faster performance would be. And if I decided to shutdown my Raspi, my images would still be available on each OSM chapter's server that decided to pin my images.

As an index structure, for each "sequence" and geo-chunk, there would be this geo-XML/GeoJSON file you mentioned would then contain links and metadata to images.
Those chunks would be immutable and stored along each image collection, with its own hash as content-ID
Maybe a cryptographic signature could be interesting here as well, such that the contributor can be verified at this level without downloading all the images for spam protection, and also this would ensure images are not lost.
This would form the base data format for the whole system.

The "heat map" would then be combinations and slices of those sequence chunks, but in the same data format.
Possibly they would contain image metadata themselves, or also links/content-IDs to other chunks, maybe in a quadtree-bb-style.
For example, I as a photographer could host an index chunk "1st of may 2023 sequence of this long bike road", whereas the local OSM chapter would then host an index called "area with bounding box xxyy of city abc with all images known to us up to june 2023" that would contain a slice of my sequence (including a link to the content-ID of my original sequence), but also a slice of every other person's sequences for the same area.

@bkil
Copy link
Author

bkil commented Mar 26, 2023

The problem I see with all such Torrent-like alternatives is that they aren't as scaleable, being vulnerable to abuse and leeches.

Without the web of trust, the local chapter (or anyone for that matter) can't pin data of others, as anyone could fake the date and location of images or even stuff them with encrypted data hidden with blatant steganography just to enjoy free storage or for exchanging data for malicious purposes.

A much more sophisticated attacker could just troll with this system or editors utilizing it by placing phony image sequences in mostly empty areas or editing the images in a way that would result in adding false data to OpenStreetMap. I think the hyped IPFS & friends are geeky technologies desperately looking for a real world problem to solve, but they are yet to succeed as of now.

I would be more than happy to contribute any kind of imagery of whatever the community needs wherever I go, but as I am low on space, I don't bother gathering and sharing the data in the first place. The only way to cope with this is web of trust based specialization within the community. I.e., some layperson heroes go camping at random interesting places to gather map data while other heroes can't have time for this but are tech-minded and can also afford to host 100x as much in storage for others than the average.

It is both energetically and in terms of hazardous waste production much more desirable to utilize shared hosting among let's say 100 volunteers compared to expecting every volunteer to purchase and operate new hardware:

https://github.com/bkil/freedom-fighters/blob/master/en/article/shared-hosting.md

Residential ADSL is inherently asymmetrical in the wrong direction and such an image hosting service is bound to be saturated egress, so there will be a great impedance mismatch there even if we discounted the fact that you are not legally allowed to host a public service at home.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants