It's like Google for all your digital shit (information), spread across multiple disjoint systems with no common data format.
Documents everywhere -> Distillation -> Document blobs -> Storage
Query to find documents -> Storage -> Results (URLs)
- Retrieve a document from a URL
- Run it through a distiller to produce a document blob
- Put the key (URL) and value (document blob) relation into storage
- Queries