Skip to content

🏛️ 22120 - An archivist browser controller that caches everything you browse, a library server with full text search to serve your archive.

License

nkgfirecream/22120

 
 

Repository files navigation

22120

🏛️ - An archivist browser controller that caches everything you browse, a library server with full text search to serve your archive.

Save your browsing, then switch off the net and go to http://localhost:22120 and switch mode to serve then browse what you browsed before. It all still works.

Downloading the binary for your OS

Get one from the releases page.

Installing using npm

npm i -g archivist1

Running as a Node.JS app

npm i && npm start

Chrome Extension

Coming soon.

Using

Pick save mode or serve mode

Go to http://localhost:22120 in your browser, and follow the instructions.

Old Skool, DIY, CUST0M way

Clone the repo, comment out the ChromeLaunch line in app.js, run npm i, open your browser with --remote-debugging-port=9222 then:

Save your stuff

npm run save

Serve your stuff

npm run serve

Get the Chrome Extension

Coming soon.

Initial goal

Proof of concept of the ability to browse and transparently save everything, then switch off internet and browse it later as if you were still online.

Inspired by people talking about enriching bookmarks and browser history with the ability to save all your browsing data and search it, even independent of you being online or the site being online.

How it works

Uses DevTools protocol to intercept all requests, and caches responses against a key made of (METHOD and URL) into an in memory map which it saves to disk every 10 seconds.

So far

  • The library server hasn't been implemented.
  • Only saving and serving with the archivist works.
  • You can use it by opening your browser with --remote-debugging-port=9222 then running npm run save. Everything you browse will be saved to cache.json
  • You can switch off your internet and run npm run serve (also with your browser on remote debugging) and browse everything you just saved as normal.

Future

  • Implement library server so we can actually save the responses to disk in the "file tree structure" of the site you browse (this new lighter memory archive structure is done)
  • then serve it, and also index and search it.
  • The idea is that you can browse a site and end up with a static directory structure of assets that you can then serve on a local static server and browse it basically as normal.
  • Generally improve code and efficiency.

The goal

To build a personal archive that you can search and use that does not depend on the continued existence of those sites, or on having internet, but that works just like you are browsing them.

Stuff that will probably be hard (and I haven't thought much about)

  • Streaming content (audio, video)
  • "Impure" request response pairs (such as if you call GET /endpoint 1 time you get "A", if you call it a second time you get "AA", and other examples like this).
  • WebSockets (how to capture and replay that faithfully?)

There are probably "good enough" solutions to all these, and likely some or all of them already exist and have been thought up by other smart people.

More Instructions

Can I use this with a browser that's not Chrome-based?

Probably not. At least not yet.

Higher level description

Basically this is like a "full spectrum record" of your browsing history, with all assets and their content saved. It's like going on holiday and taking a GoPro that saves everything you look at, except that the quality is such that when you replay it, it's actually the same as experiencing it the first time.

FAQ

How does this interact with Ad blockers?

Interacts just fine. The things ad blockers stop will not be archived.

How secure is running chrome with remote debugging port open?

Seems pretty secure. It's not exposed to the public internet, and pages you load that tried to use it cannot use the protocol for anything (except to open a new tab, which they can do anyway).

Is this free?

Yes this is totally free to download and use. It's also open source so do what you want with it.

About

🏛️ 22120 - An archivist browser controller that caches everything you browse, a library server with full text search to serve your archive.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 95.6%
  • HTML 4.4%