Wikipedia Integrations #46

Open
jbenet opened this Issue Sep 15, 2015 · 9 comments

Comments

Projects
None yet
6 participants
@jbenet
Member

jbenet commented Sep 15, 2015

We've been planning to "put Wikipedia on IPFS" for a long, long time. this issue will track possible integration points and their progress. These may lead to independent repos, etc.

In short, the way i see it, we have multiple layers of "integration" with wikipedia. these are discussed below in more detail.

  1. Archive: archive all of wikipedia on IPFS -- as in https://github.com/ipfs/archives
  2. Media: assist wikipedia.org with serving wikipedia media via IPFS ("the big stuff")
  3. Rehost: serve all of wikipedia over IPFS (falling back to ipfs http gateway)
  4. Restructure: rethink wikipedia's datastructures as CRDTs (or even basic git commits), to create new wiki software that leverages IPFS.

(4) is the most exciting to me, but wont happen for a while. (1-3) we can already do. Let's start with (1) and (2).

1. Archive: archive all of wikipedia on IPFS -- as in https://github.com/ipfs/archives

This is a matter of regularly downloading data dumps and adding them. We need to construct "help archive X" pages to publish the newest heads and guide people to help get an archive setup. (may need ipfs-cluster for good success to happen.

We can do this on our own and do not need to ask for permission, as everything is CC. (correct me if i'm wrong pls).

Steps:

  • open an issue in https://github.com/ipfs/archives
  • plan out there how to ingest all of it
  • ingest all of it
  • figure out how to keep up to date
  • make the "help archive X" pages
  • make ipfs-cluster

2. Media: assist wikipedia.org with serving wikipedia media via IPFS ("the big stuff")

This means hosting all of the big files that wikipedia has to serve. It's perhaps where we can contribute the most, but then again our poor gateway may not be able to deal with the massive bandwidth usage.

What we need, then, is

3. Rehost: serve all of wikipedia over IPFS (falling back to ipfs http gateway)

After 2 is done, we can proceed with a full mirror. (it may be easier to skip 2. and go to 3., this is to be discussed, but seems harder given difficulty on their end integrating with their backend and so on).

4. Restructure: rethink wikipedia's datastructures

This means restructuring how wikipedia's internal datastructures work to provide an editing model based on either CRDTs (or basic git commits). We could then put these directly on top of IPFS and allow people to edit + create "wikipedia commits" and "wikipedia PRs" all over IPFS.

This is a large undertaking, so perhaps step 1 is rethink the mediawiki data storage layer over ipfs first, and try making a demo. Also worth thinking about federated wiki in this context and see where "upgrading wikipedia with fedwiki" might lead. I think in general, it may be safest to just replace the storage layer first, and go from there.

To me, this is the most interesting part. But it's the biggest and the one which will take the longest to do.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@domschiener

This comment has been minimized.

Show comment
Hide comment
@jbenet

This comment has been minimized.

Show comment
Hide comment
Member

jbenet commented Sep 15, 2015

moved by @jbenet to #47 (comment)

@domschiener

This comment has been minimized.

Show comment
Hide comment
@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 15, 2015

Member

@domschiener please move this discussion to another issue i moved it to #47

Member

jbenet commented Sep 15, 2015

@domschiener please move this discussion to another issue i moved it to #47

@davidar

This comment has been minimized.

Show comment
Hide comment
@davidar

davidar Sep 15, 2015

Member

👍

Member

davidar commented Sep 15, 2015

👍

@davidar davidar referenced this issue in ipfs/archives Sep 16, 2015

Open

Wikipedia #20

@rht

This comment has been minimized.

Show comment
Hide comment
@rht

rht Sep 17, 2015

For layer 4, at least today there are several implementations of git-based wiki (i.e. can be distributed but minus the built-in way to preserve a canonical dag chain).

rht commented Sep 17, 2015

For layer 4, at least today there are several implementations of git-based wiki (i.e. can be distributed but minus the built-in way to preserve a canonical dag chain).

@almereyda

This comment has been minimized.

Show comment
Hide comment
@almereyda

almereyda Jun 12, 2016

@opn and @WardCunningham have been working on a so-called transformerporter to load Wikipedia pages into Federated Wiki.

Entrance points to this could be


Hey, what's this?

almereyda commented Jun 12, 2016

@opn and @WardCunningham have been working on a so-called transformerporter to load Wikipedia pages into Federated Wiki.

Entrance points to this could be


Hey, what's this?

@flyingzumwalt flyingzumwalt referenced this issue in ipfs/distributed-wikipedia-mirror May 1, 2017

Closed

Gather background info from other repositories and add to this one #6

@ldct

This comment has been minimized.

Show comment
Hide comment
@ldct

ldct Feb 22, 2018

hey @jbenet I saw the blog post about the Turkish wikipedia dump on IPFS. Are goals 2-4 still being worked on?

ldct commented Feb 22, 2018

hey @jbenet I saw the blog post about the Turkish wikipedia dump on IPFS. Are goals 2-4 still being worked on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment