@domschiener's project #47

Open
jbenet opened this Issue Sep 16, 2015 · 21 comments

Comments

Projects
None yet
7 participants
@jbenet
Member

jbenet commented Sep 16, 2015

In #46 (comment) @domsciener had said:

Great overview of some of the options @jbenet . What I would propose is option 5), which is a new decentralized application that builds on the foundation Wikipedia has built (which is a huge centralized knowledge database) and utilize IPFS combined with smart contracts, microtransactions and new reputation systems to create an awesome new usecase for users and Wikipedia itself.

What I mean specifically with that is something which me and @DavidSoens have been discussing for a while, which is a compressed version of Wikipedia that focuses on delivering short, concise and precise explanations of specific terms. This is very similar to UrbanDictionary, just that this delivers useful and serious answers (UrbanDictionary after all is a self-proclaimed anti-Wikipedia platform). This would allow users to search for a specific term (lets say "a priori") and they get back the shortest possible explanation they need to understand the basic definition (which in this case would be "knowledge formed by deductive reasoning rather than empirical observation").

So far this is nothing unique and in fact Google has started creating these short definitions a long time ago, but that is where IPFS, smart contracts, microtransactions and reputation comes in:

  1. We are going to use IPFS as the backbone of this application and "eternify" the entries of the short and concise knowledge application.
  2. We can use a smart contracting system (e.g. Ethereum) where we create an "honest" system which pays contributors that create, modify and approve content certain amounts of the generated income of the application. (as a side note, this system would of course only pay out if the contribution is appreciated by the community, but more to that later). Additionally, we can use smart oracles and multisig wallets to create a democratic voting system on content.
  3. Microtransactions and Tipping as an incentivization mechanism that encourages people to contribute, modify and approve content. The way this can be setup is two-fold. For one, tipping can be used by content consumers (i.e. visitors) to show appreciation to content creators. As an example check-out ChangeTip (full disclosure, I too am working on a social tipping platform called TipTap). And for another, we can setup a "Contribution fund" which basically consists of people donating money to the project or revenues generated that go towards the contributors of the website. This goes back to the smart contracting system where based on reputation and contribution these funds are dispensed among the most active contributors of the platform.
  4. Reputation systems for collaborative communities. I specifically like what Primavera and the team at http://backfeed.cc/ are doing and I absolutely think that they would love to collaborate on this project. The way it basically works is that reputation is the sum of ones appreciated contribution to a specific community. This means if my content gets appreciated by people, my reputation goes up and I gain more abilities (and respect) in the community. A specific usecase in our example is that people with higher reputation get their content automatically approved, or they get more privileges on the platform and can edit articles.

This is just a quick summary/overview of what we want to potentially build on top of IPFS. The main reason to focus on something like this instead of going directly to the big target (which is getting Wikipedia on board) is that this is an MVP that is an amazing usecase for IPFS and shows the people at Wikipedia and anyone else the true potential that IPFS (and the other technologies used) have. Through this we gain more leverage in convincing them to move over.

This project can of course grow out of its "concise mode" and we can expand into more elaborate articles that provide greater details and insights about a subject (and thus, either compete or collaborate with Wikipedia). But that is of course on a to-do for the future, so lets focus on this for now ;)

What do you guys think?

I moved this to keep it as its own discussion.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 16, 2015

Member

In #46 (comment) @jbenet said:

@domschiener that all sounds great -- and yes, it's a more advanced version if (4) above. we'd love to support you however we can. i certainly want to move the web in some of those directions, and have done some work on it before. but this is different from integrating with wikipedia itself, and should be a separate project (feel free to open another issue for it, etc). (((Also, be wary, all the things you mention are going to take a very long time to make and get right)))

Member

jbenet commented Sep 16, 2015

In #46 (comment) @jbenet said:

@domschiener that all sounds great -- and yes, it's a more advanced version if (4) above. we'd love to support you however we can. i certainly want to move the web in some of those directions, and have done some work on it before. but this is different from integrating with wikipedia itself, and should be a separate project (feel free to open another issue for it, etc). (((Also, be wary, all the things you mention are going to take a very long time to make and get right)))

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 16, 2015

Member

In #46 (comment) @domschiener said:

Yeh I absolutely agree on that @jbenet . I think the only way to successfully develop something like this is to involve more people and make it a community effort by creating sort of "focus groups" that focus on solving specific problems.

What do you think is the best way to involve more people into this? Or do you know someone who would want to work on this? I will try and work on a prototype over the coming weeks.

Member

jbenet commented Sep 16, 2015

In #46 (comment) @domschiener said:

Yeh I absolutely agree on that @jbenet . I think the only way to successfully develop something like this is to involve more people and make it a community effort by creating sort of "focus groups" that focus on solving specific problems.

What do you think is the best way to involve more people into this? Or do you know someone who would want to work on this? I will try and work on a prototype over the coming weeks.

@jbenet jbenet referenced this issue Sep 16, 2015

Open

Wikipedia Integrations #46

0 of 10 tasks complete
@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 16, 2015

Member

@domschiener what's the title of your project?

Member

jbenet commented Sep 16, 2015

@domschiener what's the title of your project?

@domschiener

This comment has been minimized.

Show comment
Hide comment
@domschiener

domschiener Sep 16, 2015

I don't have a name for it yet. I'll reach out to some people and see if we can get some interest from other communities as well to work on this. Will keep you updated.

I don't have a name for it yet. I'll reach out to some people and see if we can get some interest from other communities as well to work on this. Will keep you updated.

@rht

This comment has been minimized.

Show comment
Hide comment
@rht

rht Sep 17, 2015

For the consensus engine, it doesn't necessarily have to be fully based on reputation system.
If an edit can prove its own correctness, soundness, (and test), then this alone should be sufficient.
http://www-formal.stanford.edu/jmc/future/objectivity.html contains an example of this distributed gov.

rht commented Sep 17, 2015

For the consensus engine, it doesn't necessarily have to be fully based on reputation system.
If an edit can prove its own correctness, soundness, (and test), then this alone should be sufficient.
http://www-formal.stanford.edu/jmc/future/objectivity.html contains an example of this distributed gov.

@rht

This comment has been minimized.

Show comment
Hide comment
@rht

rht Sep 17, 2015

(hashed just in case if there is an apocalypse on the server https://ipfs.io/ipfs/QmeA6i4taf1ufjqsA7BSnECBSA5KKBFxgBTJJkz7M4AFav)

rht commented Sep 17, 2015

(hashed just in case if there is an apocalypse on the server https://ipfs.io/ipfs/QmeA6i4taf1ufjqsA7BSnECBSA5KKBFxgBTJJkz7M4AFav)

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 17, 2015

Member

@rht very nice find.

Member

jbenet commented Sep 17, 2015

@rht very nice find.

@domschiener

This comment has been minimized.

Show comment
Hide comment
@domschiener

domschiener Sep 17, 2015

Progress Update

I wanted to give everyone a quick progress update on the project. I've basically worked on a simple prototype (https://github.com/domschiener/instant-wikipedia) that makes it possible to instantly search wikipedia entries. To describe where I'm going with this:

  • The purpose of this project is it to instantly provide access to the knowledge database
  • Right now this is a simple extract of Wikipedia pages, but essentially I want to turn this into a complete semantic knowledge search which includes: Summary (2 - 4 sentences + essential info), Factoids, Full Page Description, Recent News
  • As previously mentioned above, this should evolve into a Decentralized Collaborative Organization/Community where the community is there to improve the knowledge-base. Essentially, this will be a social experiment.

Right now I'm continuously making API calls but essentially for taking this product live I will need to download the Wikipedia dump (which is around 30gb I think) and utilise that. That means the next step will it be the fork Wikipedia. But for this I absolutely need to discuss a few things with the IPFS community.

Essentially, referring back to #46, we could fork Wikipedia and put it on IPFS so that applications like the one I'm trying to build can utilize the content and perform these operations with. But I'm wondering if this is doable right now? Would love to get some input from you guys.

Progress Update

I wanted to give everyone a quick progress update on the project. I've basically worked on a simple prototype (https://github.com/domschiener/instant-wikipedia) that makes it possible to instantly search wikipedia entries. To describe where I'm going with this:

  • The purpose of this project is it to instantly provide access to the knowledge database
  • Right now this is a simple extract of Wikipedia pages, but essentially I want to turn this into a complete semantic knowledge search which includes: Summary (2 - 4 sentences + essential info), Factoids, Full Page Description, Recent News
  • As previously mentioned above, this should evolve into a Decentralized Collaborative Organization/Community where the community is there to improve the knowledge-base. Essentially, this will be a social experiment.

Right now I'm continuously making API calls but essentially for taking this product live I will need to download the Wikipedia dump (which is around 30gb I think) and utilise that. That means the next step will it be the fork Wikipedia. But for this I absolutely need to discuss a few things with the IPFS community.

Essentially, referring back to #46, we could fork Wikipedia and put it on IPFS so that applications like the one I'm trying to build can utilize the content and perform these operations with. But I'm wondering if this is doable right now? Would love to get some input from you guys.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 17, 2015

Member

i do not want a fork of wikipedia. maintaining wikipedia is a gargantuan amount of work. Please do not do this-- just provide a different view based on different storage for now. we can ingest all of the data, and then run periodic scripts that update all data from wikipedia servers as it is created. thus you do not fork, merely show how it would work distributed.

Member

jbenet commented Sep 17, 2015

i do not want a fork of wikipedia. maintaining wikipedia is a gargantuan amount of work. Please do not do this-- just provide a different view based on different storage for now. we can ingest all of the data, and then run periodic scripts that update all data from wikipedia servers as it is created. thus you do not fork, merely show how it would work distributed.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 17, 2015

Member

anyway, the website looks reall cool and happy to help get it all setup! let's just make sure it's the real wikipedia and not a fork.

Member

jbenet commented Sep 17, 2015

anyway, the website looks reall cool and happy to help get it all setup! let's just make sure it's the real wikipedia and not a fork.

@domschiener

This comment has been minimized.

Show comment
Hide comment
@domschiener

domschiener Sep 17, 2015

Well the point is that I can't take the current system into production as I'm making API calls on each keyboard entry, which means that with many concurrent clients making request this will just put unnecessary bloat on Wikipedia's servers. Which is why I much rather prefer to download Wikipedia's Dump and use it to provide the current features.

This is why I was thinking of potentially "forking" Wikipedia and uploading the content on IPFS. But I too agree that this is a pretty useless way to move forward. I like the idea of simply mirroring Wikipedia as is. This way we can then "prefill" the website and on top of that we let users create entries for Factoids and for the summary of content - which is our unique way of useful content creation.

Of course the question then comes: What if users want to edit the main content mirrored by Wikipedia? Do we fork these individual pages? But I suppose we can wait to answer that question until after we have contributors to Factoids/Summaries.

Well the point is that I can't take the current system into production as I'm making API calls on each keyboard entry, which means that with many concurrent clients making request this will just put unnecessary bloat on Wikipedia's servers. Which is why I much rather prefer to download Wikipedia's Dump and use it to provide the current features.

This is why I was thinking of potentially "forking" Wikipedia and uploading the content on IPFS. But I too agree that this is a pretty useless way to move forward. I like the idea of simply mirroring Wikipedia as is. This way we can then "prefill" the website and on top of that we let users create entries for Factoids and for the summary of content - which is our unique way of useful content creation.

Of course the question then comes: What if users want to edit the main content mirrored by Wikipedia? Do we fork these individual pages? But I suppose we can wait to answer that question until after we have contributors to Factoids/Summaries.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 18, 2015

Member

yes im not suggesting "dont download wikipedia", i'm saying there's a big difference between "cloning and staying up to date with upstream" and "forking". "forking" implies changes in your clone that are not in upstream.

Member

jbenet commented Sep 18, 2015

yes im not suggesting "dont download wikipedia", i'm saying there's a big difference between "cloning and staying up to date with upstream" and "forking". "forking" implies changes in your clone that are not in upstream.

@domschiener

This comment has been minimized.

Show comment
Hide comment
@domschiener

domschiener Sep 18, 2015

I will write a more serious Concept Paper about the overall idea of this Decentralized Collaborative Community and hopefully more people will join it then. Will keep you posted.

I will write a more serious Concept Paper about the overall idea of this Decentralized Collaborative Community and hopefully more people will join it then. Will keep you posted.

@domschiener

This comment has been minimized.

Show comment
Hide comment
@domschiener

domschiener Sep 20, 2015

@jbenet What do you think is the best strategy for uploading the Wikipedia dumps on IPFS regularly? According to DBPedia.org "1.4 Wikipedia articles are modified per second which results in 84 articles per minute" this means that around 120000 (probably less) articles are edited a day.

Wikipedia creates new dumps roughly once a month, so we would have to catch up with that updated content and create the updates for the IPFS backups. We could of course write a program that compares the old articles with the new articles (from the dump), determines what has been changed and only uploads the changes to IPFS.

But would love to get your input on this. Btw I will have the concept paper ready by tonight so you can all take a look.

@jbenet What do you think is the best strategy for uploading the Wikipedia dumps on IPFS regularly? According to DBPedia.org "1.4 Wikipedia articles are modified per second which results in 84 articles per minute" this means that around 120000 (probably less) articles are edited a day.

Wikipedia creates new dumps roughly once a month, so we would have to catch up with that updated content and create the updates for the IPFS backups. We could of course write a program that compares the old articles with the new articles (from the dump), determines what has been changed and only uploads the changes to IPFS.

But would love to get your input on this. Btw I will have the concept paper ready by tonight so you can all take a look.

@domschiener

This comment has been minimized.

Show comment
Hide comment
@domschiener

domschiener Sep 20, 2015

Also, it should be noted that the dumps consist of roughly 45gb worth of material in total.

Also, it should be noted that the dumps consist of roughly 45gb worth of material in total.

@cryptix

This comment has been minimized.

Show comment
Hide comment
@cryptix

cryptix Sep 20, 2015

Hrm, interesting... I thought the foundation would also publish changesets. Maybe they became to massive for the English version? I found the Special:Export which can return the history of a specific article.

This might give use a good chance to get more feedback on ipfs/notes#23 as it might be necessary to reconstruct the history form revision data without access to the actual VCS?

Tangential question for @domschiener: Have you considered wikidata.org as a source for your project as well?

cryptix commented Sep 20, 2015

Hrm, interesting... I thought the foundation would also publish changesets. Maybe they became to massive for the English version? I found the Special:Export which can return the history of a specific article.

This might give use a good chance to get more feedback on ipfs/notes#23 as it might be necessary to reconstruct the history form revision data without access to the actual VCS?

Tangential question for @domschiener: Have you considered wikidata.org as a source for your project as well?

@domschiener

This comment has been minimized.

Show comment
Hide comment
@domschiener

domschiener Sep 20, 2015

@cryptix I think that all the dumps include metadata, such as the article history, as well. But I haven't downloaded the dumps yet to confirm this. If it contains this information as well it will definitely make our job easier. Here are the dumps I'm referring to btw: https://dumps.wikimedia.org/backup-index.html (enwiki to be precise).

re: wikidata.org. I really like what they are doing and they are basically taking an opposite approach of DBPedia.org. But after some research, I think that utilizing DBPedia is still the better approach considering that they are more established and have more data entries. But perhaps we can find a way to use both.

The way we could use DBPedia is as a sort of "semantic overlay" to the platform which offers a richer and more informative user experience when a user searches for a specific subject. We can for example change the way people get to their desired information by extending the way search queries can be performed (http://dbpedia.org/use-cases/revolutionize-wikipedia-search-0) and we can also create "portable knowledge". What I mean with that is for example you as a website owner of your most favorite football club, lets say FC Bayern, can utilize our API to construct a detailed profile about each player. Instead of having to find out all the information yourself, you can make a simple API call which returns the required information, ready to be displayed on your website. Or we can even go a step further and allow users to create predefined profiles that can be embedded on websites (similar to how onename.com does it with their identity profiles).

The goal is it to be make knowledge even more accessible and easier for people to get the information they want in as short time as possible. But that is what the future of this project hopefully is. In the present we need to create an active community that is incentivized to actively contribute to the platform.

@cryptix I think that all the dumps include metadata, such as the article history, as well. But I haven't downloaded the dumps yet to confirm this. If it contains this information as well it will definitely make our job easier. Here are the dumps I'm referring to btw: https://dumps.wikimedia.org/backup-index.html (enwiki to be precise).

re: wikidata.org. I really like what they are doing and they are basically taking an opposite approach of DBPedia.org. But after some research, I think that utilizing DBPedia is still the better approach considering that they are more established and have more data entries. But perhaps we can find a way to use both.

The way we could use DBPedia is as a sort of "semantic overlay" to the platform which offers a richer and more informative user experience when a user searches for a specific subject. We can for example change the way people get to their desired information by extending the way search queries can be performed (http://dbpedia.org/use-cases/revolutionize-wikipedia-search-0) and we can also create "portable knowledge". What I mean with that is for example you as a website owner of your most favorite football club, lets say FC Bayern, can utilize our API to construct a detailed profile about each player. Instead of having to find out all the information yourself, you can make a simple API call which returns the required information, ready to be displayed on your website. Or we can even go a step further and allow users to create predefined profiles that can be embedded on websites (similar to how onename.com does it with their identity profiles).

The goal is it to be make knowledge even more accessible and easier for people to get the information they want in as short time as possible. But that is what the future of this project hopefully is. In the present we need to create an active community that is incentivized to actively contribute to the platform.

@taoeffect

This comment has been minimized.

Show comment
Hide comment
@taoeffect

taoeffect Sep 25, 2015

If anyone is interested in improving on Wikipedia, I'd love to chat with you. I think today's Wikipedia is fundamentally broken and there's room for at least an order of magnitude improvement (same as from Britannica -> Wikipedia). Get in touch.

@domschiener: got your slack message btw, am working on a reply right now.

If anyone is interested in improving on Wikipedia, I'd love to chat with you. I think today's Wikipedia is fundamentally broken and there's room for at least an order of magnitude improvement (same as from Britannica -> Wikipedia). Get in touch.

@domschiener: got your slack message btw, am working on a reply right now.

@Lapin0t

This comment has been minimized.

Show comment
Hide comment
@Lapin0t

Lapin0t Jul 18, 2016

I am totally in for such a project and I think that this is the kind of project that IPFS could enable. @taoeffect I may get in touch with you but what I have on the mind right now is the semantic web because just storing the content of wikipedia something but IPFS could enable a better view of that data. I know there was a lot of hype around the semantic web some years ago and it fall through but I think the idea of organizing and structuring information is crucial for the future of the web.

An IPFS data node has a some links that can be used to create relations between data and right now I don't think that these links can be tagged (to have a annotated DAG) but this may be emulated (or changed). It may be interesting to think about a webfs standard a bit like the unixfs, it could provide some ideas to crawl and search in an IPFS web.

Lapin0t commented Jul 18, 2016

I am totally in for such a project and I think that this is the kind of project that IPFS could enable. @taoeffect I may get in touch with you but what I have on the mind right now is the semantic web because just storing the content of wikipedia something but IPFS could enable a better view of that data. I know there was a lot of hype around the semantic web some years ago and it fall through but I think the idea of organizing and structuring information is crucial for the future of the web.

An IPFS data node has a some links that can be used to create relations between data and right now I don't think that these links can be tagged (to have a annotated DAG) but this may be emulated (or changed). It may be interesting to think about a webfs standard a bit like the unixfs, it could provide some ideas to crawl and search in an IPFS web.

@fdietze

This comment has been minimized.

Show comment
Hide comment
@fdietze

fdietze Jul 18, 2016

Hi, just stumbled about this thread. I am very interested in such projects and would like to help. I also have to read more in this thread.

Very brief, because I'm traveling right now:

@cornerman and I developed a hypergraph based discussion system as our master theses in computer science. The goal was to build a discussion system that scales in the number of people. We also did our own take on community moderation. I will write about more concepts soon.

Prototype: http://lanzarote.informatik.rwth-aachen.de:9001/dashboard (please play around and do the tutorial to understand the most important concepts)

Source: https://github.com/woost/wust

fdietze commented Jul 18, 2016

Hi, just stumbled about this thread. I am very interested in such projects and would like to help. I also have to read more in this thread.

Very brief, because I'm traveling right now:

@cornerman and I developed a hypergraph based discussion system as our master theses in computer science. The goal was to build a discussion system that scales in the number of people. We also did our own take on community moderation. I will write about more concepts soon.

Prototype: http://lanzarote.informatik.rwth-aachen.de:9001/dashboard (please play around and do the tutorial to understand the most important concepts)

Source: https://github.com/woost/wust

@flyingzumwalt flyingzumwalt referenced this issue in ipfs/distributed-wikipedia-mirror May 1, 2017

Closed

Gather background info from other repositories and add to this one #6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment