Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What.cd metadata preservation #11

Open
denisnazarov opened this issue Nov 21, 2016 · 3 comments
Open

What.cd metadata preservation #11

denisnazarov opened this issue Nov 21, 2016 · 3 comments

Comments

@denisnazarov
Copy link
Contributor

denisnazarov commented Nov 21, 2016

As you may have heard, what.cd, a private torrent tracker that is often called "the greatest music collection in the history of the world," was permanently shut down last week.

By many accounts, it was one of the most thorough music catalogs in history, and its shutting down represents a huge loss of cultural heritage data.

E.g.:

“Collages” were one of What’s best features. Users arranged lists of albums on the site into useful categories like “Intro to free jazz” or “Bands with a male and female singer.” These were indispensable sources of musical discovery.

The goal of this issue is to coordinate an effort to preserve the metadata from what.cd in Mediachain (albums, artists, tracks, collages) for the purposes of cultural preservation (not the torrents or media itself).

Why Mediachain is the appropriate solution:

  • everything is immutable and crypto signed so provenance/authorship of the data is permanently established
  • content-addressed links are location-independent and permanent: an organization such as Internet Archive can act as the primary host of the assets, but other orgs or users can mirror whole or part automatically
  • mediachain provides decentralized resolution from "canonical" IDs like ISRC/ISWC/etc to the content-addressed hashes so we can ensure the links keep working no matter where the data lives
  • dataset remains richly structured and accessible, instead of being a multi-terabyte dump on a server somewhere that only specialized researchers ever use (as often happens with these kinds of exports)

Please comment below if you have access to a recent What.cd metadata dump or would like to help in some other way.

Related to #10

@superphly
Copy link

I know a guy who has a lot of that data compiled and sorta archived privately.

@decadent
Copy link

An (alleged) dump of What.cd's database is available on open trackers, apparently prepared by the staff before the shutdown. The list of collages is in there, as well as the necessary related tables.

@superphly
Copy link

I've got a fairly large dump of .torrent files and metadata that I've been holding onto that I'd be glad to package up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants