endor-daniel edited this page Oct 15, 2018 · 4 revisions

MIT Open Music Legal Hackathon Sunday, October 28, 2018 at 9:00am EST www.legalhackathon.org

Background

Endor is giving a taste of applied Social Physics to participants during the MIT Open Music Legal Hackathon. The hackathon is dedicated to analyzing data from the music industry. Participants will be able to derive fresh insights on music consumption by creating new predictions off music consumer data.

Last.fm has kindly provided the hackathon with consumer data from its platform. Participants may use data on artists, songs, albums, transactions and users with the aim of deriving new insights from the data.

Please read instructions carefully as you form teams and start discussing. Whether you are located at MIT Media Lab or any of our 9 other satellite locations, all rules apply equally. Good luck and have fun!

Instructions

  1. Verify you have the following Operating System:

Any OS which is UNIX based.

  1. Install Docker CE on your local device:

https://docs.docker.com/install/

  1. Run the following bash script on your local device:

bash <(curl -s https://raw.githubusercontent.com/AthenaWisdom/standalone_scorer/master/bin/run.sh)

Upon running the script, a docker image will be downloaded with a folder containing a number of files. One of these files is a CSV file which you will have to hack prior to re-running the script for predictions. After “hacking” the CSV file, you should re-run the script to create a new prediction and view its results.

  1. Retrieve data from the following public s3 bucket:

https://s3.amazonaws.com/endor-hackathon/

This s3 bucket houses Last.fm data for participants to gain real insights about music consumption. Participants will have to come up with questions that are suitable for creating new predictions off of.

  1. Open the CSV file located in the folder from the bash script (see step 3):

./input/input/kernel.csv

Hack (i.e., modify) this file according to the question you have chosen to ask, with which to create a new prediction. You may modify this file using any appropriate tool such as Excel, SQL, R, etc. After modifying this file, make sure to re-run the bash script in order to create a new prediction and view its results.

Additional Guidelines

Please follow the guidelines below in order to successfully read Last.fm’s consumer data, ask a data-driven question, modify the CSV file, re-run the bash script and view your new prediction’s results.

Make sure to view Last.fm’s consumer data (located in the public s3 bucket) in great detail. The better you understand the data, the more interesting questions you will be able to ask yourselves.

Once you have read the data in detail, take the time to pen down a couple of future oriented questions about Last.fm’s music consumers. Center questions on the format “Who is likely to X?”

Pick a question to start with. Identify which data from Last.fm’s consumer data is relevant to that question. Note down the data so that you can easily trace it in the CSV file.

Specifically, you will want to focus on identifying the following kinds of consumer data: (a) A subset of the population which already displays behavior X. (Recall: “Who is likely to X?”) (b) A subset of the population for which you want to know how likely they will display behavior X.

Modify the CSV file in the following three columns: Universe, White, Ground. Modify these columns for the population IDs that you have identified from Last.fm’s consumer data.

After all modifications have been applied to the CSV file, re-run the bash script on your local device.

Open the folder and look for the file ./output/[DATE]/all_scores.csv to view your prediction’s results! You will be able to see probabilities attached to each ID, predicting the likelihood of IDs to exhibit the desired behavior X.

Glossary of Terms

Social Sphere: All data. In our case, this is all of Last.fm’s data available in the public s3 bucket.* Universe: A subset of the population for which you are predicting the likelihood of certain behavior.* White: Another subset of the population which already displays that same behavior.* Ground: The final population subset which is (most) likely to display that same behavior. Kernel: A table which contains Universe, White and Ground data. In our case, this is the CSV file.

*Every Social Sphere entails Universes, and each Universe entails Whites.

Notes

Results from the hackathon will be publicly released in Endor’s GitHub repository at a later date.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.