My purpouse is to learn Data Analysis and Data Mining using data from Onda Rock, an Italian music portal.
Each review page of Ondarock is conneted with the other using hyperlink. I would obtain the network using to parse the pages:
- Request
- BeautifulSoap
- Htlm5lib
- nltk
To store and analyse the net:
- NetworkX
To plot the data
- D3
There are clusters? And these follow the division based on music gender?
I will chose a way to store and organize data, for example a DB, like Mongo o Couch. Any information is precious, like votes or the page reviewer
- I would use also Pandas to charge the data
- to analise. After I can think about to search correlation between data, or to developt a method to sugest me some music that I don't know but that will be like.