chartInfoSoup contents - album info index off by one #2

brycematsuda · 2014-07-06T11:03:34Z

First off, thanks so much for doing this. As both a charts geek and CS major, I've always wanted to implement the Hot 100 into a simpler, quick and easy to read layout. I was disappointed to learn that the Billboard API had been shut down for awhile now...and then I found this a few days ago and was immediately overjoyed.

Anyway, some background on the issue, I've implemented the basic API functionality into a personal project web app of mine (which can be found here) where right now it displays the top 10 entries info and all that. I also put the info in a SQLite database so the app doesn't have to spend time re-downloading the same info over and over again when navigating to the page.

A couple of hours ago though, while making some adjustments, all the albums for every entry suddenly became null and the compiler obviously wasn't happy about it. I thought it had something to do with my program, but just to check, I ran another unaltered copy of the API script in a separate folder, and all of the albums turned up null as well.

I had the feeling that the pages code had changed somehow and now the script was grabbing the wrong thing, so I printed out the contents of chartInfoSoup and here's what I got.

If you count the indexes, you can see the album info got pushed one over by the <br> tag. I shifted the index the album string gets its info from 3 to 4 so lines 77-80 look like this:

if chartInfoSoup.contents[4].string:
    album = chartInfoSoup.contents[4].string.strip()
else:
    album = None

And it seemed to grab the album info like normally again. I'm not putting a PR for now....since I'm kinda skeptical that it will stay this way, but if it stays the same after a few days then I'll likely do so. Just keeping it as an issue as maybe something to monitor.

The text was updated successfully, but these errors were encountered:

guoguo12 · 2014-07-06T13:01:49Z

Thanks for your input! I'm really glad this project has been useful!

I wasn't able to duplicate the problem above on my computer.

As you might expect, this is very troubling. About a month ago, there was a pull request targeting a different issue that I also couldn't reproduce. It's like other people are getting different HTML pages from Billboard's servers than me, which shouldn't be happening.

Do me a favor, please—run this script and tell me what you get.

import json, requests

url = 'http://www.billboard.com/charts/hot-100'
headers_current = {'User-Agent': 'billboard.py (https://github.com/guoguo12/billboard-charts)'}

req = requests.get(url, headers=headers_current)
print json.dumps(dict(req.headers), sort_keys=True, indent=4, separators=(',', ': '))
print json.dumps(dict(req.request.headers), sort_keys=True, indent=4, separators=(',', ': '))

This script sends a HTTP GET request to Billboard's servers and prints the return and request headers to stdout in JSON format. What I got was this.

Not sure if this will help, but it's worth a shot. Let me know if you have any other ideas as to why this might be happening.

brycematsuda · 2014-07-06T20:11:29Z

Here's my output: http://pastebin.com/4Gy49tbi

The only notable differences I see are the server ("server": "ECS (cpm/F9B6)") and cache hits ("x-cache-hits": "HIT (5)") but other than that, it's relatively the same. I'm not very familiar with how http requests work and all that at the moment, so I'm not too certain as to what might be happening. I tried fooling with user agents on this site but both my browser and the windows FF/Chrome browsers came back with relatively the same info.

Also ran the script again this morning, still getting the all null albums.

guoguo12 · 2014-07-07T05:51:42Z

Hmm. Well, I'm not sure where to go from here. I'm not familiar with the intricacies of HTTP either, but I'm guessing content might be varied based on the client IP address.

I can think of two possible options. We can put something like this in:

if chartInfoSoup.contents[3].string:
    album = chartInfoSoup.contents[3].string.strip()
elif chartInfoSoup.contents[4].string:
    album = chartInfoSoup.contents[4].string.strip()
else:
    album = None
# This might not work for songs without album names on my end.

Alternatively, we can rewrite the code to ignore the line breaks, maybe using regex. Let me know what you think is best.

brycematsuda · 2014-07-07T07:36:35Z

I was thinking more towards the first option to keep things simple for now. Also it seems like to a lot of people that parsing with regex screams bloody murder, so maybe we'll hold back on it for now since the Billboard HTML code is pretty big.

It's been about 24 hours or so and I haven't ran into any problems, so I'll pull up a PR. If anything comes up we can reopen this.

Fix #2

guoguo12 · 2014-07-07T16:11:28Z

Merged. Thank you for your help!

I've given you full access to the repository. If there are any fixes or improvements you want to make in the future, feel free to do so.

brycematsuda · 2014-07-08T07:12:51Z

Oh wow, I wasn't expecting that, thanks again!

To be honest, I think you've gotten the main stuff nailed down at the moment. The only other feature I was thinking about implementing with the data we can get is determining if the entry rose/fell in the ranks from the previous week or if its a new entry/re-entry, which should be pretty easy to do since we already have the necessary info to determine it.

guoguo12 · 2014-07-11T18:09:38Z

Actually, that's already sort of included. Each song has attributes lastPos and peakPos for last position and peak position on the chart. There's also a weeks attribute for number of weeks on chart.

brycematsuda added a commit to brycematsuda/billboard-charts that referenced this issue Jul 7, 2014

Fix guoguo12#2 (for now)

4140b9e

brycematsuda mentioned this issue Jul 7, 2014

Fix #2 #3

Merged

guoguo12 closed this as completed in #3 Jul 7, 2014

guoguo12 added a commit that referenced this issue Jul 7, 2014

Merge pull request #3 from brycematsuda/issue_2

8b53562

Fix #2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chartInfoSoup contents - album info index off by one #2

chartInfoSoup contents - album info index off by one #2

brycematsuda commented Jul 6, 2014

guoguo12 commented Jul 6, 2014

brycematsuda commented Jul 6, 2014

guoguo12 commented Jul 7, 2014

brycematsuda commented Jul 7, 2014

guoguo12 commented Jul 7, 2014

brycematsuda commented Jul 8, 2014

guoguo12 commented Jul 11, 2014

chartInfoSoup contents - album info index off by one #2

chartInfoSoup contents - album info index off by one #2

Comments

brycematsuda commented Jul 6, 2014

guoguo12 commented Jul 6, 2014

brycematsuda commented Jul 6, 2014

guoguo12 commented Jul 7, 2014

brycematsuda commented Jul 7, 2014

guoguo12 commented Jul 7, 2014

brycematsuda commented Jul 8, 2014

guoguo12 commented Jul 11, 2014