-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chartInfoSoup contents - album info index off by one #2
Comments
Thanks for your input! I'm really glad this project has been useful! I wasn't able to duplicate the problem above on my computer. As you might expect, this is very troubling. About a month ago, there was a pull request targeting a different issue that I also couldn't reproduce. It's like other people are getting different HTML pages from Billboard's servers than me, which shouldn't be happening. Do me a favor, please—run this script and tell me what you get. import json, requests
url = 'http://www.billboard.com/charts/hot-100'
headers_current = {'User-Agent': 'billboard.py (https://github.com/guoguo12/billboard-charts)'}
req = requests.get(url, headers=headers_current)
print json.dumps(dict(req.headers), sort_keys=True, indent=4, separators=(',', ': '))
print json.dumps(dict(req.request.headers), sort_keys=True, indent=4, separators=(',', ': ')) This script sends a HTTP GET request to Billboard's servers and prints the return and request headers to stdout in JSON format. What I got was this. Not sure if this will help, but it's worth a shot. Let me know if you have any other ideas as to why this might be happening. |
Here's my output: http://pastebin.com/4Gy49tbi The only notable differences I see are the server ( Also ran the script again this morning, still getting the all null albums. |
Hmm. Well, I'm not sure where to go from here. I'm not familiar with the intricacies of HTTP either, but I'm guessing content might be varied based on the client IP address. I can think of two possible options. We can put something like this in: if chartInfoSoup.contents[3].string:
album = chartInfoSoup.contents[3].string.strip()
elif chartInfoSoup.contents[4].string:
album = chartInfoSoup.contents[4].string.strip()
else:
album = None
# This might not work for songs without album names on my end. Alternatively, we can rewrite the code to ignore the line breaks, maybe using regex. Let me know what you think is best. |
I was thinking more towards the first option to keep things simple for now. Also it seems like to a lot of people that parsing with regex screams bloody murder, so maybe we'll hold back on it for now since the Billboard HTML code is pretty big. It's been about 24 hours or so and I haven't ran into any problems, so I'll pull up a PR. If anything comes up we can reopen this. |
Merged. Thank you for your help! I've given you full access to the repository. If there are any fixes or improvements you want to make in the future, feel free to do so. |
Oh wow, I wasn't expecting that, thanks again! To be honest, I think you've gotten the main stuff nailed down at the moment. The only other feature I was thinking about implementing with the data we can get is determining if the entry rose/fell in the ranks from the previous week or if its a new entry/re-entry, which should be pretty easy to do since we already have the necessary info to determine it. |
First off, thanks so much for doing this. As both a charts geek and CS major, I've always wanted to implement the Hot 100 into a simpler, quick and easy to read layout. I was disappointed to learn that the Billboard API had been shut down for awhile now...and then I found this a few days ago and was immediately overjoyed.
Anyway, some background on the issue, I've implemented the basic API functionality into a personal project web app of mine (which can be found here) where right now it displays the top 10 entries info and all that. I also put the info in a SQLite database so the app doesn't have to spend time re-downloading the same info over and over again when navigating to the page.
A couple of hours ago though, while making some adjustments, all the albums for every entry suddenly became null and the compiler obviously wasn't happy about it. I thought it had something to do with my program, but just to check, I ran another unaltered copy of the API script in a separate folder, and all of the albums turned up null as well.
I had the feeling that the pages code had changed somehow and now the script was grabbing the wrong thing, so I printed out the contents of chartInfoSoup and here's what I got.
If you count the indexes, you can see the album info got pushed one over by the
<br>
tag. I shifted the index the album string gets its info from 3 to 4 so lines 77-80 look like this:And it seemed to grab the album info like normally again. I'm not putting a PR for now....since I'm kinda skeptical that it will stay this way, but if it stays the same after a few days then I'll likely do so. Just keeping it as an issue as maybe something to monitor.
The text was updated successfully, but these errors were encountered: