Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The AllMusic tagsource was broken #677

Merged
merged 1 commit into from Jan 22, 2022

Conversation

bernd-wechner
Copy link
Contributor

The format of the AllMusic page has clearly changed a little since it was written:

  • album and artist are now in h1 and h2 tags not in div tags
  • the class of the sidebar album container is no longer "album-contain" but "album-contain gallery-launch". For which the HTML parser needed a patch to support CSS class matching a la:

https://tanzu.vmware.com/content/blog/xpath-css-class-matching

  • A clear bug remained in which album_page was overwrritten by an encoding
  • Another bug existed in which a string was sought (.find) in a byte-string.

The format of the AllMusic page has clearly changed a little since it was written:

- album  and artist are now in h1 and h2 tags not in div tags
- the class of the sidebar album container is no longer "album-contain" but "album-contain gallery-launch". For which the HTML parser needed a patch to support CSS class matching a la:

https://tanzu.vmware.com/content/blog/xpath-css-class-matching

- A clear bug remained in which album_page was overwrritten by an encoding
- Another bug existed in which a string was sought (.find) in a byte-string.
@bernd-wechner
Copy link
Contributor Author

There are some small whitespace changes that crept in, I suspect because of my IDE tidying up formatting. If a problem, let me know.

This one took a little while to crack, as the AllMusic tagsource was still crashing when expanding one of the returned search results. The crux of it is that the AllMusic pages have changed a little since it was written, and a few old Python2 carry over bugs I suspect remained.

The biggest puzzle to me is why we're using lxml xpath searches (quite kludgy for doing CSS matches requiring a bit of non lucid code - I would comment that in the code base actually and will if you agree) rather than BeautifulSoup say. But I worked on a minimalist patch to make it work. Now I can use AllMusic ;-).

@sandrotosi sandrotosi merged commit c8bceff into puddletag:master Jan 22, 2022
@bernd-wechner bernd-wechner deleted the fix-Allmusic-tagsource branch January 22, 2022 22:31
@audiomuze
Copy link

audiomuze commented May 29, 2022

@bernd-wechner could you please look into diacritic marks in the tag source - they don't pull through correctly: #721

Whilst your fix gets the tag source working it doesn't handle diacritic marks correctly.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants