New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust last updated date check, more strict sanitize function, two new placeholders. #14
Conversation
…ue if found 2.in sanitize(): convert unicode to ascii to prevent accented letters in filename causing problems 3.added 'entrysummary' and 'subtitle' to placeholders. These are descriptions of the entry and the feed, useful for adding to mp3 comment fields to describe the podcast
Received, thanks. Will review it shortly. One thing that we will need to do is turn every non-standard-library module into an optional dependency (like we do with Stagger). Anyway, as I say, in a day or so I'll give more detailed comments. |
Sure, it should be easy to do. The only two places BeautifulSoup is used is in the sanitize() function where I left your original code commented out, and in the Placeholders class where I used it to sanitize the subtitle field. They could probably just be consolidated into the sanitize function and a check for BeautifulSoup availability could decide which route the sanitize function should take. I'll try to look at it tonight. |
Added htmltotext function to check if beautifulsoup module is available and use it to convert html to text for putting into mp3 tags. Added python-beautifulsoup4 as an optional dependency to git PKGBUILD
I added a new function htmltotext() which checks if the beautifulsoup module is available and uses it to convert html to text if it is. If not, it just passes the html through unchanged. I also added it as an optional dependency in the git pkgbuild file, let me know how that works for you, Filip |
More robust last updated date check, more strict sanitize function, two new placeholders.
This works. In particular, the more robust updated-time check is a clear, obvious improvement. Thanks a lot for taking the time to improve the code. I've changed a couple of things here and there -- check the commits after yours. |
Great, I just found something else which might be a bug: when using 'greg download' after 'greg check' it was saving all the files using localtime. I noticed because I use the {date} placeholder to save filenames. sync() uses the published date, so I looked into the download() function and it looks like it just sets entry.linkdate to the localtime without checking for entry.published_parsed or entry.updated_parsed. sync() does check for those, so I basically just copied the snippet from sync() to download() and it seems to be setting {date} correctly now. If there was another reason for setting entry.linkdate to localtime that I'm overlooking, please let me know, thanks, -Filip |
That seems indeed to be a bug. Would you mind opening a new issue, so that I can keep track of that? The ideal thing to do would be to take the snippet out of sync(), into its own function, and then call it both from sync() and download() |
1.add check for entries[0].published_parsed and set sync_by_date() True if found
2.in sanitize(): convert unicode to ascii to prevent accented letters in filename causing problems
3.added 'entrysummary' and 'subtitle' to placeholders. These are descriptions of the entry and the feed, useful for adding to mp3 comment fields to describe the podcast