You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After testing this out with a few different websites, there are little things that I'd like to tweak in the text output that are making the audio a bit harder to listen to
One thing is being able to manually tweak the text content before it becomes audio, but I'll talk about that in a different issue. Here, I'll talk about setting 'rules' or filters for content.
In the app source code, I've got some hard-coded tweaks for Wikipedia in particular, see below:
I want to make it easier to add rules / filters for domains. For example:
On the Mixmag website, there were full image URLs in the middle of the audio. I think I would filter out the 'figure' tag
On The Conversation, there are many "Read more:" links interspersed within the actual prose content. I would filter out paragraphs that begin with the text "Read more: " (not just a 'selector' to filter out).
Rules could either be 'global' (apply to all sites) or domain-specific.
How we would implement this
Add a database table called 'extraction rules'. Columns would be label, active boolean, domain (if set to null, then it is global), rule type - selector or regex (?), rule content. [Might need a bit more experimentation on the exact implementation here]
Move those hard-coded Wikipedia tweaks into database rules
When you're extracting content, get matching rules from the database and apply them to the content
The text was updated successfully, but these errors were encountered:
I've almost got this working. One thing that is causing a little bit of weirdness, is my decision to treat extraction rules where the 'domain' is null as 'global' rules that apply all the time. Because when you submit a form with an empty input, the database field is not 'null', it is an empty string.
So maybe I should re-think that decision, and have a different way of setting global vs domain-specific rules.
astro-sqlite-tts-feed/src/utils/extract-article.js
Lines 42 to 71 in f037b31
How we would implement this
The text was updated successfully, but these errors were encountered: