Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
60 lines (45 sloc) 1.98 KB
{-# LANGUAGE OverloadedStrings #-}
module Y2018.M04.D20.Solution where
{--
So, YESTERDAY's Haskell problem...
Well, we didn't have a Haskell problem yesterday, because the problem I did
write: "Upload the World Policy Journal archived from the REST endpoint to a
SQL RDS you created," turned out to be a bigger problem than I anticipated,
even after the groundwork I laid out in previous exercises.
So, let's break this elephant-sized problem into bite-sized pieces, shall we?
Problem one, when I uploaded the archive to the RDS and inspected the results,
is this:
"title": {
"rendered": "Russia’s Alternative Political Life"
},
Yeah, you saw that right: there are HTML entities encoded in titles of these
articles. And I'm like: REALLY? And, yes, really.
So, given a set of titles, return their plain-text equivalents. I won't tell
you how to do this, do your own research. Maybe look at HTML libraries in
Haskell.
--}
-- from https://stackoverflow.com/questions/4218205/haskell-remove-html-character-entities-in-a-string
import qualified Text.HTML.TagSoup as TS
deHTMLize :: String -> String
deHTMLize = TS.fromTagText . head . TS.parseTags
-- deHTMLize the following titles:
titles :: [String]
titles = [ "Trajectory of the U.S.-China Trade Impasse",
"Russia’s Alternative Political Life",
"Africa’s Defense Against Non-Communicable Disease",
"In Print: “Dissident Cinema”",
"In Print: “Only a Shadow”",
"In Print: “Life on Mars”",
"The Contentious U.S. Presence in Okinawa, Japan",
"Displaced in Darfur"]
{--
>>> mapM_ (putStrLn . deHTMLize) titles
Trajectory of the U.S.-China Trade Impasse
Russia’s Alternative Political Life
Africa’s Defense Against Non-Communicable Disease
In Print: “Dissident Cinema”
In Print: “Only a Shadow”
In Print: “Life on Mars”
The Contentious U.S. Presence in Okinawa, Japan
Displaced in Darfur
--}