Skip to content
Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
110 lines (79 sloc) 2.96 KB
{-# LANGUAGE OverloadedStrings #-}
module Y2018.M07.D24.Solution where
{--
There are a couple of things to do with today's Haskell exercise.
There's a JSON file associated with this exercise. I'd show you the first
few lines, but I can't because the file:
$ wc Y2018/M07/D24/disambiguation_full_25.json
0 178958 1403669 Y2018/M07/D24/disambiguation_full_25.json
apparently has '0' lines (one line, no line-feed).
But I will show you the first few characters so you can get an idea of the
structure.
--}
import Data.Aeson
import Data.Aeson.Encode.Pretty
import Data.ByteString.Lazy.Char8 (ByteString)
import qualified Data.ByteString.Lazy.Char8 as BL
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe (fromJust)
-- or, better yet, you do that.
exDir, dict :: FilePath
exDir = "Y2018/M07/D24/"
dict = "disambiguation_full_25.json"
-- Write a function that prints the first n bytes of a file:
firstNBytes :: Int -> FilePath -> IO ByteString
firstNBytes n = fmap (BL.take (fromIntegral n)) . BL.readFile
{--
What are the first 250 bytes of this file?
>>> firstNBytes 250 (exDir ++ dict)
"{\"Freshwater mangrove\": {}, \"Mango-pine\": {}, \"Heather Smith\": {\"Heath
er Smith (author)\": \"Heather Smith author Australian author\", \"Heather
Smith (curler)\": \"Heather Smith curler born 1972 Canadian Curler\", \"H
eather Smith (public servant)\": "
Okay, from the first few bytes of this file, we see we have entries with no
values... that's ... "useful."
But, also, I found that not every element is a map. Separate this dictionary
into ones that have mappings and ones that don't. What is the structure of the
elements that are not mappings?
--}
type Dictionary = Map String Value
loadDictionary :: FilePath -> IO Dictionary
loadDictionary = fmap (fromJust . decode) . BL.readFile
partitionDictionary :: Dictionary -> (Dictionary, Dictionary)
partitionDictionary = Map.partition isMapping
where isMapping (Object _) = True
isMapping _ = False
{--
>>> dic <- loadDictionary (exDir ++ dict)
>>> length dic
4614
>>> (dics,nondics) = partitionDictionary dic
>>> length nondics
343
That's quite a few entries!
What are these nondics?
>>> take 10 (Map.elems nondics)
[Null,Null,Null,Null,Null,Null,Null,Null,Null,Null]
Huh. Are there any that are not null?
--}
nonNulls :: Dictionary -> Dictionary
nonNulls = Map.filter notNull
where notNull Null = False
notNull _ = True
{--
>>> length (nonNulls nondics)
0
Nope, all the nondics are Nulls. So we can get rid of them, and for the dics
we can get rid of the entries that have empty maps. Let's do that.
--}
pruneDictionary :: Dictionary -> Dictionary
pruneDictionary = Map.filter (not . Map.null . toMap)
where toMap = successful . fromJSON
successful :: Result (Map String String) -> Map String String
successful (Success m) = m
-- Of the original 4614 entries you loaded in, how many entries have data?
{--
>>> length (pruneDictionary dics)
4128
--}
You can’t perform that action at this time.