Permalink
Browse files

The story of un-mergesort

  • Loading branch information...
1 parent fcb762f commit 1b4f275a572efcf57a83e5cfb54b749a8e48efbc @DanBurton DanBurton committed Jan 30, 2012
Showing with 223 additions and 0 deletions.
  1. +223 −0 Literate Haskell/recovery.lhs
View
223 Literate Haskell/recovery.lhs
@@ -0,0 +1,223 @@
+ One of the Facebook Hacker Cup (2012) challenges in Round 1
+was, given a list's length "n", and a series of "choices" taken by
+a typical merge sort algorithm, discover the original list.
+You additionally make the assumption that the original list
+was a permutation of [1 .. n].
+
+> {-# LANGUAGE BangPatterns #-}
+
+> import Control.Applicative
+> import Control.Monad
+> import Control.Monad.Writer.Lazy
+> import Control.Monad.State.Lazy
+> import System.IO
+> import Text.Printf (printf)
+
+ I started out by playing around with the idea of a "monadic"
+merge sort, a la sortBy. The difference is that the comparator
+you pass in is allowed to perform an arbitrary monadic effect.
+I gave this concept a type synonym, for clarity. An MComparator
+is a function that takes in two things of the same type,
+and then produces a monadic action that yields an Ordering.
+
+> type MComparator m a = a -> a -> m Ordering
+
+ Writing sortByM was trivial. In order to get the correct answer,
+I had to mimick the merge sort pseudocode provided by Facebook,
+which was pretty standard. The only thing that makes this Haskell
+code strange is that it is necessarily monadic, since it permits
+the comparator to execute arbitrary monadic effects.
+
+> sortByM :: (Monad m, Functor m) => MComparator m a -> [a] -> m [a]
+> sortByM cmp [] = return []
+> sortByM cmp [x] = return [x]
+> sortByM cmp xs = do
+> let (ys, zs) = partition xs
+> ys' <- sortByM cmp ys
+> zs' <- sortByM cmp zs
+> merge ys' zs'
+> where merge [] bs = return bs
+> merge as [] = return as
+> merge (a:as) (b:bs) = do
+> comparison <- cmp a b
+> case comparison of
+> LT -> (a:) <$> merge as (b:bs)
+> _ -> (b:) <$> merge (a:as) bs
+> partition xs = splitAt (length xs `quot` 2) xs
+
+ Now then, we have a lovely higher-order function. Let's play!
+We need some MComparators to feed to this baby, and I immediately
+had two ideas.
+
+1. Mimick the Facebook code, and record the choices made by
+ a regular comparison sort. This simply *screams* Writer monad.
+
+> spewCmp :: (a -> a -> Ordering) -> MComparator (Writer [Choice]) a
+> spewCmp cmp x y = case cmp x y of
+> LT -> tell [LeftFirst] >> return LT
+> x -> tell [RightFirst] >> return x
+
+2. Completely ignore the inputs, and perform the "sort"
+ based on a given list of choices.
+
+> givenCmp :: MComparator (State [Choice]) a
+> givenCmp x y = do
+> (c:cs) <- get
+> put cs
+> return $ toOrdering c
+
+ The State monad makes this really elegant.
+Notice how, with this approach, I don't have to tell the sort algorithm
+how many choices to take from the list when it enters a sub-branch.
+Since it all runs in the same state monad, it Just Works.
+
+ givenCmp isn't the safest function (what happens when you run out
+of choices?) but hey, I'm just playing around. As a side note,
+I made a simple data structure to represent a "choice"
+made by the merge sort algorithm, which you may have noticed
+in the code above. Nothing too special.
+
+> data Choice = LeftFirst | RightFirst deriving (Eq)
+
+> instance Show Choice where
+> show c = [fromChoice c]
+
+> fromChoice LeftFirst = '1'
+> fromChoice RightFirst = '2'
+
+> toChoice '1' = LeftFirst
+> toChoice '2' = RightFirst
+
+> toOrdering LeftFirst = LT
+> toOrdering RightFirst = GT
+
+ Still not too serious yet, I figured I'd try asserting things
+about how these two different monadic sorts behave.
+
+ If you record the steps of a regular compare sort,
+and then perform a "sort" immitating those steps,
+you will get the same result both times.
+
+ Also, you will consume exactly the same number of choices,
+so the end state of the latter computation will be
+an empty list of choices.
+
+> prop_sorty :: [Int] -> Bool
+> prop_sorty xs = null s && xs' == xs''
+> where (xs', log) = runWriter $ sortByM (spewCmp compare) xs
+> (xs'', s) = flip runState log $ sortByM givenCmp xs
+
+ Well this was all good fun, but it's about time I started
+actually solving the problem. At first, I tried to read the
+specified list of choices backwards, and reconstruct the original
+list by "unsorting" the sorted list. This lead to much pain
+and agony.
+
+ Then, I had an epiphany. I already had all the tools I needed
+to solve this problem in a very elegant way. Let's define some
+helpers to illustrate more clearly: these are just convenience
+wrappers around the cruft of running Writer and State.
+
+> -- given a list of merge choices, runs those mergesort steps on a list
+> runChoices :: [Choice] -> [a] -> [a]
+> runChoices cs = flip evalState cs . sortByM givenCmp
+
+> -- sorts a list the usual way, making note of the choices made at each step
+> recordChoices :: Ord a => [a] -> [Choice]
+> recordChoices = execWriter . sortByM (spewCmp compare)
+
+ Note that runChoices doesn't even require type "a" to have an Ord instance!
+Also note that we discard the uninteresting parts of running State
+and Writer by using evalState (keep result, discard state)
+and execWriter (keep log, discard result) respectively.
+
+ Now consider one particular example given by Facebook:
+n = 4, choices = 12212, solution = [2,4,3,1]. Let's play around
+in ghci a little bit. Recall that in ghci,
+"it" refers to the result of the previous query.
+
+ghci> recordChoices [2,4,3,1]
+[1,2,2,1,2]
+ghci> runChoices it [1,2,3,4]
+[4,1,3,2]
+
+ There are a couple things to notice about this.
+First of all, the mapping. Compare [2,4,3,1] to [1,2,3,4].
+2 maps to 1, 4 maps to 2, 3 maps to 3, and 1 maps to 4.
+Now compare [1,2,3,4] with [4,1,3,2].
+1 maps to 4, 2 maps to 1, 3 maps to 3, 4 maps to 2.
+Notice anything? It's the *same mapping*.
+Given the mapping from [1,2,3,4] to [4,1,3,2],
+we should be able to reverse it, and then apply the
+reverse map on [1,2,3,4] in order to recover the original!
+
+ But I actually solved it slightly differently.
+You see, [4,1,3,2] can be transformed back into [1,2,3,4]
+by sorting it, right? We arrived at [4,1,3,2] by applying
+the choices of sorting [2,4,3,1] onto [1,2,3,4].
+So moving from [1,2,3,4] to [2,4,3,1] *represents*
+moving from the original list to [1,2,3,4],
+since we made the *same choices* in both cases. So now
+what about moving back?
+
+ If we record the choices made moving back from
+[4,1,3,2] to [1,2,3,4] (which can be done by sorting!)
+then the *same choices* can be used to "move back"
+from [1,2,3,4] to the original list!
+
+> solvePerm :: Int -> [Choice] -> [Int]
+> solvePerm n cs = runChoices (recordChoices (runChoices cs xs)) xs
+> where xs = [1 .. n]
+
+ Of course, we *must* write a quickCheck property to make it legit.
+If you record the steps of a regular compare sort,
+then you can use solvePerm to get the "correct answer".
+To prove it is correct, you record the choices of compare sort on *that*
+and observe that they are the same.
+
+> prop_solved :: [Int] -> Bool
+> prop_solved xs = cs == cs'
+> where cs = recordChoices xs
+> cs' = recordChoices $ solvePerm (length xs) cs
+
+ The particular way I wrote this skirts around an issue
+with using quickCheck: we should actually be generating [Int]
+such that it is a permutation of [1 .. n]. I avoid caring
+about this detail by taking a list of choices to be a sort of
+"signature" for a given permutation. Thus, any random list
+"represents" a given permutation, because it produces
+a particular list of choices.
+
+ Well, there you have it. The rest of the problem is just boilerplate,
+and if any of you are using Haskell for the Facebook Hacker Cup,
+then feel free to steal this main method boilerplate, which I have used
+with only light modification on pretty much every problem:
+
+> inFileName = "recovery-in.txt"
+> outFileName = "recovery-out.txt"
+
+> readInt :: String -> Int
+> readInt = read
+
+> zipSelf :: [a] -> [(a,a)]
+> zipSelf (x:y:xs) = (x,y) : zipSelf xs
+> zipSelf _ = []
+
+> main = do
+> contents <- words <$> readFile inFileName
+> outFile <- openFile outFileName WriteMode
+> let n = readInt (head contents)
+> inputs = zipSelf $ tail contents
+> forM_ ([1 .. n] `zip` inputs) $ \(c, (num, choices)) -> do
+> let ans = checksum . solvePerm (readInt num) . map toChoice $ choices
+> hPutStrLn outFile $ printf "Case #%d: %d" c ans
+> hClose outFile
+
+ Minor detail, the checksum, as specified by Facebook for this problem.
+Can be written more succinctly with foldl'.
+
+> checksum :: [Int] -> Int
+> checksum = go 1
+> where go !res [] = res
+> go !res (x:xs) = go ((31 * res + x) `rem` 1000003) xs
+

0 comments on commit 1b4f275

Please sign in to comment.