You are planning a big programming conference and have received many proposals which have passed the initial screen process but you're having trouble fitting them into the time constraints of the day - there are so many possibilities! So you write a program to do it for you.
- The conference has multiple tracks each of which has a morning and afternoon session.
- Each session contains multiple talks.
- Morning sessions begin at 9am and must finish by 12 noon, for lunch.
- Afternoon sessions begin at 1pm and must finish in time for the networking event.
- The networking event can start no earlier than 4:00 and no later than 5:00.
- No talk title has numbers in it.
- All talk lengths are either in minutes (not hours) or lightning (5 minutes).
- Presenters will be very punctual; there needs to be no gap between sessions.
Note that depending on how you choose to complete this problem, your solution may give a different ordering or combination of talks into tracks. This is acceptable; you don't need to exactly duplicate the sample output given here.
Writing Fast Tests Against Enterprise Rails 60min
Overdoing it in Python 45min
Lua for the Masses 30min
Ruby Errors from Mismatched Gem Versions 45min
Common Ruby Errors 45min
Rails for Python Developers lightning
Communicating Over Distance 60min
Accounting-Driven Development 45min
Woah 30min
Sit Down and Write 30min
Pair Programming vs Noise 45min
Rails Magic 60min
Ruby on Rails: Why We Should Move On 60min
Clojure Ate Scala (on my project) 45min
Programming in the Boondocks of Seattle 30min
Ruby vs. Clojure for Back-End Development 30min
Ruby on Rails Legacy App Maintenance 60min
A World Without HackerNews 30min
User Interface CSS in Rails Apps 30min
Track 1:
09:00AM Writing Fast Tests Against Enterprise Rails 60min
10:00AM Communicating Over Distance 60min
11:00AM Rails Magic 60min
12:00PM Lunch
01:00PM Ruby on Rails: Why We Should Move On 60min
02:00PM Common Ruby Errors 45min
02:45PM Accounting-Driven Development 45min
03:30PM Pair Programming vs Noise 45min
04:15PM User Interface CSS in Rails Apps 30min
04:45PM Rails for Python Developers lightning
04:50PM Networking Event
Track 2:
09:00AM Ruby on Rails Legacy App Maintenance 60min
10:00AM Overdoing it in Python 45min
10:45AM Ruby Errors from Mismatched Gem Versions 45min
11:30AM Lua for the Masses 30min
12:00PM Lunch
01:00PM Clojure Ate Scala (on my project) 45min
01:45PM Woah 30min
02:15PM Sit Down and Write 30min
02:45PM Programming in the Boondocks of Seattle 30min
03:15PM Ruby vs. Clojure for Back-End Development 30min
03:45PM A World Without HackerNews 30min
04:15PM Networking Event
My first thought was that the solution is similar to the change making problem where typically we have distinct coins and we would like to change money with the least number of coins. I learnt Data Structures and Algorithm at the University so I revised my materials learnt there.
The Greedy method is when we would like to achieve a goal with the presumption that for finding a global minimum or maximum it is enough to find the local minimums or maximums respectively. More specifically: If you feel that making the local optimal choices will yield the global optimal then you don't have to solve all the sub-problem at all. Greedy solutions don't usually give you the correct answer.
For instance:
Value of banknote to change: 6
Available coins (unlimited number we have from each): 1,3,4
Greedy solution start with 4 and then choose 1 twice so it thinks: 4+1+1 is the best answer however the truth is 3+3
For the Conference Tracking problem we can choose first the event which is the longest in time (not longer than the reamining free time) and then do this until we fill up the morning session. With the remaining elements we can do the same for the afternoon session.
My concerns regarding to this approach are:
- Only find one solution at a time (if it finds at all) so we need to do backsteps and cover a coin or a subset of coins in order to use again the Greedy method and find another solution
- For many solutions we need to change the algorithm and play it with the removal of elements so we need to count with the non repeatable permutations problem which requires N! operations, where N=count of events.
A dynamic programming algorithm examines the previously solved subproblems and combine their solutions to give the best solution for the given problem. If we can find the optimal substructure for the problem (literally this means we can solve it with recursion) then we can easily reach the proper result(s). For the coin change problem described above:
Value of banknote to change: 6
Available coins (unlimited number): 1,3,4
T[1]=1 or 3 or 4
T[2]=T[1] + (1 or 3 or 4) | (1 or 3 or 4)
...
T[n]=T[n - 1] + (1 or 3 or 4) | T[n - 2] + (1 or 3 or 4)
Which will give the 3+3 optimal solution
For the Conference Tracking problem here is the proof of optimal substructure:
- For one event it is either part of the solution (one session) or not
- N - 1 events followed by E[N] event or not where N=count of events, E is a list where the Nth element is the Nth Event we can choose
According to the fact that Haskell is purely functional language and it has persistent, recursive data collections I turned to solve the problem with an implementation of Convolution Probabilistic Tree. The root start with the initial case when there is no Event which we use. Then we insert into the tree the first event which should be either part of the solution (left child of root) or not (right child of root). With each newly introduced Event we need to insert two times more element that we have on the deepest level in worst case, which means that the space complexity could be really high: O(2n).
The space complexity can be reduced by the following inclusion of dead-end cases:
- No need to insert any new Node after a Node that contains a solution
- No need to insert any new Node after a Node that already reach overtime (so not fulfills the criteria)
Here is a tree for the following case:
Event list (length):
* Event 1 (30 mins)
* Event 2 (10 mins)
* Event 3 (20 mins)
* Event 4 (10 mins)
Target Session length: 40 mins
After inserting all to the event tree we get:
- As the space complexity (thus the memory usage) could be high we can do the insertion of Events step by step finding the first result only:
findFirst :: Range -> EventTree -> [Event] -> Result
where
type Range = (minimum length of session, maximum length of session) :: (Int, Int)
type Event = (title, length) :: (String, Int)
type Result = (Tree after the first solution found, events for solution) :: (EventTree, [Event])
- After the first N events we can check the solution list with:
findNStep :: Range -> EventTree -> [Event] -> Int -> Result
- Anf last bit not least we can get all the solutions:
findAll :: Range -> EventTree -> [Event] -> Result
See EventTree.hs for the implementations.
- It can be clearly seen that we have a lot of redundancy, so after 4 events inserted the tree contains the No event ([]) Node 5 times (!!), the first Event ([1]) 4 times, the second Event 3 times, the third event and the [2,3] twice. This is called Overlapping subproblems in Dynamic programming and usually solved by reducing the space complexity of an array or a vector of elements which - most of the time - unforunately means the elimination of recursion.
- Owing to the redundancy the insertion of the subsequent steps requires more and more time on the right side which results inbalance in the tree and therefore results difficulty when parallel computation comes into picture as different subtrees requires different work resources.
- with running
./Main ../test-input.txt 19 +RTS -s
123,780,200 bytes allocated in the heap
23,771,296 bytes copied during GC
2,688,928 bytes maximum residency (11 sample(s))
53,968 bytes maximum slop
8 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 228 colls, 0 par 0.02s 0.02s 0.0001s 0.0007s
Gen 1 11 colls, 0 par 0.02s 0.02s 0.0014s 0.0033s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.12s ( 0.16s elapsed)
GC time 0.04s ( 0.04s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.16s ( 0.20s elapsed)
%GC time 23.1% (19.2% elapsed)
Alloc rate 1,021,138,354 bytes per MUT second
Productivity 76.7% of total user, 63.6% of total elapsed
- with running
./Main ../test-input-big.txt 190 +RTS -s
42,423,838,592 bytes allocated in the heap
32,638,120,368 bytes copied during GC
649,109,736 bytes maximum residency (131 sample(s))
7,129,408 bytes maximum slop
1837 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 81291 colls, 0 par 14.50s 14.51s 0.0002s 0.0008s
Gen 1 131 colls, 0 par 17.05s 17.06s 0.1302s 0.6633s
INIT time 0.00s ( 0.00s elapsed)
MUT time 10.53s ( 11.83s elapsed)
GC time 31.55s ( 31.56s elapsed)
EXIT time 0.08s ( 0.08s elapsed)
Total time 42.16s ( 43.47s elapsed)
%GC time 74.8% (72.6% elapsed)
Alloc rate 4,028,626,629 bytes per MUT second
Productivity 25.2% of total user, 24.4% of total elapsed
Improvements and further considerations:
- Eliminate Overlapping subproblems with making a list of solutions - this should mean O(N2) time complexity - and creating a simple result of the event lists. We can specify the solutions can be found in the following links: https://wiki.haskell.org/Dynamic_programming_example and http://jelv.is/blog/Lazy-Dynamic-Programming/
- As the event length is not surely unique therefore it is a good idea to do a classification (groupBy is your friend) based on length and after that based on these classes the dynamic programming solution is easier to apply
- The performance of the EventTree solution can be improved by parallel processing. As we start to build up the tree with two independent subtrees, one of which contains the first event (E1) and the other which does not therefore we can build up them through two independent but parallel processes. We can do this simplification on a lower level which requires 2n cores on the nth level. Here is a code snippet on the first tree level parallelization:
import Control.Parallel
import EventTree
EventTree = left `par` right `pseq` (aggregate left right)
where
left = findFirst (180, 180) (Node [("Event 1", 30)] 30 EmptyNode EmptyNode) [("Event 2", 10), ...]
right = findFirst (180, 180) (Node [] 0 EmptyNode EmptyNode) [("Event 2", 10), ...]
aggregate (lt, lres) (rt, rres) = (Node [] 0 lt rt, lres ++ rres)