You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps will reproduce the problem?
1. Run best_first_graph_search() which implements frontier as a PriorityQueue
as defined in utils.py
uniform_cost_search() also uses best_first_graph_search() so it demonstrates
the same problem.
What is the expected output? What do you see instead?
I expect the "in" operator of PriorityQueue to operate in O(n) time, as
explained on AIMA page 84: "The data structure for frontier needs to support
efficient membership testing, so it should combine the capabilities of a
priority queue and a hash table."
It is used e.g. in best_first_graph_search: "child not in frontier".
Instead, when the frontier gets big (thousands of Nodes) it runs like a dog.
ipython's %prun is good for testing this, pointing out over ten million
executions of three functions.
ncalls tottime percall cumtime percall filename:lineno(function)
14715773 19.519 0.000 38.611 0.000 utils.py:747(<lambda>)
17112459 18.401 0.000 21.727 0.000 search.py:105(__eq__)
13780 9.214 0.001 47.825 0.003 utils.py:338(some)
17121581 3.334 0.000 3.334 0.000 {isinstance}
4660 2.314 0.000 4.942 0.001 utils.py:748(__getitem__)
The reason is clear when we look at the code and note this member function of
class PriorityQueue:
def __contains__(self, item):
return some(lambda (_, x): x == item, self.A)
What version of the product are you using? On what operating system?
svn revision r201
Please provide any additional information below.
The Python Queue.PriorityQueue class might be a good thing to build on.
Original issue reported on code.google.com by neal...@gmail.com on 19 Jul 2012 at 5:07
The text was updated successfully, but these errors were encountered:
the Queue.PriorityQueue class is appropriate when you have to handle parallel
access to the queue (in multi-threads programs). For our purpose, I think the
heapq module is preferable.
Original comment by glewe...@gmail.com on 20 Jul 2012 at 3:40
I've written a fix to make the function PriorityQueue.__contains__() run in
constant time, rather than being linear in the number of items on the queue.
It simply adds a hash table for membership testing. I've pushed it to a
branch on github:
https://github.com/nealmcb/aima-python/tree/fasterPriorityQueue
It sped up my solution for the second problem in ai-class.org unit 22 "Optional
NLP Programming" by a huge factor.
Note that __getitem__() and __delitem__() are still linear time, but could
easily be improved also if necessary. But in my current usage they aren't
hotspots.
I didn't switch to use the standard hashq or Queue.PriorityQueue modules since
they don't provide a constant-time membership test, but it still would seem
better to use one of them.
My fix also includes another commit which improves the doctests a bit. They
now test membership after pops.
A patch is attached.
Original comment by neal...@gmail.com on 30 Jul 2012 at 4:25
Looking up the hash of a key is incorrect because different keys may have the
same hash, right? I think this could be fixed to store the actual keys, though.
Original comment by wit...@gmail.com on 20 May 2013 at 11:07
Original issue reported on code.google.com by
neal...@gmail.com
on 19 Jul 2012 at 5:07The text was updated successfully, but these errors were encountered: