Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PriorityQueue implementation is quadratic for our purposes - __contains__ does linear search #31

Open
GoogleCodeExporter opened this issue Mar 16, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?

1. Run best_first_graph_search() which implements frontier as a  PriorityQueue 
as defined in utils.py

uniform_cost_search() also uses best_first_graph_search() so it demonstrates 
the same problem.

What is the expected output? What do you see instead?

I expect the "in" operator of PriorityQueue to operate in O(n) time, as 
explained on AIMA page 84: "The data structure for frontier needs to support 
efficient membership testing, so it should combine the capabilities of a 
priority queue and a hash table."

It is used e.g. in best_first_graph_search: "child not in frontier".

Instead, when the frontier gets big (thousands of Nodes) it runs like a dog.  
ipython's %prun is good for testing this, pointing out over ten million 
executions of three functions.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 14715773   19.519    0.000   38.611    0.000 utils.py:747(<lambda>)
 17112459   18.401    0.000   21.727    0.000 search.py:105(__eq__)
    13780    9.214    0.001   47.825    0.003 utils.py:338(some)
 17121581    3.334    0.000    3.334    0.000 {isinstance}
     4660    2.314    0.000    4.942    0.001 utils.py:748(__getitem__)

The reason is clear when we look at the code and note this member function of 
class PriorityQueue:

    def __contains__(self, item):
        return some(lambda (_, x): x == item, self.A)

What version of the product are you using? On what operating system?
svn revision r201

Please provide any additional information below.

The Python Queue.PriorityQueue class might be a good thing to build on.

Original issue reported on code.google.com by neal...@gmail.com on 19 Jul 2012 at 5:07

@GoogleCodeExporter
Copy link
Author

the Queue.PriorityQueue class is appropriate when you have to handle parallel 
access to the queue (in multi-threads programs). For our purpose, I think the 
heapq module is preferable.

Original comment by glewe...@gmail.com on 20 Jul 2012 at 3:40

@GoogleCodeExporter
Copy link
Author

I've written a fix to make the function PriorityQueue.__contains__() run in 
constant time, rather than being linear in the number of items on the queue.  
It simply adds a hash table for membership testing.   I've pushed it to a 
branch on github:

 https://github.com/nealmcb/aima-python/tree/fasterPriorityQueue

It sped up my solution for the second problem in ai-class.org unit 22 "Optional 
NLP Programming" by a huge factor.

Note that __getitem__() and __delitem__() are still linear time, but could 
easily be improved also if necessary.  But in my current usage they aren't 
hotspots.

I didn't switch to use the standard hashq or Queue.PriorityQueue modules since 
they don't provide a constant-time membership test, but it still would seem 
better to use one of them.

My fix also includes another commit which improves the doctests a bit.  They 
now test membership after pops.

A patch is attached.

Original comment by neal...@gmail.com on 30 Jul 2012 at 4:25

Attachments:

@GoogleCodeExporter
Copy link
Author

Looking up the hash of a key is incorrect because different keys may have the 
same hash, right? I think this could be fixed to store the actual keys, though.

Original comment by wit...@gmail.com on 20 May 2013 at 11:07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant