Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean DB #126

Closed
imidoriya opened this issue Mar 7, 2020 · 10 comments · Fixed by #153
Closed

Clean DB #126

imidoriya opened this issue Mar 7, 2020 · 10 comments · Fixed by #153

Comments

@imidoriya
Copy link
Collaborator

As I queue jobs and complete them, the database doesn't decrease in size that I can tell in my tests. I just appears to grow and grow, saving the history. Once a job is complete, I want it gone as my jobs are large in size and I don't want it to store any anything. How do I keep the DB clean so it only contains pending jobs, not historical ones?

peter-wangxu added a commit that referenced this issue Mar 9, 2020
peter-wangxu added a commit that referenced this issue Mar 9, 2020
peter-wangxu added a commit that referenced this issue Mar 9, 2020
@peter-wangxu
Copy link
Owner

@imidoriya This feature is provided in master branch, give a try and kind me let know the result

Thanks
Peter

@imidoriya
Copy link
Collaborator Author

I wasn't able to see any shrinkage with my test. I submitted several jobs where the db-wal increased in size up to 2.2MB and it just stayed there after all jobs were completed / acked and clear_acked_data and shrink_disk_usage were run. So while I was able to test it fine, it didn't appear to do anything in my test. I didn't see any errors reported.

Opening up the data.db, I still see 10 jobs which appear to have an Ack flag with a status of 5. So it looks like the clear_acked_data may not be working as I expect it. Looking the code, might it be the LIMIT 1000 OFFSET {max_acked_length}? Would this mean that it's actually querying for 1001-2001? I tried to set q._MAX_ACKED_LENGTH = 0, but didn't appear to work.

@imidoriya
Copy link
Collaborator Author

Actually, setting q._MAX_ACKED_LENGTH = 0 did appear to fix clear_acked_data but shrink_disk_usage didn't decrease the db size, even though it's now empty.

@imidoriya
Copy link
Collaborator Author

Database is currently at 167MB for the data.db and 50MB for the data.db-wal. The after clear_acked _data, the shrink_disk_usage didn't change the size.

@peter-wangxu
Copy link
Owner

@imidoriya sorry for the late response...
the implementation of clear_acked_data is somehow weird as it only marks 1000 items each time, so the followed shrink_disk_usage may not shrink the file size dramatically.

I will follow up by a new pr to change the behavior of clear_acked_data

@peter-wangxu peter-wangxu reopened this Apr 11, 2020
@peter-wangxu
Copy link
Owner

with the new change on my local, I can see the db size change, here is the sample code

In [1]: from persistqueue import SQLiteAckQueue

In [2]: q = SQLiteAckQueue(path='data')

In [3]: s = "Database is currently at 167MB for the data.db and 50MB for the data.db-wal. The after clear_acked _data, the shrink_disk_usage didn't change the
   ...: size. %d"

In [4]: for x in range(200000):
   ...:     q.put(s % x)
   ...:     if x % 5000 == 0:
   ...:         print("done %d" % x)
   ...:
   
In [5]: for x in range(100000):
   ...:     item = q.get()
   ...:     q.ack(item)
   ...:     if x % 5000 == 0:
   ...:         print("ack %d" % x)

In [6]: ls -lh data

In [7]: q.clear_acked_data()

In [8]: q.shrink_disk_usage()

In [9]: del q
In [10]: ls -lh data/

@imidoriya
Copy link
Collaborator Author

Excellent - thank you :)

@imidoriya
Copy link
Collaborator Author

So this is odd - I had shrink_disk_usage disabled for a while and I just turned it back on. I have the job running in a thread as documented here. After it runs the shrink_disk_usage, the jobs no longer run. It appears I can add new items to the queue, but maybe the loop gets closed as the jobs themselves no longer run.

@imidoriya
Copy link
Collaborator Author

imidoriya commented Jun 12, 2020

I wonder if having multiple threads is preventing the vacuum from working properly and it just sort of hangs it. Do you have any examples of maybe an asyncio loop where it would do multiple jobs async in a single thread - curious if that would work better than trying to use sqlite in a thread safe way.

Ultimately, my goal is to have a few queued tasks being run at any one moment be it with multiple threads or some async method - whatever works best.

@cnrmck
Copy link

cnrmck commented Dec 31, 2020

Any update on this, I am also running into a persistently growing db and I need to run async?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants