New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple (>60) python processes at "runtask run_digest_update" #620

Closed
zeehio opened this Issue Sep 28, 2012 · 9 comments

Comments

Projects
None yet
2 participants
@zeehio
Copy link
Contributor

zeehio commented Sep 28, 2012

I found an issue with the metashare 2.9beta installation:
After some days of running metashare, I find running 70 or more processes like this:
meta1 16298 2.2 0.0 27336 688 ? Ds 00:00 23:29 /home/meta1/MetaNode/META-SHARE-v2.9-beta/opt/bin/python /home/meta1/MetaNode/META-SHARE-v2.9-beta/metashare/manage.py runtask run_digest_update --settings=metashare.settings

Here is the output of : "ps aux | grep python"
http://pastebin.com/8dCUC1vz

I have no idea of what to do to solve this problem, a part of restarting the metashare service every few days.

Any hint will be appreciated.

@zeehio

This comment has been minimized.

Copy link
Contributor Author

zeehio commented Sep 28, 2012

Also:
[meta1@d5meta metashare]$ crontab -l
12 4 * * 1 /home/meta1/MetaNode/META-SHARE-v2.9-beta/opt/bin/python /home/meta1/MetaNode/META-SHARE-v2.9-beta/metashare/manage.py runtask run_session_cleanup --settings=metashare.settings
12 5 * * * /home/meta1/MetaNode/META-SHARE-v2.9-beta/opt/bin/python /home/meta1/MetaNode/META-SHARE-v2.9-beta/metashare/manage.py runtask run_account_registration_request_cleanup --settings=metashare.settings
25 * * * * /home/meta1/MetaNode/META-SHARE-v2.9-beta/opt/bin/python /home/meta1/MetaNode/META-SHARE-v2.9-beta/metashare/manage.py runtask run_synchronization --settings=metashare.settings

  • */12 * * * /home/meta1/MetaNode/META-SHARE-v2.9-beta/opt/bin/python /home/meta1/MetaNode/META-SHARE-v2.9-beta/metashare/manage.py runtask run_digest_update --settings=metashare.settings
@cspurk

This comment has been minimized.

Copy link
Member

cspurk commented Nov 8, 2012

@zeehio Thank you for this report and sorry for the long silence. Do you still face this problem with the current META-SHARE version 3.0? We had faced the same problem with an early v2.9-beta here at DFKI but with the current version I seem to not be able to reproduce it anymore.

@zeehio

This comment has been minimized.

Copy link
Contributor Author

zeehio commented Nov 9, 2012

Thanks for replying. I updated this week to 3.0. I have not suffered this issue again so I suggest to close it for now, and I will reopen again if it comes back.

@zeehio zeehio closed this Nov 9, 2012

@ghost ghost assigned cspurk Nov 12, 2012

@zeehio

This comment has been minimized.

Copy link
Contributor Author

zeehio commented Nov 13, 2012

It happened again with MetaShare 3.0. It turns out that MetaShare cron jobs spam my email account, which is great to see some logs and some errors:

From root@metashare.talp.cat  Sat Nov 10 05:12:02 2012
Return-Path: <root@metashare.talp.cat>
X-Original-To: meta1
Delivered-To: meta1@metashare.talp.cat
From: root@metashare.talp.cat (Cron Daemon)
To: meta1@metashare.talp.cat
Subject: Cron <meta1@d5meta> /home/meta1/MetaNode/META-SHARE-v3.0/opt/bin/python /home/meta1/MetaNode/META-SHARE-v3.0/metashare/manage.py runtask run_account_registration_request_cleanup --settings=metashare.settings
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/meta1>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=meta1>
X-Cron-Env: <USER=meta1>
Date: Sat, 10 Nov 2012 05:12:02 +0100 (CET)
Status: R

[11/10/2012 @ 05:12:02] metashare.cron::INFO Will now clean up the account registration request database.
Traceback (most recent call last):
  File "/home/meta1/MetaNode/META-SHARE-v3.0/metashare/manage.py", line 40, in <module>
    execute_manager(settings)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/core/management/__init__.py", line 438, in execute_manager
    utility.execute()
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/core/management/__init__.py", line 379, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/core/management/base.py", line 191, in run_from_argv
self.execute(*args, **options.__dict__)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/core/management/base.py", line 220, in execute
output = self.handle(*args, **options)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/kronos/management/commands/runtask.py", line 14, in handle
return task()
  File "/home/meta1/MetaNode/META-SHARE-v3.0/metashare/cron.py", line 29, in run_account_registration_request_cleanup
created_lt=(datetime.now() - timedelta(days=3))).delete()
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/db/models/manager.py", line 141, in filter
return self.get_query_set().filter(*args, **kwargs)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/db/models/query.py", line 550, in filter
return self._filter_or_exclude(False, *args, **kwargs)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/db/models/query.py", line 568, in _filter_or_exclude
clone.query.add_q(Q(*args, **kwargs))
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1194, in add_q
can_reuse=used_aliases, force_having=force_having)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1069, in add_filter
negate=negate, process_extras=process_extras)
  File "/home/meta1/MetaNode/META-SHARE-v3.0/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1260, in setup_joins
"Choices are: %s" % (name, ", ".join(names)))
django.core.exceptions.FieldError: Cannot resolve keyword 'created_lt' into field. Choices are: created, id, user, uuid

Anyway, I am not sure that this is the error that causes all the problems. We have a big resource (150GB) and if we edit its metadata through the web interface, when we click "save", it seems to take a long time processing something... I don't know... for the time being I will restart the python process from time to time...

I am open for trying things and providing more information

@zeehio zeehio reopened this Nov 13, 2012

@cspurk

This comment has been minimized.

Copy link
Member

cspurk commented Nov 13, 2012

django.core.exceptions.FieldError: Cannot resolve keyword 'created_lt' into field. Choices are: created, id, user, uuid

That’s issue #675 which is already fixed in the development version.

We have a big resource (150GB) and if we edit its metadata through the web interface, when we click "save", it seems to take a long time processing something... I don't know... for the time being I will restart the python process from time to time...

The long processing you are seeing is probably a checksum (re)computation for the big resource. I was already suspecting such a thing. It’s probably that the periodic run_digest_update process is called multiple times in parallel because the previous runs have not completed, yet (due to the pending checksum computation). That appears to confuse the system then. While this can certainly be fixed, I unfortunately don’t see any quickly applicable and minimally invasive solution to this at the moment. Suggestions are welcome, though.

@zeehio

This comment has been minimized.

Copy link
Contributor Author

zeehio commented Nov 13, 2012

I've been thinking a bit about it... If I may ask... What is the checksum used for?

If the checksum is used to see if the file has changed, probably just computing the checksum using the first and the last 16MB of the file would be enough.

According to Wikipedia, zip [1], gz [2] and bz2 [3] files include checksums of the compressed files in the header and footer of the file. Given that only compressed files are allowed, I believe that we can (quite safely) assume that a change in the contents of a file will necessarily change the header or the footer.

I believe this suggestion would allow MetaShare to scale properly with larger resources.

Feel free to criticise this suggestion :-)

References:
[1] http://en.wikipedia.org/wiki/Zip_%28file_format%29#File_headers
[2] http://en.wikipedia.org/wiki/Gzip
[3] http://en.wikipedia.org/wiki/Bzip2#File_format

@cspurk cspurk closed this in 54cf1ce Nov 15, 2012

@ghost ghost assigned cspurk Nov 15, 2012

@cspurk

This comment has been minimized.

Copy link
Member

cspurk commented Nov 15, 2012

I've been thinking a bit about it... If I may ask... What is the checksum used for?

I have thought about the same question in the meantime: it appears that the checksum has not really been used anywhere. The whole checksum code is basically just historic … Therefore I have now removed almost all checksum computations. Thank you for the suggestion to review this whole code.

@zeehio Can you perhaps test if the latest development version still has the cron problem? Hopefully this will also solve the problem that ELDA currently cannot harvest from your node.

@zeehio

This comment has been minimized.

Copy link
Contributor Author

zeehio commented Nov 15, 2012

@cspurk Currently I am testing the 3.0 version with two patches:

I have tested "Edit Resource", edit the resource description and then clicking on "Save" (which I believe caused checksum recalculation in the past) and now it is not recalculated. :-)

I have manually run python manage.py runtask run_digest_update --settings=metashare.settings (I believe it was the cron task that took a long time) and it has been really fast.

[11/15/2012 @ 11:50:36] metashare.storage.models::INFO Starting to update digests.
[11/15/2012 @ 11:50:36] metashare.storage.models::INFO 3c1e558eacc311e19349000e0c4ad2267dc86179a3294707ab3791b06e45df1e is up to date
[11/15/2012 @ 11:50:36] metashare.storage.models::INFO 7f7c2c78acc011e19d89000e0c4ad22601456efdee7046ff95eb5610bb14486b is up to date
[11/15/2012 @ 11:50:36] metashare.storage.models::INFO 14ebffa4a45111e1975c000e0c4ad226d1e09a60a0954acaad14ff56f24b18a7 is up to date
[11/15/2012 @ 11:50:36] metashare.storage.models::INFO updating 89c8adbcacbd11e19349000e0c4ad2268249f28bb5bb4c02a3c251375d1f6b62
[11/15/2012 @ 11:50:37] metashare.repository.search_indexes::INFO Published resource #5 scheduled for reindexing.
[11/15/2012 @ 11:50:38] pysolr::INFO Finished 'http://127.0.0.1:8983/solr/main/update/?commit=true' (POST) with body '<add><doc>' in 0.158 seconds.
[11/15/2012 @ 11:50:38] metashare.storage.models::INFO updating 91113c84c99a11e18127000e0c4ad2269376f0554bc74d9787c526db63f83c4b
[11/15/2012 @ 11:50:38] metashare.repository.search_indexes::INFO Published resource #4 scheduled for reindexing.
[11/15/2012 @ 11:50:39] pysolr::INFO Finished 'http://127.0.0.1:8983/solr/main/update/?commit=true' (POST) with body '<add><doc>' in 0.152 seconds.
[11/15/2012 @ 11:50:39] metashare.storage.models::INFO updating 3ccd9728c75911e1ae69000e0c4ad2262c2dadbb610b4c7a85dbaf4bb172e8b4
[11/15/2012 @ 11:50:40] metashare.repository.search_indexes::INFO Published resource #8 scheduled for reindexing.
[11/15/2012 @ 11:50:41] pysolr::INFO Finished 'http://127.0.0.1:8983/solr/main/update/?commit=true' (POST) with body '<add><doc>' in 0.172 seconds.
[11/15/2012 @ 11:50:41] metashare.storage.models::INFO updating b189e4a6a2c211e19ae9000e0c4ad2262bb2a705e742487295c2c0965a0b77f8
[11/15/2012 @ 11:50:41] metashare.repository.search_indexes::INFO Published resource #7 scheduled for reindexing.
[11/15/2012 @ 11:50:42] pysolr::INFO Finished 'http://127.0.0.1:8983/solr/main/update/?commit=true' (POST) with body '<add><doc>' in 0.165 seconds.
[11/15/2012 @ 11:50:42] metashare.storage.models::INFO updating 091eb260acae11e1bcc8000e0c4ad226ba64b1c22bcf4000a3b55843ca3ab532
[11/15/2012 @ 11:50:42] metashare.repository.search_indexes::INFO Published resource #6 scheduled for reindexing.
[11/15/2012 @ 11:50:44] pysolr::INFO Finished 'http://127.0.0.1:8983/solr/main/update/?commit=true' (POST) with body '<add><doc>' in 0.152 seconds.
[11/15/2012 @ 11:50:44] metashare.storage.models::INFO Finished updating digests 

And running it again:

[11/15/2012 @ 11:52:52] metashare.storage.models::INFO Starting to update digests.
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO 3c1e558eacc311e19349000e0c4ad2267dc86179a3294707ab3791b06e45df1e is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO 7f7c2c78acc011e19d89000e0c4ad22601456efdee7046ff95eb5610bb14486b is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO 14ebffa4a45111e1975c000e0c4ad226d1e09a60a0954acaad14ff56f24b18a7 is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO 89c8adbcacbd11e19349000e0c4ad2268249f28bb5bb4c02a3c251375d1f6b62 is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO 91113c84c99a11e18127000e0c4ad2269376f0554bc74d9787c526db63f83c4b is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO 3ccd9728c75911e1ae69000e0c4ad2262c2dadbb610b4c7a85dbaf4bb172e8b4 is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO b189e4a6a2c211e19ae9000e0c4ad2262bb2a705e742487295c2c0965a0b77f8 is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO 091eb260acae11e1bcc8000e0c4ad226ba64b1c22bcf4000a3b55843ca3ab532 is up to date
[11/15/2012 @ 11:52:52] metashare.storage.models::INFO Finished updating digests.

Therefore I believe it is fixed. If the same problem appeared again (I don't think so) I will reopen the issue.

If by tomorrow everything is still working fine, I will contact Olivier Hamon from ELDA for syncing (unless someone suggests me to wait for the 3.0.1 release).

@cspurk

This comment has been minimized.

Copy link
Member

cspurk commented Nov 15, 2012

@zeehio Thank you for the quick tests! I assume we can leave the issue closed and consider it as fixed then.

A note to other readers: the harvesting from the ELDA node has successfully worked in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment