Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

982 error this iri is already being processed #1697

Merged
merged 18 commits into from
Jul 1, 2016

Conversation

eugenk
Copy link
Member

@eugenk eugenk commented May 30, 2016

This is mostly a refactoring. It shall finally fix #982. It

  • introduces a more sophisticated Semaphore that can also lock and unlock in separate methods.
  • splits the Redis namespaces between the environments (production, development, test)
  • shares the Redis connection between Sidekiq and the Semaphore. Other services can easily be included to use the same connection as well.
  • all Workers are now implemented as intended by Sidekiq
  • changes completely the handling of parsing multiple ontologies in parallel: We don't have many complex workers that handle parallelity themselves. Instead, we use the Semaphore on the ontology Loc/Id

I will not merge this before #1618 is merged, because both change some things with the Worker classes and it should be way easier to rebase this on #1618 + staging.

Things to do:

  • merge Improve locId #1618
  • rebase on staging after that merge
  • merge this into develop and deploy on develop.ontohub.org.

@@ -0,0 +1,21 @@
class SequentialOntologyParsingWorker < OntologyParsingWorker
MUTEX_KEY = "#{self}-evaluating"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Freeze mutable objects assigned to constants.

@eugenk eugenk force-pushed the 982-error_this_iri_is_already_being_processed branch 2 times, most recently from d2b96c2 to c5c2332 Compare May 30, 2016 14:01
yield
else
SEMAPHORES_NAMESPACE =
"#{Settings.redis.namespace}:#{self.to_s.downcase}".freeze

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant self detected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave this one in there to make sure one understands that this is the class name

@ebolloff
Copy link
Member

I tested it like this: Shut down Sidekiq, create a fork from a repo, add a URL map and start sidekiq again. With your test repo test-import-dol everything is fine but with test-import-clif I still get this error with all ontologies failed: target ontology not found: /iri2/o1//o1.

@eugenk
Copy link
Member Author

eugenk commented Jun 27, 2016

Then this is an error of the ontology and/or ontology parsing. It's not related to this pull request. Please try mirroring a repository with some imports on develop.ontohub.org instead of doing it locally. There, we have true parallelity.

@ebolloff
Copy link
Member

👍

After the expiration, the semaphore is unlocked. This allows dead
processes (e.g. by kill -9) to release the lock again.
eugenk added 15 commits July 1, 2016 12:13
* Put it into the same redis-namespace as sidekiq.
* Add method to check if a lock is closed.
* Namespace semaphores even deeper.

Checking if a lock is closed is not an atomic operation. To make it
atomic, we put the check into a mutexed part. In order to really exclude
the locking and unlocking from this mutexed check, these two operations
must be mutexed as well with the same key (LOCK_ACTION_KEY).

Unfortunately, using many threads/processes in an example makes the test
suite hang. This is because of celluloid. Maybe we can unskip the
example after upgrading to sidekiq 4 and dropping sidetiq.
When all connections are in use, any Redis call will block until a
connection is freed.

Sidekiq is now namespaced deeper in order to prevent conflicts with the
semaphore.
Now, all comes down to the new OntologyParsingWorker. It performs on a
queue of triples containing the version id, options for the version and
the try_count (number of the next parsing attempt).

If parsing a version fails, it is put back to the queue (at the end),
with an incremented try_count. If this happens too often for the same
version, it is given to the SequentialOntologyParsingWorker that only
parses one OntologyVersion at a time.

The OntologyParsingPriorityWorker is a more elegant way to use the
priotitized queue.

All the background parsing is now performed via the
OntologyParsingWorker or its subclasses.
@eugenk eugenk force-pushed the 982-error_this_iri_is_already_being_processed branch from c5c2332 to 4b24157 Compare July 1, 2016 10:13
@eugenk
Copy link
Member Author

eugenk commented Jul 1, 2016

The commits are all "new" because I rebased on staging after merging #1618. There were conflicts, because two worker files that were corrected here were deleted on staging. They also should be deleted because the whole parsing worker structure has changed.

These were the files:

  • spec/lib/ontology_batch_parse_worker_spec.rb
  • spec/lib/worker_spec.rb.

@eugenk
Copy link
Member Author

eugenk commented Jul 1, 2016

About failing tests: I think we can ignore jenkins for now because it is being restructured anyway. Travis and development-machine-local tests should be green, though.

@eugenk eugenk merged commit e8ec59d into staging Jul 1, 2016
@eugenk eugenk deleted the 982-error_this_iri_is_already_being_processed branch July 1, 2016 11:31
tillmo pushed a commit that referenced this pull request Jan 27, 2017
* Add expiration parameter to Semaphore

After the expiration, the semaphore is unlocked. This allows dead
processes (e.g. by kill -9) to release the lock again.

* Namespace the redis semaphore at ontohub:semaphore.

* Code style.

* Code style: use class << self block.

* Add concurrency testing gem.

* Completely revamp Semaphore

* Put it into the same redis-namespace as sidekiq.
* Add method to check if a lock is closed.
* Namespace semaphores even deeper.

Checking if a lock is closed is not an atomic operation. To make it
atomic, we put the check into a mutexed part. In order to really exclude
the locking and unlocking from this mutexed check, these two operations
must be mutexed as well with the same key (LOCK_ACTION_KEY).

Unfortunately, using many threads/processes in an example makes the test
suite hang. This is because of celluloid. Maybe we can unskip the
example after upgrading to sidekiq 4 and dropping sidetiq.

* Add settings keys for Redis.

* Use redis namespace setting for Semaphore.

* Share redis connections between Sidekiq and Semaphore

When all connections are in use, any Redis call will block until a
connection is freed.

Sidekiq is now namespaced deeper in order to prevent conflicts with the
semaphore.

* Fix locked? and perform_exclusively signatures.

* Add lock and unlock methods.

* Use separate redis namespaces for our environments.

* Use Semaphore only if not testing

* Use Semaphore when parsing an Ontology

* Remove not needed method perform_async_on_queue.

* Refactor all the Worker classes for parsing

Now, all comes down to the new OntologyParsingWorker. It performs on a
queue of triples containing the version id, options for the version and
the try_count (number of the next parsing attempt).

If parsing a version fails, it is put back to the queue (at the end),
with an incremented try_count. If this happens too often for the same
version, it is given to the SequentialOntologyParsingWorker that only
parses one OntologyVersion at a time.

The OntologyParsingPriorityWorker is a more elegant way to use the
priotitized queue.

All the background parsing is now performed via the
OntologyParsingWorker or its subclasses.

* Be explicit about the namespace.

* Obey Hound.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error "This iri <...> is already being processed"
4 participants