Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate metadata - idempotent #66

Open
atarkowska opened this issue Jan 4, 2017 · 9 comments
Open

Populate metadata - idempotent #66

atarkowska opened this issue Jan 4, 2017 · 9 comments

Comments

@atarkowska
Copy link
Member

atarkowska commented Jan 4, 2017

Populate_metadata plugin allows populate map annotations based on a bulk file with the given context:

# step 1. create bulk file
bin/omero metadata populate --file /path/to/annotation.csv Screen:ID
# step 2. populate maps
bin/omero metadata populate --context bulkmap --cfg /path/to/bulkmap-config.yml Screen:ID

Second step could be idempotent, rerunning the same command bin/omero metadata populate --context bulkmap --cfg /path/to/bulkmap-config.yml Screen:ID shouldn't thrown any exceptions.

cc @manics @joshmoore @eleanorwilliams

@manics
Copy link
Member

manics commented Jan 4, 2017

Issues to consider:

  • If entries in bulkmap-config.yml are changed or deleted (but not added) should the annotations be modified/deleted/recreated?
  • What happens if the primary key definition in one config file is changed? Ideally the primary keys should be defined in a global file but is there a short-term workaround? If we do implement global primary keys should it be possible to change the definition at all?

@atarkowska
Copy link
Member Author

I am not sure at the moment about updates, this is much more complicated. Keep in mind that annotations are shared and if PK is updated then this will be completely new annotation.

@atarkowska
Copy link
Member Author

atarkowska commented Jan 4, 2017

One possible solution for update could be that if namespace is considered unique per Image/Well then script could proceed as normal, deleting/unlinking (see delete issues in #67) and creating new one

@manics
Copy link
Member

manics commented Jan 4, 2017

If the requirement is to only create an annotation for namespace 'xxxx' if an image/well doesn't already have an annotation in xxxx regardless of whether it's up to date or not that should be easier to implement.

@atarkowska atarkowska changed the title Populate metadata idempotent Populate metadata - idempotent Jan 4, 2017
@atarkowska
Copy link
Member Author

atarkowska commented Jan 4, 2017

So going more to the details

Use case 1: general o.org/omero/bulk_annotations

Modifying config as follow:

---
columns:
  - name: Channels
     include: yes
+ - name: Comment
+    include: yes

all of the existing openmicroscopy.org/omero/bulk_annotation would need to be reviewed and updated with the respect of omitempty flag. And the same if column is removed. That is simple usecase as there is always single AnnotationLink

The same use case but PK:

   - group:
       namespace: openmicroscopy.org/mapr/sirna
       columns:
       - name: siRNA Identifier
         include: yes
+        omitempty: no
+      - name: siRNA Identifier
+        clientname: siRNA Pool Identifier
+        clientvalue: ""
+        include: yes
+        omitempty: no

There are 2 cases here:

Use case 2:

Modifying config as follow:

   - group:
       namespace: openmicroscopy.org/mapr/gene
       columns:
       - name: Gene Identifier
         include: yes
+        omitempty: no
       - name: Gene Identifier
         clientname: Gene Identifier URL
         clientvalue: http://www.ensembl.org/id/{{ value|urlencode }}
         include: yes
       - name: Gene Symbol
         include: yes
+        omitempty: no

I think this is simple as it would need to add whatever was missing.

Use case 3:

Modifying config as follow:

---
columns:
  - name: Channels
     include: yes
  - group:
      namespace: openmicroscopy.org/mapr/sirna
      columns:
      - name: siRNA Identifier
        include: yes
+       omitempty: no
+       split: ;

script would need to review and decide if map already exists delete or update? Perhaps depends if this is PK and Map is linked to multiple images or not

 advanced:
     ignore_missing_primary_key: yes
     primary_group_keys:
     - namespace: openmicroscopy.org/mapr/sirna
       keys:
       - siRNA Identifier

The same use case can be analyze in opposite way where split: ; is removed from the config.

Use case 4:

Updating cvs -> bulk_annotations, no changes to the config.

In general I think each map could be reviewed if PK is changing or not:

  • if PK value change, consider unlinking and creating brand new map or update value in the existing one if linked only to one entity
  • if non-PK value, consider unlinking and creating new one?
  • if one AnnotationLink then always delete?

@atarkowska
Copy link
Member Author

atarkowska commented Jan 5, 2017

What happens if the primary key definition in one config file is changed? Ideally the primary keys should be defined in a global file but is there a short-term workaround? If we do implement global primary keys should it be possible to change the definition at all?

Existing implementation doesn't allow to define different PK for the same NS, so it looks like it should be global. It is very likely to get an exception https://github.com/openmicroscopy/openmicroscopy/blob/metadata52/components/tools/OmeroPy/src/omero/util/metadata_mapannotations.py#L218.

Would be sufficient to extend default config by adding primary_group_keys and then inherit it in each individual study?

- default config - minimum predefiend by the plugin
  - global config - any attributes that a common including columns, 
    - study config

We have to have a way of updating PKs otherwise fixing bugs as above won't be possible. 

@eleanorwilliams
Copy link

Two concrete examples of usecase 4 that I have:

1. updating some gene symbols
In this case 5 genes, corresponding to 5 wells need their gene symbols updated. Changed in CSV but no bulkmap config change. PR is here IDR/idr-metadata#198. Only 5 values out of about 22000 for gene symbol will change.

2. changing channel information annotations
I'm about to go through all the channel information annotations to look at consistency in naming etc. Its possible that I might update a csv changing the 'Channel' information values for all wells.

I am happy with any of the following but

i. deleting all annotations for the whole screen and reloading (obviously this will take time in some screens)

ii. allowing deletion of a certain annotation type e.g. Channel info or Gene Identifier/Gene Symbol and reloading just that

iii. allowing deletion of a certain annotation type at a more precise level e.g. on certain wells for the case of Gene Symbol changing in a small % of wells.

@atarkowska
Copy link
Member Author

atarkowska commented Jan 5, 2017

I am happy with any of the following but
i. deleting all annotations for the whole screen and reloading (obviously this will take time in some screens)
ii. allowing deletion of a certain annotation type e.g. Channel info or Gene Identifier/Gene Symbol and reloading just that
iii. allowing deletion of a certain annotation type at a more precise level e.g. on certain wells for the case of Gene Symbol changing in a small % of wells.

As we don't operate on diffs but entire bulk file I think entire screen has to be scanned everytime to detect what data has changed. Do we have any way to detect the diff?

@atarkowska
Copy link
Member Author

Just giving more feedback. running populate metadata against already annotated data result in

Traceback (most recent call last):
  File "/home/omero/workspace/OMERO-server/OMERO.server/bin/omero", line 125, in <module>
    rv = omero.cli.argv()
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1489, in argv
    cli.invoke(args[1:])
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 974, in invoke
    stop = self.onecmd(line, previous_args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1051, in onecmd
    self.execute(line, previous_args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1133, in execute
    args.func(args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/plugins/metadata.py", line 439, in populate
    ctx.write_to_omero(batch_size=args.batch, loops=loops, ms=ms)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/util/populate_metadata.py", line 1406, in write_to_omero
    sz = self._save_annotation_and_links(batch, ma, batch_size)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/util/populate_metadata.py", line 1287, in _save_annotation_and_links
    batch, {'omero.group': group})
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero_api_IUpdate_ice.py", line 138, in saveArray
    return _M_omero.api.IUpdate._op_saveArray.invoke(self, ((graph, ), _ctx))
omero.ValidationException: exception ::omero::ValidationException
{
    serverStackTrace = ome.conditions.ValidationException: could not insert: [ome.model.annotations.ImageAnnotationLink]; SQL [insert into imageannotationlink (child, creation_id, external_id, group_id, owner_id, permissions, update_id, parent, version, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [imageannotationlink_parent_child_owner_id_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.annotations.ImageAnnotationLink]
	at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:637)
	at org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:412)
	at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:117)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:108)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:249)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
	at com.sun.proxy.$Proxy91.saveArray(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor1084.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
	at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:93)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
	at com.sun.proxy.$Proxy91.saveArray(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor1086.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:172)
	at ome.services.throttling.Callback.run(Callback.java:56)
	at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56)
	at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:140)
	at ome.services.blitz.impl.UpdateI.saveArray_async(UpdateI.java:68)
	at sun.reflect.GeneratedMethodAccessor1085.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
	at omero.cmd.CallContext.invoke(CallContext.java:78)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
	at com.sun.proxy.$Proxy92.saveArray_async(Unknown Source)
	at omero.api._IUpdateTie.saveArray_async(_IUpdateTie.java:98)
	at omero.api._IUpdateDisp.___saveArray(_IUpdateDisp.java:212)
	at omero.api._IUpdateDisp.__dispatch(_IUpdateDisp.java:387)
	at IceInternal.Incoming.invoke(Incoming.java:221)
	at Ice.ConnectionI.invokeAll(ConnectionI.java:2536)
	at Ice.ConnectionI.dispatch(ConnectionI.java:1145)
	at Ice.ConnectionI.message(ConnectionI.java:1056)
	at IceInternal.ThreadPool.run(ThreadPool.java:395)
	at IceInternal.ThreadPool.access$300(ThreadPool.java:12)
	at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:832)
	at java.lang.Thread.run(Thread.java:745)

    serverExceptionClass = ome.conditions.ValidationException
    message = could not insert: [ome.model.annotations.ImageAnnotationLink]; SQL [insert into imageannotationlink (child, creation_id, external_id, group_id, owner_id, permissions, update_id, parent, version, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [imageannotationlink_parent_child_owner_id_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.annotations.ImageAnnotationLink]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants