Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all DevelopmentalStage ontology elements point to the same parents #81

Closed
jkobject opened this issue Nov 27, 2023 · 16 comments · Fixed by #87
Closed

all DevelopmentalStage ontology elements point to the same parents #81

jkobject opened this issue Nov 27, 2023 · 16 comments · Fixed by #87
Assignees

Comments

@jkobject
Copy link
Contributor

jkobject commented Nov 27, 2023

Hello!

I hope you had a nice weekend, and sorry to come up with yet another question 😅 :

There seems to be an issue (which I have also seen in some versions of the HsapDV ontology) where all elements of the ontology point to the same parent (the root parent) HsapDv:0000000

However, this is not the case for the version on OLS http://www.ebi.ac.uk/ols4/ontologies/hsapdv/classes?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FHsapDv_0000206.

I guess you should be able to get the corrected version there(?)

Best,

@sunnyosun
Copy link
Member

Thank you so much for raising this, @jkobject! I'll look into getting the updated version for HsapDV!

@sunnyosun
Copy link
Member

As far as I understand now, the version of HsapDv we have in bionty is up-to-date. The fact that the parents are shown differently compared to the EBI site was due to the type of relationships. Only the root parent HsapDv:0000000 (life cycle stage) is labeled as a "superclass" for all the terms, which was how we parsed out the relationships from the owl files. The rest of the relationships are part_of, which we previously didn't take into account when accounting for parents.

I'm going to update the DataFrame in the next Bionty release to account for this for HsapDv.

@jkobject
Copy link
Contributor Author

jkobject commented Nov 28, 2023

thanks!
It might (not) be related but I also saw that some relationships were also different to EBI's for the cell type connections.
For example retinal rod cells are a type of neurons which I would want to see as a parental relationship.
Screenshot 2023-11-28 at 10 45 22

However right now the only parent I get after running this:

records = lb.CellType.from_values(names, field=lb.CellType.ontology_id)
ln.save(records)

(names is actually all cell types in cellxgene)
Is camera-type eye photoreceptor.
Screenshot 2023-11-28 at 10 47 34
But when I look for the "camera-type eye photoreceptor" element in lb, it doesn't have any parents.

@jkobject
Copy link
Contributor Author

Maybe my issue is not very clear...
Let me know if this is similar to the dev-stage issue or something different.

@sunnyosun
Copy link
Member

sunnyosun commented Nov 29, 2023

Sorry for the delayed response, the missing parents of cell types were due to a previous cleanup of the laminlabs/cellxgene instance, they are now all restored:

Screenshot 2023-11-29 at 14 58 00

You can pass a larger number to distance= (default is 5) to see more upstream parents.

@jkobject
Copy link
Contributor Author

I am not sure how to access this. I still have the issue.
should I install lamin 0.63?
deleting, then reloading all celltypes by doing:

records = lb.CellType.from_values(names, field=lb.CellType.ontology_id)
ln.save(records)

Does not seem to solve it at least

@sunnyosun
Copy link
Member

sunnyosun commented Nov 30, 2023

Which instance are you working on? The screenshot was from the laminlabs/cellxgene instance (note of the renaming laminlabs/cellxgene-census to laminlabs/cellxgene).

If you are working on your own local instance, could you make sure you didn't set auto_save_parents to False? Cleaning up the registry and re-adding the terms should automatically populate parents. (Yes, updating to the latest lamindb is always a good idea:))

Could you try the following and see if you get parents?

# clean up registry
lb.CellType.filter().delete()

# add a new cell type
record = lb.CellType.from_bionty(name="T cell")
record.save() # you should see a warning saying parents are being saved

# check
record.view_parents()

Here is a screenshot from me trying to run this snippet on a local instance with lamindb 0.63.1:
Screenshot 2023-11-30 at 12 56 18

This is running from_values, which should add parents as well:
Screenshot 2023-11-30 at 12 58 25

@jkobject
Copy link
Contributor Author

jkobject commented Nov 30, 2023

Hi, when I meant "celltypes in cellxgene" I actually meant downloading all unique celltypes from the cellxgene census API.
I am working on my own private local instance.

I have tried multiple times to remove and then re-add the elements but I still see the same thing.

Even for B-cell it is different:
Screenshot 2023-11-30 at 13 50 56

It seems it populates my relationships, but maybe not using everything?

Here are my tool version:
lamin_cli 0.2.3
lamin_utils 0.11.7
lamindb 0.63.0
lamindb_setup 0.60.0
lnschema_bionty 0.35.0
lnschema_core 0.57.3

@sunnyosun
Copy link
Member

sunnyosun commented Nov 30, 2023

Hi @jkobject, it seems that you are on an older version of the "Cell Ontology", I can reproduce your hierarchy using the earlier version "2023-04-20":
Screenshot 2023-11-30 at 14 06 22

The issue that caused it was you didn't upgrade lnschema_bionty when upgrading lamindb. As lnschema_bionty 0.35.2 should be installed with lamindb 0.63.0: https://github.com/laminlabs/lamindb/blob/9543995309abfe456af5223fe86f13f2fea5c031/pyproject.toml#L39C21-L39C21

I'd suggest you run pip install 'lamindb[bionty]' -U to install the latest lamindb and its dependencies.

Once you upgraded lnschema_bionty, run:

import bionty as bt
bt.reset_sources()

# Run via CLI: lamin load <your instance>

import lnschema_bionty as lb
lb.dev.sync_bionty_source_to_latest()

# Run via CLI: lamin load <your instance>

The above would add the latest ontology sources to your local instance. (Make sure you run lamin load on CLI as commented)

You can check which version of ontology is being used by printing lb.CellType.bionty(), and the latest version should be "2023-08-24" which gives the hierarchies I pasted previously.

Let me know if this works!

@jkobject
Copy link
Contributor Author

it worked! 🚀

@jkobject
Copy link
Contributor Author

Hello @sunnyosun I am reoppening the thread here as we went into a tangent but for the development stage and the tissue ontologies I still see a broken parental tree with the latest version of bionty. Let me know if it has something to do on my end or in the way it has been constructed :)

Also let me know how I can help, if you tell me where the code is I can try to make PR.

@sunnyosun
Copy link
Member

sunnyosun commented Aug 13, 2024

@Zethson Could you include part_of when parsing parents and update the most recent version of DevelopmentalStage ontologies?

@sunnyosun sunnyosun transferred this issue from laminlabs/lamindb Aug 13, 2024
@Zethson Zethson self-assigned this Aug 13, 2024
@Zethson
Copy link
Member

Zethson commented Aug 13, 2024

@jkobject thank you for reporting the issue again and offering to help! I'll have a look ASAP. I might use your existing PR (laminlabs/bionty-base#540) as a base and then go from there.

I'll keep you posted...

@jkobject
Copy link
Contributor Author

jkobject commented Aug 14, 2024

Hello @Zethson,

Thanks for the help; let me know if you need anything from me or if you have questions about the PR I did. (I might have a look at it as well)
You mentioned an issue with the datafile size if we were to include the part_of relationship. Do you think it will be a problem? (I think only the DevStage and Tissue ontology need to use the part_of relationship)

@Zethson
Copy link
Member

Zethson commented Aug 15, 2024

@jkobject I've updated Bionty to also include other relationships (part_of) in this case for DevelopmentalStage and Tissue. With this, I have added new versions of the 2 ontologies that should include the desired relationships.

For you in practice, this means that you can use https://docs.lamin.ai/bionty.core.biorecord#bionty.core.BioRecord.import_from_source to get the latest versions into your lamin instance.

Assuming that you have loaded your instance, it should look something along the lines of:

import bionty as bt

new_human_ds = bt.Source.filter(entity="bionty.DevelopmentalStage", organism="human", version="2024-05-28").one()
new_mouse_ds = bt.Source.filter(entity="bionty.DevelopmentalStage", organism="mouse", version="2024-05-28").one()

bt.DevelopmentalStage.import_from_source(new_human_ds)
bt.DevelopmentalStage.import_from_source(new_mouse_ds)

new_tissue = bt.Source.filter(entity="bionty.Tissue", version="2024-08-07").one()
bt.Tissue.import_from_source(new_tissue)

If you are using standalone bionty (bionty.base) you can simply specify the new versions after we have made a new release. Technically you could also adapt your local sources to include the new versions but if it's not urgent, it might be easier to just wait.

The implementation is based on your earlier PR and I added you as a co-author to it. We cannot transfer PRs from archived repositories to this one, sorry.

Please ask away if something is unclear or doesn't work as expected!

@jkobject
Copy link
Contributor Author

Awesome thanks! I will check it as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants