## Analyze Synonym Types in Mondo Edit

### Overview
Analyze the synonym types in `mondo-edit.obo` in order to understand what and how synonym types
are used in Mondo. 


### Source data
Use ROBOT to convert a current `mondo-edit.obo` into OWL as:\
`robot convert -i mondo-edit.obo -o TEST-mondo-edit.owl`

Get all classes, labels, synonyms, and synonym types from the OWL file using a SPARQL query (see sparql directory) and ROBOT as:\
`robot query -i TEST-mondo-edit.owl -q mondo_get-synonyms.sparql mondo-classes-and-synonyms.tsv`

In [1]:
# Imports
import pandas as pd
import numpy as np

In [2]:
# Read in the results from mondo-classes-and-synonyms.tsv
df = pd.read_csv('../data/mondo-classes-and-synonyms.tsv', sep='\t')
df.head(15)

Unnamed: 0,?curie,?label,?synonym,?synonymScopeName,?synonymType,?synonymSourceFixed
0,MONDO:0000001,disease,condition,hasExactSynonym,,NCIT:C2991
1,MONDO:0000001,disease,disease,hasExactSynonym,,NCIT:C2991
2,MONDO:0000001,disease,disease or disorder,hasExactSynonym,,NCIT:C2991
3,MONDO:0000001,disease,"disease or disorder, non-neoplastic",hasExactSynonym,,NCIT:C2991
4,MONDO:0000001,disease,diseases,hasExactSynonym,,NCIT:C2991
5,MONDO:0000001,disease,diseases and disorders,hasExactSynonym,,NCIT:C2991
6,MONDO:0000001,disease,disorder,hasExactSynonym,,NCIT:C2991
7,MONDO:0000001,disease,disorders,hasExactSynonym,,NCIT:C2991
8,MONDO:0000001,disease,medical condition,hasExactSynonym,,No Source
9,MONDO:0000001,disease,other disease,hasExactSynonym,,NCIT:C2991


In [3]:
# Get count of all unique values 
df.nunique()

?curie                  20721
?label                  20718
?synonym               109315
?synonymScopeName           4
?synonymType                9
?synonymSourceFixed     35646
dtype: int64

In [4]:
# Get a list of all synonym types in Mondo
sorted([x for x in df['?synonymType'].unique() if pd.notna(x)])

['<http://purl.obolibrary.org/obo/OMO_0003005>',
 '<http://purl.obolibrary.org/obo/mondo#ABBREVIATION>',
 '<http://purl.obolibrary.org/obo/mondo#AMBIGUOUS>',
 '<http://purl.obolibrary.org/obo/mondo#CLINGEN_LABEL>',
 '<http://purl.obolibrary.org/obo/mondo#DEPRECATED>',
 '<http://purl.obolibrary.org/obo/mondo#DUBIOUS>',
 '<http://purl.obolibrary.org/obo/mondo#EXCLUDE>',
 '<http://purl.obolibrary.org/obo/mondo#MISSPELLING>',
 '<http://purl.obolibrary.org/obo/mondo#NON_HUMAN>']

In [5]:
# Get counts of all unique classes that have a synonym type ABBREVIATION and the synonymScopeName

filtered_df = df[df['?synonymType'] == '<http://purl.obolibrary.org/obo/mondo#ABBREVIATION>']

synonym_scope_counts = filtered_df['?synonymScopeName'].value_counts()

synonym_scope_counts

?synonymScopeName
hasExactSynonym      9809
hasRelatedSynonym    5981
hasNarrowSynonym      603
hasBroadSynonym       104
Name: count, dtype: int64

In [6]:
# What are the counts and synonym scope for unique Mondo IDs where there is no xref and no synonym type?

filtered_df = df[(df['?synonymSourceFixed'] == 'No Source') & (df['?synonymType'].isna() | (df['?synonymType'] == ''))]

filtered_df

Unnamed: 0,?curie,?label,?synonym,?synonymScopeName,?synonymType,?synonymSourceFixed
8,MONDO:0000001,disease,medical condition,hasExactSynonym,,No Source
28,MONDO:0000022,nocturnal enuresis,bedwetting,hasExactSynonym,,No Source
29,MONDO:0000022,nocturnal enuresis,"enuresis, nocturnal",hasExactSynonym,,No Source
111,MONDO:0000155,triglyceride storage disease,inborn sequestering of triglyceride disorder,hasExactSynonym,,No Source
163,MONDO:0000193,cortisone reductase deficiency,hyperandrogenism due to cortisone reductase de...,hasExactSynonym,,No Source
...,...,...,...,...,...,...
141324,MONDO:0700226,food allergy,allergic disease from food material,hasExactSynonym,,No Source
141470,MONDO:0800043,Stüve-Wiedemann syndrome 1,Stuve-Wiedemann syndrome,hasBroadSynonym,,No Source
141698,MONDO:0800341,"congenital myopathy 4A, autosomal dominant",cap myopathy 1,hasExactSynonym,,No Source
141892,MONDO:0850302,intracranial meningioma,meningioma (disease) of brain,hasExactSynonym,,No Source


In [7]:
# How many unique Mondo IDs are in filtered_df?
# (filtered_df contains rows where there no xref and no synonym type and there is no xref)

# Count unique mondo ids (assuming ?curie represents mondo ids)
filtered_df['?curie'].nunique()

2194