## A one-off script to fix the parent classes

Due to restrictions in the older versions of the DDE, only classes from schema.org or classes from a whitelist of registered namespaces could be extended. As such, the true parent class of a profile had to be altered to reflect the parent class of the type. This is an incorrect representation as profiles are subclasses of types.

For example, the CourseMaterial profile is meant to be a more tailored specification based off schema.org's LearningResource type.

For this reason, the tables were originally generated using only schema.org classes as parents in order to ensure compatibility with the DDE, even though this is not the most correct representation

The DDE has since had many features added in order to improve it's compatibility with the needs of the NIAID systems biology DDWG and Bioschemas communities. For this reason, the parent classes for all the schemas in the DDE need to be updated to ensure accurate representation

In [2]:
import os
import json
import pandas as pd

In [3]:
def convert_to_raw(githuburl): ## Converts a github url to a raw github url
    githubrawurl = githuburl.replace('github.com','raw.githubusercontent.com').replace('blob/','').replace('tree/','')
    return githubrawurl

def load_parent_source():
    parent_source_url = 'https://github.com/BioSchemas/bioschemas.github.io/blob/profile-auto-generation/_data/metadata_mapping.csv'
    parent_source_df = pd.read_csv(convert_to_raw(parent_source_url), header=0,usecols = ['profile','TypeParent','ProfileParent'])
    return parent_source_df

def lookup_parent(parent_source_options,classname,spectype):
    parent_choices = parent_source_options.loc[parent_source_options['profile']==classname]
    if spectype == 'Profile':
        parent = parent_choices.iloc[0]['ProfileParent']
    else:
        parent = parent_choices.iloc[0]['TypeParent']
    return parent

In [5]:
script_path = ''
filenames = ['deprecated.txt','profile_list.txt','draft_profile_list.txt','type_list.txt','draft_type_list.txt']
parent_source_df = load_parent_source()

for eachfile in filenames:
    originaldf = pd.read_csv(os.path.join(script_path,eachfile),delimiter='\t',header=0)
    newdf = originaldf[['namespace','name','type','version','url']].copy()
    newdf['subClassOf'] = originaldf.apply(lambda row: lookup_parent(parent_source_df,row['name'],row['type']),axis=1)
    ordereddf = newdf[['namespace','name','subClassOf','type','version','url']].copy()
    ordereddf.to_csv(os.path.join(script_path,eachfile),sep='\t',header=True,index=False)