# Create GeoName Feature Codes Ontology File

# Description

Purpose is to create the geonames-featureCodes.ttl file from the featureCodes_en.txt file, downloaded from http://download.geonames.org/export/dump/. 

Before processing, the featureCodes_en.txt file was modified to:
  * Change all class-dot-code ids to class-underscore-code
  * Add super-class definition(s) 
    * This could be changed to create a default (flat) mapping based only on the feature class, but a greater level of hypernymy was desired
  * Remove codes where their semantics were identical to the class  or where the semantics were only plurals of an existing (singular) code 
    * As an example of the first, P.PPL is equivalent to the feature class, 'P' ('Populated Place') and was removed
    * As an example of the second, S.HUTS is the plural of S.HUT and can be expressed in the Narrative ontology as S_HUT + setting the 'plural' predicate to True
      * Unfortunately, most locations are not separately labeled as unique singular and plural codes
  * Modify descriptions to begin with a capital letter and to be singular
    * Where a description was not provided, the label was copied

After completion, the geonames-featureCodes.ttl file was split into separate GeoName feature class files (to be more modular), and synonyms were added. The resulting files were moved to the ../Ontologies directory.

In [1]:
# Load feature codes from the modified GeoNames file, featureCodes_en.txt 
with open('geonames_featureCodes.ttl', 'w') as ttlFile:
    # Write the prefix details
    ttlFile.write('@prefix : <urn:ontoinsights:ontology:dna:> . \n'\
                  '@prefix dna: <urn:ontoinsights:ontology:dna:> . \n'\
                  '@prefix owl: <http://www.w3.org/2002/07/owl#> . \n'\
                  '@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . \n'\
                  '@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . \n'\
                  '@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . \n\n')
    with open('featureCodes_en.txt', 'r') as featuresFile:
        fcodes = featuresFile.read()
        
    fcodes = fcodes.split('\n')
    for fcode in fcodes:
        fc = fcode.split('\t')  # Tab-separated data
        if len(fc) == 4:        # Blank lines were inserted for readability and should be ignored
            # Create the Turtle for the feature definitions
            ttlLine1 = ':{} a owl:Class ;'.format(fc[0])
            ttlLine2 = '  rdfs:subClassOf :{}'.format(fc[3].replace(',', ', :'))
            # Some class names may include a ':' already, which should be removed
            ttlLine2 = ttlLine2.replace('::', ':')   
            ttlLine3 = '  rdfs:label "{}"@en ;'.format(fc[1])
            ttlLine4 = '  :definition "{}"@en .\n\n'.format(fc[2])
            ttlFile.write('\n'.join([ttlLine1, ttlLine2, ttlLine3, ttlLine4]))