# Purpose

This is a collection of general purpose functions

# Discussion

this assumes that
* aws credentials have been previously setup
* Boto3 has been installed:  `pip3 install boto3`

# Functions

## yamlDump()

read in a python data structure and return it in YAML format

`results = yamlDump(dataDict)`

* data: Python dictionary or OrderdDictionary containing YAML data
* results: YAML as a string

In [1]:
#read in a python data structure and return it in YAML format
#results = yamlDump(dataDict)
#data: Python dictionary or OrderdDictionary containing YAML data
#results: YAML as a string
#noVer: dont print the YAML version

def yamlDump (data, noVer=False):
    from ruamel.yaml import YAML #[ruamel.yaml documentation](https://yaml.readthedocs.io/en/latest/index.html)
    yaml = YAML()

    #define the output format
    if noVer == False:
        yaml.version = (1, 2) #https://yaml.readthedocs.io/en/latest/detail.html#document-version-support
    yaml.default_flow_style = False #https://yaml.readthedocs.io/en/latest/basicuse.html#basic-usage
    yaml.indent(mapping=2, sequence=4, offset=2) #https://yaml.readthedocs.io/en/latest/detail.html#indentation-of-block-sequences
    #yaml.top_level_colon_align = True #https://yaml.readthedocs.io/en/latest/detail.html#positioning-in-top-level-mappings-prefixing
    yaml.explicit_start=True #guessed from: https://pyyaml.org/wiki/PyYAMLDocumentation
    yaml.explicit_end=True #guessed from: https://pyyaml.org/wiki/PyYAMLDocumentation
    yaml.sort_keys=False #guessed from: [sort_keys=False](https://stackoverflow.com/a/55171433/12400492)

    from io import StringIO

    old_stdout = sys.stdout #save the original stdout
    sys.stdout = mystdout = StringIO() #redirect stdout
    yaml.dump(data, sys.stdout) #dump YAML to stdout
    sys.stdout = old_stdout #restore stdout

    return mystdout.getvalue() #return the YAML structure

## increaseHeaderLevel()

Look for MD header levels and increase them by the indicated amount.  Assumes that "H1 = '#'" and if it is to be decreased by 2 levels, the result would be H3 ('###')

`results = increaseHeaderLevel (mdText, addlLevels)`

* mdText: string containing MarkDown text
* addlLevels: int for how many levels to decrease
* results: resulting string of MarkDown text

In [None]:
def increaseHeaderLevel (mdText, addlLevels):
    import re
    
    if addlLevels > 0:
        string = "#"
        
        #add the appropriate number of `#`s to the string
        x = 1
        while x < addlLevels:
            string = string + '#'
            x += 1
            
        mdText = re.sub ('^#', string, mdText, flags=re.MULTILINE) #substitute the old string with the new one
    return (mdText)

## fileNamePart()

#generate all the file name permutations we may need

`results = createFileInfo (fileID)`

* fileID: The document's ID (UUIDv4)
* results: Dictionary with the following
  * results[base_name]: the base name (document UUID)
  * results[file_prefix]: generate the prefix (the path based upon the documentID)
  * results[file_suffix]: set the file suffix (extension)
  * results[file_name]: generate the full filename (documentID + extension)
  * results[key_name]: generate the full key (path + full filename)

In [3]:
def fileNamePart (fileID):
    document={}

    #the base name (document UUID)
    document['base_name']=fileID
    #print("document['base_name']="+document['base_name'])

    #generate the prefix (the path based upon the documentID)
    import re
    #couldnt get `re.sub()` to cooperate so did this instead
    document['file_prefix'] = ''
    for part in document['base_name'].split('-'):
        document['file_prefix'] += re.search('^.{2}', part).group(0) + '/' #match the first 2 chars and output the first match
    #print("document['file_prefix']="+document['file_prefix'])
    
    #generate the prefix+base name (this is the subdir we will use for other files)
    document['dir']=document['file_prefix']+document['base_name']
    print ("document['dir']="+document['dir'])

    #set the file suffix (extension)
    document['file_suffix']='.ymal'
    #print("document['file_suffix']="+document['file_suffix'])

    #generate the full filename (documentID + extension)
    document['file_name']=document['base_name']+document['file_suffix']
    #print("document['file_name']="+document['file_name'])

    #generate the full key (path + full filename)
    document['key_name']=document['file_prefix']+document['file_name']
    #print("document['key_name']="+document['key_name'])

    return (document)

## is_valid_uuid()

determine if the string is a valid UUID (defaulting to UUIDv4)

`results = is_valid_uuid(uuid_to_test, version=4)`

* uuid_to_test: text string
* version: version number to test for
* results: boolean

In [4]:
#From:  https://stackoverflow.com/questions/19989481/how-to-determine-if-a-string-is-a-valid-v4-uuid

"""
Check if uuid_to_test is a valid UUID.

Parameters
----------
uuid_to_test : str
version : {1, 2, 3, 4}

Returns
-------
`True` if uuid_to_test is a valid UUID, otherwise `False`.

Examples
--------
>>> is_valid_uuid('c9bf9e57-1685-4c89-bafb-ff5af830be8a')
True
>>> is_valid_uuid('c9bf9e58')
False
"""

def is_valid_uuid(uuid_to_test, version=4):
    from uuid import UUID
    
    try:
        uuid_obj = UUID(uuid_to_test, version=version)
    except ValueError:
        return False

    return str(uuid_obj) == uuid_to_test

## createPandocHeader()

Generate a Pandoc header in YAML format

`results = createPandocHeader (srcData)`

* srcData: OrderdDictionary containing the YAML document
* results: YAML data as a string

In [2]:
def createPandocHeader (srcData):
    
    #these are the header fields that we want to use
    keys=[
        "title",
        "author",
        "abstract",
        "lang"
    ]
    
    header = {} #temp structure to use
    for key in keys:
        if srcData.get(key) != None: #from [here](https://thispointer.com/python-how-to-check-if-a-key-exists-in-dictionary/)
            header[key] = srcData[key]
            
    return yamlDump(header, noVer=True) #Pandoc doesnt correctly handle the `%YAML <ver>` header

## createMarkDownDocument()

recursively walks the YAML structure and generates a MarkDown document

`results = createMarkDownDocument (srcData, depth)`
* srcData: OrderdDictionary containing the YAML document
* depth: offset from the listed markdown header level
* results: markdown formatted data as a string
* bucket: the S3 bucket
* directory: This is the directory to look in for the non-UUIDv4 references

In [None]:
def createMarkDownDocument (bucket, directory, srcData, depth):
    document=''

    if isinstance(srcData['body'], list):
        print ('List detected')
        
        for element in srcData:
            
            if isinstance(element, list) or isinstance(element, dict):
                print ('Structure detected - recursing')
                results += generateDocumentBody (element, depth+1) #walk the next level down and increase the header depth
                
            elif re.search('.md$', element, flags=re.IGNORECASE):
                print ('MarkDown: ' + element)

                #read in the content
                %run "Functions_S3.ipynb" # `downloadTextFile()` is located here
                results = downloadTextFile (bucket, fileNamePart(srcData['id'])['dir']+'/'+element)
                
            elif re.search('.yaml$', element, flags=re.IGNORECASE):
                print ('YAML: ' + element)
                #process the file
                
            elif is_valid_uuid(element):
                print ('UUIDv4: ' + element)
                #go find the file and then process the file
                
            else: #No idea what this is. Assuming its MD text
                print ('unknown: ' + element)
                results += element #assuming that if you are manually entering MD then you can control the formatting too
                
            document += '\n\n'
                  
    elif isinstance(srcData['body'], dict):
        print ('Dict detected')
        
    else:
        #print ('markdown:  no structure')
        document += srcData['body']
        
    return (document)

## compilePandocDocument()

Convenient packaging to create the Pandoc file contents

`results = compileDocument (srcData)`

* results: the full Pandoc formatted document as a string
* srcData: OrderdDictionary containing the YAML document
* bucketName: the S3 bucket
* directory: This is the directory to look in for the non-UUIDv4 references

In [None]:
def compilePandocDocument (bucketName, directory, srcData):
    document = createPandocHeader (srcData)
    document += createMarkDownDocument (bucketName, directory, srcData, 0)
    return (document)