# Amend SQL To Work In SAS Scheduler


SAS scheduler has its own Olive schema, but this has some limitations, the main one being:
- It does not have its own space allocated i.e. it cannot create tables in it own schema. But it can create hash tables

Therefore any existing code will need to be modified to run through SAS scheduler. Typically you will have to use hash tables whenever you can. Also any saved data will have to be stored in tables in a different schema (e.g. your own). This script helps you to:
1. produce a list of procedures and all tables within each procedure
2. for each proc/table you can change or add a schema
3. for each proc/table you can add a prefix (e.g. #) 


#### The process is:
1. Save the sql script into a text file
2. Update the parameter settings below (e.g. file names and locations) 
3. Run Step 1: Get a list of procs and tables
4. Look at the resulting Excel spreadsheet.
5. Create another sheet in the same spreadsheet with the list of procs and tables and for each have a column for schema, prefix (in that order) and if replace whole. You should have 4 columns, which you need to give headers to (see below for example). Enter the schema name (if any) that you want (note if you want to keep the existing schema then you still need to enter it), any prefix (e.g. #), or if you want to replace the existing table/proc with something else entirely. Note that if there is a whole replacement then the schema and prefix fields are ignored.
6. Run Part 2: Process SQL script with changes 

#### Example spreadsheet format for proc/table changes required:

|Proc_Table Name                                        |    Schema         |   Prefix    | Replace Whole |
|:------------------------------------------------------|:-----------------:|:-----------:|:-------------:|
|cp2_accounts                                           |    thompsonja     |   SAS_      |               |
|cp2_box_lookup                                         |                   |   #         |               |
|vespa_analysts.channel_map_prod_service_key_attributes |    vespa_analysts |             |               |
|('waterfall_base')                                     |                   |             | ('thompsonja.waterfall_base') |


#### Be Aware Of:

1. "create index i1 on tablename (xxxx)" - the table will not be found in step 1, but will get replaced in step 2 if found through a keyword in step 1.
2. "execute('call procname.." - Will miss the proc in step 1. Will be replaced in step 2 if found through a keyword in step 1
3. Dynamic SQL may not get matched nor replaced in either step and will need to be manually changed


## Parameters

In [1]:
# folder name for all files
folder_name = 'C:/Users/thompja/OneDrive - Sky/002 Measurement General/000 BAU Panel Balancing/02 POC SAS Scheduler/'  

# text files for original and processed SQL
sql_input_filename = 'PanBal_M02_WaterfallSQL.txt'
sql_output_filename = 'PanBal_M02_WaterfallSQL_SAS.txt'

# Spreadsheet to use for proc/table lists and schema/prefixes
in_out_excel = 'PanBal_M02_WaterfallSQL Procs Tables.xlsx'
# sheet that contains proc/table schema/prefixes
sheet_name_for_settings = 'Settings'

# SQL keywords to search for that are immediately before proc/table name - should be lower case
keyword_search_list = ['procedure', 'execute', 'call', 'table', 'into', 'update', 'from', 'join', 'object_id']


### Import Libraries

In [2]:
import string
import numpy as np
import pandas as pd


### Functions

In [3]:
def star_out_star_comments_from_sql(all_text):
    
    uncomment_text = all_text   

    start_comment_star = -1
    char_star_end = False
    
    
    for i in range(len(uncomment_text)):
        check_chars = uncomment_text[i : i+2]
         
        if check_chars == '/*':
            start_comment_star = i       
        elif check_chars == '*/':
            char_star_end = True
        
        
        if start_comment_star == i:
            # at the start of the star comment so do nothing
            pass
        elif start_comment_star > -1 and not char_star_end:
            # a /* has already started, and not yet reached the end
            # so populated text with a star
            uncomment_text = uncomment_text[: i] + '*' + uncomment_text[i+1 :]
        elif start_comment_star and char_star_end:
            # reached the end of a star comment
            # so reset start_comment_star
            start_comment_star = False        
        else:
            pass
                
    return uncomment_text      
            

def remove_dash_comments_from_sql(text_list):
    
    uncomment_text = text_list   
   
    for i in range(len(uncomment_text)):        
        for j in range(len(uncomment_text[i])):
            check_chars = uncomment_text[i][j : j+2]            
            if check_chars == '--':
                uncomment_text[i] = uncomment_text[i][ : j]                
                break
    
    return uncomment_text



    
def match_string_list_to_string_list(word_list, search_list):
    # Search through the word_list to find matches to search_list
    # Only the first x characters of each word in the word_list is compared
    # where x is the length of the word in the search_list
    
    search_mask_list = [] 

    # Add a col for each search word and set to true if the search word has a match in the word list
    for search_word in search_list:    
        search_mask_list.append([x[0:len(search_word) + 1] == search_word for x in word_list])

    # Transpose so that rows = each word of word_list and cols = boolean matches to the search_list
    search_mask_array = np.asarray(search_mask_list).T
    
    return search_mask_array
    
def combine_list_into_single_string(string_list):
    
    combined_string = ''
    
    for i in range(len(string_list)):
        combined_string += string_list[i] + '\n'
        
    return combined_string


### Step 1: Get a list of procs and tables

An Excel file will be created with the list of procs/tables. 

In [4]:
# convert keyword search list to an array
keyword_search_array = np.asarray(keyword_search_list)

# Read in SQL script
sql_file = open(folder_name + sql_input_filename, 'r')
sql_all_text = sql_file.read() # read whole text into a single variable
sql_file.close()

# turn /* comments into *  
uncommented_sql = star_out_star_comments_from_sql(sql_all_text)

# drop -- comments
# first split by end line so can deal with '--' dash comments
uncommented_sql = uncommented_sql.split('\n')
uncommented_sql = remove_dash_comments_from_sql(uncommented_sql)

# recombined uncommented split list into long text
sql_all_text = combine_list_into_single_string(uncommented_sql)


# split by space so that we can search for key commands
# replace tabs and spaces by end of line so text split into words

# ensures there is a space before the ( to make sure the proc/table name is split out 
sql_all_text = sql_all_text.replace('(', ' (')  
sql_all_text = sql_all_text.replace('\t', '\n')
sql_all_text = sql_all_text.replace(' ', '\n')
sql_word_list = [x.lower() for x in sql_all_text.split('\n')]


# search for key words in text and set up a mask
keyword_mask_array = match_string_list_to_string_list(sql_word_list, keyword_search_list)

# if any search_list words are found then set all_keyword_mask_array to true else false (i.e. > 0 is true)
all_keyword_mask_array = np.sum(keyword_mask_array, axis = 1)


# loop through each word to see if any match to any key word
# generate list of procs/tables
keyword_find = False
tables_and_procs = []
latest_procedure = 'n/a'

for i in range(len(all_keyword_mask_array)):
    if all_keyword_mask_array[i] > 0:
        keyword_find = True
        # mask should match 1 keyword, so the 0 element of teh result will be the keyword
        keyword = keyword_search_array[keyword_mask_array[i]][0]        
    elif keyword_find and sql_word_list[i] != '':
        if keyword == 'procedure':
            # keep the name of the last procedure
            latest_procedure = sql_word_list[i]
        append_data = [latest_procedure, keyword, sql_word_list[i]]
        tables_and_procs.append(append_data)
        keyword_find = False

  


In [5]:
 # create dataframe of proc/table lt
tables_and_procs_df = pd.DataFrame(tables_and_procs, columns = ('Current Proc', 'Keyword', 'Table/Proc'))     

# Export to Excel
file_name_string = folder_name + in_out_excel

writer = pd.ExcelWriter(file_name_string)
tables_and_procs_df.to_excel(writer,'List_Tables_Procs')
writer.save()

# save processed script as a csv. Will read back in for Step 2.
file_name_string = folder_name + sql_output_filename
# Turn list into single string
final_sql_string =  ' '.join(sql_word_list) 
text_file = open(file_name_string, "w")
text_file.write(final_sql_string)
text_file.close()


### Step 2: Process SQL script with changes

Before running this step you will need to enter the proc/table settings in the spreadsheet



In [6]:
# import Excel which has proc/tables setings
file_name_string = folder_name + in_out_excel
table_proc_amendments_df = pd.read_excel(file_name_string, sheet_name_for_settings, index_col=None, na_values=['NA'])

# import sql text processed in Step 1
file_name_string = folder_name + sql_output_filename
sql_file = open(file_name_string, 'r+')
sql_all_text = sql_file.read() # read whole text into a single variable
sql_file.close()

sql_word_list = sql_all_text.split(' ')

In [8]:
# turn into an array
sql_word_array = np.asarray(sql_word_list)

# loop through the proc/table list and make the changes
for i in range(len(table_proc_amendments_df)):    
    
    # get the proc/table name
    search_table_proc = table_proc_amendments_df.iloc[i,0]
    
    # check if there is a complete replace (NaN if not)
    if isinstance(table_proc_amendments_df.iloc[i,3], str):
        replace_string = table_proc_amendments_df.iloc[i,3]
    else:
        # not replacing whole string, so check schema and prefix        

        # check if already a schema for proc/table and ignore it
        if search_table_proc.find('.') < 0:
            table_proc = search_table_proc
        else:
            table_proc = search_table_proc[search_table_proc.find('.') + 1 : ]

        # check that there is a change required for proc/table (NaN if not)
        if isinstance(table_proc_amendments_df.iloc[i,1], str):
            schema = table_proc_amendments_df.iloc[i,1] + '.'
        else:
            schema = ''

        # check that there is a change required for proc/table (NaN if not)
        if isinstance(table_proc_amendments_df.iloc[i,2], str):
            prefix = table_proc_amendments_df.iloc[i,2]
        else:
            prefix = ''        
    
        # build replacement string
        replace_string = schema + prefix + table_proc
    
    
    # find all matches to the proc/table
    word_mask_array = np.asarray([x[0:len(search_table_proc) + 1] == search_table_proc for x in sql_word_list])
    
    # replace all matches with the replace string
    sql_word_array[word_mask_array] = replace_string
    



In [9]:
# Save new SQL as a text file
final_sql_string =  ' '.join(sql_word_array)  

file_name_string = folder_name + sql_output_filename
text_file = open(file_name_string, "w")
text_file.write(final_sql_string)
text_file.close()

