# Sample Document Generator 

This is the first program of my project which will generate the sample set of documents the main "Retention Checker" program will check. Because confidentiality protocols prevent me from using real document information, I need dummy documents to run the code on. 

The goal of this program is to produce an outfile of a large number of dummy documents. The bank I work for has over 15,000 loans, each with dozens of documents but for the purposes of this project, I'd like the main program to check a sample set of 500 documents. Considering how specific the parameters for retention are, I feel like I need a lot of samples to see the code really working. 

## Setting Up the Program 

In order to generate the sample docs I need to do some set-up. First, I need to import a few modules I will be using later.

In [None]:
import random
from random import randint
import datetime
import csv

Next, I need to set up my reference dates. 

In [3]:
current_date = datetime.datetime.today()
open_date = datetime.datetime(2001,1,1)

print("current date is", current_date)
print("open date is", open_date)

current date is 2020-05-09 21:11:04.455519
open date is 2001-01-01 00:00:00


Using the datetime module I set up the current date. In reality, new documents will be produced every day so to simulate that, I need the program to operate using the current date. 

I also set up a "Open Date" which represents the date my hypothetical bank opened. No documents can be created before that date which gets me as close to reality as possible with the dummy documents. 

Finally, I import my "Reference Schedule Guide."

In [None]:
reference = open("Retention Schedule Guide", 'r') 
doc_type_list = reference.readlines()
reference.close()

This step isn't entirely necessary. I created the guide for my personal reference when setting up the retention parameters. However, I figured I could easily extract the Document Type Codes from the guide to save some typing time. 

In [None]:
doc_series_codes = [] 
for line in doc_type_list[1:]: 
    doc_line = line.split()
    doc_type = doc_line[0] 
    doc_series_codes.append(doc_type)

Using a for loop and accumulator starting with position [1] of the guide in order to skip the header text. Then I slice out the Doc Code in position [0] of each line and add it to the accumulator. 

In [None]:
print(doc_series_codes)
['ACC100', 'ACC180', 'ACC160', 'ACC300', 'ACC340', 'ACC400', 'ACC500', 'ACC520', 'ADM440', 'BNK100', 'BNK120', 'BNK130', 'BNK140', 'BNK160', 'BNK170', 'BNK180', 'BNK300', 'BNK400', 'GEN200', 'GEN210']


I now have a list of the potential Document Type Codes to generate sample documents from. Now I'll just set up the csv outfile script just to get that out of the way.

In [None]:
outfile = open('sample_docs.csv', 'w')
doc_csv = csv.writer(outfile) 
doc_csv.writerow(['Doc Code',"Doc Creation Date", "Doc Expiration Date(if applicable) ", "Accnt #", "Accnt Closed?", "Accnt Close Date"])

This creates the csv outfile titled `sample_docs_csv` and also names the csv row headers. For the purposes of the retention checker, the following fields will be defined by the Sample Doc Generator: 

* Document Type Code 
* Date Doc is Created
* Doc Expiration Date (if applicable)
* Account Number (if applicable) 
* Is the account open or closed? 
* If closed, the close date

Now its time to generate some docs!

## Random Doc Generator 

First, I add a little status_ref list

In [None]:
status_ref = ("Y","N")

I'm going to use this in the main function to randomly determine if an account is opened or closed. More on that later. 

The big problem with the retention parameters is that it involves a great deal of logic that make the program pretty complicated. The main function needed a lot of layers and nested filters to get the right simulated docs. 

The first portion of the function sets up the accumulator and the random doc code generator. 

In [None]:
def main():
    sample_doc = []
    doc_series = (random.choice(doc_series_codes))
    sample_doc.append(doc_series)
    start = open_date.toordinal()
    end = current_date.toordinal()
    doc_date = current_date.fromordinal(random.randint(start, end))
    sample_doc.append(str(doc_date))

The function generates one single document and `sample_doc` serves as our accumulator to hold all of that sample doc's information. Using the random module, the function selects a random doc code from our list, then assigns it a creation date within the range of the bank open date and the current date. 

Next, we filter through each of the other doc stats by filtering through each potential doc type. This is where it starts getting complicated. 

In [None]:
    if doc_series == "ACC160":
        start_exp = doc_date.toordinal()
        end_exp = current_date.toordinal()
        exp_date = current_date.fromordinal(random.randint(start_exp, end_exp))
        sample_doc.append(str(exp_date))
        sample_doc.append("0000000")
        sample_doc.append("0000-00-00 00:00:00")
        sample_doc.append("NA")
        sample_doc.append("0000-00-00 00:00:00")

I have the function look at the random Doc Code and assign details from there. Doc Type ACC160, for example, is a document that can expire. This date is set by the bank and depends on various factors so I used the randint module to assign a date any time after the doc creation date and before the current date. 

Then that's it. ACC160 is not account related so I don't need an account number, closing date, etc. I just append zeros for the rest of the fields. 

In [None]:
    elif doc_series == "BNK100":
        sample_doc.append("0000-00-00 00:00:00")
        account_no = str(random_account_no(8)) 
        sample_doc.append(account_no)
        start_open = open_date.toordinal()
        end_open = doc_date.toordinal()
        account_open_date = current_date.fromordinal(random.randint(start_open, end_open))
        sample_doc.append(str(account_open_date))
        account_status = (random.choice(status_ref))

Next I look at BNK100. This doc does not have a set expiration date so I enter zero for that. But it is account related so I have the progam assign it a random 8-digit account number and a random account open date. 

Accounts can close at any time so this is where the `account_status` list comes into play. The function randomly choices Y or N to tell me if the account is opened or closed. Depending on the output, the function filters the data even further. 

In [None]:
            if account_status == "Y":
                start_close = doc_date.toordinal()
                end_close = current_date.toordinal()
                account_close_date = current_date.fromordinal(random.randint(start_close, end_close))
                sample_doc.append("YES")
                sample_doc.append(str(account_close_date))

If the account is closed (Y), we need a close date. The program assigns a close date that is after the account open date and before the current date. 

In [None]:
            if account_status == "N":
                sample_doc.append("NO")
                sample_doc.append("0000-00-00 00:00:00")

If the account is still open (N) we just fill in zeros for the close date. 

And that's the overall gist of the doc generator function. It continues through each of the doc types in much the same manner, making small tweaks depending on the parameters of each doc type. 

The function finishes up by printing the complete doc data within the accumulator to the csv outfile. 

The last step is to execute the program. 

In [None]:
for d in range(500):
    main()

The program is executed in a range of 500 to have the function run, generate a doc, and print it to the outfile 500 times. 