# Line-by-line Test of the Batch Report Sync
This is a notebook that tests the main script 2_batch_report_sync line by line with outputs for the purpose of debugging. The main script follows these steps:

1. import installed packages and supporting modules
2. set up directories and logging
3. sync s3 bucket with stanford's sherlock oak folder
4. read batch report and upload it to reglab's aws database
5. save log file and sync to s3 bucket

This notebook tests each of the code block and validates outputs.

## Prerequisites for replication:
1. must have Sherlock OAK and GROUP_SCRATCH mounted on your local machine, see guide.
2. must have saved OAK and GROUP_SCRATCH as environment variables in your .bash_profile or .zshrc file. For example, 
```
# sherlock directories
export OAK="~/sherlock_oak"
export GROUP_SCRATCH="~/sherlock_group_scratch"
```
3. must have the esnc_risk_notif git repo cloned to our local machine
4. must have set up AWS web service and saved access key id and access key. See [this guide](https://realpython.com/python-boto3-aws-s3/).

In [None]:
# first set working directory as where the 0_email_maker will sit
import os 

os.chdir('..')
os.getcwd()

### Step 1: Import packages

In this step, we are checking whether all the required modules have been installed in the environment. 

In [None]:
# import installed packages
import os
import pandas as pd
import datetime as dt
import logging
from io import StringIO

## for s3 connection
import boto3
import subprocess

# import supporting modules
import configs
from utilities import sql_save

In [None]:
# set parsed arguments
mode = 'test'
run_id = '2021Q4_2021-08-03_170618_610692'

### Step 2: set up directories and logging

In this step, we are configuring directories and logging file. We should expect to see global variables from `configs` read correctly and the logging file prints out relevant lines. 

#### code block

In [None]:
print(configs.HELPER_TEXT_BATCH_REPORT_SYNC)
print("===== Start running batch report sync =====")

# ## get parsed variables
# args = get_args()
# mode = args.mode
# run_id = args.run_id

## get global variables
engine = configs.ENGINE
bucket = configs.BUCKET
s3_project_dir = configs.S3_PROJECT_DIR
oak_project_dir = configs.OAK_PROJECT_DIR

## set directories
s3_run_dir = os.path.join(mode, run_id)
oak_run_dir = os.path.join(oak_project_dir, mode, run_id)
oak_log_dir = os.path.join(oak_run_dir, 'logs')

In [None]:
## configure logging
logger, log_capture_string = configs.configure_logging(logger_name = 'batch_report_sync')
logger.info(configs.HELPER_TEXT_BATCH_REPORT_SYNC)
logger.info("Configured logger")
logger.info(f"Log file to be saved in {oak_log_dir}")
logger.info("----- Parsed variables: mode = {}, run_id = {}".format(mode, run_id))
logger.info("----- S3 bucket: s3_project_dir = {}, s3_run_dir = {}".format(s3_project_dir, s3_run_dir))
logger.info("----- Sherlock OAK folders: oak_run_dir = {}".format(oak_run_dir))

In [None]:
## print out variables and let the user confirm if they are correct and wish to proceed. 
print("----- Parsed variables: mode = {}, run_id = {}".format(mode, run_id))
print("----- S3 bucket: s3_project_dir = {}, s3_run_dir = {}".format(s3_project_dir, s3_run_dir))
print("----- Sherlock OAK folders: oak_run_dir = {}".format(oak_run_dir))

proceed = input('Please verify the above variables. Do you wish to proceed with the run? [y/n]')

#### validate output

In [None]:
print(log_capture_string.getvalue())

### Step 3: sync s3 bucket with stanford's sherlock oak folder

In this step, we sync the S3 bucket to Sherlock Oak project folder. We should expect to retrieve batch reports and whippet sender log file from the oak folder. One quick way to check this would be to compare the whippet_sender log file in the s3 bucket and sherlock folder. They should be the same.

#### code block

In [None]:
logger.info("========= 1/3 Sync s3 bucket with Stanford's Sherlock OAK folder ==========")
subprocess.run(['aws', 's3', 'sync', s3_project_dir, oak_project_dir])

#### validate output

In [None]:
# check the whippet_sender log file in s3 bucket and sherlock folder. they should be the sames
s3_log_dir = os.path.join(mode, run_id, 'logs')
s3_content = bucket.Object(os.path.join(s3_log_dir,'whippet_sender.log')).get()['Body'].read().decode('utf-8')
with open(os.path.join(oak_log_dir, 'whippet_sender.log'), 'r') as file:
    oak_content = file.read()
s3_content == oak_content

### Step 4: upload batch report to reglab's aws database

In this step, we upload batch report to RegLab's AWS database. We should expect to read the table from the database. 

#### code block

In [None]:
logger.info("======== 2/3 read batch report and upload it to reglab's aws database ========")
batch_report = pd.read_csv(os.path.join(oak_run_dir, 'batch_report.csv'))
batch_report['run_id'] = run_id
batch_report['file_timestamp'] = dt.datetime.now()
sql_save.save_batch_report(mode = mode, batch_report = batch_report, engine = engine)

#### validate output

In [None]:
if mode == 'prod':
    with engine.begin() as conn:
        df = pd.read_sql("SELECT * FROM esnc_risk_notif.batch_report", conn)
elif mode == 'test':
    with engine.begin() as conn:
        df = pd.read_sql("SELECT * FROM sandbox.esnc_notif_batch_report", conn)
df.head()

In [None]:
# check if the table from the database is equal to the batch report from the folder
df.equals(batch_report)

### Step 5: save log file and sync to s3 bucket

In this save, we save log file to Sherlock Oak project folder and sync the file to s3 bucket. We should expect the log file in the sherlock folder and s3 bucket to be the same. 

#### code block

In [None]:
logger.info('======= 3/3 save log file and sync to s3 bucket =======')
logger.info(f'Script FINISHED. Log file saved in {oak_log_dir} and synced with S3 bucket.')
with open(os.path.join(oak_log_dir, 'batch_report_sync.log'), 'w') as file:
    file.write(log_capture_string.getvalue())

subprocess.run(['aws', 's3', 'sync', oak_project_dir, s3_project_dir], check=True)

#### validate output

In [None]:
# check the log file in s3 bucket and sherlock folder. they should be the same
file_name = 'batch_report_sync.log'
s3_log_dir = os.path.join(mode, run_id, 'logs')
s3_content = bucket.Object(os.path.join(s3_log_dir, file_name)).get()['Body'].read().decode('utf-8')

with open(os.path.join(oak_log_dir, file_name), 'r') as file:
    oak_content = file.read()

s3_content == oak_content

In [None]:
print(oak_content)

---
End of notebook