# Line-by-line Test of the Whippet Sender

This is a notebook that tests the main script 1_whippet_sender line by line with outputs for the purpose of debugging. The main script follows these steps:

1. import installed packages and supporting modules
2. set up directories and logging
3. read email files from s3 bucket
4. send email from whippet 
5. generate batch report 
6. save batch report to s3 bucket 
7. save logging file

This notebook tests each of the code block and validates outputs.

## Prerequisites for replication:
1. must have Sherlock OAK and GROUP_SCRATCH mounted on your local machine, see guide.
2. must have saved OAK and GROUP_SCRATCH as environment variables in your .bash_profile or .zshrc file. For example, 
```
# sherlock directories
export OAK="~/sherlock_oak"
export GROUP_SCRATCH="~/sherlock_group_scratch"
```
3. must have the esnc_risk_notif git repo cloned to our local machine
4. must have access to reglab's testing gmail account (`reglabtest@gmail.com`) and have saved REGLAB_TEST_GMAIL_ADDR and REGLAB_TEST_GMAIL_PWD as environment variables in your local `.bash_profile` or `.zshrc` file. Reach out to Nicole Lin (nlin@law.stanford.edu) for access.
5. must have set up AWS web service and saved access key id and access key. See [this guide](https://realpython.com/python-boto3-aws-s3/).


In [None]:
# first set working directory as where the 0_email_maker will sit
import os 

os.chdir('..')
os.getcwd()

### Step 1: import packages

In this step, we are checking whether all the required modules have been installed in the environment. 

In [None]:
# import installed packages
import os
import pandas as pd
import datetime as dt
import logging

## for s3 connection
import boto3
import subprocess

## for emailer
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import smtplib, ssl

# import supporting modules
import configs
from utilities import json_functions

In [None]:
# set parsed arguments
mode = 'test'
run_id = '2021Q4_2021-08-03_170618_610692'
system = 'sherlock'

### Step 2: set up directories and logging

In this step, we are configuring directories and logging file. We should expect to see global variables from `configs` read correctly and the logging file prints out relevant lines. 

#### code block

In [None]:
print(configs.HELPER_TEXT_WHIPPET_SENDER)
print("===== Start running whippet sender =====")

# ## get parsed variables
# args = get_args()
# mode = args.mode
# run_id = args.run_id
# system = args.system
# assert mode in ['test', 'prod'], 'Expect mode to be in test or prod. Aborting.'
# assert system in ['sherlock', 'whippet'], 'Expect system to be in sherlock or whippet. Aborting.'

## get global variables
bucket = configs.BUCKET
s3_project_dir = configs.S3_PROJECT_DIR
prod_from_addr = configs.PROD_FROM_ADDR
test_from_addr = configs.TEST_FROM_ADDR
test_to_addr = configs.TEST_TO_ADDR
test_addr_pwd = configs.TEST_ADDR_PWD
test_bcc_addr = configs.TEST_BCC_ADDR  

## set directories based on mode and run_id 
s3_run_dir = os.path.join(mode, run_id)
s3_emails_dir = os.path.join(s3_run_dir, 'emails')
s3_log_dir = os.path.join(s3_run_dir, 'logs')

In [None]:
## configure logging
logger, log_capture_string = configs.configure_logging(logger_name = 'whippet_sender')
logger.info(configs.HELPER_TEXT_WHIPPET_SENDER)
logger.info("Configured logger")
logger.info("----- Parsed variables: mode = {}, run_id = {}".format(mode, run_id))
logger.info("----- S3 bucket: s3_project_dir = {}, s3_run_dir = {}".format(s3_project_dir, s3_run_dir))
logger.info("----- From email address: {}".format(test_from_addr if mode == 'test' else prod_from_addr))
logger.info("----- To email address: {}".format(test_to_addr if mode == 'test' else 'facility addresses'))

In [None]:
## print out variables and let the user confirm if they are correct and wish to proceed. 
print("----- Parsed variables: mode = {}, run_id = {}".format(mode, run_id))
print("----- S3 bucket: s3_project_dir = {}, s3_run_dir = {}".format(s3_project_dir, s3_run_dir))
print("----- From email address: {}".format(test_from_addr if mode == 'test' else prod_from_addr))
print("----- To email address: {}".format(test_to_addr if mode == 'test' else 'facility addresses'))

proceed = input('Please verify the above variables. Do you wish to proceed with the run? [y/n]')

#### validate outputs

In [None]:
print(log_capture_string.getvalue())

### Step 3: read email files from s3 bucket

In this step, we read emails as json files from s3 bucket. We should expect to see each email file as a dictionary when it is read into the program. 

#### code block

In [None]:
logger.info('====== 1/7 Reading email files from s3 bucket =======')
email_dicts = json_functions.read_emails_from_json(s3_emails_dir, s3=True, bucket=bucket)

#### validate outputs

In [None]:
email_dicts[0]

### Step 4: send email from whippet

In this step, we are sending emails from whippet. In test mode, we are sending out emails from Sherlock with a test email. We should expect to see sample emails sent to `reglabtest@gmail.com` and `nlin@law.stanford.edu`.

#### code block

In [None]:
logger.info('====== 2/7 Generating email objects with dictionaries =======')
notif_permits = [e['npdes_permit_id'] for e in email_dicts]
logger.info(f'for the following permit ids (totalling {len(notif_permits)} permits): {notif_permits}')

In [None]:
# compile emails
for email in email_dicts:
    logger.info(f"Compiling email for {email['npdes_permit_id']}")
    msg = MIMEMultipart('alternative')
    msg['Subject'] = email['subject']
    msg.attach(MIMEText(email['header'] + email['body'], 'html'))
    email['whippet_sender_mode'] = mode
    email['sender_system'] = system

    # send emails
    logger.info(f'Sending email to test email address {test_to_addr}')
    if mode == 'test':
        msg['To'] = test_to_addr
        msg['BCC'] = test_bcc_addr

        # sending out from sherlock: using a test gmail account
        if system == 'sherlock': 
            msg['From'] = test_from_addr
            password = test_addr_pwd
            port = 465  # For SSL
            smtp_server = "smtp.gmail.com"
            context = ssl.create_default_context()
            with smtplib.SMTP_SSL(smtp_server, port, context=context) as server:
                server.login(msg['From'], password)
                server.send_message(msg)

        # sending out from whippet: using epa's production email 
        ## to be tested on whippet
        if system == 'whippet':
            msg['From'] = prod_from_addr
            with smtplib.SMTP('localhost', port=25) as server: 
                server.send_message(msg)

    elif mode == 'prod':
        msg['From'] = prod_from_addr
        msg['To'] = email['to_addrs']
        msg['BCC'] = email['bcc_addrs']

        with smtplib.SMTP('localhost', port=25) as server: 
            server.send_message(msg)

    email['email_sent_timestamp'] = dt.datetime.now()
    email['email_sent_from_addr'] = msg['From']
    email['email_sent_to_addr'] = msg['To']
    email['bcc_addrs'] = msg['BCC']

#### validate outputs

emails sent to nlin@law.stanford.edu and reglabtest@gmail.com

### Step 5: generate batch report

In this step, we generate the batch report from sending the emails. We should expect to see if the dataframe reads correctly. 

#### code block

In [None]:
logger.info('====== 4/7 Generating batch report as dataframe =======')
cols = ['npdes_permit_id', 
        'fiscal_quarter', 
        'to_addrs', 
        'bcc_addrs',
        'whippet_sender_mode',
        'system',
        'email_sent_timestamp',
        'email_sent_from_addr',
        'email_sent_to_addr',
        'email_template',
        'subject',
        'header',
        'body'
       ]
batch_report = pd.DataFrame(email_dicts)[cols]

#### validate outputs

In [None]:
batch_report.head()

### Step 6: save batch report to s3 bucket

In this step, we save the batch repot to s3 bucket as csv files. Note that KY has requested the batch report sent to their state representatives each quarter, so we generate a batch report specific to KY and save it as a separate file. We should expect to retrieve the batch reports saved to the s3 bucket. 

#### code block

In [None]:
logger.info('======= 5/7 Saving batch report to s3 bucket ========')
batch_report.to_csv(os.path.join(s3_project_dir, s3_run_dir,'batch_report.csv'), index = False)

logger.info('======= 6/7 Subset and save KY batch report to s3 bucket ======')
ky_batch_report = batch_report[batch_report.npdes_permit_id.str.startswith('KY')]
ky_batch_report.to_csv(os.path.join(s3_project_dir, s3_run_dir, 'batch_report_ky.csv'))

#### validate outputs

In [None]:
from io import StringIO
content = bucket.Object(os.path.join(s3_run_dir,'batch_report.csv')).get()['Body'].read().decode('utf-8')
df = pd.read_csv(StringIO(content))
df.head()

In [None]:
# check whether the email body is eligible 
from IPython.core.display import display, HTML
display(HTML(df.body[0]))

### Step 7:  save logging file to s3 bucket

In this step, we save the logging file to s3 bucket. We should expect to retrieve the file from the bucket and print it out. 

#### code block

In [None]:
logger.info('========= 7/7 Saving logging file to s3 bucket =========')
logger.info(f'Script FINISHED. Log file saved in S3 bucket {s3_log_dir}. Note: not yet synced with Sherlock oak folder.')
logger_obj = bucket.Object(os.path.join(s3_log_dir, 'whippet_sender.log'))
logger_obj.put(Body=log_capture_string.getvalue())

print("===== Finish running whippet sender =====")

#### validate output

In [None]:
s3_content = bucket.Object(os.path.join(s3_log_dir,'whippet_sender.log')).get()['Body'].read().decode('utf-8')
print(s3_content)

---
End of notebook