# SAT Bootcamp Analysis 2
### Goal:
Evenly divide new Ckodon students into random groups of 30.
Assign group labels from I to Z.

### Steps
1. Import Google Form data into Pandas DataFrame.
2. Clean the data:
    A. Clean leading and trailing white space from all string columns.
    B. Search for and print out duplicate rows.
    C. Search for and print out invalid email addresses.
3. Group the students:
    A. Define function for randomly assigning groups.
    B. Create \`SAT Group\` column and assign groups in column.
    C. Sort students by group.
4. Export resulting DataFrame to Excel Workbook.

Future
1. Replace group naming with dictionary comprehension
2. Remove duplicates after white space extraction
3. Move code for sending email to different module.
4. Remove hardcoded WhatsApp links plus app password.

## Import Google Form data

In [None]:
# import required modules
import pandas as pd
from random import randint

# import students
students_bf = pd.read_excel("../data/Ckodon Activity Review Form - Before (Responses).xlsx")
students_ng = pd.read_excel("../data/Ckodon Activity Review Form - No Group (Responses).xlsx")


# test import success
print(students_bf.head())
print(students_ng.head())

## Clean the data

### Remove leading and trailing white space from all string columns.
Remove leading and trailing whitespace from the `Full Name` and `Email` column.


In [2]:
students_bf["Full Name"] = students_bf["Full Name"].apply(str.strip)
students_bf["Email"] = students_bf["Email"].apply(str.strip)
students_ng["Full Name"] = students_ng["Full Name"].apply(str.strip)
students_ng["Email"] = students_ng["Email"].apply(str.strip)

### Search and print out duplicate rows
Identify all rows with the same Full Name and Email or Full Name and WhatsApp Number.
Duplicate rows will be manually inspected in MS Excel and removed from the data set after identification.


In [3]:
# define function for finding duplicates
def print_duplicates(dataframe):
    """Prints all duplicate rows in `dataframe`.

    Parameters
    ==========
    dataframe_ : pandas.core.frame.DataFrame
        A pandas DataFrame containing each student's information.

    Returns
    =======
    None
    """
    # create copy of `dataframe`
    dataframe_cp = dataframe.copy()

    # extract relevant columns as lists for easy searching
    dataframe_cp["Full Name"] = dataframe_cp["Full Name"].apply(str.lower)  # remove casing
    dataframe_cp["Email"] = dataframe_cp["Full Name"].apply(str.lower)  # remove casing
    names = [name.lower() for name in dataframe_cp["Full Name"]]
    emails = [email.lower() for email in dataframe_cp["Email"]]
    whatsapp_nos = list(dataframe_cp["WhatsApp Number"])

    for row in dataframe_cp.index:
        name = dataframe_cp.loc[row]["Full Name"]
        email = dataframe_cp.loc[row]["Email"]
        whatsapp_no = dataframe_cp.loc[row]["WhatsApp Number"]
        row_excel = row + 2  # row number in MS Excel

        if names.count(name) > 1:
            if emails.count(email) > 1 or whatsapp_nos.count(whatsapp_no) > 1:
                no_of_duplicates = names.count(name)
                print(
                    f"DUPLICATE FOUND ({no_of_duplicates}): {row_excel} {dataframe['Full Name'][row]} {whatsapp_no} {dataframe['Email'][row]}")

In [None]:
# search duplicate values from `students_bf`
print("----------Searching-STUDENTS_BF--------------")
print_duplicates(students_bf)

# search duplicate values from `students_ng`
print("\n\n----------Searching-STUDENTS_NG--------------")
print_duplicates(students_ng)

In [None]:
# merge and search duplicates across sheets (after removing duplicates manually)
students = pd.concat([students_bf, students_ng], ignore_index=True)

print("----------Searching-STUDENTS_BF-and-STUDENTS_NG--------------")
print_duplicates(students)

### Search and print out invalid email addresses
Crucial as each student's group will be sent to them via email.

In [6]:
# define function for validating email addresses
from re import search


def validate_address(email):
    """Return True if `email` is valid or False otherwise.

    Parameters
    ==========
    email: str
        The email address to be validated.

    Returns
    =======
    bool
        True means email address is valid. False means email address is invalid.
    """

    pattern = "[a-zA-Z0-9\-.+_]+@[a-zA-Z0-9\-.+_]+\.[a-zA-Z]{2,}"
    match = search(pattern, email)
    return bool(match)

In [None]:
# call function on all addresses in email column

# students_bf
print("-------Invalid addresses in `students_bf`--------")
for row in students_bf.index:
    email = students_bf.loc[row]["Email"]
    if not validate_address(email):
        print(students_bf.loc[row])

# students_ng
print("-------Invalid addresses in `students_ng`--------")
for row in students_ng.index:
    email = students_ng.loc[row]["Email"]
    if not validate_address(email):
        print(students_ng.loc[row])

## Group the students

### Define function for randomly assigning groups

In [12]:
# recursive random number generator without repetition
previous = []


def generate(a, b):
    """Returns pseudorandom integer between a and b if not already contained in a global list with name `previous`.

    Parameters
    ==========
    a : int
        Lower bound of the range within which generated number may lie (inclusive).
    b : int
        Upper bound of the range within which generated number may lie (inclusive).

    Returns
    =======
    int
        A pseudorandom number between a and b inclusive whose value is not already contained in `previous` list.
    """
    number = randint(a, b)
    if number not in previous:
        previous.append(number)
        return number
    else:
        return generate(a, b)

### Create \`SAT Group\` column and assign groups in column

In [9]:
# Create SAT Group column
students_bf["SAT Group"] = None
students_ng["SAT Group"] = None

In [None]:
# Assign groups for `students_bf`
no_of_groups_a = len(students_bf) // 30  # no of groups for students_bf
groups = [chr(i + 73) for i in range(no_of_groups_a)]

# for the no. of students there are, pick a random student and assign them a group.
for i in range(len(students_bf)):
    student = generate(0, len(students_bf) - 1)
    group = groups[i % len(groups)]
    students_bf["SAT Group"][student] = group

print("---------No-of-Students-per-Group--------")
for group in groups:
    print(group, list(students_bf["SAT Group"]).count(group))

In [None]:
# Assign groups for `students_ng`
previous = []  # reset `previous` list for random number generator
students_ng["SAT Group"] = None  # create new column
no_of_groups_b = len(students_ng) // 30  # no of groups for students_bf
groups = [chr(i + 73 + no_of_groups_a) for i in range(no_of_groups_b)]

# for the no. of students there are, pick a random student and assign them a group.
for i in range(len(students_ng)):
    student = generate(0, len(students_ng) - 1)
    group = groups[i % len(groups)]
    students_ng["SAT Group"][student] = group

print("---------No-of-Students-per-Group--------")
for group in groups:
    print(group, list(students_ng["SAT Group"]).count(group))

### Sort students by group

In [None]:
students_bf.sort_values("SAT Group", inplace=True)
students_ng.sort_values("SAT Group", inplace=True)

print(students_bf)
print(students_ng)

## Export to Excel Workbook

In [13]:
students_bf.to_excel("../data/output/Ckodon Activity Review Groups(BEFORE).xlsx", index=False)
students_ng.to_excel("../data/output/Ckodon Activity Review Groups(NO GROUP).xlsx", index=False)

# Email Group Assignments to Students
Email appropriate group and WhatsApp group link to each student.

In [None]:
from smtplib import SMTP
from email.message import EmailMessage

# set WhatsApp links for each group
whatsapp_link_bf = {
                    "I": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "J": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "K": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "L": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "M": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
                    }

whatsapp_link_ng = {
                    "N": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "O": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "P": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "Q": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "R": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    "S": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
                    }

# instantiate SMTP client
server = "smtp.gmail.com"
port = "587"
username = "ckodontech@gmail.com"
password = input("Enter the App password you received from Ckodon:")

smtp = SMTP(server, port)
smtp.ehlo()
smtp.starttls()
smtp.login(username, password)

In [None]:
from time import sleep
emails_sent = 0  # no. of emails sent so far
delay_s = 60  # delay in seconds

count = 0
# iterate over all students, send emails.
for row in students_bf.index:
    student_bf = students_bf.loc[row]
    msg = EmailMessage()
    msg["From"] = "ckodontech@gmail.com"
    msg["To"] = student_bf["Email"]
    msg["Subject"] = "Your Ckodon Activity Review Group"
    body = f"""Dear {student_bf["Full Name"].title()},

You have been assigned to Ckodon Activity Review Group {student_bf["SAT Group"]}.

You may use the link below to join your assigned group on WhatsApp:
{whatsapp_link_bf[student_bf["SAT Group"]]}

Join the group as soon as possible so that we can begin with the review. This link is meant to be used by you alone. Do not share with any other person.

Please do not reply to this email.

Best,
The Ckodon Foundation Team."""
    msg.set_content(body)

    # send email message
    try:
        smtp.send_message(msg)
        print("SENT")
        emails_sent += 1

        # wait 30s every 60 emails to avoid Gmail SMTP Error 421
        if emails_sent == 60:
            emails_sent = 0
            sleep(delay_s)
    except Exception as exception:
        print("Exception:", exception)
        print("Sending terminated at row number:", row)


In [None]:
emails_sent = 0  # no. of emails sent so far
delay_s = 60  # delay in seconds

# iterate over all students, send emails.
for row in students_ng.index:
    student_ng = students_ng.loc[row]
    msg = EmailMessage()
    msg["From"] = "ckodontech@gmail.com"
    msg["To"] = student_ng["Email"]
    msg["Subject"] = "Your Ckodon Activity Review Group"
    body = f"""Dear {student_ng["Full Name"].title()},

You have been assigned to Ckodon Activity Review Group {student_ng["SAT Group"]}.

You may use the link below to join your assigned group on WhatsApp:
{whatsapp_link_ng[student_ng["SAT Group"]]}

Join the group as soon as possible so that we can begin with the review. This link is meant to be used by you alone. Do not share with any other person.

Please do not reply to this email.

Best,
The Ckodon Foundation Team."""
    msg.set_content(body)

    # send email message
    try:
        smtp.send_message(msg)
        print("SENT")
        emails_sent += 1

        # wait 30s every 60 emails to avoid Gmail SMTP Error 421
        if emails_sent == 60:
            emails_sent = 0
            sleep(delay_s)
    except Exception as exception:
        print("Exception:", exception)
        print("Sending terminated at row number:", row)
