# Connecting to the Gmail Server And Obtaining Inbox Index Information

Step 1: Importing the email connection, date/time, collections, and csv modules

    - email = library for managing email messages
    - imaplib = defines three classes, IMAP4, IMAP4_SSL and IMAP4_stream, which encapsulate a connection to an IMAP4 server
    - datetime = module supplies classes for manipulating dates and times in both simple and complex ways
    - collections = This module implements specialized container datatype (particularly the counter object)
    - csv = the most common import and export format for spreadsheets and databases

In [None]:
import imaplib
import email
from datetime import datetime, timedelta
from collections import Counter
import csv

Step 2: Defining basic variabless for the domain, username, password, server, and port

The domain is statically specified as "gmail.com." The FROM_EMAIL and FROM_PWD variables capture the user's Gmail username and password. The SMTP_SERVER and SMTP_PORT variables specify the server the program is ping and at what address (or port).

In [None]:
ORG_EMAIL   = "@gmail.com"
FROM_EMAIL  = str(input("What is your Gmail username?")) + ORG_EMAIL
FROM_PWD    = str(input("What is your Gmail password?"))
SMTP_SERVER = "imap.gmail.com"
SMTP_PORT   = 993

Step 3: Connecting to server, logging in, and selecting the "inbox"

The variable mail is created to act as an email connection object in order to leverage the varying methods and functions from the email module. The methods point the object to the previously defined server and to the "inbox" folder.

In [None]:
mail = imaplib.IMAP4_SSL(SMTP_SERVER)
mail.login(FROM_EMAIL,FROM_PWD)
mail.select('inbox')

Step 4: Defining variables for subcomponents in the inbox folder of the "mail" object

Once the connection object has been pointed to the inbox, two new variables are defined to capture information about index of emails in the inbox folder. The data variable contains a list of byte type objects that must be split in order parse the index number (ID) of the emails. 

In [None]:
type, data = mail.search(None, 'ALL')
mail_ids = data[0]
id_list = mail_ids.split()
first_email_id = int(id_list[0])
latest_email_id = int(id_list[-1])

# Analyzing Inbox Index Information 

Step 1: Defining variables for the number of emails and days, list of senders, and csv row population

Two variables are created to obtain input from the user is leveraged to capture the number of latest emails to be analyzed and how many days to go back. Additional variables are created as empty lists to store lists of email senders and the creation of a row of information to be written to the csv file. A final variable is used to process a time delta from the current time to the days entered by the user.

In [None]:
num_of_emails = int(input("How many of the latest emails should be analyzed?"))
num_of_days = int(input("How many days back should be analyzed?"))
List_of_Senders = []
allrows = []
date_N_days_ago = datetime.now(tz=None) - timedelta(days=num_of_days)

Step 2: Creating a "for" loop to iterate over emails according to number and days to be analyzed

A for loop is used to iterate over the emails in a range created from the delta of the latest email ID and the number to be analyzed provided by the user. The email connection object fetch method is then used to retreive each email based on its ID iterable in the for loop statement and it is then decoded using utf-8 codec and converted to an email object from a string. 

Decision structures evaluate the date formatting and strip it accordingly removing the UTC offset and/or timezone information and covert it to a date/time data type. An additional decision structure then evaluates the date of the email against the timeframe calculated in the timedelta variable (date_N_days_ago) and then proceeds to append the sender information to the previously created empty list and creates a row list during each loop pass that captures Date, Sender, and Subject. Each row is then appended to the previously created row list to support creating the csv file. 

In [None]:
for i in range((latest_email_id-num_of_emails),latest_email_id):
    result, mail_data = mail.fetch(id_list[i],'(RFC822)')
    raw_email = mail_data[0][1].decode("utf-8")
    email_message = email.message_from_string(raw_email)
    if email_message['Date'][0:3] in ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]:
        Email_Date = email_message['Date'][0:25].strip()
        Formatted_Date= datetime.strptime(Email_Date, '%a, %d %b %Y %X')
    else:
        Email_Date = email_message['Date'][0:20].strip()
        Formatted_Date= datetime.strptime(Email_Date, '%d %b %Y %X')
    if Formatted_Date >= date_N_days_ago:
        row = []
        Date = Formatted_Date
        Sender = email_message['From']
        Subject = email_message['Subject']
        row.append(Date)
        row.append(Sender)
        row.append(Subject)
        allrows.append(row)
        List_of_Senders.append(email_message['From'])

# Retruning Analysis Results in Summary Display and via a CSV File

Step 1: Counting senders and displaying summary results

The counter function was used to count the list of senders created in the previous for loop iterations and the top ten were stored in a counted variable. Print statements are used to titele the summary, provide a total number of emails in the inbox and the analysis timeframe using the previous variables, and introduce the top ten sender list. To print the sender list a for loop was created using the length of the sender list and the string replace method to clean the sender of commas parenthesis and quotes.

In [None]:
counted = Counter(List_of_Senders)
Top10Senders = counted.most_common(10)

print("\n","\n","Analysis Results:","\n")
print("The total number of emails in your inbox is:",latest_email_id,"\n")
print("The timeframe specified is:", date_N_days_ago,"-",datetime.now(tz=None),"\n")
print("The number of emails analyzed within the specified timeframe is:",len(List_of_Senders),"\n")
print("The top ten senders followed by volumes of emails sent are as follows:","\n")

for i in range(len(Top10Senders)):
    print(i+1,". ",Top10Senders[i][0].replace("(","").replace("'","").replace(")","").replace("\"","")," : ",Top10Senders[i][1], sep='')

Step 2: Creating a CSV file with date, sender, and subject details

A list variable is created to act as the first row of headers in the csv structure. The with statement is used to create a csv file to run subsequent csvout statement populating each of the following row structures with the lists previously created in the analysis iteration loop.

In [None]:
headers = ['Date', 'Sender', 'Subject']
with open('EmailAnalysisResults.csv', 'w', newline='') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(headers)
    csvout.writerows(allrows)