<a href="https://colab.research.google.com/github/jonasmzsouza/colab-notebooks/blob/main/filtering-suppressed-contact-emails.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Filtering - Contact Lists vs. Suppression Lists**

This script has the purpose of automating and optimizing contact databases (csv's), deleting email contacts that were suppressed by the email communication platform (sendgrid or others). In this way it improves statistics and generates savings in contact storage.


#### **Author: Jonas Souza**

## Creating folders in the runtime environment.
The following cells perform the importing of libraries and the creation of folders in the runtime environment.
To check if the creation was successful, click on the folder icon in the side menu and then on refresh.

In [None]:
# importing as libraries
import os
import pandas as pd
import shutil

### folder names and paths

In [None]:
# folder names
contacts = "contact_lists"
suppressions = "suppression_lists"
filtering = "filtering_lists"

# paths
path_root = "./"
path_contacts = path_root + contacts
path_suppressions = path_root + suppressions
path_filtering = path_root + filtering

folders = [path_suppressions, path_contacts, path_filtering]

### Creating folders

In [None]:
# creating folders
for folder in folders:
  if os.path.exists(folder) == False:
    os.mkdir(folder)

### **File uploads**
### **WARNING!!!**
Before running the cells below, access in the side menu the folder icon "Files" to expand the menu.

*   Inside the **"suppression_lists"** folder, upload the csv files downloaded in Sendgrid's Suppressions, such as:
  *   "Global Unsubscribes";
  *   "Bounces";
  *   "Spam Reports";
  *   "Blocks";
  *   "Invalid";
  *   and "Unsubscribe Groups"
*   Inside the **"contact_lists"** folder upload all the files related to the contact list backups.

## Filtering of contact lists

### Function that filters the database, comparing files contacts_lists x suppressions_lists

In [None]:
def filter_database(data, data_suppr, file_name):
  # reading and Transforming the contacts file into a DataFrame with Pandas
  db_ctc = pd.read_csv(data)
  db_ctc = db_ctc[db_ctc.columns.intersection(['email', 'nome'])]

  # reading and Transforming the supression file into a DataFrame with Pandas
  db_suppression = pd.read_csv(data_suppr)
  db_suppression = db_suppression[db_suppression.columns.intersection(['email', 'nome'])]

  # merge the files into a new file with data where the same "email" is present in both files
  same_email_existing_in_the_files = db_suppression.merge(db_ctc, on='email')

  # concatenates the contacts file with the new file that was generated after the merge
  data_concat = pd.concat([db_ctc, same_email_existing_in_the_files ])

  # removes the duplicates generating a new database without the emails from the supressions file
  data_without_suppressions = data_concat.drop_duplicates(subset='email', keep=False)

  # converting the filtering to CSV and exporting to the runtime environment
  data_name = path_filtering + "/" + file_name
  data_without_suppressions.to_csv(data_name, index = False, encoding ='utf8') 

### Loop repetition files of contacts_lists and suppressions_lists

In [None]:
# repeating loop to go through the files in the contact_lists folder
for file_ctc in os.listdir(path_contacts):

  # condition to check if it is a csv file
  if file_ctc.endswith(".csv"):

    # loop to go through the files in the suppression_lists folder
    for file_suppr in os.listdir(path_suppressions):

      # condition to check if it is a csv file
      if file_suppr.endswith(".csv"):

        # paths
        data_filter = path_filtering + "/" + file_ctc
        data_ctc = path_contacts + "/" + file_ctc
        data_suppr = path_suppressions + "/" + file_suppr

        # condition to check if file_ctc has already been filtered. 
        # If not, it means it is the first time of the loop (filtering) file_ctc x file_suppr
        data = data_filter if os.path.exists(data_filter) else data_ctc
        
        # instance the filter function
        filter_database(data, data_suppr, file_ctc)  

### Generate zip of the folder containing the filtered files

In [None]:
# generating a zip file from the filtering_lists folder
shutil.make_archive(filtering,'zip', path_filtering)