# Stanford Pride Database Matching System
System to alleviate member attrition

Authors:

- Saad Saeed [Github](https://github.com/ssaeed85) | [LinkedIn](https://www.linkedin.com/in/saadsaeed85/)
- Zach Rauch [Github](https://github.com/ZachRauch) | [LinkedIn](https://www.linkedin.com/in/zach-rauch/)
- Hanis Zulmuthi [Github](https://github.com/hanis-z) | [LinkedIn](https://www.linkedin.com/in/hanis-zulmuthi/)

- Xiaohua Su [Github](https://github.com/xiaohua-su) | [LinkedIn](https://www.linkedin.com/in/xiaohua-su/)

# Overview

Nonprofit organizations want to be able to bring new members and retain them.It is vital for organizations to keep in touch with its members who are the foundation to their networks through communications about events or news. Without any method of communication, members are
no longer in touch with the organization, and its activities and are considered 'lost'. A common issue that some organizations may have is that the email provided to the organization as the main means of communication may no longer work or gets bounced once the individual graduates from said institution such as colleges, and or bootcamp. Usually, an individual might forget about updating it before they are far away. As such updating the contact method is critical to keep them in the network. Overtime, this 'lost' member issue will get larger and larger for the organization.

The purpose of this project is to help Stanford Pride address such an issue. Stanford Pride currently has ~5000 members in their database. Unfortunately, Stanford Pride has lost contact with a small portion of its member. One way Stanford Pride recognizes that it has lost contact with a member that has not chosen to opt-out of newsletter is that the newsletters was bounced. According to Stanford Pride, their members are not all using the same platform. Some chose to have subscribed to either only emails, others are only on their Facebook, LinkedIn group and a small minority
interacts with Stanford using multiple platform. As such, Stanford Pride hopes to be able to rectify the issue of lost members by
updating the individual's contact information in order to bring/keep them in the network once again.

Our goal for this project is to help Stanford Pride be able to update this information in a more efficient way. We improved the efficiency by using a cosine similar model to provide a list of individuals from the Stanford Pride database with the individual from their Mailchimp database. This way, the chair in-charge of updating their database does not need to look up multiple potential people on their Stanford Database before deciding if they are the same individual. They now have a list of potential matches with information about them to compare against.

From Stanford Pride:
> A nonprofit organization, such as Stanford Pride, strives by attracting and retaining members.
> It is vital for the organization to stay in touch with its members.
> The main means to achieve this is the sending of newsletters via e-mail.
> Members are not likely to keep informed of the organization’s activity on their own. We only stay in their minds by regularly pushing news out to them.
Members do not always subscribe to other sources of information about the organization’s activities.
> For example, Stanford Pride has approximately 4,400 members in its database, out of which about 3,700 currently have valid e-mail addresses.
> Only 1,600 are part of our Facebook group, and 400 in our LinkedIn group.
> Therefore, our monthly e-mail newsletter is our sole means to reach about 2,100 members – almost half of our total membership.


# Running the Notebooks and app

In the GitHub, a copy of the environment use to run this notebook and the fake dataset notebook can be found. We have provided the [windows environment](./Environment_windows.yml) and [MAC environment](./Environment_mac.yml) versions.

# Imports

In [1]:
import pandas as pd
import numpy as np
import random 
import pycountry
np.random.seed = 42
random.seed(42)

pd.set_option('display.max_columns', None)
pd.set_option('display.min_rows', 100)

# Fake Dataset Creation

Due to the sensitive information, we decided to create a fake dataset to be able to work with for this project.

### SAA

In [2]:
def createRandomPhoneNumber_SAA():
    range_start = 10**9
    range_end = 10**10 -1
    num = str(random.randint(range_start,range_end))
    num_str = "{0} {1}-{2}".format(num[:3], num[3:7],num[7:])
    return num_str

In [16]:
def createRandomEmail(fName,lName,domainList=['@yahoo.com','@gmail.com','@stanfordalumni.org','@alumni.stanford.edu']):
    domain = random.choice(domainList)
    fName = fName.strip().lower()
    LName = lName.strip().lower()
    formats = []
    
    #Example: John Doe 123
    formats.append(fName[0]+LName+str(random.randint(10,9999))) #jdoe123
    formats.append(fName+LName+str(random.randint(10,9999))) #johndoe123
    formats.append(fName+LName[0]+str(random.randint(10,9999))) #johnd123
    formats.append(fName[0]+'.'+LName+str(random.randint(10,9999))) #j.doe123
    formats.append(fName+'.'+LName+str(random.randint(10,9999))) #john.doe123
    
    return random.choice(formats)+domain

In [6]:
def createDegreeString_SAA(numPotentialDegrees=3):
    # Creates a short degree string
    # Potential formats: "MS '99, MBA '90", "'88, JD 94", "'89, '93"
    #
    degree_list = ['MS','MA','MBA','MD','PhD','BA','BS','JD','']
    degree=''
    k=0
    while k < np.random.randint(0,numPotentialDegrees+1):
        s = random.choice(degree_list) + "'" + str(np.random.randint(80,99))
        if k == 0:
            degree = s
        else:
            degree = degree +', '+ s
        k+=1
    return degree.strip()   

In [147]:
def  createPrefMailName(fName,lName,degreeStr):
    pre = ['Mr.','Ms.','Mrs.']
    suff = ['Jr.', 'II', 'III','']
    
    if 'phd' in degreeStr.lower():
        pre.extend(['Dr.'])
    
    return f'{random.choice(pre)} {fName} {lName} {random.choices(suff, weights = [1,1,1,30])[0]}'.strip()

In [9]:
city = ['Chicago', 'Boston',  'Madrid', 'Tokyo', 'Seoul', 'London','Beijing','Shanghai','Dubai','*',np.nan,'']
state = ['NY', 'WA', 'TX','CA','NM','*',np.nan,'']
country = ['Japan', 'United States', 'USA', 'China', 'Kuwait','*',np.nan,'']

# state dictionary for some US cities
state = {'Chicago': 'IL', 
           'Boston': 'MA', 
           'New York' : 'NY', 
           'San Francisco': 'CA', 
           'Los Angeles' : 'CA', 
           'Austin' : 'TX',
        'Dallas': 'TX',
        'Denver': 'CO',
        '':'',
        '*':'*'}

In [10]:
common_FNames = [
"Maria",
"Nushi",
"Mohammed",
"Jose",
"Muhammad",
"Mohamed",
"Wei",
"Mohammad",
"Ahmed",
"Yan",
"Ali",
"John",
"David",
"Li",
"Abdul",
"Ana",
"Ying",
"Michael",
"Juan",
"Anna"
]

In [11]:
common_LNames = [
"Wang",
"Li",
"Zhang",
"Chen",
"Liu",
"Devi",
"Yang",
"Huang",
"Singh",
"Wu",
"Kumar",
"Xu",
"Ali",
"Zhao",
"Zhou",
"Nguyen",
"Khan",
"Ma",
"Lu",
"Zhu"
]

In [173]:
# Initializing a dictionary to store fake data. These are the fields we will be creating for
fake_record_data = {
    'pref_mail_name' : [],
    'first_name' : [],
    'last_name' : [],
    'home_phone_number' : [],
    'home_email_address' : [],
    'email_switch' : [],
    'saa_email_address' : [],
    'gsb_email_address' : [],
    'bus_phone_number' : [],    
    'bus_email_address' : [],
    'home_city' : [],
    'home_state' : [],
    'home_country' : [],
    'bus_country' : [],
    'home_state_code' : [],
    'pref_class_year' : [],
    'short_degree_string' : []
}

num_ofDesiredRecords = 1000

for _ in range(0,num_ofDesiredRecords):
    
    curr_first_name = random.choice(common_FNames)
    curr_last_name = random.choice(common_LNames)
    
    # Choose a random first and last name
    fake_record_data['first_name'].append(curr_first_name)
    fake_record_data['last_name'].append(curr_last_name)
    
    # Create a random phone number
    # 1 in 11 odds of a properly formatted phone number
    # 5 in 11 odds of a '*'
    # 5 in 11 odds of a null
    fake_record_data['home_phone_number'].append(random.choices([createRandomPhoneNumber_SAA(),'*',np.nan],
                                                                weights=(1,5,5))[0])
    
    
    fake_record_data['bus_phone_number'].append(random.choices([createRandomPhoneNumber_SAA(),'*',np.nan],
                                                                weights=(1,5,10))[0])

    # Create a random emails for email fields
    # Unless otherwise specified
    # domains: '@yahoo.com','@gmail.com','@stanfordalumni.org','@alumni.stanford.edu'
    
    # equal chance of null or email in email_switch
    fake_record_data['email_switch'].append(random.choice([createRandomEmail(curr_first_name,curr_last_name),
                                                           np.nan]))
    # equal chance of '*',null or email in home_email_address
    fake_record_data['home_email_address'].append(random.choice(['*',
                                                                 np.nan,
                                                                 createRandomEmail(curr_first_name,curr_last_name)]))    
    # equal chance of '*',null or email in saa_email_address
    fake_record_data['saa_email_address'].append(random.choice(['*',
                                                                np.nan,
                                                                createRandomEmail(curr_first_name,curr_last_name,
                                                                                  ['@stanfordalumni.org','@alumni.stanford.edu'])]))
    # 1:50 odds of email:null gsb_email_address
    fake_record_data['gsb_email_address'].append(random.choices([createRandomEmail(curr_first_name,curr_last_name),np.nan],
                                                                 weights=(1, 50))[0])
    # 1:10 odds of email:null bus_email_address
    fake_record_data['bus_email_address'].append(random.choices([createRandomEmail(curr_first_name,curr_last_name),np.nan],
                                                                weights=(1, 10))[0])
    
    # Select a random city, country (bus and home)
    fake_record_data['home_city'].append(random.choice(city))
    fake_record_data['home_country'].append(random.choice(country))
    fake_record_data['bus_country'].append(random.choice(country))
    
    
    # Select a random pref_class_year from 1990 to 2018
    fake_record_data['pref_class_year'].append(random.choice(['',np.nan,random.randint(1990, 2018)]))
    
    # Create a set of degrees for record
    curr_deg_str = createDegreeString_SAA(numPotentialDegrees=3)
    fake_record_data['short_degree_string'].append(curr_deg_str)
    
    fake_record_data['pref_mail_name'].append(createPrefMailName(curr_first_name,curr_last_name,curr_deg_str))
    
    
fake_record_data['home_state_code'] = [state[k] if k in state else '' for k in fake_record_data['home_city'] ]
    
    

In [174]:
# Map to df_SAA
df_SAA = pd.read_excel("../data/SAA Pride member reports headings.xlsx")

df_SAA['pref_mail_name'] = fake_record_data['pref_mail_name'] 
df_SAA['first_name'] = fake_record_data['first_name'] 
df_SAA['last_name'] = fake_record_data['last_name'] 
df_SAA['home_phone_number'] = fake_record_data['home_phone_number'] 
df_SAA['home_email_address'] = fake_record_data['home_email_address'] 
df_SAA['email_switch'] = fake_record_data['email_switch'] 
df_SAA['saa_email_address'] = fake_record_data['saa_email_address'] 
df_SAA['gsb_email_address'] = fake_record_data['gsb_email_address'] 
df_SAA['bus_phone_number'] = fake_record_data['bus_phone_number'] 
df_SAA['bus_email_address'] = fake_record_data['bus_email_address'] 
df_SAA['home_city'] = fake_record_data['home_city'] 
df_SAA['home_country'] = fake_record_data['home_country'] 
df_SAA['bus_country'] = fake_record_data['bus_country'] 
df_SAA['pref_class_year'] = fake_record_data['pref_class_year'] 
df_SAA['short_degree_string'] = fake_record_data['short_degree_string'] 


df_SAA['home_state_code'] = fake_record_data['home_state_code'] 

In [175]:
df_SAA

Unnamed: 0,pref_mail_name,pref_class_year,home_city,home_state_code,home_country,home_phone_area_code,home_phone_number,home_email_address,bus_city,bus_state_code,bus_country,bus_phone_area_code,bus_phone_number,bus_email_address,first_name,last_name,pref_name_sort,email_switch,saa_email_address,gsb_email_address,other_email_address,pref_phone_area_code,pref_phone_number,pref_phone_addr_type,memb_status_desc,short_degree_string,parent_degree_string,short_degree_string_spouse,parent_degree_string_spouse,primary_sort_name,plan_name,primary_ind
0,Ms. David Singh,2000,,,*,,,,,,Kuwait,,*,,David,Singh,,,davidsingh8572@stanfordalumni.org,,,,,,,"'82, MA'90",,,,,,
1,Mrs. Ying Zhou,,Seoul,,Japan,,*,yzhou4490@stanfordalumni.org,,,USA,,,,Ying,Zhou,,,*,,,,,,,"PhD'90, BS'91, MS'81",,,,,,
2,Ms. Mohammed Xu,,Dubai,,*,,*,*,,,Japan,,,,Mohammed,Xu,,mohammedxu6547@alumni.stanford.edu,*,,,,,,,MS'87,,,,,,
3,Mr. Muhammad Zhao,,,,USA,,,,,,USA,,*,,Muhammad,Zhao,,muhammad.zhao4231@alumni.stanford.edu,m.zhao7557@stanfordalumni.org,,,,,,,PhD'96,,,,,,
4,Mr. Juan Wu,2000,,,,,,*,,,*,,,juanw3482@yahoo.com,Juan,Wu,,,,,,,,,,BA'97,,,,,,
5,Ms. Maria Lu,2014,London,,China,,907 1518-225,maria.lu8399@yahoo.com,,,USA,,*,,Maria,Lu,,m.lu5254@alumni.stanford.edu,mlu2580@stanfordalumni.org,,,,,,,"BA'85, MA'92, MA'81",,,,,,
6,Mr. Maria Kumar,,Beijing,,,,*,*,,,*,,,,Maria,Kumar,,,,,,,,,,"BA'93, BA'81",,,,,,
7,Ms. Wei Wu,,Shanghai,,USA,,*,,,,Japan,,873 7408-167,,Wei,Wu,,,weiwu3505@alumni.stanford.edu,,,,,,,"BS'89, PhD'83",,,,,,
8,Ms. Mohammad Yang,1993,,,Kuwait,,*,*,,,*,,,,Mohammad,Yang,,,mohammady4034@stanfordalumni.org,,,,,,,JD'80,,,,,,
9,Mrs. Li Chen,,,,China,,*,*,,,Kuwait,,,,Li,Chen,,lic3732@alumni.stanford.edu,lic6035@alumni.stanford.edu,,,,,,,,,,,,,


In [178]:
df_SAA.to_excel('../data/SAA_FinalDB.xlsx',index = False)

In [None]:
# df_SAA.drop(columns=[0]).to_excel('data/SAA_Pokemon_FakeDB.xlsx',index = False)

In [None]:
# df_SAA.loc[74]

### Mailchimp

Making Mailchimp records that matches up to a degree to some Stanford records.

In [226]:
df_mc = pd.read_csv('../data/MailChimp cleaned records headers.csv')
df_mc

Unnamed: 0,Email Address,First Name,Last Name,Board Member,Gender,Chapter,Reunion Year,Country,Degree,MEMBER_RATING,OPTIN_TIME,OPTIN_IP,CONFIRM_TIME,CONFIRM_IP,LATITUDE,LONGITUDE,GMTOFF,DSTOFF,TIMEZONE,CC,REGION,CLEAN_TIME,CLEAN_CAMPAIGN_TITLE,CLEAN_CAMPAIGN_ID,LEID,EUID,NOTES,TAGS


In [227]:
df_mc = df_mc.append([np.nan]*5).drop(columns=0)

  df_mc = df_mc.append([np.nan]*5).drop(columns=0)


In [228]:
df_SAA.loc[74]

pref_mail_name                                   Mrs. Wei Wu
pref_class_year                                          NaN
home_city                                            Beijing
home_state_code                                             
home_country                                             USA
home_phone_area_code                                     NaN
home_phone_number                                          *
home_email_address             wei.wu9512@stanfordalumni.org
bus_city                                                 NaN
bus_state_code                                           NaN
bus_country                                            Japan
bus_phone_area_code                                      NaN
bus_phone_number                                           *
bus_email_address                                        NaN
first_name                                               Wei
last_name                                                 Wu
pref_name_sort          

In [229]:
# First record. No major difference other than email
# email handle match but not domain
# @stanfordalumni.org to @gmail.com
rec = df_SAA.loc[74]
i = 0
df_mc.loc[i,'First Name'] = rec.first_name
df_mc.loc[i,'Last Name'] = rec.last_name
df_mc.loc[i,'Email Address'] = rec.home_email_address.split('@')[0]+'@gmail.com'
df_mc.loc[i,'Country'] = rec.home_country
df_mc.loc[i,'Degree'] = random.choice([np.nan,''])

df_mc.loc[i,'Board Member'] = random.choice([True,False])
df_mc.loc[i,'Gender'] = random.choice(['F','M',np.nan])
df_mc.loc[i,'Chapter'] = random.choice(['Other US','Texas','Bay Area','DC Area','New England'])

In [230]:
df_SAA.loc[32]

pref_mail_name                           Mrs. Abdul Chen II
pref_class_year                                         NaN
home_city                                            Boston
home_state_code                                          MA
home_country                                          Japan
home_phone_area_code                                    NaN
home_phone_number                                         *
home_email_address             achen3568@stanfordalumni.org
bus_city                                                NaN
bus_state_code                                          NaN
bus_country                                               *
bus_phone_area_code                                     NaN
bus_phone_number                                        NaN
bus_email_address                                       NaN
first_name                                            Abdul
last_name                                              Chen
pref_name_sort                          

In [231]:
# Second record. Missing 1 degree
rec = df_SAA.loc[32]
i=1
df_mc.loc[i,'First Name'] = rec.first_name
df_mc.loc[i,'Last Name'] = rec.last_name
df_mc.loc[i,'Email Address'] = rec.home_email_address             
df_mc.loc[i,'Country'] = 'USA'
df_mc.loc[i,'Degree'] = "JD"

df_mc.loc[i,'Board Member'] = random.choice([True,False])
df_mc.loc[i,'Gender'] = random.choice(['F','M',np.nan])
df_mc.loc[i,'Chapter'] = random.choice(['Other US','Texas','Bay Area','DC Area','New England'])

In [232]:
df_SAA.loc[37]

pref_mail_name                       Mrs. John Lu
pref_class_year                              1999
home_city                                  London
home_state_code                                  
home_country                                  NaN
home_phone_area_code                          NaN
home_phone_number                    708 8040-774
home_email_address                              *
bus_city                                      NaN
bus_state_code                                NaN
bus_country                                   NaN
bus_phone_area_code                           NaN
bus_phone_number                              NaN
bus_email_address                             NaN
first_name                                   John
last_name                                      Lu
pref_name_sort                                NaN
email_switch                   j.lu6087@gmail.com
saa_email_address                               *
gsb_email_address                             NaN


In [233]:
# Third record. Same name and degree
rec = df_SAA.loc[37]
i=2
df_mc.loc[i,'First Name'] = rec.first_name
df_mc.loc[i,'Last Name'] = rec.last_name
df_mc.loc[i,'Email Address'] = createRandomEmail(rec.first_name,rec.last_name)
df_mc.loc[i,'Country'] = 'USA'
df_mc.loc[i,'Degree'] = "MBA"

df_mc.loc[i,'Board Member'] = random.choice([True,False])
df_mc.loc[i,'Gender'] = random.choice(['F','M',np.nan])
df_mc.loc[i,'Chapter'] = random.choice(['Other US','Texas','Bay Area','DC Area','New England'])

In [234]:
df_SAA.loc[9]

pref_mail_name                                Mrs. Li Chen
pref_class_year                                           
home_city                                              NaN
home_state_code                                           
home_country                                         China
home_phone_area_code                                   NaN
home_phone_number                                        *
home_email_address                                       *
bus_city                                               NaN
bus_state_code                                         NaN
bus_country                                         Kuwait
bus_phone_area_code                                    NaN
bus_phone_number                                       NaN
bus_email_address                                      NaN
first_name                                              Li
last_name                                             Chen
pref_name_sort                                         N

In [235]:
# 4th record. Same name. Different email. Has a degree on mail chimp side
rec = df_SAA.loc[9]
i=3
df_mc.loc[i,'First Name'] = rec.first_name
df_mc.loc[i,'Last Name'] = rec.last_name
df_mc.loc[i,'Email Address'] = createRandomEmail(rec.first_name,rec.last_name)
df_mc.loc[i,'Country'] = 'Japan'
df_mc.loc[i,'Degree'] = 'MS'

df_mc.loc[i,'Board Member'] = random.choice([True,False])
df_mc.loc[i,'Gender'] = random.choice(['F','M',np.nan])
df_mc.loc[i,'Chapter'] = random.choice(['Other US','Texas','Bay Area','DC Area','New England'])

In [236]:
df_SAA.loc[100]

pref_mail_name                                     Mr. Ahmed Ali
pref_class_year                                                 
home_city                                                Beijing
home_state_code                                                 
home_country                                                 USA
home_phone_area_code                                         NaN
home_phone_number                                              *
home_email_address                                             *
bus_city                                                     NaN
bus_state_code                                               NaN
bus_country                                                  USA
bus_phone_area_code                                          NaN
bus_phone_number                                             NaN
bus_email_address                 ahmeda1343@alumni.stanford.edu
first_name                                                 Ahmed
last_name                

In [237]:
# 5th record. Missing all degrees. Different email
rec = df_SAA.loc[100]
i=4

df_mc.loc[i,'First Name'] = rec.first_name
df_mc.loc[i,'Last Name'] = rec.last_name
df_mc.loc[i,'Email Address'] = createRandomEmail(rec.first_name,rec.last_name)
df_mc.loc[i,'Country'] = 'United States'
df_mc.loc[i,'Degree'] = random.choice([np.nan,''])

df_mc.loc[i,'Board Member'] = random.choice([True,False])
df_mc.loc[i,'Gender'] = random.choice(['F','M',np.nan])
df_mc.loc[i,'Chapter'] = random.choice(['Other US','Texas','Bay Area','DC Area','New England'])

In [238]:
df_SAA.loc[11]

pref_mail_name                                    Mrs. Juan Wang
pref_class_year                                              NaN
home_city                                                 Boston
home_state_code                                               MA
home_country                                                 USA
home_phone_area_code                                         NaN
home_phone_number                                            NaN
home_email_address                        juanwang5755@yahoo.com
bus_city                                                     NaN
bus_state_code                                               NaN
bus_country                                                     
bus_phone_area_code                                          NaN
bus_phone_number                                             NaN
bus_email_address                                            NaN
first_name                                                  Juan
last_name                

In [239]:
# 6th record. Changed last name. Still has same email handle
rec = df_SAA.loc[11]
i=5
df_mc.loc[i,'First Name'] = rec.first_name
df_mc.loc[i,'Last Name'] = random.choice(common_LNames)
df_mc.loc[i,'Email Address'] = rec.saa_email_address.split('@')[0]+'@gmail.com'
df_mc.loc[i,'Country'] = 'USA'
df_mc.loc[i,'Degree'] = random.choice([np.nan,''])

df_mc.loc[i,'Board Member'] = random.choice([True,False])
df_mc.loc[i,'Gender'] = random.choice(['F','M',np.nan])
df_mc.loc[i,'Chapter'] = random.choice(['Other US','Texas','Bay Area','DC Area','New England'])

In [241]:
# 7th record. mostly empty mail chimp record
random_country = list(pycountry.countries)
random.shuffle(random_country)

rec = df_SAA.loc[600]
i=6
df_mc.loc[i,'First Name'] = rec.first_name
df_mc.loc[i,'Last Name'] = rec.last_name
df_mc.loc[i,'Email Address'] = rec.email_switch.split('@')[0]+'@gmail.com'
df_mc.loc[i,'Country'] = random_country[0].official_name
df_mc.loc[i,'Degree'] = random.choice([np.nan,''])

df_mc.loc[i,'Board Member'] = random.choice([True,False])
df_mc.loc[i,'Gender'] = random.choice(['F','M',np.nan])
df_mc.loc[i,'Chapter'] = random.choice(['Other US','Texas','Bay Area','DC Area','New England'])

In [242]:
df_mc

Unnamed: 0,Email Address,First Name,Last Name,Board Member,Gender,Chapter,Reunion Year,Country,Degree,MEMBER_RATING,OPTIN_TIME,OPTIN_IP,CONFIRM_TIME,CONFIRM_IP,LATITUDE,LONGITUDE,GMTOFF,DSTOFF,TIMEZONE,CC,REGION,CLEAN_TIME,CLEAN_CAMPAIGN_TITLE,CLEAN_CAMPAIGN_ID,LEID,EUID,NOTES,TAGS
0,wei.wu9512@gmail.com,Wei,Wu,False,F,Bay Area,,USA,,,,,,,,,,,,,,,,,,,,
1,achen3568@stanfordalumni.org,Abdul,Chen,False,M,Texas,,USA,JD,,,,,,,,,,,,,,,,,,,
2,johnlu6589@gmail.com,John,Lu,True,F,DC Area,,USA,MBA,,,,,,,,,,,,,,,,,,,
3,lic8759@yahoo.com,Li,Chen,False,M,Other US,,Japan,MS,,,,,,,,,,,,,,,,,,,
4,aali9872@yahoo.com,Ahmed,Ali,True,M,Other US,,United States,,,,,,,,,,,,,,,,,,,,
5,juan.wang5585@gmail.com,Juan,Zhu,False,F,New England,,USA,,,,,,,,,,,,,,,,,,,,
6,a.wu8158@gmail.com,Abdul,Wu,True,F,Other US,,Arab Republic of Egypt,,,,,,,,,,,,,,,,,,,,


In [243]:
df_mc.to_csv('../data/MailChimp_Final.csv',index=False)