# So Your Email Has Been Stolen...

We hear a lot about data breaches, about a company exposing our email addresses or IP addresses. We heave a sigh of relief that it was "just" email addresses or IP addresses. Our passwords and financial data wasn't in danger. This project starts off with a very basic form of OSINT: what kind of information can attackers find just starting off with the humble email address?

In the previous notebook, I obtained a list of stolen email addresses, IP addresses, and their geographic locations (latitude and longitude). In this notebook, I find out using the Email Reputation API what kind of social media footprint this user has.


In [15]:
#import all here

import numpy as np
import pandas as pd
import requests
import time

In [16]:
#read in master file
df = pd.read_csv("maintable.csv")
df.head(5)

Unnamed: 0,email,ip,latitude,longitude
0,abadisagustin@gmail.com,181.168.120.106,-38.949261,-68.059479
1,Carloselmandril@gmail.com,186.48.125.117,-34.883999,-56.162998
2,cacacasdfa@hotmail.com,186.130.6.73,-34.504719,-58.67952
3,ignaciomortega95@gmail.com,186.18.108.185,-34.687401,-58.563301
4,yerkhomolina1990@gmail.com,181.74.174.95,-33.465,-70.655998


In [17]:
# it is much easier to work with list of dictionaries than dataframes
rows = df.to_dict('records')
rows

#for testing purposes, just use 5 rows
#rows = df.head(5).to_dict('records')
#rows

[{'email': 'abadisagustin@gmail.com',
  'ip': '181.168.120.106',
  'latitude': -38.94926071166992,
  'longitude': -68.05947875976561},
 {'email': 'Carloselmandril@gmail.com',
  'ip': '186.48.125.117',
  'latitude': -34.88399887084961,
  'longitude': -56.16299819946289},
 {'email': 'cacacasdfa@hotmail.com',
  'ip': '186.130.6.73',
  'latitude': -34.50471878051758,
  'longitude': -58.67951965332031},
 {'email': 'ignaciomortega95@gmail.com',
  'ip': '186.18.108.185',
  'latitude': -34.68740081787109,
  'longitude': -58.56330108642578},
 {'email': 'yerkhomolina1990@gmail.com',
  'ip': '181.74.174.95',
  'latitude': -33.46500015258789,
  'longitude': -70.65599822998047},
 {'email': 'sebas12341992@hotmail.com',
  'ip': '181.49.73.230',
  'latitude': 4.599999904632568,
  'longitude': -74.08332824707031},
 {'email': 'troll@hotmail.com',
  'ip': '190.18.78.110',
  'latitude': -34.66109848022461,
  'longitude': -58.36700057983398},
 {'email': 'j.angulo.valdivia@gmail.com',
  'ip': '190.209.55.16

## Combining with Reputation Data

[Emailrep.io](https://emailrep.io) scrapes the Internet for information about email addresses. The [API](https://github.com/sublime-security/emailrep.io/blob/master/README.md) that takes a given email address and returns information about whether it is a spam/malicious email address, as well as its overall reputation. I am interested in the social media lookup, where all social media accounts associated with the email address are returned. I created a reputation table that lists all social media accounts per email address.

List of DataFrames:

* df = main table
* rep = contains data from Email Reputation API; 
* prof = contains data from Email Reputation API, different format.

The common field across tables is the email address

In [18]:
# call the API for Email Reputation

def call_emailrep(address):
    header = {'User-agent':'github.com/fr48 Columbia University Graduate School of Journalism'}
    time.sleep(2)
    response = requests.get(f'https://emailrep.io/{address}', headers = header)
    rep = response.json()
    return rep

I decided to keep the for loop to create the different tables in one place to minimize the number of times I hit the API since the API just locks up at 150 queries per IP address.

In [19]:
# logic to create the email reputation table and email profiles table
rep =[]
prof =[]
for row in rows:
    info = {}
    sites = {}
    try:
        records = call_emailrep(row['email'])
        print(records['email'])
        info = {
            'email' : records['email']
        }
        for record in records['details']['profiles']:
            info.update({record:'Y'})
            sites = {
                'email' : records['email'],
                'social': record
            }
            prof.append(sites)
        rep.append(info)    
    except:
        pass

abadisagustin@gmail.com
carloselmandril@gmail.com
cacacasdfa@hotmail.com
ignaciomortega95@gmail.com
yerkhomolina1990@gmail.com
sebas12341992@hotmail.com
troll@hotmail.com
j.angulo.valdivia@gmail.com
akantoresg@gmail.com
thiagojosso@hotmail.com
darimarinroth@hotmail.com
estebanbuenahonda@hotmail.com
drumberghostyt@hotmail.com
vichoaraya09@gmail.com
vpoblete7@hotmail.com
supertanker814@gmail.com
vpobletepersi20@gmail.com
vpobleteibarra7@gmail.com
shuryken441@gmail.com
bpobleteibarra@gmail.com
shuryken23@hotmail.com
vpoblete7@hotmail.com
ignaciogonzalez1983@gmail.com
sebaveloo15.sv@gmail.com
solcatylove@yahoo.com
jesusamira.arenascarmi@gmail.com
patatoiderisimo@hotmail.com
bryam-1905@hotmail.com
kheiedozuljevic@live.cl
florenciapiaduarteriosf@outlook.com
kheiedkele@gmail.cl
alexcarvajalpos@hotmail.com
kheiedozuljevic@hotmail.com
alexcarvajalpos6@gmail.com
keko2101@hotmail.com
alexcarvajalpos@gmail.com
keko2101@live.cl
kheied21kele@live.cl
kheiedozuljevic@gmail.cl
kheied21kele@hotmail.com


In [21]:
rep

[{'email': 'abadisagustin@gmail.com',
  'pinterest': 'Y',
  'instagram': 'Y',
  'spotify': 'Y'},
 {'email': 'carloselmandril@gmail.com'},
 {'email': 'cacacasdfa@hotmail.com'},
 {'email': 'ignaciomortega95@gmail.com', 'pinterest': 'Y'},
 {'email': 'yerkhomolina1990@gmail.com', 'twitter': 'Y', 'instagram': 'Y'},
 {'email': 'sebas12341992@hotmail.com',
  'twitter': 'Y',
  'pinterest': 'Y',
  'instagram': 'Y'},
 {'email': 'troll@hotmail.com',
  'spotify': 'Y',
  'pinterest': 'Y',
  'instagram': 'Y',
  'vimeo': 'Y',
  'gravatar': 'Y',
  'twitter': 'Y',
  'flickr': 'Y',
  'myspace': 'Y',
  'aboutme': 'Y'},
 {'email': 'j.angulo.valdivia@gmail.com',
  'twitter': 'Y',
  'pinterest': 'Y',
  'spotify': 'Y'},
 {'email': 'akantoresg@gmail.com', 'pinterest': 'Y'},
 {'email': 'thiagojosso@hotmail.com',
  'pinterest': 'Y',
  'instagram': 'Y',
  'spotify': 'Y',
  'twitter': 'Y'},
 {'email': 'darimarinroth@hotmail.com',
  'pinterest': 'Y',
  'instagram': 'Y',
  'twitter': 'Y',
  'spotify': 'Y'},
 {'emai

In [22]:
prof

[{'email': 'abadisagustin@gmail.com', 'social': 'pinterest'},
 {'email': 'abadisagustin@gmail.com', 'social': 'instagram'},
 {'email': 'abadisagustin@gmail.com', 'social': 'spotify'},
 {'email': 'ignaciomortega95@gmail.com', 'social': 'pinterest'},
 {'email': 'yerkhomolina1990@gmail.com', 'social': 'twitter'},
 {'email': 'yerkhomolina1990@gmail.com', 'social': 'instagram'},
 {'email': 'sebas12341992@hotmail.com', 'social': 'twitter'},
 {'email': 'sebas12341992@hotmail.com', 'social': 'pinterest'},
 {'email': 'sebas12341992@hotmail.com', 'social': 'instagram'},
 {'email': 'troll@hotmail.com', 'social': 'spotify'},
 {'email': 'troll@hotmail.com', 'social': 'pinterest'},
 {'email': 'troll@hotmail.com', 'social': 'instagram'},
 {'email': 'troll@hotmail.com', 'social': 'vimeo'},
 {'email': 'troll@hotmail.com', 'social': 'gravatar'},
 {'email': 'troll@hotmail.com', 'social': 'twitter'},
 {'email': 'troll@hotmail.com', 'social': 'flickr'},
 {'email': 'troll@hotmail.com', 'social': 'myspace'},

### Reputations Table

I create a pivot table in the for loop in order to be able to have one row per email address and all the possible social media accounts.

In [23]:
#move to dataframe
reputations = pd.DataFrame(rep)
reputations

Unnamed: 0,aboutme,email,facebook,flickr,foursquare,google,gravatar,instagram,lastfm,myspace,pastebin,pinterest,spotify,twitter,vimeo
0,,abadisagustin@gmail.com,,,,,,Y,,,,Y,Y,,
1,,carloselmandril@gmail.com,,,,,,,,,,,,,
2,,cacacasdfa@hotmail.com,,,,,,,,,,,,,
3,,ignaciomortega95@gmail.com,,,,,,,,,,Y,,,
4,,yerkhomolina1990@gmail.com,,,,,,Y,,,,,,Y,
5,,sebas12341992@hotmail.com,,,,,,Y,,,,Y,,Y,
6,Y,troll@hotmail.com,,Y,,,Y,Y,,Y,,Y,Y,Y,Y
7,,j.angulo.valdivia@gmail.com,,,,,,,,,,Y,Y,Y,
8,,akantoresg@gmail.com,,,,,,,,,,Y,,,
9,,thiagojosso@hotmail.com,,,,,,Y,,,,Y,Y,Y,


### Profile Table

This table contains just two columns. The email address and a social media profile. There are multiple rows per email, since many people can have multiple social media accounts.

In [None]:
profiles = pd.DataFrame(prof)
profiles

## Write to files

The profile and reputation tables are complete.

In [None]:
#write to file
df.to_csv('reputation.csv', index=0)

In [None]:
#write to file
df.to_csv('profiles.csv', index=0)