# Imports and Configuration

There are a few things happening here. One is the use of `nest_asyncio` which is required for Jupyter to be capable of async operations due to some nesting magic that it handles. The next bit is `dotenv` that handles our sensitive login information and such in an env file that it loads into the OS environment variables. This keeps us from storing secrets in this repository on accident. The gitignore file handles the rest keeping that out of the way. I like to separate the imports and configuration out so we can quickly reload them if we need to change anything up or add new libraries.

In [None]:
import os

import domaintools
import dotenv
import nest_asyncio
import pandas
import pyasn

from tqdm import tqdm

nest_asyncio.apply()
dotenv.load_dotenv()
dt_api_user = os.getenv('DT_API_USER')
dt_api_key = os.getenv('DT_API_KEY')

In [None]:
# Add Iris hash here.
iris_hash = ''

dt = domaintools.API(dt_api_user, dt_api_key)
results = dt.iris_investigate(search_hash=iris_hash)

hosting_history = dict()
for result in tqdm(results):
    domain = result['domain']
    hosting_history[domain] = { 'ip_history': dt.hosting_history(domain).setdefault('ip_history', list()),
                                'nameserver_history': dt.hosting_history(domain).setdefault('nameserver_history', list()),
                                'registrar_history': dt.hosting_history(domain).setdefault('registrar_history', list()) }

In [None]:
# Get unique counts of all nameservers, ips, and registrars per domain. They are broken out in weird, verbose ways here so we can manipulate
# them in later cells if we decide to modify or futz with the data. Just feeling the need to let you know my non-notebook code is not this bad...

hosting_history_unique = dict()
for domain in hosting_history.keys():
    for history_type in hosting_history[domain].keys():
        for history_count, history in enumerate(hosting_history[domain][history_type]):
            # For some reason our JSON is assumed a string of [] when empty instead of as an empty list.
            if type(history) == str:
                continue
                
            for key in history.keys():
                if key.startswith('pre_') or key.startswith('post_') or key == 'registrar':
                    if hosting_history[domain][history_type][history_count][key] is not None:
                        hosting_history_unique.setdefault(domain, dict()).setdefault(f'{history_type}_unique', set()).add(hosting_history[domain][history_type][history_count][key])

# Now we want all of these unique counts as well so we can work with it later as well.

total_history_counts = dict()
for domain in hosting_history_unique.keys():
    for history_type in hosting_history_unique[domain].keys():       
        for history in hosting_history_unique[domain][history_type]:
            total_history_counts[history_type][history] = total_history_counts.setdefault(history_type, dict()).setdefault(history, 0) + 1

In [None]:
# Show counts tables for each of IPs, registrars, and nameservers.

for history_type in total_history_counts.keys():
    
    table = list()
    
    # This is the kind of clever thing that would annoy the shit out of me in code, but it will generate us a nice column name in the dataframe
    # from what we already have and does it in a way so I don't have to write it three times. Basically just takes the string, finds the first
    # occurence of the _ and dumps the rest treating it like a list as you can do in Python. Don't blame me. Blame Guido.
    
    for key, value in total_history_counts[history_type].items():
        table.append([key, value])
        
    pandas.DataFrame(table, columns=[history_type[:history_type.find('_')], 'counts'])

In [None]:
table = list()
for key,value in total_history_counts['ip_history_unique'].items():
    table.append([key, value])
    
pandas.DataFrame(table, columns=["ip", "counts"]).sort_values(by='counts', ascending=False)

In [None]:
table = list()
for key,value in total_history_counts['registrar_history_unique'].items():
    table.append([key, value])
    
pandas.DataFrame(table, columns=["registrar", "counts"]).sort_values(by='counts', ascending=False)

In [None]:
table = list()
for key,value in total_history_counts['nameserver_history_unique'].items():
    table.append([key, value])
    
pandas.DataFrame(table, columns=["nameserver", "counts"]).sort_values(by='counts', ascending=False)

# Passive DNS

In this section we take all of the passive DNS A records associated with these domains and compile a table of each domain of:

DNS Name, First Seen, ASN at time of First Seen, Last Seen, ASN at time of Last Seen

This uses the pyasn library and the collections of historical ASNs from multiple datasets. This data has to be loaded in a specific way for this to work so see the README.md of this dockerized Jupyter instance to see how to populate the data therein.

# URLScan

Simply query URLScan for all of these domains, pull the images, and load them up in Jupyter using the interact widget as described at https://stackoverflow.com/questions/51546983/embedding-slideshow-in-jupyter-notebook

# VirusTotal

Hit VT with all of these domains and pull back any interesting information on them. Perhaps in the future we can interactively build a VTGraph of everything we've compiled here through the new v3 API.