# Working with PassiveTotal Data in DataFrames
This example assumes you have the PassiveTotal python API module installed and your credentials configured.

In [1]:
from passivetotal.libs.dns import DnsRequest
import pandas as pd
from pandas import Timestamp

Create a PassiveTotal client object:

In [2]:
client = DnsRequest.from_config()

Get the results for a recent GitHub-targeted phishing domain:

In [3]:
r = client.get_passive_dns(query='glthubs.com')

Creating a DataFrame from all the data returned is simple:

In [4]:
df = pd.DataFrame(r.get('results'))

In [5]:
df

Unnamed: 0,collected,firstSeen,lastSeen,recordHash,recordType,resolve,resolveType,source,value
0,2020-04-11 21:47:28,2020-04-03 12:42:30,2020-04-04 23:45:06,33f09dc22d89b91429d630eccb991243ce55161c22df70...,NS,2-can.njalla.in,domain,[riskiq],glthubs.com
1,2020-04-11 21:47:29,2020-04-03 17:00:00,2020-04-04 23:44:47,f0eaa8f0d6b4ca80e55fc385c27c4568896b9a036d6748...,A,46.17.96.88,ip,"[riskiq, emerging_threats, kaspersky]",glthubs.com
2,2020-04-11 21:47:28,2020-04-03 12:42:30,2020-04-04 23:45:06,47b9f9b1e68a34ea922738ef35f4bf1601b31d4806a1bc...,NS,3-get.njalla.fo,domain,[riskiq],glthubs.com
3,2020-04-11 21:47:29,2020-04-03 12:42:30,2020-04-03 22:42:29,f99029cfbb09f0728dac56b2d7659c552889c0a88dadb8...,A,185.163.47.164,ip,"[riskiq, kaspersky]",glthubs.com
4,2020-04-11 21:47:28,2020-04-03 12:42:30,2020-04-04 23:45:06,febe023b5f2df659ba33f1ba0728dffec560fc16e0543c...,NS,1-you.njalla.no,domain,[riskiq],glthubs.com
5,2020-04-11 21:47:28,2020-04-03 12:42:30,2020-04-04 23:44:55,e7446ef4647ecbd7e291dd33934265567c7edcbb466936...,SOA,you@can-get-no.info,email,[riskiq],glthubs.com
6,2020-04-11 21:47:28,2020-04-03 12:42:30,2020-04-04 23:44:55,cbfc9de9aba2d112d7ebcfa3bddfc5fc4a1b9133d706f7...,SOA,1-you.njalla.no.,domain,[riskiq],glthubs.com


Not all the columns are super interesting, however. Let's create a function that:

- returns the most interesting columns
- coverts the firstSeen and lastSeen columns to Timemstamps that we can use for filtering
- optionally filters the results to one DNS record type

In [6]:
def pt2df(value,rt=None):
    client = DnsRequest.from_config()
    columns = ['firstSeen','lastSeen','recordType','resolve','resolveType','value']
    result = client.get_passive_dns(query=value).get('results')
    df = pd.DataFrame(result)[columns]
    df[['firstSeen','lastSeen']] = df[['firstSeen','lastSeen']].apply(pd.to_datetime)
    if rt is None:
        return df
    else:
        return df[df.recordType==rt]

Here's our filtered DataFrame:

In [7]:
df = pt2df('glthubs.com')
df

Unnamed: 0,firstSeen,lastSeen,recordType,resolve,resolveType,value
0,2020-04-03 12:42:30,2020-04-04 23:45:06,NS,2-can.njalla.in,domain,glthubs.com
1,2020-04-03 17:00:00,2020-04-04 23:44:47,A,46.17.96.88,ip,glthubs.com
2,2020-04-03 12:42:30,2020-04-04 23:45:06,NS,3-get.njalla.fo,domain,glthubs.com
3,2020-04-03 12:42:30,2020-04-03 22:42:29,A,185.163.47.164,ip,glthubs.com
4,2020-04-03 12:42:30,2020-04-04 23:45:06,NS,1-you.njalla.no,domain,glthubs.com
5,2020-04-03 12:42:30,2020-04-04 23:44:55,SOA,you@can-get-no.info,email,glthubs.com
6,2020-04-03 12:42:30,2020-04-04 23:44:55,SOA,1-you.njalla.no.,domain,glthubs.com


Note that the date columns are now typed as Pandas Timestamp objects:

In [8]:
type(df.firstSeen.iloc[0])

pandas._libs.tslibs.timestamps.Timestamp

This means we can trivially filter based on those columns:

In [9]:
df[df.lastSeen >= Timestamp('2020-04-04')]

Unnamed: 0,firstSeen,lastSeen,recordType,resolve,resolveType,value
0,2020-04-03 12:42:30,2020-04-04 23:45:06,NS,2-can.njalla.in,domain,glthubs.com
1,2020-04-03 17:00:00,2020-04-04 23:44:47,A,46.17.96.88,ip,glthubs.com
2,2020-04-03 12:42:30,2020-04-04 23:45:06,NS,3-get.njalla.fo,domain,glthubs.com
4,2020-04-03 12:42:30,2020-04-04 23:45:06,NS,1-you.njalla.no,domain,glthubs.com
5,2020-04-03 12:42:30,2020-04-04 23:44:55,SOA,you@can-get-no.info,email,glthubs.com
6,2020-04-03 12:42:30,2020-04-04 23:44:55,SOA,1-you.njalla.no.,domain,glthubs.com


Here's how we would filter for a record type if we didn't do it using the function we wrote above:

In [10]:
df[df.recordType=='A']

Unnamed: 0,firstSeen,lastSeen,recordType,resolve,resolveType,value
1,2020-04-03 17:00:00,2020-04-04 23:44:47,A,46.17.96.88,ip,glthubs.com
3,2020-04-03 12:42:30,2020-04-03 22:42:29,A,185.163.47.164,ip,glthubs.com


but we already anticipated that need, and we can filter it at query time like this:

In [11]:
df = pt2df('glthubs.com',rt='A')
df

Unnamed: 0,firstSeen,lastSeen,recordType,resolve,resolveType,value
1,2020-04-03 17:00:00,2020-04-04 23:44:47,A,46.17.96.88,ip,glthubs.com
3,2020-04-03 12:42:30,2020-04-03 22:42:29,A,185.163.47.164,ip,glthubs.com
