Malicious Use of Domain Names
* Bots: locate C&C
* Spam/Phishing: URLs linking to scam servers

## Detecting Malicious Domains via DNS

Reference: EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis

* Goal: detect malicious domains
* Build features using traffic from authoritative DNS servers to recursive DNS servers
    * Queried domain name, query issue time, TTL, list of IP addresses associated with domain

### Features
* F1: Time-based Features
    * Short life
    * Daily similarity
    * Repeating patterns
    * Access ratio
* F2: DNS answer-based features
    * &#35; of distinct IP addresses
    * &#35; of distinct countries
    * &#35; of domains IP shared with
    * Reverse DNS query results
* F3: TTL value-based features
    * Average TTL
    * Standard deviation of TTL
    * &#35; of distinct TTL values
    * &#35; of TTL changes
    * % usage of specific TTL ranges
* F4: Domain name-based features
    * % of numerical characters
    * % of the length of the LMS

### Time-based features
* Global scope: short-lived
* Local scope:
    * Daily simliarity
        * An increase or decrease of request count at same intervals everyday
    * Regularly repeating patterns
        * Instance of change point detection (CPD)
    * Access ratio
        * Idle vs popular

## DNS answer-based features
* &#35; of distinct IPs
    * Resolved for a domain during the experiment
* &#35; of different countries for those IPs
* Reverse DNS query results of those IPs
* &#35; of domains that share those IPs
    * Can be learge for web hosting providers as well
    * Reduce false positives by looking for reverse DNS query results on Google top 3 search results

### TTL value-based features
* TTL: length of time to cache a DNS response
    * Recommended between 1 - 5 days
* Average TTL value
    * High availability systems
        * Low TTL values
        * Round Robin DNS
        * Example: CDNs, Fast Flux botnets
* Standard deviation of TTL
    * Compromised home computers (dynamic IP) assigned much shorter TTL than compromised servers (static IP)
* &#35; of TTL changes, total &#35; of different TTL values
    * Higher in malicious domains
* % usage of specific TTL ranges
    * considered ranges: [0,1), [1,10), [10,100),[100,300),[300,900),>900
    * Malicious domains peak at [0,100) ranges

### Domain name-based features
* easy-to-remember names
    * important for benign services
        * main purpose of DNS
* Features:
    * Ratio of numerical characters to name length
    * Ratio of length of the longest meanigful substring (i.e., a dictionary word) to length of domain name
        * Query name of Google & check # of hits vs a threshold
* Features applied to only second-level domains
    * Example: server.com for x.y.server.com
* Other possible feature: entropy of the domain name
    * DGA-generated names more random than human-generated

Find a dataset, go over algorithm, give them some features to choose from and why and let them submit features on their own

https://ant.isi.edu/datasets/readmes/DoS_DNS_amplification-20130617.README.txt

### Evasion

* Assign uniform TTL values across all compromise dmachines
    * Reduces attacker's infrastructure reliability
* Reduce &#35; of DNS lookups of malicious domain
    * Not trivial to implement
    * Reduces attacker's impact
    * Requires high degree of coordination