# Domains

## Task 1: Creating domain variations using a list of Top Level Domains (TLDs)

We have learned the value of finding email patterns, but what about domain names that are registered?

#### 💜 Activity: Create a list of domain names for `Apple` using different TLD's to further investigate.

Sometimes companies register in multiple countries. We can find this information in a variety of sources, but we want to be extra thorough and check against all TLD's.



### Download a list of TLD's from IANA

To start, we will get a list of all TLD's directly from IANA - the Internet Assigned Number Authority
- https://iana.org

You can try run the below code, or access it directly with this link. Copy and paste the list into a text file named "domains.txt":
- https://data.iana.org/TLD/tlds-alpha-by-domain.txt

In [None]:
import os

command = 'curl https://data.iana.org/TLD/tlds-alpha-by-domain.txt > domains.txt'

os.system(command)

In [None]:
# Create a variable assigned to the name of the file we just created

domain = 'google'
domain_file   = 'domains.txt'
domains_array = []


with open(domain_file, 'r') as file:
    print('Reading domains from file, skipping first line...')
    next(file) # skip over the first line
    for index, line in enumerate(file):
        new_domain = domain + '.' + line.strip().lower() # this should result in an all lower case domain
        domains_array.append(new_domain) # add to an array
        # print(new_domain)

# print(domains_array)

# create a new file and write the domain names to it
with open('google_tlds.txt', 'w') as outfile:
    for domain in domains_array:
        outfile.write(domain + '\n')
        
print('Done!')

## Safe domains

Okay cool, now we should have a list of domains. If you don't, check the dataset folder within this directory for a file called `google_tlds.txt`

### So, are they safe?

Good question! We should check that... To follow the next step, you will require a VirusTotal API key, which can be gotten here:

- https://developers.virustotal.com/reference/overview

Note there are restrictions on the Free API key, and in order to not use up all our requests while testing, we should really start with 1 or 2 domains. Let's try with a domain we know exists, and then with one we aren't sure exists.

Create a new file: `test_domains.txt`

Add the following:

```
google.com
google.baby
```

Set your API key to an environment variable:

_One option... which I wouldn't say is the safest option_

If you have bash installed:
1. New code cell
1. Use a magic command: `%env VT_API_KEY your_api_key_here`
1. Access the environment variable in your Python code (see below snippet)

In [2]:
# may help with async methods
import nest_asyncio
nest_asyncio.apply()

In [3]:
# Reference:
# https://developers.virustotal.com/reference/scan-url - Select Python

import os

from dotenv import load_dotenv
import vt

load_dotenv()

client = vt.Client(os.getenv('VT_API_KEY'))

url_id = vt.url_id("http://www.google.com")
url = client.get_object("/urls/{}", url_id)
url.last_analysis_stats # work on the assumption that 40 - 100 is likely not malicious



{'harmless': 75,
 'malicious': 1,
 'suspicious': 0,
 'undetected': 14,
 'timeout': 0}

In [None]:
analysis = client.scan_url('https://google.com')

In [None]:
print(analysis)

In [None]:
look at dotenv and load_dotenv()

In [None]:
import base64

url_id = base64.urlsafe_b64encode("http://google.com".encode()).decode().strip("=")
print(url_id)

In [4]:
import requests

url = "https://www.virustotal.com/api/v3/urls/aHR0cDovL2dvb2dsZS5jb20"

headers = {
    "accept": "application/json",
    "x-apikey": "bfd1a74aed6085abb3db4f6a22d0cb34236f62ee816dc96d22c2e2dc556b1355"
}

response = requests.get(url, headers=headers)

print(response.text)

{
    "data": {
        "attributes": {
            "last_modification_date": 1683201575,
            "times_submitted": 178122,
            "total_votes": {
                "harmless": 1800,
                "malicious": 600
            },
            "threat_names": [],
            "redirection_chain": [
                "http://google.com/",
                "http://www.google.com/"
            ],
            "last_submission_date": 1683201571,
            "last_http_response_content_length": 143246,
            "last_http_response_headers": {
                "X-XSS-Protection": "0",
                "Permissions-Policy": "unload=()",
                "Transfer-Encoding": "chunked",
                "Set-Cookie": "1P_JAR=2023-05-04-11; expires=Sat, 03-Jun-2023 11:59:33 GMT; path=/; domain=.google.com; Secure; SameSite=none, NID=511=qy_5hTkenEydCTEh0UBylb4bt_CIitxyokmWJ0aGGx6l8LU-Q3TFJqaZ_yTSmwSSKbZVOr7lHBHlC7NtASEyQXqXPvQoXBgGYQIP0JXBnjKnYrvBeNnJ_Xdn3jRLS2PWS0WEOMwzsMXIaTX6cesKtFTl68uxgDI