# DNSSEC Crypto Support in Domains on the Internet
Now with Tranco Top1M

Hi Elias (if Nils or Peter did that, then the Q is also for you), please
send me % for the following (the questions apply to KSK as well as to ZSK):
1. How many zones out of those we measured are signed.
2. How many (%) are signed with one cipher.
3. How many (% and an absolute number) are signed with more than one cipher
(e.g., maybe KSK is signed with one cipher but the zonefile is signed by
multiple ZSK - so then use the number of multiple ZSK - we want to show
here that there are zones that have multiple ciphers. In this stats you can
consider RSA 1024 and RSA 2048 as different ciphers for the purpose of this
question).
4. What is the maximum number of ciphers that you saw being used?

How many resolvers validate (open resolvers, and ad net resolvers). Here
please % out of the entire population and also what is the entire
population you measured.

In addition to sending me the % here, please also add text into a dataset
section that I added in file dataset.tex

Best, Haya


In [1]:
import json
import re
import pandas as pd
import numpy as np
# from dns import message
# from dns import rdata
import matplotlib as plt
import seaborn as sns
# import logging

REPO_DIR = '../../dnssec-downgrade-data/'
DATA_DIR = REPO_DIR + '/2021-10-07_ns-crypto/'  # location of input/raw and processed data
STATS_DIR = DATA_DIR + '/stats/' # output location fo tables and plots 
TOP1M_FILENAME = DATA_DIR + '/dnssec-misconfiguration-prevalence-tranco-top-1m.pickle.gz'
TLDS_FILENAME = DATA_DIR + '/dnssec-misconfiguration-prevalence-tld.pickle.gz'
TOP1M_FILENAME_SLIM = DATA_DIR + '/dnssec-algos-top1m.pickle.gz'
TLDS_FILENAME_SLIM = DATA_DIR + '/dnssec-algos-tlds.pickle.gz'


### Load Data

In [2]:
df_top1m = pd.read_pickle(TOP1M_FILENAME_SLIM, compression='gzip')
df_tlds = pd.read_pickle(TLDS_FILENAME_SLIM, compression='gzip')
df_tlds

Unnamed: 0,domain,zone,ds,dnskey
0,clinic.,clinic.,(41947 8 2 b0f663276812153021d47e9cd2ff811528b...,(257 3 8 AwEAAcvTHmPn6v1yXm/FQhByRDSkM90A 8eX2...
1,chat.,chat.,(26920 8 1 b173a1003fa332b416e206ab573f01f37ca...,(256 3 8 AwEAAcbmk/5OptWsFOhli3ZKoPH/T/08 J6gS...
2,ua.,ua.,(48349 13 2 d8456df0eab0db7d2422b4110722f5772d...,(257 3 13 C2bE7DeaYbO2Am+P1gdNZfkPEyxILzG1 cB7...
3,fo.,fo.,(41527 8 2 6e7925d8d6f243ef35381231b955528f250...,(256 3 8 AwEAAZlw9SbFTz+s5YAkSppDFY7+NZYT k14U...
4,dog.,dog.,(28987 8 1 5b5bcf475937dfd841abd4b12cd7211790d...,(257 3 8 AwEAAbP82mWF474QWUW1gOoVUaAaiBMj C22s...
...,...,...,...,...
1493,akdn.,akdn.,(25057 8 2 47e24ce61e8604c9a8fc169d5e073b280f2...,(256 3 8 AwEAAd/UjKedeGUguqbs9rvSJJksSZ1H WYV9...
1494,xn--3bst00m.,xn--3bst00m.,(11479 8 2 7881a662f41e0d4b2133e66229672677147...,(256 3 8 AwEAAb8C1mRkjTEojs7aHs5i4kw5GA0f vtSU...
1495,tw.,tw.,(40792 8 2 a05db4b0deb971031361bb621e8bb1b8d73...,(256 3 8 AwEAAcw3wLGi203A+Wb0POo36BMFtf8v PeyN...
1496,xn--ses554g.,xn--ses554g.,(57266 8 2 a3c057a22744eb0ff0518d51e55b1271dab...,(257 3 8 AwEAAch4dO5zg0S2UnTLkn2ugjEK/oQ2 HaaC...


### Slim Down Data

In [3]:
## Done once and plugged into initial reading
# df_top1m = df_top1m[['domain', 'zone', 'ds', 'dnskey']]
# df_tlds = df_tlds[['domain', 'zone', 'ds', 'dnskey']]
# df_tlds.to_pickle(TLDS_FILENAME_SLIM, compression='gzip')
# df_top1m.to_pickle(TOP1M_FILENAME_SLIM, compression='gzip')
df_tlds

Unnamed: 0,domain,zone,ds,dnskey
0,clinic.,clinic.,(41947 8 2 b0f663276812153021d47e9cd2ff811528b...,(257 3 8 AwEAAcvTHmPn6v1yXm/FQhByRDSkM90A 8eX2...
1,chat.,chat.,(26920 8 1 b173a1003fa332b416e206ab573f01f37ca...,(256 3 8 AwEAAcbmk/5OptWsFOhli3ZKoPH/T/08 J6gS...
2,ua.,ua.,(48349 13 2 d8456df0eab0db7d2422b4110722f5772d...,(257 3 13 C2bE7DeaYbO2Am+P1gdNZfkPEyxILzG1 cB7...
3,fo.,fo.,(41527 8 2 6e7925d8d6f243ef35381231b955528f250...,(256 3 8 AwEAAZlw9SbFTz+s5YAkSppDFY7+NZYT k14U...
4,dog.,dog.,(28987 8 1 5b5bcf475937dfd841abd4b12cd7211790d...,(257 3 8 AwEAAbP82mWF474QWUW1gOoVUaAaiBMj C22s...
...,...,...,...,...
1493,akdn.,akdn.,(25057 8 2 47e24ce61e8604c9a8fc169d5e073b280f2...,(256 3 8 AwEAAd/UjKedeGUguqbs9rvSJJksSZ1H WYV9...
1494,xn--3bst00m.,xn--3bst00m.,(11479 8 2 7881a662f41e0d4b2133e66229672677147...,(256 3 8 AwEAAb8C1mRkjTEojs7aHs5i4kw5GA0f vtSU...
1495,tw.,tw.,(40792 8 2 a05db4b0deb971031361bb621e8bb1b8d73...,(256 3 8 AwEAAcw3wLGi203A+Wb0POo36BMFtf8v PeyN...
1496,xn--ses554g.,xn--ses554g.,(57266 8 2 a3c057a22744eb0ff0518d51e55b1271dab...,(257 3 8 AwEAAch4dO5zg0S2UnTLkn2ugjEK/oQ2 HaaC...


### Expand Key Info

In [4]:
for df in (df_tlds, df_top1m):
    df['ds_algos'] = df.apply(lambda row: {rr.algorithm for rr in row['ds']}, axis=1)
    df['dnskey_algos'] = df.apply(lambda row: {rr.algorithm for rr in row['dnskey']}, axis=1)
    df['dnskey_algos_kl'] = df.apply(lambda row: {(rr.algorithm, (len(rr.key)-4)*8) for rr in row['dnskey']}, axis=1)
    df['num_dnskey_algos'] = df.apply(lambda row: len(row['dnskey_algos']), axis=1)
    df['num_dnskey_algos_kl'] = df.apply(lambda row: len(row['dnskey_algos_kl']), axis=1)


print(f"{df_tlds.iloc[0].loc['dnskey_algos_kl']}")
df_tlds

{(<Algorithm.RSASHA256: 8>, 1280), (<Algorithm.RSASHA256: 8>, 1024), (<Algorithm.RSASHA256: 8>, 2048)}


Unnamed: 0,domain,zone,ds,dnskey,ds_algos,dnskey_algos,dnskey_algos_kl,num_dnskey_algos,num_dnskey_algos_kl
0,clinic.,clinic.,(41947 8 2 b0f663276812153021d47e9cd2ff811528b...,(257 3 8 AwEAAcvTHmPn6v1yXm/FQhByRDSkM90A 8eX2...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1280), (Algorithm.RSASH...",1,3
1,chat.,chat.,(26920 8 1 b173a1003fa332b416e206ab573f01f37ca...,(256 3 8 AwEAAcbmk/5OptWsFOhli3ZKoPH/T/08 J6gS...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1280), (Algorithm.RSASH...",1,3
2,ua.,ua.,(48349 13 2 d8456df0eab0db7d2422b4110722f5772d...,(257 3 13 C2bE7DeaYbO2Am+P1gdNZfkPEyxILzG1 cB7...,{Algorithm.ECDSAP256SHA256},{Algorithm.ECDSAP256SHA256},"{(Algorithm.ECDSAP256SHA256, 480)}",1,1
3,fo.,fo.,(41527 8 2 6e7925d8d6f243ef35381231b955528f250...,(256 3 8 AwEAAZlw9SbFTz+s5YAkSppDFY7+NZYT k14U...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1024), (Algorithm.RSASH...",1,2
4,dog.,dog.,(28987 8 1 5b5bcf475937dfd841abd4b12cd7211790d...,(257 3 8 AwEAAbP82mWF474QWUW1gOoVUaAaiBMj C22s...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1280), (Algorithm.RSASH...",1,3
...,...,...,...,...,...,...,...,...,...
1493,akdn.,akdn.,(25057 8 2 47e24ce61e8604c9a8fc169d5e073b280f2...,(256 3 8 AwEAAd/UjKedeGUguqbs9rvSJJksSZ1H WYV9...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1024), (Algorithm.RSASH...",1,2
1494,xn--3bst00m.,xn--3bst00m.,(11479 8 2 7881a662f41e0d4b2133e66229672677147...,(256 3 8 AwEAAb8C1mRkjTEojs7aHs5i4kw5GA0f vtSU...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1024), (Algorithm.RSASH...",1,2
1495,tw.,tw.,(40792 8 2 a05db4b0deb971031361bb621e8bb1b8d73...,(256 3 8 AwEAAcw3wLGi203A+Wb0POo36BMFtf8v PeyN...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1024), (Algorithm.RSASH...",1,2
1496,xn--ses554g.,xn--ses554g.,(57266 8 2 a3c057a22744eb0ff0518d51e55b1271dab...,(257 3 8 AwEAAch4dO5zg0S2UnTLkn2ugjEK/oQ2 HaaC...,{Algorithm.RSASHA256},{Algorithm.RSASHA256},"{(Algorithm.RSASHA256, 1024), (Algorithm.RSASH...",1,2


## Determine Statistics of Interest
1. How many zones out of those we measured are signed.
2. How many (%) are signed with one cipher.
3. How many (% and an absolute number) are signed with more than one cipher
(e.g., maybe KSK is signed with one cipher but the zonefile is signed by
multiple ZSK - so then use the number of multiple ZSK - we want to show
here that there are zones that have multiple ciphers. In this stats you can
consider RSA 1024 and RSA 2048 as different ciphers for the purpose of this
question).
4. What is the maximum number of ciphers that you saw being used?


### 1. Signed Zones

In [5]:
df_signed_tlds = df_tlds[df_tlds['dnskey'].map(any)]
num_tlds_all = df_tlds['domain'].nunique()
num_tlds_signed = df_signed_tlds['domain'].nunique()
share_tlds_signed = num_tlds_signed/num_tlds_all


df_signed_top1m = df_top1m[df_top1m['dnskey'].map(any)]
num_top1m_all = df_top1m['domain'].nunique()
num_top1m_signed = df_signed_top1m['domain'].nunique()
share_top1m_signed = num_top1m_signed/num_top1m_all

print(f"Number of TLDs: {num_tlds_all}")
print(f"Number of signed TLDs: {num_tlds_signed} ({share_tlds_signed:.4%})")
print()
print(f"Number of Top1M: {num_top1m_all}")
print(f"Number of signed Top1M: {num_top1m_signed} ({share_top1m_signed:.4%})")


Number of TLDs: 1498
Number of signed TLDs: 1372 (91.5888%)

Number of Top1M: 967374
Number of signed Top1M: 43181 (4.4637%)


### 2. Signed with exactly one Cipher

In [6]:
df_singlekey_kl_tlds = df_tlds[df_tlds['num_dnskey_algos_kl'] == 1]
num_tlds_singlekey_kl = df_singlekey_kl_tlds['domain'].nunique()
share_tlds_singlekey_kl = num_tlds_singlekey_kl/num_tlds_signed

df_singlekey_tlds = df_tlds[df_tlds['num_dnskey_algos'] == 1]
num_tlds_singlekey = df_singlekey_tlds['domain'].nunique()
share_tlds_singlekey = num_tlds_singlekey/num_tlds_signed

print(f"Number of TLDs with exactly one cipher (considering different lengths): {num_tlds_singlekey_kl} ({share_tlds_singlekey_kl:.4%} of signed)")
print(f"Number of TLDs with exactly one cipher (not considering lengths): {num_tlds_singlekey} ({share_tlds_singlekey:.4%} of signed)")

print()

df_singlekey_kl_top1m = df_top1m[df_top1m['num_dnskey_algos_kl'] == 1]
num_top1m_singlekey_kl = df_singlekey_kl_top1m['domain'].nunique()
share_top1m_singlekey_kl = num_top1m_singlekey_kl/num_top1m_signed

df_singlekey_top1m = df_top1m[df_top1m['num_dnskey_algos'] == 1]
num_top1m_singlekey = df_singlekey_top1m['domain'].nunique()
share_top1m_singlekey = num_top1m_singlekey/num_top1m_signed

print(f"Number of Top1M domains with exactly one cipher (considering different lengths): {num_top1m_singlekey_kl} ({share_top1m_singlekey_kl:.4%} of signed)")
print(f"Number of Top1M domains with exactly one cipher (not considering lengths): {num_top1m_singlekey} ({share_top1m_singlekey:.4%} of signed)")


Number of TLDs with exactly one cipher (considering different lengths): 185 (13.4840% of signed)
Number of TLDs with exactly one cipher (not considering lengths): 1368 (99.7085% of signed)

Number of Top1M domains with exactly one cipher (considering different lengths): 21122 (48.9150% of signed)
Number of Top1M domains with exactly one cipher (not considering lengths): 42967 (99.5044% of signed)


### 3. Signed with more than one Cipher

In [7]:
df_multikey_kl_tlds = df_tlds[df_tlds['num_dnskey_algos_kl'] > 1]
num_tlds_multikey_kl = df_multikey_kl_tlds['domain'].nunique()
share_tlds_multikey_kl = num_tlds_multikey_kl/num_tlds_signed

df_multikey_tlds = df_tlds[df_tlds['num_dnskey_algos'] > 1]
num_tlds_multikey = df_multikey_tlds['domain'].nunique()
share_tlds_multikey = num_tlds_multikey/num_tlds_signed

print(f"Number of TLDs with more than one cipher (considering different lengths): {num_tlds_multikey_kl} ({share_tlds_multikey_kl:.4%} of signed)")
print(f"Number of TLDs with more than one cipher (not considering lengths): {num_tlds_multikey} ({share_tlds_multikey:.4%} of signed)")

print()

df_multikey_kl_top1m = df_top1m[df_top1m['num_dnskey_algos_kl'] > 1]
num_top1m_multikey_kl = df_multikey_kl_top1m['domain'].nunique()
share_top1m_multikey_kl = num_top1m_multikey_kl/num_top1m_signed

df_multikey_top1m = df_top1m[df_top1m['num_dnskey_algos'] > 1]
num_top1m_multikey = df_multikey_top1m['domain'].nunique()
share_top1m_multikey = num_top1m_multikey/num_top1m_signed

print(f"Number of Top1M domains with more than one cipher (considering different lengths): {num_top1m_multikey_kl} ({share_top1m_multikey_kl:.4%} of signed)")
print(f"Number of Top1M domains with more than one cipher (not considering lengths): {num_top1m_multikey} ({share_top1m_multikey:.4%} of signed)")


Number of TLDs with more than one cipher (considering different lengths): 1187 (86.5160% of signed)
Number of TLDs with more than one cipher (not considering lengths): 4 (0.2915% of signed)

Number of Top1M domains with more than one cipher (considering different lengths): 22059 (51.0850% of signed)
Number of Top1M domains with more than one cipher (not considering lengths): 214 (0.4956% of signed)


### 4. Maximum Number of Ciphers used

In [8]:
max_num_ciphers_kl_tlds = df_tlds['num_dnskey_algos_kl'].max()
max_num_ciphers_tlds = df_tlds['num_dnskey_algos'].max()

print(f"Maximum number of ciphers in TLDs (considering different lengths): {max_num_ciphers_kl_tlds}")
print(f"Maximum number of ciphers in TLDs (not considering different lengths): {max_num_ciphers_tlds}")

print()
max_num_ciphers_kl_top1m = df_top1m['num_dnskey_algos_kl'].max()
max_num_ciphers_top1m = df_top1m['num_dnskey_algos'].max()

print(f"Maximum number of ciphers in Top1M domains (considering different lengths): {max_num_ciphers_kl_top1m}")
print(f"Maximum number of ciphers in Top1M domains (not considering different lengths): {max_num_ciphers_top1m}")


Maximum number of ciphers in TLDs (considering different lengths): 4
Maximum number of ciphers in TLDs (not considering different lengths): 2

Maximum number of ciphers in Top1M domains (considering different lengths): 7
Maximum number of ciphers in Top1M domains (not considering different lengths): 4


## Plot Statistics
- frequencies of algorithms in CDF
- distinct algorithm counts CDF

### Algorithm Frequencies
1. determine base set of algorithms
2. find numbers of domains per algorithm **skewed for multi-algorithm domain??**

In [17]:
df_tlds
all_algos_tlds = set()
all_algos_top1m = set()
df_tlds.apply(lambda row: all_algos_tlds.update(row['dnskey_algos']), axis=1)
df_tlds.apply(lambda row: all_algos_top1m.update(row['dnskey_algos']), axis=1)


all_algos_tlds
all_algos_top1m
all_algos = all_algos_tlds | all_algos_top1m
all_algos

{<Algorithm.RSASHA1: 5>,
 <Algorithm.RSASHA1NSEC3SHA1: 7>,
 <Algorithm.RSASHA256: 8>,
 <Algorithm.RSASHA512: 10>,
 <Algorithm.ECDSAP256SHA256: 13>}

### Number of Distinct Algorithms per Domain
as found in DNSKEY records

In [10]:
NUM_SPECIFIER = 'num_dnskey_algos_kl'
# NUM_SPECIFIER = 'num_dnskey_algos'
pass
# TODO implement me