# Check if CDN setting hijacks for Subprefix are actually Hidden Hijacks

Data collected was from the following setting:

```bash
export PYTHONHASHSEED=0
python __main__.py --percentages 0.01 0.05 0.1 0.2 \
                   --relay_asns cloudflare \
                   --num_trials 5 \
                   --policy v4k5 \
                   --cpus 1 \
                   --python_hash_seed $PYTHONHASHSEED \
                   --rov_adoption none \
                   --num_attackers 5 \
                   --scenario V4SubprefixHijackScenario

```

Graphs and a complete JSON of the settings can be found in 
`V4SubprefixHijackScenario_scenario_none_type_none_rov_0_hash_cloudflare_relay_False_attackRelay_5_attacker_5_trials_[0.01,0.05,0.1,0.2]_percentages.zip`

## Imports and Load Trial Metadata

In [2]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Jun 19 17:45:00 2023

@author: uconn
"""


################################
# Imports
################################

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


################################
# Load Data
################################
#%%
data_file_path = '../../data/as_metadata/subpefix_metadata.csv'
data = pd.read_csv(data_file_path, delimiter='\t')


Show a sample of the data. EAch row of the data is an entry of the LocalRIB of an AS. The outcomes are determined by the `prefix_for_outcome` as that one resulted in the min-number of successful connections.

In [3]:
data.head()

Unnamed: 0,trial,percentage,propagation_round,asn,adoption_setting,outcome,prefix_for_outcome,local_rib_prefix,as_path,relationship,blackhole,avoid_list
0,0,0.01,0,206536,BGP Simple,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,1.2.3.0/24,[206536],Relationships.ORIGIN,False,"[20485, 32787, 7195, 8220, 4637, 7713, 29226, ..."
1,0,0.01,0,206536,BGP Simple,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[206536, 8359, 18403, 13335]",Relationships.PROVIDERS,False,"[20485, 32787, 7195, 8220, 4637, 7713, 29226, ..."
2,0,0.01,0,206536,BGP Simple,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,1.2.0.0/16,"[206536, 8359, 9498, 133275, 133718]",Relationships.PROVIDERS,False,"[20485, 32787, 7195, 8220, 4637, 7713, 29226, ..."
3,0,0.01,0,25408,BGP Simple,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,1.2.3.0/24,"[25408, 206536]",Relationships.CUSTOMERS,False,"[20485, 32787, 7195, 8220, 4637, 7713, 29226, ..."
4,0,0.01,0,25408,BGP Simple,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[25408, 1299, 13335]",Relationships.PROVIDERS,False,"[20485, 32787, 7195, 8220, 4637, 7713, 29226, ..."


In [4]:
len(data)

4434316

In [5]:
################################
# Analyze
################################
#%%

trial = 0
percentage = 0.1

# Select all adopting ASes that are hijacked
adopting_ases_trials = data.loc[data['adoption_setting'] == 'ROV V4 Lite K5', :]
hijacked_adopting_ases_trials = adopting_ases_trials.loc[adopting_ases_trials['outcome'] == 'Outcomes.ATTACKER_SUCCESS', :]
sinlge_trial_data = hijacked_adopting_ases_trials.loc[(hijacked_adopting_ases_trials['trial'] == trial) & (hijacked_adopting_ases_trials['percentage'] == percentage), :]

sinlge_trial_data.head()

Unnamed: 0,trial,percentage,propagation_round,asn,adoption_setting,outcome,prefix_for_outcome,local_rib_prefix,as_path,relationship,blackhole,avoid_list
2237648,0,0.1,0,32934,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[32934, 1221, 13335]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237649,0,0.1,0,32934,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,1.2.0.0/16,"[32934, 4657, 4761, 23954]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237650,0,0.1,0,32934,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,1.2.3.0/24,"[32934, 4657, 45474, 7195, 13786, 264517, 2635...",Relationships.PEERS,True,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237689,0,0.1,0,58879,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[58879, 3257, 13335]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237690,0,0.1,0,58879,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,1.2.0.0/16,"[58879, 4826, 17451, 23954]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."


In [7]:
len(sinlge_trial_data)

22289

Now lets check how many of these adopting ASes are hidden hijacks
A hidden hijack is classified as:
* If the attacker ann is not in LocalRIB
* Has a path to the origin

In [23]:
# Quickly lets see if an adopter has the attacker ann in LocalRIB without a blackhole ann
has_attacker_ann_in_local_rib = sinlge_trial_data.loc[(sinlge_trial_data['prefix_for_outcome'] == sinlge_trial_data['local_rib_prefix']) , :]
and_does_not_have_blackhole = has_attacker_ann_in_local_rib.loc[has_attacker_ann_in_local_rib['blackhole'] == False, :]
len(and_does_not_have_blackhole)

0

That's reassuring! It should be 0. this means that all the adopters that have been hijacked have a blackhole installed for the attacker prefix; which indicates that they are DISCONNECTED, but then chose to be connected via the relay.

Now lets get a count on the adopters that have attacker ann in their RIB 

In [8]:
# Check if attacking announcement in RIB
has_attacker_ann_in_local_rib = sinlge_trial_data.loc[sinlge_trial_data['prefix_for_outcome'] == sinlge_trial_data['local_rib_prefix'], 'asn'].unique()
len(has_attacker_ann_in_local_rib)

7257

In [9]:
# Check has path to origin
has_path_to_origin = sinlge_trial_data.loc[sinlge_trial_data['local_rib_prefix'] == '1.2.0.0/16', 'asn'].unique()
len(has_path_to_origin)

7516

In [10]:
# Get all ASNs
all_asns = sinlge_trial_data['asn'].unique()
print(len(all_asns))

7516


In [11]:
# ASes that don't have attacker in LocalRIB
ases_that_do_not_have_attacker_ann = set(all_asns) - set(has_attacker_ann_in_local_rib)
len(ases_that_do_not_have_attacker_ann)

259

In [12]:
# and have a path to the origin
and_have_path_to_origin = ases_that_do_not_have_attacker_ann.intersection(set(has_path_to_origin))
len(and_have_path_to_origin)

259

In [13]:
len(and_have_path_to_origin)/len(all_asns)
# sinlge_trial_data.query('asn in @and_have_path_to_origin').head()

0.0344598190526876

## Why is there such a huge variance in results?

In [6]:
# What is percentage of ASes that were hijacked?
all_sinlge_trial_data = adopting_ases_trials.loc[(adopting_ases_trials['trial'] == trial) & (adopting_ases_trials['percentage'] == percentage), :]
len(sinlge_trial_data)/len(all_sinlge_trial_data)

0.9999102776905477

The observation above is actually the most insightful. Looking at different trials we can see that most results are either almost 0 or almost 100 percent. There's really nothing in between. The goal now is to try to understand why this is the case.

Lets see how many hijacks are caused by the relay

In [17]:
hijacked_via_relay = sinlge_trial_data.loc[(sinlge_trial_data['local_rib_prefix'] != sinlge_trial_data['prefix_for_outcome']) & (sinlge_trial_data['local_rib_prefix'] != '1.2.0.0/16'), :]
hijacked_via_relay.head()

Unnamed: 0,trial,percentage,propagation_round,asn,adoption_setting,outcome,prefix_for_outcome,local_rib_prefix,as_path,relationship,blackhole,avoid_list
2237648,0,0.1,0,32934,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[32934, 1221, 13335]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237689,0,0.1,0,58879,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[58879, 3257, 13335]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237704,0,0.1,0,13335,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,[13335],Relationships.ORIGIN,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237706,0,0.1,0,28186,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[28186, 13335]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."
2237727,0,0.1,0,39764,ROV V4 Lite K5,Outcomes.ATTACKER_SUCCESS,1.2.3.0/24,7.7.7.0/24,"[39764, 6939, 557, 13335]",Relationships.PEERS,False,"[57344, 20485, 36873, 16397, 12301, 32787, 123..."


In [24]:
len(hijacked_via_relay)

7516

I think this is an indication that all the disconnections were converted to ATTACKER_SUCCESS via the relay prefix. Let's check if the relay is an AS with a hidden hijack. The data being analyzed used `cloudflare` as the CDN; which has a single ASN `13335`

In [25]:
13335 in and_have_path_to_origin

True

That's it ... the disconnections are being converted to hijacks via the CDN's hidden hijack