# ipv6scan

In a dataset of IPv6 scan results we collected, we noticed that many network devices share identical lower 64 bits in their IPv6 addresses. Considering the most widely used IPv6 assignment mechanisms, such address collisions should be significantly less frequent than what we have observed. We want to find out the reason behind this.

## Dataset collection

IPv6 address space is astronomically large. (It will take about a 100 million years to scan through all of them if we do 100 million scans every second!) The difficulty is tremendous compared to trying to scan the IPv4 addresses space, which we can do in five minutes with tools such as [ZMap](https://zmap.io/). 

Another important characteristic of IPv6 address space is that it's extremely sparsely occupied -- only a tiny fraction of addresses have active devices behind them and the vast majority remain unused. Notably, the addresses are assigned following certain patterns. Therefore, while brute-forcing through the entire IPv6 address space is impractical, it is fortunately unnecessary. Instead, we can focus on narrowing down the search space to target those likely occupied regions.

## MAC address assignment and semantics

MAC (Media Access Control) addresses are link layer addresses that identify network devices (more specifically, identify the NIC on that device), and each one is globally unique. These addresses are 48-bit long, typically written in 12 hexadecimal digits (e.g., **02:04:7A**:*BB:28:FC*). The first six hexademical digits identifies the manufacturer of the NIC; and the last six should be unique numbers assigned by the manufacturer. 

## IPv6 address semantics and assignment

If we take a step back and look at how IPv6 addresses are formed... IPv6 addresses are 128 bit long addresses on the network layer, typically written in eight groups of two bytes in hexadecimal numbers (e.g., 2001:0DB8:AC10:FE01:0000:0000:10B2:0301). 
- First 64 bits: Network and subnet identification
  - First 48 bits: Network prefix (identifies the overall network)
  - Next 16 bits: Subnet ID (identifies a specific subnet within that network)
- Last 64 bits: Host identification (identifies a specific device on that network)

There are several ways to assigned the last 64 bits:
- Static
- SLAAC
- DHCPv6

## DHCPv6

DHCPv6 (Dynamic Host Configuration Protocol for IPv6) 

## SLAAC

The SLAAC (Stateless Address Autoconfiguration) process generates link-local addresses (???) and ensures that the address is unique on that local segment. It takes the MAC address of a NIC to form this address, and it's a good place to start since MAC addresses are already globally unique. 

## Routers and subnets?

router sitting in front of a subnet. contrast of its role between ipv4 and v6. 
 
ipv6 subnet ip assignment: upper 64 bits are assigned to the network, and the lower 64 bits are given to the devices in the network. i think there are three options (manual assignment, DHCPv6, SLAAC). todo: find out why erik only mentioned SLAAC and if that's the main/only thing people use nowadays.

iot device address configuration: pad mac address (which is 48 bits and conviniently already globally unique) with [SLAAC](https://www.networkacademy.io/ccna/ipv6/stateless-address-autoconfiguration-slaac). for this reason the lower 64 bits should be unique and never seen twice in the network. however, many identical lower 64 bits are seen repeatedly in the network. 

## NTPs

network time servers. when a router is first configured, it requests from a network time server for the current time. (todo: do they have a list of time server addresses that are configured into this server? it seems unlikely that the router updates this list (if it exists!) on its own? where can the router obtain a list of time servers? is it allowed to be online if its time is really off?) the hypothesis is that these routers access (also wanna check how dns process work again) none of the time servers configured on this router's list is active anymore, so the router might be defaulting to the start of the unix epoch time (new year midnight 1970 UTC) which it in turns use as a seed to configure the lower 64 bits of the address. (todo: can't remember why we need to seed anything exactly. figure that out.) but again how are these routers even allowed online if the time is so off? but i guess maybe that's only a hard requirement for the endhost? find out about this.

## Questions

we can check for ipv6 dups within a certain range, right? beyond that range we just rely on the hierarchy also just cuz who's gonna keep a list on a global level. 