# <mark>Internet</mark>

## What is the internet?

- The internet is a network of networks or a vast number of networks connected together.
- In between all of the sub-networks are systems of routers that direct network traffic.
- The internet can be thought of as the infrastructure that enables inter-network communication, both in terms of the physical network and the lower-level protocols that control its use.<br><br>
- The World Wide Web is a **service** that can be accessed via the internet. 
- It is a vast information system of resources which are navigable by means of a URL (Uniform Resource Locator).
- HTTP is the primary means by which applications interact with the resources that make up the web.<br><br>
- **HTML** is the means by which the resources on the Web should be uniformly structured.
- **URI** is part of a system bu which resources should be uniformly addressed on the Web.
- **HTTP** is the set of rules which provide uniformity to the way resources on the web are transferred between applications.<br><br>
**The web is comprised of the resources that are being transported.  The internet is the infrastructure that enables the transfering.**

## What are protocols, and why are they necessary?

- Protocols are a system or set of rules that govern the exchange or transmission of data.
    - **Application**: HTTP
    - **Session**: TLS or DTLS
    - **Transport**: TCP or UDP
    - **Network**: IP
    - **Link / Data Link**: Ethernet
- Main reasons why there are so many different protocols for network communication:
    - Different protocols were developed to address different aspects of network communication.
        - TCP and UDP would be examples of two protocols that address the same fundamental aspect of communication, the transfer of messages between applications, but do so differently.
    - Different protocols were developed to address the same aspect of network communication but differently for a specific use case.
        - TCP and HTTP are examples of two protocols that address different aspects of communication; TCP provides for the transfer of messages between applications, while HTTP defines the structure of those messages.
- We need them because otherwise the many and varied devices on the network would not have a cohesive and uniform communication method

## Explain how data encapsulation works in the context of a network model

- **Data Encapsulation** in the context of a network communication models means that we are essentially hiding data from one layer by encapsulating it within a data unit of the layer below.
    - It is the process of packaging data of a PDU at a higher layer with metadata of a protocol at the current layer, forming a new PDU. In other words, a PDU of a protocol at a higher layer is encapsulated in a PDU of a protocol at the current layer.
- A **Protocol Data Unit (PDU)** is an amount or block of data transferred over a network. 
- Different protocols or protocol layers refer to PDUs by different names:
    - <mark>**Application**: request or response</mark> (Do PDUs exist at the Application layer?)
    - **Transport**: segment (TCP) or datagram (UDP)
    - **Network**: packet
    - **Link / Data Link**: frame
- In all cases, the basic concept is effectively the same; the PDU consists of a header, a data payload, and in some cases a trailer or footer.
- The header and trailer provide protocol-specific metadata about the PDU.  This meta-data attached to it's data payload tells it what to do.
    - For example, an Internet Protocol (IP) packet header would include fields for the Source IP Address and the Destination IP Address, which would be used to correctly route the packet.
- The data payload portion of a PDU is simply the data that we want to transport over the network using a specific protocol at a particular network layer.<br><br>
- The data payload is the key to the way encapsulation is implemented.  **The entire PDU from a protocol at one layer is set as the data payload for a protocol at the layer below**.  
    - For example, a HTTP Request at the Application layer could be set as the payload for a TCP segment at the transport layer.
- The major benefit of this approach is the separation (abstraction) it creates between the protocols at different layers.
- This means that a protocol at one layer doesn't need to know anything about how a protocol at another layer implemented in order for those protocols to interact.  
- It can independently complete its specific communication task without information from other layers.
- It doesn't really matter what the data payload is as long as the header information is complete and the layer can perform its intended function.
- It creates a system whereby **a lower layer effectively provides a 'service' to the layer above it**.
- This is particularly pertinent when there are many different protocols used at one network layer:
    - For example, a TCP segment isn't really concerned whether its data payload is an HTTP request, an SMTP command, FTP or some other sort of Application layer data. **It just knows it needs to encapsulate some data from the layer above and provide the result of this encapsulation to the layer below**.

## What is a Protocol Data Unit (PDU)? What is its purpose in the context of network communication?

- Protocol Data Unit
- A block of data that gets transported over the network by the current "governing" protocol
- The unit itself depends on the layer in which we are currently functioning
- A PDU consists of a header which contains meta-data specific to the current protocol's responsibility/service
- A PDU has a data payload which contains the entire PDU from the layer above the current layer
- It might also have a trailer/footer
- It facilitates encapsulation of data, allowing each protocol to operate a modularized process, and perform the service that it is allocated in conjunction with the other protocols that make up the network.

## How do the different parts of a PDU interact?

- A PDU consists of a header, a data payload and an optional footer/trailer.
- The header contains metadata concerning the current protocol, and this metadata facilitates the service the protocol is performing for the data payload.
- The data payload is the data that we want to transport over the network using a specific protocol at a particular network layer.  It is the PDU of the layer above.

## What is the physical network? What are the characteristics of the physical network?

- The physical layer is the tangible infrastructure (network devices (switches & routers), cables, wires) that transmits all previous encapsulated data (from the layers above) as bits in the form of electrical signals, light and radio waves which carry network communications.
- The functionality at this level is essentially concerned with the transfer of bits (binary data) across a physical medium.
- The physical limitations of networked communication, **latency** and **bandwidth**, all come as a result of unavoidable physical laws that govern this layer.
- These limitations influence how developers use protocols in higher layers when building applications.
- Because these physical limitations are inevitable, they must be mitigated by the choices of the developer, who should always seek to optimize by limiting their effects as much as possible.

## Describe the different elements of latency and what each is caused by

- Latency is a measure of the time it takes for some data to get from one point in a network to another point in a network
    - It is a measure of delay, which is the difference between the start and end.
    - It is determined by real physical laws, such as the distance traveled and the speed of the signal traveling (i.e. speed of light, sound, or electricity).
- Latency has four main aspects that occur during each network "hop" that data takes during its overall journey through the network:
    - **Propagation delay**: this is the amount of time it takes for a message (the first bit) to travel from the sender to the receiver, and can be calculated as the ratio between distance and speed.
    - **Transmission delay**: the amount of time it takes to push the data (all the packet's bits) onto the "link" or "node" (switches, routers, and other network devices) in the overall network
    - **Processing delay**: Data travelling across the physical network doesn't directly cross from one link to another, but is processed in various ways; amount of time it takes to process the data within one of the "nodes" or "links" (for the router to process the packet)  in the overall network.
    - **Queuing delay**: Network devices such as routers can only process a certain amount of data at one time. If there is more data than the device can handle, then it queues, or buffers, the data; the amount of time the data (packet) is waiting in the queue or "buffer" to be processed is the queuing delay.
- The total latency between two points, such as a client and a server, is the sum of all these delays (usually given in milliseconds (ms)), plus any of the following delays:
    - **Last-mile latency**: a "slowing down" that takes place at the network edge, as smaller and more frequent hops take place as data moves lower in the network hierarchy.  The most delay is introduced here.
    - **Round-trip Time (RTT)**: the length of time for a signal to be sent, added to the length of time for an acknowledgement or response to be received.  
        - This could refer to exchanges between 'nodes' on a P2P network, or exchanges between client and server.
        - Latency overhead associated with additional round trips is often a trade off to consider when dealing with the implementation of network reliability in TCP at the transport layer.

## What is the Ethernet protocol? What is its purpose in the context of network communication?

- The Ethernet Protocol is a set of standards and protocols that enables/governs communication between devices on a local network.  it is the most commonly used protocol at the link/ data link layer.
- The Ethernet protocol operating at the link/data link layer is primarily concerned with the identification of the next network "node" to which data should be sent and moving data over the physical network between the devices that comprise it, such as hosts (e.g. computers), switches, and routers.
- Ethernet governs communication between devices in a local network, and is responsible for navigating to the correct physical address, rather than logical one (this is left to IP). For this reason, it acts as an interface between the physical infrastructure below it and the more logical layers above (nework, transport, application etc.).
- It's PDU is called an Ethernet Frame.<br><br>
- The Ethernet protocol provides two main functions:
    - **Framing**, which provides logical structure to the streams of bits traveling through the physical infrastructure/layer of the network by categorizing data into 'fields' that have specific lengths and orders.
        - **Ethernet Frames**: a Protocol Data Unit (PDU) that encapsulates data from the Internet/ Network layer above.
        - The Link/ Data Link layer is the lowest layer at which encapsulation takes place.
        - At the physical layer, the data is essentially a stream of bits in one form or another without any logical structure.
        - Adds logical structure to this binary data.  The data in the frame is still in the form of bits, but the structure defines which bits are actually the data payload, and which are metadata (in the header) to be used in the process of transporting the frame.
        - The "fields" of a frame include:
            - **Source and Destination MAC address**: The source address is the physical address of the device which created the frame. The destination MAC address is the physical address of the device for which the data is ultimately intended.
             - **Data Payload**: Contains the data for the entire Protocol Data Unit (PDU) from the layer above, (commonly) an IP Packet for example.
            - A Frame does not have a header.
    - **Addressing** which identifies the next network "node" to which data should be sent with the use of MAC addressing.  Identifies the intended recipient device.

## What is a MAC address and what are its characteristics?

- Ethernet uses **MAC addressing** to identify devices (rather than location) connected to the local network.  This is how Ethernet implements addressing
- Since this address is linked to the specific physical device, and (usually) doesn't change, it is sometimes referred to as the **physical address** or **burned-in address**.
- MAC Addresses are formatted as a sequence of six two-digit hexadecimal numbers, e.g. `00:40:96:9d:68:0a`, with different ranges of addresses being assigned to different network hardware manufacturers.
- MAC addresses work well in LANs, where devices are connected to a central hub that recalls their specific MAC address or a swithc that can keep a record of each device's address.
    - When using a hub to connect devices to create a network, each receiving device would check its MAC Address against the Desitination MAC Address in the Frame to check if it was the intended receipient.
    - Sending every frame to every device on the network isn't very efficient, especially for large networks.
    - Most modern networks instead use switches. Like a hub, a switch is a piece of hardware to which you connect devices to create a network. Unlike a hub however, a switch uses the destination address in order to direct a frame ***only*** to the device for which it is intended.
- They do not work well in large decentralized systems, nor are they scalable:
    - They are physical, not logical, i.e. they do not change based on location. Each MAC Address is tied (burned in) to a specific physical device
    - The are flat, and do not posses a hierarchical structure that allows us to categorize them into searchable subdivisions. The entire address is a single sequence of values and can't be broken down into sub-divisions.
- If we want to solve these problems, we need a different system of rules that doesn't have these limitations and that can scale in such a way that we can build a network of networks which spans the entire globe. The Internet Protocol provides just such a set of rules.

## What is the primary function of the Internet / Network Layer? What Protocols govern this function?

- Whereas the Ethernet protocol provides communication between devices on the same local network, the Internet Protocol enables communication between two networked devices anywhere in the world.
- The primary function of protocols at this layer is to facilitate communication between hosts (e.g. computers) on different networks (i.e. inter-network communication).
- The **Internet Protocol (IP)** is the predominant protocol used at this layer for inter-network communication. 
- The primary features of IP are:
    - Routing capability via IP addressing
    - Encapsulation of data into packets
- A **Packet** is the Protocol Data Unit (PDU) within the IP Protocol
    - Just as with Ethernet Frames, the data payload of an IP Packet is the PDU from the layer above (generally a TCP segment or a UDP datagram from the transport layer)
    - A packet consists of a header and a data payload
    - The IP packet is responsible for routing all the encapsulated data on its journey, which consists of a series of network "hops", or jumps between various nodes (routers) on the overall network.
    - The header is split into logical fields which provide metadata used in transporting the packet.
    - The header fields include:
        - **Source Address**: the 32-bit IP address of the source (sender) of the packet. Allows for IP addressing.
        - **Destination Address**: the 32-bit IP address of the destination (intended recipient) of the packet. Allows for IP addressing.<br><br>
- An **IP Address** is a unique address that we can use to identify a device or host on the internet.
- IP addresses have two main features that allow for inter-network communication across a large distributed system:
    - They are logical: they are assigned as required when devices join a network
    - They are hierarchical: the structure of the address allows us to categorize them into searchable subdivisions (subnets). The overall network is divided into logical sub-networks and numbers are allocated according to this hierarchy.
        - A range of IP addresses is defined by network hierarchy, and each subnetwork is assigned a given range of addresses.
        - The network address is assigned to the first address in the range and the broadcast address is assigned to be the last address in that range.
- There are two types of IP addresses in two different versions of IP:
    - IPv4 = 32-bit addresses provides 4.3 billion possible addresses, which is not enough for all the devices on the network
    - IPv6 = 128-bit addresses provide 340 undecillion addresses, hopefully will be enough for a long time to come<br><br>
- MAC addresses, due to their nature (physical (*not logical*), flat (*not hierarchical*), are not scalable. IP addresses fill this gap. Because they are logical and hierarchical, they work well in large distributed systems.
- Unlike MAC Addresses, IP Addresses are logical in nature. This means that they are not tied to a specific device, but can be assigned as required to devices as they join a network.
- The IP address only gets us in communication with the intended device. It does not allow us to isolate any particular application of process running on that device. For that we need the Port numbers provided by the Transport Layer protocol.

## How does IP structure data and implement its functionality?

- A **Packet** is the Protocol Data Unit (PDU) within the IP Protocol
    - A packet consists of a header and a data payload
    - Just as with Ethernet Frames, the data payload of an IP Packet is the PDU from the layer above (generally a TCP segment or a UDP datagram from the transport layer)
    - The IP packet is responsible for routing all the encapsulated data on its journey, which consists of a series of network "hops", or jumps between various nodes (routers) on the overall network.
    - The header is split into logical fields which provide metadata used in transporting the packet.
    - The header fields include:
        - **Source Address**: the 32-bit IP address of the source (sender) of the packet. 
        - **Destination Address**: the 32-bit IP address of the destination (intended recipient) of the packet.
    - These addresses allow for IP addressing