**MAIN POINTS**

- What types of connection each layer / protocol provides

-------------

At the computer/device level, it starts with the HTTP request, which gets encapsulated into the TCP segment, then into the IP packet, then into the Ethernet Frame, and then it is sent vis wire/fiberoptic line (or whatever) to the host.

How does this translate to the TCP/IP chart? Is it like this:

Application => HTTP request

Transport => TCP (segment)

Internet => IP packet

Link/Data Link => Ethernet Frame => then sent via physical network

When the host receives the request, it just unpacks all the PDUs at each level in reverse until it reaches the topmost HHTP data from the application? And then sends a response the same way as outlined above?


##############

**Physical**: physical point-to-point communication<br>
**Data Link**: logical point-to-point communication<br>
**Internet**: end-to-end communication between devices<br>
**Transport**: end-to-end communication between applications<br>

###############

1. we type in the URL into the search address field in the browser and hit "Enter". This is the top layer of the model -- HTTP request
2. TCP adds some metadata as a header to the Data Payload from the HTTP request layer
3. TCP segment is once again updated and gets the requesting IP address (website is converted to the e.g. 109.156.106.57) and the IP address of the recipient device (e.g. 109.156.106.254)
4. Additional LAN metadata is added to the IP packet data to form an Ethernet Frame.
5. The request is transferred over the physical network to the specified IP location of the website and gets the requested data.
6. Data is transferred back to the LAN of the requesting device and loaded on this device.

###############



# <mark>Internet</mark>

### What is the internet?

**Network**
- A network is when at least two devices connect to communicate or exchange data
    - Multiple computers and other devices connected via a network bridging device such as a hub or, more likely, a switch. The computers are all connected to this device via network cables, and this forms the network.
- Our home’s LAN (Local Area Network) allows multiple devices, or clients, to connect either via an **Ethernet** cable or wirelessly to a **switch** (what most of us colloquially call a “router”), which is itself connected to a **modem**.
    - Most modern networks instead use switches. 
    - A **switch** is a networking device that sends data to the appropriate device within the LAN, using an ARP table that matches each IP address to a MAC address.
    - The scope of communications is limited to devices that are connected (either wired or wirelessly) to the network switch or hub, which imposes some geographic limitations. (That's the 'local' in Local Area Network.)
        - Like a hub, a switch is a piece of hardware to which you connect devices to create a network. 
        - Unlike a hub however, a switch uses the destination address in order to direct a frame only to the device for which it is intended.
    - A **hub** is a basic piece of network hardware that replicates a message and forwards it to all of the devices on the network. 
        - In our hub scenario, each receiving device would check its MAC Address against the Destination MAC Address in the Frame to check if it was the intended recipient.  If it wasn't, then it would just ignore the frame.
    - **Ethernet** is the protocol that allows for communication within a LAN.
- Each device that can connect to a network does so through its Network Interface Card (NIC)
- Each NIC has a unique **Media Access Control (MAC) address**, which is a physical, hardcoded identifier that distinguishes that device on a network.
- Each device on a network can be statically or dynamically assigned an **IP address** by a switch
- **Unlike a MAC address, an IP Address is not hardcoded into a device, but is instead logical and hierarchical, which allows a device to be more easily located based on its IP address.**

**Internet**

- The internet is a network of networks or a vast number of networks connected together.
- In between all of the sub-networks are systems of routers that direct network traffic.
- The internet can be thought of as the infrastructure that enables inter-network communication, both in terms of the physical network and the lower-level protocols that control its use.
- The internet is a vast network of networks. It is comprised of both the network infrastructure itself (devices, routers, switches, cables, etc) and the protocols that enable that infrastructure to function.
    - **Routers** are network devices that can route network traffic to other networks. Within a Local Area Network, they effectively act as gateways into and out of the network.
- It’s useful to think of the Internet as just a network of routers spanning across the entire world, connecting networks together.
- Each router redirects data across a hop to another router, getting the data progressively closer to its final destination IP address.<br><br>
- The **World Wide Web** is a **service** that can be accessed via the internet. 
- It is a vast information system of resources which are navigable by means of a URL (Uniform Resource Locator).
- HTTP is the primary means by which applications interact with the resources that make up the web.<br><br>
- **HTML** is the means by which the resources on the Web should be uniformly structured.
- **URI** is part of a system bu which resources should be uniformly addressed on the Web.
- **HTTP** is the set of rules which provide uniformity to the way resources on the web are transferred between applications.<br><br>
**The web is comprised of the resources that are being transported.  The internet is the infrastructure that enables the transferring.**

### What are protocols, and why are they necessary?

- Network Protocols are a system or set of rules that govern the exchange or transmission of data.
    - **Application**: HTTP
    - **Session**: TLS or DTLS
    - **Transport**: TCP or UDP
    - **Internet / Network**: IP
    - **Link / Data Link**: Ethernet
- Main reasons why there are so many different protocols for network communication:
    - Different protocols were developed to address different aspects of network communication.
        - TCP and UDP would be examples of two protocols that address the same fundamental aspect of communication, the transfer of messages between applications, but do so differently.
    - Different protocols were developed to address the same aspect of network communication but differently for a specific use case.
        - TCP and HTTP are examples of two protocols that address different aspects of communication; TCP provides for the transfer of messages between applications, while HTTP defines the structure of those messages.
- A single network communication typically uses multiple different protocols. The different protocols each provide different services or functionality, and can operate at different network 'layers'.
- We need them because otherwise the many and varied devices on the network would not have a cohesive and uniform communication method
- Different types of protocols are concerned with different aspects of network communication. It can be useful to think of these different protocols as operating at particular 'layers' of the network.

### Explain how data encapsulation works in the context of a network model

**Data Encapsulation**

- **Data Encapsulation** in the context of a network communication models means that we are essentially hiding data from one layer by encapsulating it within a data unit of the layer below.
    - With encapsulation, the entire PDU from one layer forms the data payload for the PDU at the layer below.
    - It is the process of packaging data of a PDU at a higher layer with metadata of a protocol at the current layer, forming a new PDU. In other words, a PDU of a protocol at a higher layer is encapsulated in a PDU of a protocol at the current layer.
    - Encapsulation is a means by which protocols at different network layers can work together.
- The OSI and TCP/IP models standardize a layered approach to telecommunications, where each layer provides a certain level of abstraction and functionality. Each layer also contains some encapsulated data to be opened and used by the layer directly above it.
- A **Protocol Data Unit (PDU)** is an amount or block of data transferred over a network.
- Encapsulation is implemented through the use of Protocol Data Units (PDUs). The PDU of a protocol at one layer, becomes the data payload of the PDU of a protocol at a lower layer.
- Different protocols or protocol layers refer to PDUs by different names:
    - <mark>**TLS**: record</mark>
    - **Transport**: segment (TCP) or datagram (UDP)
    - **Internet / Network**: packet
    - **Link / Data Link**: frame
- In all cases, the basic concept is effectively the same; the PDU consists of a header, a data payload, and in some cases a trailer or footer.
- The header and trailer provide protocol-specific metadata about the PDU.  This meta-data attached to it's data payload tells it what to do.
    - For example, an Internet Protocol (IP) packet header would include fields for the Source IP Address and the Destination IP Address, which would be used to correctly route the packet.
- The data payload portion of a PDU is simply the data that we want to transport over the network using a specific protocol at a particular network layer.
- The data payload is the key to the way encapsulation is implemented.  **The entire PDU from a protocol at one layer is set as the data payload for a protocol at the layer below**.  
    - For example, a HTTP Request at the Application layer could be set as the payload for a TCP segment at the transport layer.
    
**Benefits**

- The major benefit of this approach is the separation (abstraction) it creates between the protocols at different layers.
- This means that a protocol at one layer doesn't need to know anything about how a protocol at another layer implemented in order for those protocols to interact.  
- It can independently complete its specific communication task without information from other layers.
- It doesn't really matter what the data payload is as long as the header information is complete and the layer can perform its intended function.
- It creates a system whereby **a lower layer effectively provides a 'service' to the layer above it**.
- This is particularly pertinent when there are many different protocols used at one network layer:
    - For example, a TCP segment isn't really concerned whether its data payload is an HTTP request, an SMTP command, FTP or some other sort of Application layer data. **It just knows it needs to encapsulate some data from the layer above and provide the result of this encapsulation to the layer below**.<br><br>
- The concept of **encapsulating each layer’s PDU** provides a basic foundation for how various Internet security measures function. 
- Most fundamentally, encapsulation is a way to enclose certain bits of data to render them inaccessible from external contexts.
- Each layer therefore secures its contents from the layer beneath it.

### What is a Protocol Data Unit (PDU)? What is its purpose in the context of network communication?

- Protocol Data Unit
- A block of data that gets transported over the network by the current "governing" protocol
- The unit itself depends on the layer in which we are currently functioning
- A PDU consists of a header which contains metadata specific to the current protocol's responsibility/service
- A PDU has a data payload which contains the entire PDU from the layer above the current layer
- It might also have a trailer/footer
- It facilitates encapsulation of data, allowing each protocol to operate a modularized process, and perform the service that it is allocated in conjunction with the other protocols that make up the network.

### How do the different parts of a PDU interact?

- A PDU consists of a header, a data payload and an optional footer/trailer.
- The header contains metadata concerning the current protocol, and this metadata facilitates the service the protocol is performing for the data payload.
- The data payload is the data that we want to transport over the network using a specific protocol at a particular network layer.  It is the PDU of the layer above.

## Physical Network 

### What is the physical network? What are the characteristics of the physical network?

- The Physical layer is the most rudimentary and foundational level, on top of which all the other layers rest.
- The physical layer is the tangible infrastructure (network devices (switches & routers), cables, wires) that transmits all previous encapsulated data (from the layers above) as bits in the form of electrical signals, light and radio waves which carry network communications over a coaxial or fiberoptic wire or a wireless medium, like Wi-Fi.
- The functionality at this level is essentially concerned with the transfer of bits (binary data) across a physical medium.
- Consisting of bits transmitted either across a wire or wirelessly, **the physical layer contains a payload that will be progressively decoded by higher layers**.
- The physical limitations of networked communication, **latency** and **bandwidth**, all come as a result of unavoidable physical laws that govern this layer.
- These limitations influence how developers use protocols in higher layers when building applications.
- Because these physical limitations are inevitable, they must be mitigated by the choices of the developer, who should always seek to optimize by limiting their effects as much as possible.<br><br>
- **Network Hops**: The journey of a piece of data on the network isn't direct from the start point to the end point but will consist of several 'hops', or journeys, between nodes on the network. You can think of the nodes as routers that process the data and forward it to the next node on the path.<br><br>
- Although the Physical layer doesn’t technically contain anything that can be classified as data ***yet***, and therefore does not have a PDU, there is nevertheless a concept worth addressing – the **Interframe Gap**.
- The **Interframe Gap (IFG)** is a required pause in the signal transmission, which lets a NIC card operating on layer 2 know that a frame was completed and another may begin.
- The physical layer’s IFG therefore contributes to reliability by ensuring that the signals of one frame don’t get accidentally mixed up with the signals of another frame during the initial transmission.
- Ethernet also specifies an interframe gap (IFG). This gap is a brief pause between the transmission of each frame, which permits the receiver to prepare to receive the next frame. The length of this gap varies according to the capability of the Ethernet connection. 
- This Interframe gap contributes to the Transmission Delay element of latency.

### Describe the different elements of latency and what each is caused by

**Latency**

- **Latency** is a measure of the time it takes for some data to get from one point in a network to another point in a network
    - It is a measure of delay, which is the difference between the start and end.
    - It is determined by real physical laws, such as the distance traveled and the speed of the signal traveling (i.e. speed of light, sound, or electricity).
    - Whereas bandwidth measures the **maximum size** of data per a set unit of time (i.e. a second), latency measures the time it takes for a set, **minimal size** of data.
- Latency has four main aspects that occur during each network "hop" that data takes during its overall journey through the network:
    - **Propagation delay**: this is the amount of time it takes for a message (the first bit) to travel from the sender to the receiver, and can be calculated as the ratio between distance and speed.
        - We can think of propagation delay as the physical limits imposed by nature and geographic distance.
    - **Transmission delay**: the amount of time it takes to push the data (all the packet's bits) onto the "link"/"node"/hops (switches, routers, and other network devices) in the overall network
    - **Processing delay**: Data travelling across the physical network doesn't directly cross from one link to another, but is processed in various ways; amount of time it takes to process the data within one of the "nodes" or "links" (for the router to process the packet)  in the overall network.
    - **Queuing delay**: Network devices such as routers can only process a certain amount of data at one time. If there is more data than the device can handle, then it queues, or buffers, the data; the amount of time the data (packet) is waiting in the queue or "buffer" to be processed is the queuing delay.
        - This can be caused either because of insufficient bandwidth, a particularly high traffic time of the day, or an inefficient router. 
        - If data is arriving at a router at a faster rate than the router is able to redirect it appropriately, this gives rise to queuing delay. 
- The total latency between two points, such as a client and a server, is the sum of all these delays (usually given in milliseconds (ms)), plus any of the following delays:
    - **Last-mile latency**: a "slowing down" that takes place at the network edge (last link stretch closest to the customer, a stretch often called the “last mile"), as smaller and more frequent hops take place as data moves lower in the network hierarchy.  The most delay is introduced here.
    - **Round-trip Time (RTT)**: the length of time for a signal to be sent, added to the length of time for an acknowledgement or response to be received.  
        - Essentially the sum of the latency of going from A to B, plus the latency of coming back from B to A.
        - This could refer to exchanges between 'nodes' on a P2P network, or exchanges between client and server.
        - Latency overhead associated with additional round trips is often a trade off to consider when dealing with the implementation of network reliability in TCP at the transport layer.
        
**Bandwidth**

- **Bandwidth** is the amount of data that can be sent along the physical structure of the network / over a network connection in a particular unit of time (typically, a second).
- It is a measure of capacity.
- It is also determined by real physical laws, such as the capacity of the medium down which data is being transported.
- Because this is almost never a constant amount, we consider the bandwidth of a connection to be whatever value is the lowest value over the entire connection.

## Data Link / Link Layer

### What is the Ethernet protocol? What is its purpose in the context of network communication?

**Ethernet Protocol**

- The second layer is the Data Link layer, which interprets the physical transmissions of Layer 1, and converts them into a Frame.
- The Ethernet Protocol is a set of standards and protocols that enables/governs communication between devices on a local network.  It is the most commonly used protocol at the link / data link layer.
- The Ethernet protocol operating at the link/data link layer is primarily concerned with the identification of the next network "node" to which data should be sent and moving data over the physical network between the devices that comprise it, such as hosts (e.g. computers), switches, and routers.
- Ethernet governs communication between devices in a local network, and is responsible for navigating to the correct physical address, rather than logical one (this is left to IP). For this reason, it acts as an interface between the physical infrastructure below it and the more logical layers above (nework, transport, application etc.).
- Because this layer delivers data from one point (or node) on a network to another, it is referred to as providing a **node to node connection**.

**Ethernet Frame**

- It's PDU is called an Ethernet **Frame**.
- The Ethernet protocol provides two main functions:
    - **Framing**, which provides logical structure to the streams of bits traveling through the physical infrastructure/layer of the network by categorizing data into 'fields' that have specific lengths and orders.
        - **Ethernet Frames**: a Protocol Data Unit (PDU) that encapsulates data from the Internet/ Network layer above.
        - The Link/ Data Link layer is the lowest layer at which encapsulation takes place.
        - At the physical layer, the data is essentially a stream of bits in one form or another without any logical structure.
        - Adds logical structure to this binary data.  The data in the frame is still in the form of bits, but the structure defines which bits are actually the data payload, and which are metadata (in the header) to be used in the process of transporting the frame.
        - The "fields" of a frame include:
            - **Source and Destination MAC address (header)**: The source address is the physical address of the device which created the frame. The destination MAC address is the physical address of the device for which the data is ultimately intended.
             - **Data Payload**: Contains the data for the entire Protocol Data Unit (PDU) from the layer above, (commonly) an IP Packet for example.
             - **Frame Check Sequence (FCS) (footer/trailer)** is what the Data Link layer uses for error detection (using a checksum), to make sure that the frame has neither lost any bits along the way, nor has been corrupted due to signal interference.
    - **Addressing** which identifies the next network "node" to which data should be sent on the local network with the use of MAC addressing.  Identifies the intended recipient device.
        - LANs, NIC Cards, and Switches all function at this layer. Switches read and process each frame, and send them to the appropriate device on the network that has the frame’s destination MAC Address.
- With Ethernet there's decapsulation and re-encapsulation at every point on the journey. So when a device such as a router receives a frame that has an IP packet as its payload, it decapsulates the packet, and re-encapsulates it it into a new frame for the next 'hop' on its journey.

### What is a MAC address and what are its characteristics?

- Ethernet uses **MAC addressing** to identify devices (rather than location) connected to the local network.  This is how Ethernet implements addressing
- Since this address is linked to the specific physical device, and (usually) doesn't change, it is sometimes referred to as the **physical address** or **burned-in address**.
- MAC Addresses are the manufacturer's ID burned into the device.
- MAC Addresses are formatted as a sequence of six two-digit hexadecimal numbers, e.g. `00:40:96:9d:68:0a`, with different ranges of addresses being assigned to different network hardware manufacturers.
- MAC addresses work well in LANs, where devices are connected to a central hub that recalls their specific MAC address or a switch that can keep a record of each device's address.
    - When using a hub to connect devices to create a network, each receiving device would check its MAC Address against the Desitination MAC Address in the Frame to check if it was the intended receipient.
    - Sending every frame to every device on the network isn't very efficient, especially for large networks.
    - Most modern networks instead use switches. Like a hub, a switch is a piece of hardware to which you connect devices to create a network. Unlike a hub however, a switch uses the destination address in order to direct a frame ***only*** to the device for which it is intended.
    
**Advantages/disadvantage**

- They do not work well in large decentralized systems, nor are they scalable:
    - They are physical, not logical, i.e. they do not change based on location. Each MAC Address is tied (burned in) to a specific physical device
    - The are flat, and do not posses a hierarchical structure that allows us to categorize them into searchable subdivisions. The entire address is a single sequence of values and can't be broken down into sub-divisions.
- If we want to solve these problems, we need a different system of rules that doesn't have these limitations and that can scale in such a way that we can build a network of networks which spans the entire globe. The Internet Protocol provides just such a set of rules.

## Internet / Network Layer

### What is the primary function of the Internet / Network Layer? What Protocols govern this function?

- The Internet layer is where the majority of the action takes place, because that’s where the data travels across routers, from **network to network**.
- **Whereas the Ethernet protocol provides communication between devices on the same local network, the Internet Protocol enables communication between two networked devices anywhere in the world.**
- The primary function of protocols at this layer is to facilitate communication between hosts (e.g. computers) on different networks (i.e. inter-network communication). 
- The **Internet Protocol (IP)** is the predominant protocol used at this layer for inter-network communication
    - There are two version of IP currently in use: ***Internet Protocol version 4 (IPv4)*** and the ***Internet Protocol version 6 (IPv6)***
- The primary features of IP are:
    - Routing capability via IP addressing
    - Encapsulation of data into packets

### How does IP structure data and implement its functionality?

**Packet**

- A **Packet** is the Protocol Data Unit (PDU) within the IP Protocol
    - A packet consists of a header and a data payload
    - Just as with Ethernet Frames, the data payload of an IP Packet is the PDU from the layer above (generally a TCP segment or a UDP datagram from the transport layer)
    - The IP packet is responsible for routing all the encapsulated data on its journey, which consists of a series of network "hops", or jumps between various nodes (routers) on the overall network.
    - The header is split into logical fields which provide metadata used in transporting the packet.
    - The header fields include:
        - **Source Address**: the 32-bit IP address of the source (sender) of the packet. 
        - **Destination Address**: the 32-bit IP address of the destination (intended recipient) of the packet.
    - These addresses allow for IP addressing<br><br>
    - At this level, routers open up the frame, and read the encapsulated data, which is known as a **Packet**.
    - Routers then progressively forward the packet to the router that will get it closer to its destination, until it finally arrives to the destination network containing the destination IP address.
    - For this reason, the Internet layer is known as providing a **network to network connection**.
    
**IP Addressing** 

- An **IP Address** is a unique address that we can use to identify a device or host on the internet.
- The Internet Protocol uses a system of addressing (IP Addressing) to direct data between one device and another across networks.
- IP is **end to end communication between devices** (i.e. it only cares about the two end points in the communication, such as the client and server, not particularly about how the packets are routed through the network).
- IP addresses have two main features that allow for inter-network communication across a large distributed system:
    - They are logical: they are assigned as required when devices join a network
        - Unlike MAC Addresses, IP Addresses are logical in nature. This means that they are not tied to a specific device, but can be assigned as required to devices as they join a network.
    - They are hierarchical: the structure of the address allows us to categorize them into searchable subdivisions (***subnetting***). The overall network is divided into logical sub-networks and numbers are allocated according to this hierarchy.
        - A range of IP addresses is defined by network hierarchy, and each subnetwork is assigned a given range of addresses.
        - The network address is assigned to the first address in the range and the broadcast address is assigned to be the last address in that range.
        
**MAC addresses v.s. IP Addresses**

- MAC addresses, due to their nature (physical (*not logical*), flat (*not hierarchical*)), are not scalable. IP addresses fill this gap. Because they are logical and hierarchical, they work well in large distributed systems.
- If we did not have IP addresses and had to rely simply on MAC addresses to connect a client and a server, the process would be profoundly inefficient and time consuming, because every device would need to somehow:
    - keep track of where all the most-used destination servers are
    - the MAC addresses of all the routers along the way
    - update these routes every single time a router or server is replaced, since each device has a unique MAC address.
-  The fact that the addresses are non-hierarchical means that routing devices would need a record of each single address that existed somewhere in the world; that would mean storing impossibly large tables.
<br><br>
- The IP address only gets us in communication with the intended device. It does not allow us to isolate any particular application or process running on that device. For that we need the Port numbers provided by the Transport Layer protocol.

### Why are there two versions of IP?

- The main difference between packets encoded in IPv4 and those encoded in IPv6 is the **size of the space reserved on the packet’s header for the source and destination IP addresses**.
- There are two types of IP addresses in two different versions of IP:
    - IPv4 = 32-bit addresses provides 4.3 billion possible addresses, which is not enough for all the devices on the network
        - `109.156.106.132`
        -  4 sets of numbers, , each containing 8 bits of information
        - The first two sets of numbers usually references the network and sub-network, respectively. The third is the host, and the fourth part references a machine connected to that host.
    - IPv6 = 128-bit addresses provide 340 undecillion addresses, hopefully will be enough for a long time to come
        - `3002:0bd6:0000:0000:0000:ee00:0033:6778`
        - 8 sets of hexadecimal characters, each containing 16 bits of information. 
        - The first 4 sets are used to locate a specific network on the internet. The last 4 sets are typically used to identify a particular interface or device within that network.

### What gaps in MAC addressing does IP addressing fill?

- MAC addresses, due to their nature (physical (not logical), flat (not hierarchical), are not scalable. IP addresses fill this gap. B
- Because they are logical and hierarchical, they work well in large distributed systems.

### What does IP addressing allow us to do, and what does it not allow us to do?

- The IP address only gets us in communication with the intended device. It does not allow us to isolate any particular application or process running on that device. For that we need the Port numbers provided by the Transport Layer protocol.

# <mark>TCP and UDP</mark>

### What is The Transport Layer and what is it concerned with?

- TCP (Transmission Control Protocol) ensures reliable data transfer between applications on top of the unreliable channel of the lower-level protocols.
- Enables end-to-end communication between a specific process running on two different devices
- TCP provides multiplexing services.  This means it enables the transmission of multiple signals (data inputs) over a single channel, such as a single device communicating with the browser, the e-mail client, and streaming Spotify all through the same Network connection.
- This is important because often there are multiple applications running on a single device, and yet IP addresses only provide a ***single channel***.

### How is multiplexing enabled?

- The process of sending the contents of received datagrams and segments to the correct destination ports (using the destination port number in the segment/datagram header) is known as **demultiplexing**. 
- The reverse process, which is gathering content from all the ports, and blending them into a single channel is known as **multiplexing**.
- In the context of a communication network, multiplexing is the idea of transmitting multiple signals over a single channel, such as a single device communicating with the browser, the e-mail client, and streaming Spotify all through the same Network connection.
- Multiplexing is enabled through the use of network ports (port numbers) alongside IP addresses
- Each specific process is assigned a single port, which can be used to identify that same process running on a different device.
- An IP address and port number combined define a communication end-point known as a **network socket**.
- That’s why the Transport layer is known as providing a **port to port connection**, or a **socket to socket connection**<br><br>
- Essentially, the source and destination IP addresses in the packet header are used to create a single communication channel between hosts, while the source and destination port numbers (present in both TCP Segments and UDP Datagrams) are used to transmit multiple data inputs across that single channel, and to separate them out on the other side. This is the core of multiplexing and demultiplexing.<br><br>
- Multiplexing and demultiplexing is a general concept in networking, but the application of it that we discuss in this course is with regards to the service provided by the transport layer protocols TCP and UDP. In that context, this **multiplexing is conducted using the channel provided by the protocol at the layer below, namely the Internet Protocol (IP)**.
- We can think of **IP as a logical end-to-end connection or channel between devices** (in the sense that packets will be routed to the correct device using the IP address), rather than a 'physical' channel (though clearly the underlying physical network is required in order to enable this. 
- What I mean by this idea of a logical channel is that the actual routes taken by the packets as they travel from one device to another is kind of irrelevant in the sense that all we care about is that the data gets from one device to another. Think about this like making a phone-call. When you call someone, you don't actually care how your call is routed through to them, all you care about is that when you dial a certain number the person you are calling picks up at the other end. This is the idea of a logical channel.
- However, IP only provides this channel between devices. **If we want to have multiple processes use this channel at the same time, we need to use multiplexing/ demultiplexing**.

### What is a socket? What is its purpose in the context of network communication?

- A port is an identifier for a specific process or application running on a host.
- A socket refers to the communication end-point that consists of the port number and IP address together.
- The IP address gets us the correct device on the network, and the port number gets us to the correct application on that device.
- Sockets facilitate multiplexing.<br><br>
- Other things to consider:
    - The ability to programmatically instantiate socket objects specifically defined to listen for particular communications (i.e. for a certain application from a certain host) allows for us to implement both connection-oriented and connectionless communication systems.
    - Conceptually, a socket facilitates multiplexing. On a practical level, instantiation of a socket object in code can implement a TCP or UDP connection specifically.

### What is the difference between a connectionless system and a connection-oriented system?

**Connectionless system**

- A **connectionless system** relies on a single socket for all communication, does not establish dedicated communication channels, and responds to all communication individually as they arrive.
    - There is one socket object defined by the IP address of the host machine and the port assigned to a particular process running on that machine.
    - That object could call a `listen()` method which would allow it to wait for incoming messages directed to that particular IP/port pair.
    - It would simply process any incoming messages as they arrived and send any responses as necessary.
    - It does not matter from what process transmissions come, a single socket listens to all messages regardless and responds to each as it arrives.
    - This is useful because it is 
        - a) a simpler and more flexible process than a connection-oriented system and 
        - b) it reduces latency overhead because a connection does not have to be established.<br><br>
        
**Connection-oriented system**

- A **connection-oriented system** instantiates new socket object to establish a dedicated virtual connection channel between two processes running on separate devices.
    - It doesn't start sending application data until a connection has been established between application processes
    - This is done by having a socket object defined by the host IP and process port use a `listen()` method to wait for incoming messages.
    - When new communication comes into the first listening socket, a new socket is created.  This new socket object is defined by both the local IP and port number and the IP and port of the host/process which sent the message.
    - This new socket listens specifically for messages that match its four-tuple, i.e. the IP and port of sender along with the IP and port of the receiver.
    - Implementing communication in this way effectively creates a dedicated virtual connection for communication between a specific process running on one host and a specific process running on another host. 
    - The advantage of having a dedicated connection like this is that it more easily allows you to put in place rules for managing the communication such as the order of messages, acknowledgements that messages had been received, retransmission of messages that weren't received, and so on.<br><br>
    
***
- Essentially, **the source and destination IP addresses in the packet header are used to create a single communication channel between hosts**, while **the source and destination port numbers (present in both TCP Segments and UDP Datagrams) are used to transmit multiple data inputs across that single channel, and to separate them out on the other side. This is the core of multiplexing and demultiplexing**.
- The difference in how this is executed between connectionless and connection-oriented protocols comes down to how the host machine handles the PDU that it receives. On a connectionless system, the host will have a single socket object "listening" for messages sent to that specific destination IP address / destination port number pair. It doesn't care where the messages come from or the order they're in -- it simply receives the messages as they arrive and sends responses as appropriate.
- On a connection-oriented system, the host machine also has a socket object "listening" for messages sent to that specific IP address/ port pair. However, when it receives a message, it looks at the source IP address and source port number, and creates a new socket dedicated to listening to messages containing that specific "four-tuple" of information (source and destination IP address and port number). Further messages with a matching four-tuple are sent to and handled by that socket object.
- So, while both protocols provide multiplexing and demultiplexing by including source and destination port numbers in their PDUs, **the difference between them comes down to how that information is utilized by the host machine** -- particularly with regards to the creation of new socket objects.
- Despite being connectionless, UDP is still able to perform demultiplexing of incoming packets. This is because each UDP packet contains a source port number and a destination port number. The source port number is used to identify the sending application on the source host, while the destination port number is used to identify the receiving application on the destination host.
***

### How are connections in a connection-oriented system recognized?

- Via a four-tuple:
    - Source IP address and port number
    - Destionation IP address and port number

### What is the TCP protocol? What services does it provide?

**TCP**

- TCP (Transmission Control Protocol) ensures reliable data transfer between applications on top of the unreliable channel of the lower-level protocols.
- Enables end-to-end communication between a specific process running on two different devices 
- TCP provides reliability through message acknowledgement and retransmission, and in-order delivery.
- The reliability of TCP comes from the TCP three-way handshake process, which has four main features:
    - data integrity
    - de-duplication
    - in-order delivery
    - retransmission of lost data.
    
**Segments**

- **Segments** are the Protocol Data Unit (PDU) that operates with the TCP protocol. Like the PDUs of protocols we've looked at for other network layers, it uses a combination of headers and payload to provide encapsulation of data from the layer above.
    - Data from the application layer is encapsulated as the data payload in this PDU, and the source and destination port numbers within the PDU can be used to direct that data to specific processes on a host. 
    - The **Source and Destination port numbers** are fields in the segment header, while data such as an HTTP request (or another type of Data/Message at the Application level) is part of the payload.
    - It provides five main services:
        - **Multiplexing** through **source and destination port numbers**
        - **Error detection** corrupt data is identified using a **checksum**
        - **In-order deliver, handling data loss, and handling data duplication (data reliability)** through **sequence and acknowledgment numbers**
        - **(message acknowledgement and retransmission, and in-order delivery)**
            - **In-order delivery**: data is received in the order that it was sent (**sequence numbers**)
            - **Handling data loss**: missing data is retransmitted based on **acknowledgements** and timeouts
            - **Handling duplication**: duplicate data is eliminated through the use of **sequence numbers**
        - **Flow control** through **window size** data
        - **Congestion avoidance** through dynamic adjustment of flow according to data loss

### What are the steps for the three-way handshake? What is its purpose?

- The three-way handshake is what TCP uses to establish a dedicated and reliable connections between two different processes over the network.
    - The three-way handshake is used for establishing connections, though there is another process known as the **four-way handshake** which can be used to close connections.
- The TCP Handshake establishes this initial connection, using the **Flags** component of the segment header.
- It uses a three-step process that takes one and a half RTTs (round-trip times).
    - 1. The client sends an empty (i.e. with no payload, or bodyless) SYN segment (i.e. where the segment header’s SYN flag is set to true, or on)
    - 2. The server receives the segment and replies with another bodyless segment with the SYN and ACK flags turned on
    - 3. The client receives the server’s segment and sends a final acknowledgment back to the server, which is a bodyless segment containing only the ACK flag turned on.
    - Immediately after that’s complete, the client begins sending the actual HTTP request (or whatever is being sent in the Message).<br><br>
- This not only ensures a reliable connection between both devices, but synchronizes sequence numbers that will be used during the connection.
- It is this aspect of TCP that enables network reliability, that is, handling data loss through message acknowledgement, and ensuring in order delivery and de-duplication via the synchronized segment numbers.
- A key characteristic of the process is that the sender cannot send any application data until after it has sent the ACK Segment.
- What this means in practical terms, is that there is an entire round-trip of latency before any application data can be exchanged. Since this hand-shake process occurs every time a TCP connection is made, this clearly has an impact on any application which uses TCP at the transport layer.
- This can contribute to the overall latency of the trip, due to its complexity.<br><br>
- Sender sends one message at a time, with a sequence number, and sets a timeout
- If message received, receiver sends an acknowledgement which uses the sequence number of the message to indicate which message was received
- When acknowledgement is received, sender sends next message in the sequence
- If acknowledgement is not received before the timeout expires, sender assumes either the message or the acknowledgement went missing and sends the same message again with the same sequence number
- If the recipient receives a message with a duplicate sequence number it assumes the sender never received the acknowledgement and so sends another acknowledgement for that sequence number and discards the duplicate

### What are the disadvantages of TCP?

- The **main downsides of TCP** are the latency overhead of establishing a connection, and the potential Head-of-line blocking as a result of in-order delivery.
    - TCP provides reliability at the cost of speed (that is, its reliability functions can contribute greatly to latency)
    -  The added overhead due to the need of establishing a connection with the three-way handshake, which can add up to two round trip times of latency.
    - **Head-of-Line (HOL) blocking** relates to how issues in delivering or processing one message in a sequence of messages can delay or 'block' the delivery or processing of the subsequent messages in the sequence.
        - HOL blocking can occur as a result of the fact that TCP provides for in-order delivery of segments. If one of the segments goes missing and needs to be retransmitted, the segments that come after it in the sequence can't be processed, and need to be buffered until the retransmission has occurred.
        - This can lead to increased **queuing delay** which is one of the elements of latency.
- It's not as flexible as protocols like UDP

### What is flow control?

- Flow control is a mechanism to prevent the sender from overwhelming the receiver with too much data at once 
- Provided by TCP, flow control helps to ensure that data is transmitted as efficiently as possible.
- This, in turn, helps to mitigate the increased latency inherent in TCP connections.
- It is implemented via the **Window Size** field of the TCP segment header.
    - Each side of a connection can let the other side know the amount of data that it is willing to accept
    - The window header field contains data sent by the receiver letting the sender know the maximum amount of data it can accept at any given time.  
    - This number is dynamically generated, and therefore the receiver can lower the amount if the buffer is getting full, and the sender will respond accordingly.
    - Data awaiting processing is stored in a **'buffer'**. The buffer size will depend on the amount of memory allocated according to the configuration of the OS and the physical resources available.

### What is congestion avoidance?

- Although flow control helps the sender determine how busy the receiver is, it doesn’t really say anything about how busy **the network traffic is**.
- Congestion avoidance is a service provided by TCP that attempts to prevent network congestion, a situation in which more data is being transmitted than there is capacity.
- To implement this, TCP uses data loss as a feedback mechanism to determine how "congested" the network is, by tracking how many retransmissions are required.
- A lot of data loss, or a lot of retransmissions, indicates there is more data on the network than there is capacity to process that data.
- TCP will take this as a sign to reduce the size of the transmission window, that is, it will send less data along the given channel.
- This is done to make data transmission as efficient as possible to mitigate the latency overhead inherent in TCP connections.

### What is UDP? What services does it provide?

- User Datagram Protocol (UDP) is a very simple protocol compared to TCP. It provides multiplexing (through source and destination port numbers) and ***optional*** error detection in IPv4 (through checksum), but no reliability, no in-order delivery, and no congestion or flow control.
- It establishes end-to-end connections between processes in the Transport Layer.
- Unlike TCP, it doesn't provide very many reliability features (other than checksum), which it makes up for with its **speed and flexibility**
- UDP is connectionless and so doesn't need to establish a connection before it starts sending data
- Specifically, UDP provides speed because it doesn't take the time to establish a dedicated connection, its lack of in-order delivery means no latency due to Head-of-Line blocking, and the one way data flow (there are no acknowledgments) of a connectionless system cuts down on latency due to extra round trips, and since it is a connectionless protocol, it provides no connection state tracking.
- UDP is a base that programmers can build upon. They can add features as desired at the Application layer.
- Specifics of what type of reliability functions to include are left up to the developer to implement at the Application level.

### What is the PDU for UDP and how is it structured?

- **Datagram**
    - headers:
        - source port and destination port which provides for multiplexing and socket routing
        - length
        - checksum
    - data payload:
        - encapsulated HTTP request/response (or other type of request/response)

### What are some use cases for UDP and TCP?

**TCP**
- Any form of communication that requires reliability and integrity, like processing credit card payments, loading web pages, or transferring files, is done through TCP segments.

**UDP**

- Video calling (Zoom) applications and online games (first-person shooter games) that prioritize speed and low latency/lag over the potential for small amounts of lost data, can utilize UDP.
- Any form of communication that requires receiving continuous updates or streams of data with no need to worry about the occasional dropped bits, such as videoconferencing (e.g. Zoom) or voice-over-ip (VOIP) is accomplished through UDP datagrams.
- DNS queries are also usually done through UDP because of its faster speed.

### What is pipelining?

- Sending multiple messages at once without waiting for an acknowledgment, to maximize bandwidth use. 
- The receiver still sends acknowledgements, and retransmission can still occur, so our system is still reliable.
- This is necessary because if each data transmission must stop and wait for an acknowledgement too much time is spent waiting for the sender's (client's) ACK, which contributes greatly to latency overhead.
- The sender will implement a 'window' (set in the **Window Size** field in the segment header) representing the maximum number of messages that can be in the 'pipeline' at any one time, once it has received the appropriate acknowledgements for the messages in the window, it moves the window on.
- The sender doesn’t move on to the next `n` segments until it has received an acknowledgment from the receiver for those `n` segments
- If sufficient time has passed and the sender has still not received an acknowledgment back (**Acknowledgement #**) on any segment, the sender actually resends that segment (**retransmission of lost data**)
- Any segment already received (i.e. with the same **Sequence #** as an already-received segment) is simply dropped (known as **de-duplication**).
- The advantage of this pipelined approach is its more efficient use of available bandwidth. Instead of wasting lots of time just waiting for acknowledgements, more time is spent actually transmitting data.
- Pipelining transmissions can mitigate the latency added by additional waiting time.
- Ensures TCP is reliable but also efficient as possible

### What are the 4 key elements needed to guarantee network reliability?

- **Network Reliability** ensures that a reliable communication channel is established between processes.
    - That is, that all transmitted data is received at communication end-point in the correct order.
    - Consists of 4 key elements:
        - **In-order delivery**: data is received in the order that it was sent (sequence numbers)
        - **Error detection**: corrupt data is identified using a checksum
        - **Handling data loss**: missing data is retransmitted based on acknowledgements and timeouts -> ACK
        - **Handling duplication**: duplicate data is eliminated/deleted through the use of sequence numbers

# <mark>DNS</mark>

## What is DNS?

- DNS or the Domain Name System is a distributed database which translates/maps domain names like `google.com` to an IP address (like `216.58.213.14`), so that the IP address can then be used to make a request to the server.
- For convenience (people find it easier to remember domain names, not a sequence of numbers)

## How does it work?

- DNS databases are stored on special devices called DNS servers.
- Each of these is a member of a huge hierarchical system and contains only a part of the database that maps domain names to IP addresses.
- Whenever you use your browser to go to a specific website, your browser application needs to first figure out which server would have the correct website information.
- Every time you go to a website you’ve never been to before, your browser has to first triangulate the IP address associated with that domain.
- When a domain name is entered into a browser address bar, if your device already has a record of the IP address for the domain name in its DNS cache, it will use this cached address. 
- If the IP address isn't cached, a DNS request will be made to the Domain Name System to obtain the IP address for the domain.
- If the DNS server that recieves the request does not have the correct domain name, it will route the request up the hierarchical system until it finds it.
- Your browser will then cache (i.e. save) that domain-to-IP address key-value pair so that it won’t have to look it up again the next time you enter that domain.
- DNS then hands that IP address to the lower level protocols that are responsible for routing the HTTP request to the proper location / The packaged-up HTTP request then goes over the Internet where it is directed to the server with the matching IP address.

## What is it concerned with?

- It is a service that allows us to utilize user friendly domain names rather than hard to recall strings of numbers like IP addresses.

# <mark>URLs</mark>

## What is an URL? What is its purpose in the context of network communication?

- A URL or (Universal Resource Locator) is a consistently formatted string that allows us to locate a certain resource on the web.
- It provides us with a systematic means of locating resources that we are requesting (via an HTTP request).
- It consists of the scheme, host, path, port number, and any query strings that we wish to include.

## What is URL a subset of?

- A URI or Uniform Resource Identifier is an identifier for a particular resource within an information space.
    - For example, ISBN for a publication
- URL refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").
- A URL is a specific type of URI that provides the complete address of a resource on the internet. It includes:
    - the protocol used to access the resource (such as `HTTP`, `FTP`, or `HTTPS`)
    - the domain name or IP address of the server hosting the resource
    - the path to the resource on that server.
-  A URL, ***unlike*** a URI, must include some piece of data that allows us to locate the resource in question, while a URI does not have this requirement.
- A URI, on the other hand, is a broader term that refers to any string of characters that identifies a resource on the internet. This includes URLs as well as other types of identifiers, such as URNs (Uniform Resource Names), 

## What are the components of an URL, and what purpose do they serve?

`Example URL: https://app.coderpad.io:1234/KAFWN7FJ?hello=world&coder=pad`

- URL components include the:
    - **scheme**: tells the web client how/which protocol to use to access the resource.
        - `https`
        - The first part of the URL
        - A scheme is different from a protocol, although these terms are sometimes used interchangeably
        - The scheme identifies which protocol should be used to access the resource, but not the specific version
        - Schemes and protocols can be differentiated by their case; the convention is to refer to scheme names in lowercase, e.g. http, and protocol names in uppercase, e.g. HTTP.
        - It is a mandatory component of the URL
    - **host (or hostname)**: It tells the client where the resource is hosted or located.
        - `app.coderpad.io`
        - This is written in the format of a domain name.
        - DNS takes this human readable domain and finds the equivalent IP so the request can be routed.
        - It is a mandatory component of the URL
    - **port**: an identifier for the specific process to which the communication should be routed.
        - `1234`
        - It is only required if you want to use a port other than the default.
        - The default port is 80 for HTTP and 443 for HTTPS.
    - **path**: It shows what local resource is being requested from the host.
        - `/KAFWN7FJ`
        - This part of the URL is optional.
        - If the resource in question is a home page, the path might consist of a single forward slash (`/`).
        - This is one of the ways you can pass information to the application server via the URL.
        - Historically, the path has indicated specifically where the resource was located on the server, but with the proliferation of dynamically generated content, this no longer always follows the absolute file path of the server.
    - **query string/parameters**: passes additional information in the form of specially formatted query parameters to the server.
        - `?hello=world&coder=pad`
        - made up of query parameters. It is used to send data to the server. This part of the URL is also optional.
        - Query strings are used to pass additional data to the server during an HTTP Request. They take the form of name/value pairs separated by an `=` sign. Multiple name/value pairs are separated by an `&` sign. The start of the query string is indicated by a `?`.
        - Because query strings are passed in through the URL, they are only used in HTTP `GET` requests.
        - Query strings are limited in use in that they have a maximum length, are not suitable for sensitive information as they are plainly visible in the URL, and `space` and special characters like `&` cannot be used with query strings. They must be URL encoded.

## What is URL encoding, and why is it necessary?

- URL encoding is a special technique that replaces characters that aren't allowed in a URL with an ASCII code.
- URLs are designed to accept only certain characters in the standard 128-character ASCII character set.
- URL encoding is used if 
    - a character has no corresponding character in the original ASCII set
    - is unsafe because it can be misinterpreted or modified by some systems (i.e. `%`, spaces, quotation marks, the `#` character, `<` and `>`, `{` and `}`, `and`, and `~`)
    - or the character is reserved for special use within the url. (such a `?` which indicates the beginning of the query string or `&` which separates query parameters.  Also `/`, `:`, `@`)
- URL encoding serves the purpose of replacing these non-conforming characters with a % symbol followed by two hexadecimal digits that represent the equivalent UTF-8 character.
- Only alphanumeric and special characters `$-_.+!'()"`, and reserved characters when used for their reserved purposes can be used unencoded within a URL.  
- As long as a character is not being used for its reserved purpose, it has to be encoded.
- We need a safe way to represent these characters in a URL because using them literally can "break" the URL, in that it will no longer be able to locate the resource in question.

**Examples**

- Space -> `%20`
- `$` -> `%24`
- `£` -> `%C2%A3`
- `€` -> `%E2%82%AC`
- `𐍈` -> `%F0%90%8D%88`<br><br>
- There are two encodings that are commonly used to represent spaces within the query string of a URL: `%20` and `+`. You should generally follow this convention and convert all spaces in a query to one of these two encodings.
    - Some sites may accept `_` and `&` as a space, but it's non-standard behavior and isn't commonly used.

## Construct a valid URL

- Request a resource using HTTP from the domain ginni.com that is called 'my_resource.md' and limit the responses to those that contain only notes items.
    - `http://www.ginni.com/my_resource.md?item_category=notes`<br><br>
- Request a resource using HTTP from the localhost with port number 88. The resource is called hello.md.
    - `http://localhost:88/my_folder/hello.md`

# <mark>HTTP and the Request/Response Cycle</mark>

## HTTP

### What is HTTP? What is its purpose?

- HTTP or Hypertext Transfer Protocol is a system of rules, a protocol, that serve as a link between applications and the transfer of hypertext documents.
- HTTP operates at the application layer of networked communication and is concerned with structuring the messages that are exchanged between applications.
- HTTP is a Request-Response protocol; it determines how requests for resources on the web are made, as well as how those requests should be responded to.
- It provides uniformity to the way resources are transferred. In other words, it is an agreed-upon format on how to communicate.
- It determines how requests are made and how they are responded to with regards to resources on the web.
- HTTP is based on the client-server paradigm, in which a client (usually some kind of browser) makes a request through the network for a particular web resource stored on a server.
- In order for the protocol to work, the Request and Response must be structured in such a way that both the client and the server can understand them.<br><br>
- Once the content of segments or datagrams is sent to the correct port, the Application layer is finally ready to read the encapsulated data, which is a **Message** (sometimes also simply called **Data**).
- A **Message** is an Application-layer PDU that can contain many different things, depending on what Application process runs on that port, and/or what protocol was used to encapsulate the message in the first place.
- A Message can be, among other things: 
    - an HTTP request
    - an HTTP response
    - a DNS query
    - a DNS response
    - an SMTP message
    - a TLS record (used for HTTPS).
- Each Application process runs on a dedicated port. That’s why this layer is known as providing a **process to process connection**.

### High level explanation as to what HTTP consists of?

- A single HTTP message exchange consists of a Request and a Response. The exchange generally takes place between a Client and a Server. The client sends a Request to the server and the server sends back a Response.<br><br>
- HTTP request
    - aimed to access a resource on the server
    - client to the server
- HTTP response
    - response to the client's request
    - server to the client
- Together they are a request/response cycle<br><br>
- HTTP is based on the client-server paradigm, in which a client (usually some kind of browser) makes a request through the network for some kind of resource that's stored on a server.
- The server, then, sends a response to this request that ideally contains the requested resource, or if not, some kind of messaging that explains what happened.
    - Provides the client with the resource requested, inform the client that the action requested has been carried out, or else to inform the client that an error occurred in the process
- HTTP governs the syntax of these messages, which together consist of the request/response cycle.

### What is the client-server model? How does this tie in with an HTTP request/response cycle?

- The client-server model is one in which the two devices transmitting data over the network each have a certain role
- The client generally describes some kind of web-browser. They are responsible for issuing HTTP requests and processing the responses such that they are readable for humans (such as rendering the HTML of a webpage)
- The server is a remote computer capable of handling inbound requests. Their job is to issue an HTTP response, which ideally will contain the resource requested by the client, or if not, some kind of messaging that explains what happened.
- The server in this model is not limited to a single device, in reality it refers to all the server-side infrastructure that processes the requests and provides the responses.

### What are the different components of server-side infrastructure?

- **Web server**: A web server is typically a server that responds / hosts to requests for static assets: files, images, css, javascript, etc. These requests don't require any data processing, so can be handled by a simple web server.
    - They can merely deliver prepackaged content without having to handle business logic.
    - Most landing pages, basic company websites, news sites, personal pages, and blogs, such as this one, are hosted by a web server. Web servers probably account for the largest chunk of all server types.
- **Application server**: An application server, on the other hand, is typically where application or business logic resides, and is where more complicated requests are handled. This is where your server-side code lives when deployed.
    - It is the software program that runs your server-side code.
    - Although a personal blog might be hosted by a basic web server that can easily deliver read-only content, there might be an underlying application (e.g. Wordpress) that needs to not just read, but also actually generate, edit, and delete new static content.
- **Data store / database server**: The application server will often consult (used in collaboration with) a persistent data store, like a relational database, to retrieve or create data. Data stores can also be simple files, key/value stores, document stores and many other variations, as long as it can save data in some format for later retrieval and processing.
    - There are two main types of database structures used in database design: either a relational model (usually SQL), or a non-relational model (usually the document model)

### What is a resource? What is its role in the general scheme of networked communication?

- Resource is a generic term for any number of things that a user interacts with on the internet that can be retrieved with a URL.
- This can include:
    - images
    - videos
    - web pages
    - files
    - software
    - games
- It makes up the web
- There are no limits to the number of resources
- A resource is the thing that we, the user, interact with via the client (browser)
- It is that to which all the layers of the networked communication model provide their various services, so that we can see/hear/click/interact with the remote resource from any location via the internet.

### What are the differences when using a web browser versus an HTTP tool?

- When using an HTTP tool, a new request for redirected resources will not be issued automatically, unlike as with a browser
- When using an HTTP tool the body of the response will not be rendered in a user friendly fashion as with a browser, rather the raw contents (such as HTML) will be displayed
    - browser will request all referenced resources
    - HTTP tool will not
- A browser is like Bundler - it manages all the dependencies and an HTTP tool is not

### Describe statelessness and its tradeoffs

**State**

- A "stateful" web application is one that maintains knowledge of past interactions
- This might include keeping track of individual user accounts and maintain a "logged in" status accross multiple resource requests and refreshes.
- When your e-mail client identifies you by name and displays some kind of customized greeting, this is also an aspect of "state"
- When you go to Facebook, for example, and log in, you expect to see the internal Facebook page. That was one complete request/response cycle. You then click on the picture -- another request/response cycle -- but you do not expect to be logged out after that action.
- Session IDs can be generated even if a user does not logged into a website. Consider the example of a website with a shopping cart whose state persists even if you don't have an account on a website. With the session ID handy, a server can retrieve stored session data which allows it to identify a client, providing a stateful web experience for the user. 
- Statefulness can be simulated through techniques which use **session IDs**, **cookies**, and **AJAX**.
    - There's also a 4th approach: **sending stateful data as query parameters** when making a request. This approach used to be nearly universal, but is mostly gone from all modern web sites.

**Stateless**

- HTTP is a stateless protocol. This means that each Request/ Response cycle is independent of Request and Responses that came before or those that come after.
    - Each cycle has no effect on the preceding or subsequent cycles
    - Each request should contain all the information necessary for the request to be fulfilled.
- No information is kept on the server between request/response cycles
- Stateless protocols are resilient, fast, and flexible 
    - The server doesn't have to retain any information between each request/response cycle (resilient)
    - Nor does any part of the system have to perform any clean up (speed)
- However, because of the statelessness of HTTP, it can be very difficult to simulate a stateful experience and make it seem like a persistent connection exists as many modern web apps do.
- This statelessness is what makes HTTP and the internet so distributed and difficult to control, but it's also the same ephemeral attribute that makes it difficult for web developers to build stateful web applications.

### What are mechanisms that are used to simulate state?

**Sessions**

- One way to maintain a sense of statefulness is to have the server send some form of a unique token to the client. 
- Whenever a client makes a request to that server, the client appends this token as part of the request, allowing the server to identify clients.
- We call this unique token that gets passed back and forth the **session identifier (id)**.
- This mechanism of passing a `session id` back and forth between the client and server creates a sense of persistent connection between requests.
- This sort of faux statefulness has several consequences:
    - This can be difficult to maintain because each HTTP request must be analyzed for a `session id`.
    - Furthermore, each `session id` must be validated and the server must establish procedures for invalid ids;  the server needs to retrieve the session data based on the `session id`
    - If a `session id` is valid, the server needs to store and retrieve data associated with each `session id`, as well as recreate the application state from that data when sending back a response
    
**Cookies or HTTP cookies**

- Cookies are a way for the browser to store data sent from the server that helps maintain the appearance of persistent application state. They work in conjunction with `session id`s.
- Cookies are a way we can optimize sessions since we can conveniently store cookies in a browser.
- A piece of data that's sent from the server and stored in the client (browser) during a request/response cycle.  It contains information about the `session id`
- Small files stored in the browser and contain the session information.
- These files are stored even if the browser is closed or shut down, which enables a longer and more consistent appearance of state.
- Session data is generated and stored on the server-side and the `session id` is sent to the client in the form of a cookie.
- The information stored in cookies is sent with each request to the server, then used to "unlock" the correct stored session data
- The client side cookie is compared with the server-side session data on each request to identify the current session.
- This allows the server to recreate the correct state of the application, and the `session id` to be recognized each time a website is visited, even if some time has passed.  When you visit the same website again, your session will be recognized because of the stored cookie with its associated information.
- The `session id` is stored on the client, and it is used as a "key" to the session data stored server side.<br><br>
- In fact, this is what many web applications with authentication systems do. When a user's username and password match, the `session id` is stored on their browser so that on the next request they won't have to re-authenticate.<br><br>
- ***The session token is something that gets sent back to the client's browser as part of the cookie. The app uses the token to store and track session-relevant information on the server. On subsequent requests, the client browser sends the cookie (which includes the session token) back to the app. The app uses it to determine whatever stateful information it is tracking***.

**AJAX or Asynchronous JavaScript and XML**

- Its main feature is that it allows browsers to issue requests and process responses without a full page refresh.
- When AJAX is used, all requests sent from the client are performed asynchronously, which just means that the page doesn't refresh.
- Modern web pages tend to be fairly complex, including dynamically generated content as well as many resource dependencies.
- Therefore, it behooves us to have a means of responding to both server data and user actions without having to refresh and reload the whole page.
- AJAX enabled this functionality, allowing the client to send and retrieve information in small pieces that can be used to update the state of an application without refreshing/reloading, making it much easier to maintain state.
- AJAX requests are sent like normal HTTP requests, and the server responds to them with a normal HTTP response.
- Instead of the browser refreshing to process the HTTP response, it will process the response with a **callback** function (which is usually some client-side JavaScript code), which can update the state of the web app.

## HTTP Requests

### What is an HTTP request? What does it consist of?

- An HTTP request is a text-based message sent from the client to the server with the aim of accessing a resource on the server.
- Entering something into the browser address bar, clicking a link, submitting a form, or any number of other "user interaction" with a resources on the web can instigate the sending on an HTTP request.
- It consists of a **request line**, **headers**, and an **optional body**.
- The HTTP **request line** contains the **HTTP method**, **path** <mark>(the full URL for `GET`)</mark>, and **HTTP version**
    - The **method** indicates what kind of action the request is performing (for example, `GET`, `POST`, `PATCH` or and `DELETE` etc.).  This is required.
    - The **path** indicates where to find the particular resource locally within the server.  This is required.
        - Any query parameters used in a `GET` request are included in the Path of the Request Line. 
        - For `POST` requests, however, query parameters are separated out and part of the Body.
        - Technically speaking the 'path' portion of the request-line is known as the 'request-URI', and incorporates the actual path to the resource and the optional parameters if present. In practice, most people simply refer to this part of the request-line as the 'path'.
        - If a path isn't provided as part of a URL, a client (such as a web browser) will automatically set the path to `/`
    - The **version** tells us which version of HTTP is being used (i.e. 1.0, 1.1, 2).  As of HTTP 1.0, the HTTP version also forms part of the request-line.
    - The three items in the Request Line are separated by a **space**.
- **HTTP Headers**: allow the client and the server to send additional information during the HTTP request/response cycle.
    - a way to give more information about both the client and the resource that is being requested
    - Supplemental information about the request that provides useful details to the server.
    -  They act as metadata that provides supplemental information about the request to aid the server in processing the request.
    - Headers are colon-separated name-value pairs that are sent in plain text.
    - The **host** header:
        - has been required since HTTP 1.1
        - indicates where the resource in question is located as a server may contain many hosts
    - All other headers are optional
    - Other headers might include:
        - `Accept-Language` fields about what languages are accepted by the client
        - `User-Agent` specially formatted string that identifies the client such as a `session id`
        - `Cookie` information about cookies that help applications maintain the appearance of state
        - `Connection` what type of connection the client prefers (such as `keep-alive`)
    - With HTTP/1.1, an **empty line** separates and delineates the Header from the Body.
- **HTTP body** - the body contains the data that is being transmitted in an HTTP message and is optional. 
    - In other words, an HTTP message can be sent with an empty body. When used, the body can contain HTML, images, audio and so on.
    - What this looks like depends on the type of request methond sent (i.e. method)
    - The body is mainly used with a `POST` request, which is used to send data to the server
    - It would contain the query parameters of a POST request (if there are any).<br>
```
URL: https://twitter.com/home

HTTP Request:

GET /home HTTP/1.1
Host: twitter.com
[Other headers]
{empty line}
{optional body}
```

OR

```
GET /lion.html HTTP/1.1
Host: localhost:2345
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
{empty line}
{optional body}
```
OR

```
POST /login HTTP/1.1
Host: example.com
Content-Type: application/json

{
  "username": "johndoe",
  "password": "pa$$w0rd"
}
```

- When specifying HTTP/1.1 as the version we need to include an empty line after the request line in order to indicate where the initial part of the message (request line and headers) ends!

### What are GET and POST requests? What are their use cases?

**GET Requests**

- **GET requests**: Used to retrieve a resource from the server or load a web page
- initiated by clicking a link or via the address bar of a browser to load a web page.
- The response from a GET request can be anything, but if it's HTML and that HTML references other resources, your browser will automatically request those referenced resources. A pure HTTP tool will not.

**POST requests**

- **POST request**: Used when you want to initiate some action on the server (server side action), send data to a server, or for logging in.
    - or create/update a resource on the server: Sending a username/password is creating a session (which is a resource)
- Typically from within a browser, you use POST when submitting a form or other information (such as user authentication (logging in) or form submission)
- Without `POST` requests, we are limited to sending data to the server via query strings
- Using a `POST` request in a form fixes the  problem of exposing credentials in the URL query string.  With a `POST` request, we can send more sensitive data such as a username or password
- `POST` requests also help sidestep the query string size limitation that you have with GET requests. With POST requests, we can send significantly larger forms of information (such as images or videos) to the server.
- Search forms are a noticeable exception to this rule: they often use `GET` since they are not changing any data on the server, only viewing it.

## HTTP Responses

### What is an HTTP response? What does it consist of?

- HTTP Responses are text-based messages sent from the server to the client with the aim of responding to the client's request.
- Raw data returned by the server is called a response.
- They either:
    - Provide the client with the resource required
    - Inform the client that the action it requested was carried out
    - Inform the client that an error occurred in the process
- consists of a **status line**, **optional headers**, and an **optional body**.
- The **status line** contains the **status code**, **status text**, and **HTTP version**
    - The status code is a three digit number indicating the specific status of the response, i.e. whether or not it was successful
    - It is accompanied by the status text that tells the status of the response
    - The three items in the Status Line are separated by a **space**.
- HTTP response **headers** contain additional information about the response.  These are optional.
    - Supplemental information about the response provided by the server that provides useful details to the client.
    - They act as metadata that provides supplemental information about the response to aid the client in processing the response.
    - Headers are colon-separated name-value pairs that are sent in plain text.
        - `Content-Encoding` information about the type of encoding used on the data
        - `Server` the name of the server
        - `Location` a new resource location if applicable (Location header), which helps the client redirect to the requested resource if it has been moved
        - `Content-Type` the content-type (i.e. text/html; charset=utf-8), which helps the client correctly render the data in a user friendly way.  It indicates the media type of the resource being sent in the message.
        - `Content-Length` 
            - One issue with processing a response is knowing where the response message ends. 
            - There are various ways in which this can be determined, depending on the type of message.
            - Certain responses, such as `1xx` level responses, `204`, and `304` responses and any response to a `HEAD` request, must NOT include a body. For these types of responses, the end of the response message is indicated by the first empty line after the header fields.
            - For messages with a body, the size of the message body can be used to determine the end of the message.
            - One way to indicate the size of the body is to include a `Content-Length` header.
            - The `Content-Length` entity header indicates the size of the entity-body, in bytes, sent to the recipient.  This can help determine where the HTTP message should end.
    - With HTTP/1.1,, an **empty line** separates and delineates the Header from the Body.
- The HTTP response **body** consists of the raw data for the requested resource.  This is optional.
    - This might be the HTML of the webpage, or the raw data of any files being requested, such as images, videos, or audio files<br>
```
HTTP/1.0 200 OK
Date: Mon, 02 Nov 2015 04:12:19 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
{other headers}

{body}
```

- Other status line examples:
    - `HTTP/1.1 400 Bad Request`
    - `HTTP/1.1 404 Not Found`


### Give me some examples of status codes and what they mean?

- **Status Codes**: three-digit numbers that are part of the status line in a HTTP Response. They indicate the status of the request. There are various categories of status code:
    - **200 OK**: the request was successfully handled, and the resource has been transmitted.  All 200 level response codes indicate success
    - **302 Found**: When your browser sees a response status code of 302, it knows that the resource has been moved, and will automatically follow the new re-routed URL in the `Location` response header.
        - All 300 level status codes indicate some kind of redirect status
        - When the browser receives the 302 response, it will automatically issue an HTTP request to the updated URL provided in the `Location` header.
        - This, ideally, will result in the HTTP 200 OK response so that the browser can render the resource for the user.
    - **404 Not Found**: The server returns this status code when the requested resource cannot be found due to a client error with the request
        - All 400 level status codes indicate various client errors
    - **500 Internal Server Error**: A 500 status code says "there's something wrong on the server side".
        - Indicates a generic server-side error took place while trying to retrieve the requested resource.
        - All 500 level status codes indicate server side errors.

### How are 300 level requests handled by browsers?

- When an HTTP response consists of a 300 level redirect status code, the `Location` header is added to the response headers
- This indicates the new location of the resource in question with the updated URL
- When the browser received the HTTP 300 response, it will automatically issue an HTTP request to the updated URL provided by the `Location` header.
- This, ideally, will result in the HTTP 200 OK response so that the browser can render the resource for the user

**Example**

- Say you want to access the account profile at GitHub, you'll have to go to the address `https://github.com/settings/profile`. 
- However, in order to have access to the profile page, you must first be signed in. 
- If you're not already signed in, the browser will send you to a page to do that.
- After you enter your credentials, you'll be redirected to the original page you were trying to access. 
- In the `Location` response header, there is a url with a `return_to` parameter: `Location: https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fsettings%2Fprofile`
- This URL contains two distinct elements: an address telling the browser where to go to and a query string telling us where to return to after the log-in.

# <mark>Security</mark>

## HTTP

### Is HTTP secure? Why or why not?

**HTTP risks**

- HTTP is a text based protocol, and all it's requests and responses consist of plain text.  As such, HTTP is inherently insecure. 
- As the client and server send requests and responses to each other, all information in both requests and responses are being sent as strings. If a malicious hacker was attached to the same network, they could employ ***packet sniffing*** techniques to read the messages being sent back and forth. 
- As we learned previously, requests can contain the session id, which uniquely identifies you to the server, so if someone else copied this session id, they could craft a request to the server and pose as your client, and thereby automatically being logged in without even having access to your username or password.
- Using HTTPS, which utilizes TLS to ensure a secure connection helps with this
- We can use the Transport Layer Security (TLS) Protocol to add security to HTTP communications.

### What is session hijacking? What measures can be taken to prevent this?

- **Session Hijacking** refers to a malignant action in which a hacker utilizes a stolen `session id` to authenticate themselves and share sessions
- A `session id` serves as that unique token used to identify each session. 
- When an attacker gets a hold of the session id, both the attacker and the user now share the same session and both can access the web application.
- Because a `session id` is used to identify a user to the server, it can also be used by hackers to pose as the user and get logged in without needing to authenticate with a username and password.
- Countermeasures for Session Hijacking include:
    - **Resetting sessions**. 
        - With authentication systems, this means a successful login must render an old `session id` invalid and create a new one.
        - With this in place, on the next request, the victim will be required to authenticate. At this point, the altered session id will change, and the attacker will not be able to have access. 
        - If resetting sessions is in place then when the victims client makes the next requests such as a GET request to display the page with the account number then the user will be required to authenticate by signing in again. This will make the old session id invalid and the attacker won't have access to the users session unless they were also able to access the new session id.
    - **Setting an expiration time on sessions** gives attackers a narrower window for access to the `session id`.
    - **Use HTTPS across the entire app** to minimize the chance that an attacker can get to the `session id`
        - A resource that's accessed by HTTPS will start with `https://` instead of `http://`, and usually be displayed with a lock icon in most browsers:
        - With HTTPS every request/response is encrypted before being transported on the network. This means if a malicious hacker sniffed out the HTTP traffic, the information would be encrypted and useless.
        - HTTPS sends messages through a cryptographic protocol called **TLS** for encryption.
            - These cryptographic protocols use certificates to communicate with remote servers and exchange security keys before data encryption happens. 

### What are some other ways to mitigate the lack of security in HTTP?

**HTTPS**

- TLS is mainly used to **encrypt HTTP messages**, thereby creating **Secure HTTP, or HTTPS** (aka. HTTP/TLS, or “HTTP over TLS”).
- A resource that's accessed by HTTPS will start with `https://` instead of `http://`, and usually be displayed with a lock icon in most browsers:
- With HTTPS every request/response is encrypted before being transported on the network. This means if a malicious hacker sniffed out the HTTP traffic, the information would be encrypted and useless.
- HTTPS sends messages through a cryptographic protocol called **TLS** for encryption.
    - These cryptographic protocols use certificates to communicate with remote servers and exchange security keys before data encryption happens. 
- HTTPS is not a separate protocol - it still uses the HTTP protocol, but it layers TLS on top of that.

**Same-origin policy**

- permits unrestricted interaction between resources originating from the same origin, but restricts certain interactions between resources originating from different origins.
- By **origin**, we mean the combination of the **scheme**, **host**, and **port**.  Only those resources that share all three aspects are allows to issue requests unrestrictedly.
    - So `http://mysite.com/doc1`:
        - has the same origin as `http://mysite.com/doc2`
        - but a different origin from `https://mysite.com/doc1` (different scheme)
        - `http://mysite.com:4000/doc1` (different port), 
        - and `http://anothersite.com/doc1` (different host).
- This prevents attackers from using session hijacking to access `session id`s or other session information.
- Designing for the same-origin policy can help to mitigate the lack of security in HTTP by restricting interactions between resources.
- The same-origin policy is an important guard against **session hijacking** attacks and serves as a cornerstone of web application security.

**Cross-Site Scripting (XSS)**

- This type of attack happens when you allow users to input HTML or JavaScript that ends up being displayed by the site directly.
- Websites that allow some kind of input, such as allowing users to enter a comment that will be displayed, must protect against cross site scripting or XSS.
- Because it's just a normal HTML `<textarea>`, users are free to input anything into the form. This means users can add raw HTML and JavaScript into the text area and submit it to the server as well
- If the server side code doesn't do any sanitization of input, the user input will be injected into the page contents, and the browser will interpret the HTML and JavaScript and execute it.
- Potential solutions for cross-site scripting include:
    - making sure to always **sanitize user input**. This is done by eliminating problematic input, such as `<script>` tags, or by disallowing HTML and JavaScript input altogether.
    - **Escape all user input** data when displaying it so that the browser does not interpret it as code.  (To escape a character means to replace an HTML character with a combination of ASCII characters, which tells the client to display that character as is, and to not process it)
    - Site's can also choose to only **accept a safer form of input**, such as Markdown.

## TLS

### What is the TLS protocol, and what is its purpose?

- It is a protocol that utilizes cryptography to provide more secure communications between networked applications.
- Because HTTP is a text based protocol, it is inherently insecure.
- Any intercepted requests/responses are easy to read.
- Furthermore, HTTP is a fairly simple protocol, concerned only with basic message structure.
- It provides no check for whether or not the source of an HTTP response is trustworthy, nor does it provide a means of determining if the messages are being tampered with in transit.
- When thinking about TLS it can be useful to think of it as operating between HTTP and TCP (Exists between protocols of the Application layer and Transport layer).
- TLS adds security to HTTP communications.
-  TLS can be used to secure HTTP traffic, but it doesn't replace HTTP. HTTP is still used at the Application layer for structuring messages between applications.<br><br>
- Purpose of TLS:
    - TLS enables us to provide encryption to the inherently insecure plain text of the HTTP protocol.  Encrpytion is a process of encoding a message so that it can only be read by those with an authorized means of decoding the message
    - It provides authentication services,  a process to verify the identity of a particular party in the message exchange / checking to see if the source of an HTTP response is trustworthy.
    - It also provides a means of ensuring data integrity, that is, determining whether or not HTTP messages have been tampered with /  detect whether a message has been interfered with or faked

### What are the three services that TLS provides? Describe each

**TLS Encryption**

- Allows us to encode messages so that they can only be read by those with an authorized means of decoding the message
- TLS encryption uses a combination of Symmetric Key Encryption and Asymmetric Key Encryption. Encryption of the initial key exchange is performed asymmetrically, and subsequent communications are symmetrically encrypted.
- This secure channel is established with the TLS handshake, which uses both symmetric and asymmetric key encryption
- It encrypts, which increases security, but adds several round-trips of latency which impacts performance

**TLS authentication**

- Provides a means of verifying the identity of a participant in a message exchange,  to ensure that party is trustworthy.
- This ensures that the source of an HTTP response is trustworthy, and so the provided resource can be safely processed.
- TLS Authentication is implemented through the use of Digital Certificates, which are signed by a chain of Certificate Authorities.  
- Certificates are exchanged during the TLS handshake

**TLS Integrity** 

- provides a means of checking whether a message has been altered or interfered with in transit.
- Data that is being exchanged via HTTP is encapsulated within the TLS payload and meta data is attached in the form of header and trailer fields.
- Metadata fields such as the Message Authentication Code (MAC) (in the TLS record's footer) allows us to check to see if the message has been interfered with.
- This is slightly different than a regular checksum, which is only concerned with error detection. (data corruption)

### Give a general overview of the TLS handshake

- **The TLS Handshake**:  a special process that takes place ***after the TCP Handshake*** in which the client and the server exchange encryption keys.  This is how TLS sets up an encrypted connection.
    - The TCP Handshake is required to establish a connection and send information necessary for the TLS Handshake process, such as the server's public key.
    - Right after the client sends the third part of the TCP handshake, i.e. the `“ACK”` segment, the client also initiates the first part of the TLS handshake, which is the `ClientHello` step.
- This exchange allows both parties to communicate via encrypted messages, thus giving a security advantage over the inherently insecure messages of HTTP.
- TLS uses a combination of symmetric and asymmetric cryptography.
    - The bulk of the message exchange is conducted via symmetric key encryption but the initial symmetric key exchange is conducted using asymmetric key encryption.
    - Asymmetric key encryption is a mechanism used to encrypt the encryption key itself, so that even if it is intercepted it can't be used.
    - The symmetric key cannot be sent in a readable format to the other party in our message exchange; if the key was intercepted by a third-party, they could then use it to decrypt any subsequent messages between the sender and receiver. Exchanging the symmetric key through asymmetric key encryption solves that problem.
- The key points to remember about the TLS Handshake process is that it is used to:
    - Agree which version of TLS to be used in establishing a secure connection.
    - Agree on the various algorithms that will be included in the cipher suite.
    - Enable the exchange of symmetric keys that will be used for message encryption.
- The TLS Handshake must be performed before secure data exchange can begin; it involves several round-trips of latency and therefore has an impact on performance.<br><br>
- How the TLS Handshake is implemented:
    - The client sends a message to the server in the form of a `ClientHello`, which includes the maximum version of TLS protocol it supports and a list of available cipher suites.
    - The server responds with a `ServerHello`, which contains a decision regarding which TLS version and cipher suite will be used. It also includes the server's certificate and public key. This ends with a `ServerHelloDone` marker.
    - Next the client initiates the symmetric key exchange process, using the server's public key for asymmetric key encryption.
    - Once the keys have been exchanged, the server sends a ready-to-go message using the symmetric key and secure message exchange commences.<br><br>
- Trade offs:
    - Allows us to implement secure message exchange over the inherently insecure text based protocol of HTTP
    - Because the TLS handshake is a complex process, it can add two round-trip time (RTT) of latency, this has an impact on speed and performance.<br><br>
- ***The TCP handshake establishes a reliable connection between processes. At this stage we aren't sending any application data, just 'empty' TCP Segments (i.e. with no payload), whose sole purpose is to establish a connection.***
- ***Once the TCP connection is established, the TLS handshake occurs in order to create a secure connection. We're still not sending any application data at this point, just establishing our secure connection.***
- ***Once both of the handshake processes are complete, and we have our connection established, we can send application data in a reliable, secure way.***

### What is Symmetric Key Encryption? What is Asymmetric Key Encryption? What are the advantages and drawbacks of each?

- **Symmetric Key Encryption**: an encrypted communication system in which both the sender and receiver posses a shared encryption key.
- The advantages to this are that it facilitates two-way communication. Both parties can use the shared key to encode, send, and decode messages to and from the other.
- The disadvantage is that a symmetric system relies on the fact that no one else has access to the key in order for it to remain secure.
- This means that it requires a secure way for both parties to exchange keys before symmetric encryption can be established, and this is difficult to do on the web (can't exchange keys in-person when communicating over a network).
- For this reason, it is used in conjunction with asymmetric key encryption, which facilitates a secure exchange of a shared key<br><br>
- **Asymmetric Key Encryption**: also known as public key encryption, an encrypted communications system which uses two distinct keys: a public key and a private key.
- The public key is used to encrypt and send a secure message to the recipient, who holds the private key, which is used to decode the encrypted message.
- This only facilitates one way communication, in which only the party who holds the private key can receive and decode secure communications.
- Encryption is primarily intended to work in one direction.  Bob can send Alice messages encrypted with the public key which she can then decrypt with the private one, but they can't be used in the other direction.
- However, because it works only one way, we can use asymmetric key encryption as a means for hosts to exchange symmetric encryption keys during the TLS handshake process.
- Unlike the symmetric system where the same key is used to encrypt and decrypt messages, in the asymmetric system the keys in the pair are non-identical: the public key is used to encrypt and the private key to decrypt.
- Advantage: securely exchange symmetric keys
- Drawback: only facilitates one way communication

### What is a cipher suite?

- A cipher is a cryptographic algorithm; in other words they are sets of steps for performing encryption, decryption, and other related tasks. 
- A cipher suite is a suite, or set, of ciphers.
- In general, we want to have a distinct algorithm (cipher) for each task during secure communication
- TLS uses different ciphers for different aspects of establishing and maintaining a secure connection. 
- The algorithms for performing each of these tasks, when combined, form the cipher suite.  
- A cipher suite is the agreed set of algorithms used by the client and server during the secure message exchange.
- The suite to be used is agreed as part of the TLS Handshake.
-  As part of the `ClientHello` message, the client sends a list of algorithms it supports for each required task, and the server chooses from these according to which algorithms it also supports.

### How does TLS Authentication work?

- TLS authentication as a means of verifying the identify of the sender of the message
- It uses digital certificates, which are provided by the server during the TLS handshake
- Digital certificates are provided by the server during the TLS handshake.
- The certificate includes a public key, a signature (which consists of data encrypted with the private key), and the original data that was used to create the signature.
- Upon receipt, the receiver decrypts the signature with the public key and checks that it matches against the original data, which tells it that the sender is who it says it is (because it holds the private key).
- The digital certificate the server provides is considered to be trustworthy on the basis of the issuing certificate authority and the chain of trust.
- Certificates are signed by a **Certificate Authority**, and work on the basis of a Chain of Trust which leads to one of a small group of highly trusted Root CAs.

### What are Certificate Authorities and the Chain of Trust?

- Certificate Authorities are trustworthy sources that issue certificates used by servers to establish authentication.
- We use certificates provided by these authorities to ensure that the certificate in question is not being faked.
- Certificate authorities exist in a hierarchy known as the "chain of trust"
- Within this hierarchy, the certificate for lower level authorities is signed by the CA one level above it
- At the top of the chain there exists a Root CA whose certificate is "self-signed"
- These consist of a small group of organizations who have proved their high level trustworthiness through prominence and longevity.<br><br>
- Each site’s certificate (which the server sends in step 2 of the TLS handshake) is signed by an Intermediate Certificate Authority (CA) that certifies the certificate’s authenticity. 
- Each CA in turn derives its authority from another intermediate CA higher up the chain, all the way up to the Root CA.
- The hierarchical structure of certificate authorities establishes the **Chain of Trust** from the original site all the way to the Root CA.

### What is DTLS and why do we need it?

- Datagram Transport Layer Security
- A separate protocol based on TLS that is used with network connections that utilize UDP instead of TCP
- Because TLS is interlinked with TCP and the TCP handshake, separate protocols are needed to meet the security requirements of UDP