## LS170 Networking foundations & Bash basics

## Basics

### What is the internet?
- LAN (Local Area Network): 
    - multiple computers and other devices connected via a network bridging device such as a hub or switch. 
    - The computers are all connected to this device via network cables, and this forms the network. 
    - The scope of communications is limited to devices that are connected (either wired or wirelessly) to the network switch or hub, which imposes some geographic limitations. 
    - That's the 'local' in Local Area Network.
- Inter-network Communication: 
    - In order to enable communication between networks, we need to add routers into the picture. 
    - Routers are network devices that can route network traffic to other networks. 
    - Within a Local Area Network, they effectively act as gateways into and out of the network.
- Network of networks: 
    - We can imagine the internet as a vast number of these networks connected together. 
    - In between all of the sub-networks are systems of routers that direct the network traffic.

### What is a protocol?
- In general, a protocol is a "system of rules".
- In terms of networks:  A protocol is "set of rules governing the exchange or transmission of data".
- Examples of the most common protocls: IP, SMTP, TCP, HTTP, Ethernet, FTP, DNS, UDP, TLS
- Why so many protocols? 
    1. Different protocols were developed to address different aspects of network communication. (TCP and HTTP)
    2. Different protocols were developed to address the same aspect of network communication, but in a different way or for a specific use-case. (TCP vs UDP)

### Network models
- OSI model: divides the layers in terms of the functions that each layer provides (physical addressing, logical addressing and routing, encryption, compression, etc).
- Internet Protocol Suite (TCP/IP) model: divides the layers in terms of the scope of communications within each layer (within a local network, between networks, etc).
![title](img/network_models.png)
- Both models have utility, no single model will perfectly fit a real-world implementation. 
- Such models are useful for gaining a broad-brush view of how a system works as a whole, and for modularizing different levels of responsibility within that system.
- However, attempting to strictly adhere to the model when drilling into the detail of how a specific protocol works can be counter-productive. 

### Protocol Data Units

- A Protocol Data Unit (PDU) is an amount or block of data transferred over a network.
- Different protocols or protocol layers refer to PDUs by different names. 
    - Data Link layer: a PDU is known as a frame. 
    - Internet/ Network layer: it is known as a packet. 
    - Transport layer: it is known as a segment (TCP) or datagram (UDP).
- A PDU consists of a header, a data payload, and in some cases a trailer or footer.
- Header and Trailer: 
    - Exact structure of the header and trailer varies from protocol to protocol.
    - Purpose: to provide protocol-specific metadata about the PDU.
- Data Payload: the data that we want to transport over the network using a specific protocol at a particular network layer.
- Encapsulation between layers: 
    - The entire PDU from a protocol at one layer is set as the data payload for a protocol at the layer below.
    - This means that a protocol at one layer doesn't need to know anything about how a protocol at another layer is implemented in order for those protocols to interact. 
    - It creates a system whereby a lower layer effectively provides a 'service' to the layer above. 
    - It just knows it needs to encapsulate some data from the layer above and provide the result of this encapsulation to the layer below. 
    - This separation of layers provides a certain level of abstraction, which allows us to use different protocols at a certain layer without having to worry about the layers below.
![title](img/data_payload.png)

### The physcial network
- The functionality of the physical layer is concerned with the transfer of bits (binary data). 
- In order to be transported, these bits are converted into signals. 
- Depending on the transportation medium used, bits are converted to electrical signals, light signals, or radio waves.
- **Latency is a measure of delay. It indicates the amount of time it takes for data to travel from one point to another.**
    - Propagation delay: this is the amount of time it takes for a message to travel from the sender to the receiver, and can be calculated as the ratio between distance and speed.
    - Transmission delay: the journey of data from point A to point B on a network typically won't be made over one single cable. Instead, the data will travel across many different wires and cables that are all inter-connected by switches, routers, and other network devices. Each of these elements within the network can be thought of as an individual 'link' within the overall system. Transmission delay is the amount of time it takes to push the data onto the link.
    - Queuing delay: Network devices such as routers can only process a certain amount of data at one time. If there is more data than the device can handle, then it queues, or buffers, the data. The amount of time the data is waiting in the queue to be processed is the queuing delay
    - Processing delay: Data travelling across the physical network doesn't directly cross from one link to another, but is processed in various ways.
    - The total latency between two points, such as a client and a server, is the sum of all the delays above. 
- **Bandwidth is a measure of capacity. It indicates the amount of data that can be transmitted in a set period of time.**
    - Low bandwidth can be an issue when dealing with large amounts of data. 
    - In many situations latency can be a much more serious limitation on the performance of a networked application.

## The Link/ Data Link Layer
-  The protocols operating at this layer are primarily concerned with the identification of devices on the physical network and moving data over the physical network between the devices that comprise it, such as hosts (e.g. computers), switches, and routers.
- The most commonly used protocol at this layer is the Ethernet protocol. 
- Two of the most important aspects of Ethernet are framing and addressing.

### Ethernet Frames:
- Ethernet Frames are a Protocol Data Unit, and encapsulate data from the Internet/ Network layer above. 
- The Link/ Data Link layer is the lowest layer at which encapsulation takes place.
- An Ethernet Frame adds logical structure to this binary data.

---

1. Preamble and Start of Frame Delimiter (SFD/ SOF):
    - Generally aren't considered part of the actual frame.
    - Are sent just prior to the frame as a synchronization measure which notifies the receiving device to expect frame data and then identify the start point of that data. 
    - Preamble: seven bytes long.
    - SFD: one byte.
    - Both use a repeated pattern that can be recognised by the receiving device, which then knows that the data following after is the frame data.
2. Source and Destination MAC address: 
    - Two fields: each six bytes (48 bits) long.
    - The source address: address of the device which created the frame. 
    - The destination MAC address: address of the device for which the data is ultimately intended.
3. Length: 
    - two bytes.
    - Used to indicate the size of the Data Payload.
4. DSAP, SSAP, Control: 
     - Three fields, each one byte.
     - DSAP and SSAP fields: identify the Network Protocol used for the Data Payload. 
     - The Control field provides information about the specific communication mode for the frame, which helps facilitate flow control.
5. Data Payload: 
     - Between 42 and 1497 bytes.
     - Contains the data for the entire Protocol Data Unit (PDU) from the layer above, an IP Packet for example.
6. Frame Check Sequence (FCS): 
     - Four bytes
     - A checksum generated by the device which creates the frame. 
     - If the checksum is incorrect, the frame is dropped. Ethernet doesn't implement any kind of retransmission functionality for dropped frames; it is the responsibility of higher level protocols to manage retransmission of lost data if this is a requirement of the protocol.

---

Interframe gap: As well as using the Preamble and SFD to prepare a receiving device to process the frame data, Ethernet also specifies an interframe gap (IFG). This gap is a brief pause between the transmission of each frame, which permits the receiver to prepare to receive the next frame.

### MAC Addresses
- Every network-enabled device, e.g. a Network Interface Card (NIC) that you would find in a PC or laptop, is assigned a unique MAC Address when it is manufactured. 
- MAC Addresses are formatted as a sequence of six two-digit hexadecimal numbers, e.g. 00:40:96:9d:68:0a, with different ranges of addresses being assigned to different network hardware manufacturers.
- A switch directs the frames to the correct device by keeping and updating a record of the MAC addresses of the devices connected to it, and associating each address with the Ethernet port to which the device is connected on the switch. It keeps this data in a MAC Address Table.
- The MAC Addressing system works well for local networks, where all the devices are connected to a switch that can keep a record of each device's address.
- Limitations of MAC addresses:
    - They are physical rather than logical. Each MAC Address is tied (burned in) to a specific physical device
    - They are flat rather than hierarchical. The entire address is a single sequence of values and can't be broken down into sub-divisions.

## The Internet/ Network Layer
- Primary function of protocols at this layer is to facilitate communication between hosts (e.g. computers) on different networks.
- The Internet Protocol (IP) is the predominant protocol used at this layer for inter-network communication. There are two versions of IP currently in use: IPv4 and IPv6. 
- Primary features of IP:
    - Encapsulation of data into packets
    - Routing capability via IP addressing
    
### IP Packets:
- The Protocol Data Unit (PDU) within the IP Protocol is referred to as a packet
- A packet is comprised of a Data Payload and a Header.
- Just as with Ethernet Frames, the Data Payload of an IP Packet is the PDU from the layer above (the Transport layer). 
- Packet header:
    - Version: the version of the Internet Protocol used, e.g. IPv4.
    - ID, Flags, Fragment Offset: these fields are related to fragmentation. If the Transport layer PDU is too large to be sent as a single packet, it can be fragmented, sent as multiple packets, and then reassembled by the recipient.
    - TTL: every packet has a Time to Live (TTL) value. This is to ensure that any packets which don't reach their destination for some reason aren't left to endlessly bounce around the network. The TTL indicates the maximum number of network 'hops' a packet can take before being dropped. At each hop, the router which processes and forwards the packet will decrement the TTL value by one.
    - Protocol: this indicates the protocol used for the Data Payload, e.g. TCP, UDP, etc.
    - Checksum: this is an error checking value generated via an algorithm. The destination device generates a value using the same algorithm and if it doesn't match, it drops the packet. IP doesn't manage retransmission of dropped packets. This is left to the layers above to implement.
    - Source Address: the 32-bit IP address of the source (sender) of the packet
    - Destination Address: the 32-bit IP address of the destination (intended recipient) of the packet

### IP Addresses
- Unlike MAC Addresses, IP Addresses are logical in nature.
- They are not tied to a specific device, but can be assigned as required to devices as they join a network.
- This splitting of a network into parts is referred to as sub-netting. By dividing IP address ranges further, subnets can be split into smaller subnets to create even more tiers within the hierarchy.
- All routers on the network store a local routing table. When an IP packet is received by a router, the router examines the destination IP address and matches it against a list of network addresses in its routing table.
- As well as a difference in address structure, IPv6 has some other differences with IPv4 such as a different header structure for the packet and a lack of error checking (it leaves this to the Link Layer checksum).

### Summary 
The internet is a vast network of networks. It is comprised of both the network infrastructure itself (devices, routers, switches, cables, etc) and the protocols that enable that infrastructure to function.

Protocols are systems of rules. Network protocols are systems of rules governing the exchange or transmission of data over a network.

Different types of protocol are concerned with different aspects of network communication. It can be useful to think of these different protocols as operating at particular 'layers' of the network.

Encapsulation is a means by which protocols at different network layers can work together.

Encapsulation is implemented through the use of Protocol Data Units (PDUs). The PDU of a protocol at one layer, becomes the data payload of the PDU of a protocol at a lower layer.

The physical network is the tangible infrastructure that transmits the electrical signals, light, and radio waves which carry network communications.

Latency is a measure of delay. It indicates the amount of time it takes for data to travel from one point to another.

Bandwidth is a measure of capacity. It indicates the amount of data that can be transmitted in a set period of time.

Ethernet is a set of standards and protocols that enables communication between devices on a local network.

Ethernet uses a Protocol Data Unit called a Frame.

Ethernet uses MAC addressing to identify devices connected to the local network.

The Internet Protocol (IP) is the predominant protocol used at this layer for inter-network communication.

There are two versions of IP currently in use: IPv4 and IPv6.

The Internet Protocol uses a system of addressing (IP Addressing) to direct data between one device and another across networks.

IP uses a Protocol Data Unit called a Packet.

### How does the Data Link (Ethernet) and Network (IP) layer relate to each other? 
- MAC is used to transfer frames in the local network. 
- IP us used to transfer packets between networks. 

--- 
**Example: Send data from 0.0.0.0 to 127.167.0.123**
1. Data is encapuslated in packet with source and destination IP.
2. Packet is given to device network card which looks the destination IP and checks which of the connected devices can handle the request. For that it uses a MAC <-> IP lookup table. 


- If the destination is a device in the local network, it creates a frame and sets the destination MAC address to the MAC address of that device. --> DONE

- If the destination is not on the local network, it creates a frame and sets the destination MAC address to the MAC address of your gateway (i.e. router). 


3. Router receives frame with its MAC address on it and opens the frame. 
4. The router then sees the destinatio IP and decides how to send the packet there. 
5. The router then creates a new frame with and sets the destination MAC address to the MAC address of the next device (probably your ISP's router). 
6. This peeling and unpeeling continues until the packet arrives at the destination device


- Read https://superuser.com/a/623648

## The transport layer
- Although we have multiple communication channels on a host, with IP addresses we only have a single channel between hosts. What we need is a way to transmit these multiple data inputs over this single host-to-host channel and then somehow separate them out at the other end.
- In the context of a communication network, this idea of transmitting multiple signals over a single channel is known as multiplexing, with demultiplexing being the reverse process. It is a general concept that can be applied in lots of contexts within communications networks.
- Ports:
    - A port is an identifier for a specific process running on a host. 
    - Integer in the range 0-65535. 
    - Sections of this range are reserved for specific purposes:
        - 0-1023 are well-known ports: These are assigned to processes that provide commonly used network services. For example HTTP is port 80, FTP is port 20 and 21, SMTP is port 25, and so on.
        - 1024-49151 are registered ports: They are assigned as requested by private entities.
        - 49152-65535 are dynamic ports (sometimes known as private ports). Ports in this range cannot be registered for a specific use. They can be used for customized services or for allocation as ephemeral ports.
- The source and destination port numbers are included in the Protocol Data Units (PDU) for the transport layer.
- Transport layer PDUs:
    - Data from the application layer is encapsulated as the data payload in this PDU
    - The source and destination port numbers within the PDU can be used to direct that data to specific processes on a host. 
    - The entire PDU is then encapsulated as the data payload in an IP packet.
    - The IP addresses in the packet header can be used to direct data from one host to another. 
    - The IP address and the port number together are what enables end-to-end communication between specific applications on different machines. 
    - The combination of IP address and port number information can be thought of as defining a communication end-point, referred to as a socket.
    
### Sockets:
- Conceptual level: An abstraction for an endpoint used for inter-process communication.
- Implementation level: 
    - A) a mechanism for inter-process communication between local processes running on the same machine and B) a mechanism for inter-process communication between networked processes (usually on different machines).

### Connectionless and connection-oriented networks 
- In a connectionless system:
    - we could have one socket object defined by the IP address of the host machine and the port assigned to a particular process running on that machine.
    - The socket would wait for incoming messages directed to that particular ip/port pair. Such messages could potentially come from any source, at any time, and in any order, but that isn't a concern in a connectionless system -- it would simply process any incoming messages as they arrived and send any responses as necessary.
- In a connection-oriented system:
    - a new socket object is instantiated for each sender IP:HOST.
    - This new socket object would then listen specifically for messages where all four pieces of information matched (source port, source IP, destination port, destination IP). 
    - Any messages not matching this four-tuple would still be picked up by the original socket, which would then instantiate another socket object for the new connection.
    - Implementing communication in this way effectively creates a dedicated virtual connection for communication between a specific process running on one host and a specific process running on another host. The advantage of having a dedicated connection like this is that it more easily allows you to put in place rules for managing the communication such as the order of messages, acknowledgements that messages had been received, retransmission of messages that weren't received, and so on. The purpose of these types of additional communication rules is to add more reliability to the communication. 

### Network reliability
- A major characteristic of the communication protocols that are primarily used to provide the functionality for the lower layers in our network system is that they are inherently unreliable. 
-  If the data is corrupt however, these protocols simply discard it (dropping the frame or packet); there is no provision within these protocols for enabling the replacement of lost data. 
- The possibility of losing data and it not being replaced means that the network up to and including the Internet Protocol is effectively an unreliable communication channel.
- How can we transfer data reliably over an unreliable channel?
- Fundamental elements required for reliable data transfer:
    - In order delivery: data is received in the order that it was sent
    - Error detection: corrupt data is identified using a checksum
    - Handling data loss: missing data is retransmitted based on acknowledgements and timeouts
    - Handling duplication: duplicate data is eliminated through the use of sequence numbers
- Such a "Stop and Wait" protocl is very slow, for performance use "pipelining".
    - The sender will implement a 'window' representing the maximum number of messages (n) that can be in the 'pipeline' at any one time, once it has received the appropriate acknowledgements for the messages in the window, it moves the window on.
    - That way n messages can be in transit at the same time.

### Transmission Control Protocol (TCP)
- What TCP essentially provides is the abstraction of reliable network communication on top of an unreliable channel. What this abstraction does is to hide much of the complexity of reliable network communication from the application layer: data integrity, de-duplication, in-order delivery, and retransmission of lost data.
- Reliability isn't the only thing that TCP provides however, it also provides data encapsulation and multiplexing. It achieves this through the use of TCP Segments.

#### TCP Segments
- Segments are the Protocol Data Unit (PDU) of TCP.
- Segment header:
    - Source and destination port: provide the multiplexing capability of the protocol
    - Sequence and acknowledgement number: these two fields are used together to provide for the other elements of TCP reliability such as In-order Delivery, Handling Data Loss, and Handling Duplication.
    - Checksum: for error detection
    - Window size
    - Flags
    
#### TCP Connections
- TCP is a connection-oriented protocol.
- Three-way Handshake' this is where the SYN and ACK flags come into play; theFIN flag is used in different process, the Four-way Handshake, used for terminating connections.
- Why TCP handshake? https://networkengineering.stackexchange.com/a/24072

--- 

1. The sender sends a SYN message (a TCP Segment with the SYN flag set to 1)
2. Upon receiving this SYN message, the receiver sends back a SYN ACK message (a TCP Segment with the SYN and ACK flags set to 1)
3. Upon receiving the SYN ACK, the sender then sends an ACK (a TCP Segment with the ACK flag set to 1)

--- 

- Upon sending the ACK, the sender can immediately start sending application data. The receiver must wait until it has received the ACK before it can send any data back to the sender. One of the main reasons for this process is to synchronise (SYN) the sequence numbers that will be used during the connection.

![title](img/tcp.png)

- A key characteristic of TCP is that the sender cannot send any application data until it has sent the ACK Segment. 
- What this means in practical terms, is that there is an entire round-trip of latency before any application data can be exchanged. Since this hand-shake process occurs every time a TCP connection is made, this clearly has an impact on any application which uses TCP at the transport layer.
- TCP involves a lot of overhead in terms of establishing connections, and providing reliability through the retransmission of lost data. In order to mitigate against this additional overhead, it is important that the actual functioning of data transfer when using the protocol occurs as efficiently as possible. In order to help facilitate efficient data transfer once a connection is established, TCP provides mechanisms for flow control and congestion avoidance.

#### Flow Control
- Mechanism to prevent the sender from overwhelming the receiver with too much data at once. 
- Each side of a connection can let the other side know the amount of data that it is willing to accept via the WINDOW field of the TCP header.
- This number is dynamic, and can change during the course of a connection. If the receiver's buffer is getting full it can set a lower amount in the WINDOW field of a Segment it sends to the sender, the sender can then reduce the amount of data it sends accordingly.

#### Congestion Avoidance
- Occurs when there is more data being transmitted on the network than there is network capacity to process and transmit the data. 
- IP packets move across the networks in a series of 'hops'. 
- At each hop, the packet needs to be processed: the router at that hop runs a checksum on the packet data; it also needs to check the destination address and work out how to route the packet to the next hop on its journey to that destination. All of this processing takes time, and a router can only process so much data at once. 
- Routers use a 'buffer' to store data that is awaiting processing, but if there is more data to be processed than can fit in the buffer, the buffer over-flows and those data packets are dropped.
- TCP retransmits lost data. If lots of data is lost that means lots of retransmitted data, which is inefficient. 
    - Ideally we want to keep retransmission to a minimum.
    - TCP actually uses data loss as a feedback mechanism to detect, and avoid, network congestion; if lots of retransmissions are occurring, TCP takes this as a sign that the network is congested and reduces the size of the transmission window.

#### Disadvantages of TCP
- Latency overhead in establishing a TCP connection due to the handshake process.
- Head-of-Line (HOL) blocking: relates to how issues in delivering or processing one message in a sequence of messages can delay or 'block' the delivery or processing of the subsequent messages in the sequence.
    - If one of the segments goes missing and needs to be retransmitted, the segments that come after it in the sequence can't be processed, and need to be buffered until the retransmission has occurred. This can lead to increased queuing delay.

### User Datagram Protocol (UDP)
- The Protocol Data Unit (PDU) of UDP is known as a Datagram.
- Header:
    - Source Port
    - Destination Port
    - UDP Length (the length, in bits, of the Datagram, including any encapsulated data)
    - Checksum field to provide for error detection. (optional when using IPv4)
- Unlike TCP, UDP doesn't do anything to resolve the inherent unreliability of the layers below it.
    - It provides no guarantee of message delivery
    - It provides no guarantee of message delivery order
    - It provides no built-in congestion avoidance or flow-control mechanisms
    - It provides no connection state tracking, since it is a connectionless protocol
- Why UDP? 
    - The advantage that UDP has over TCP is its simplicity.
    - This simplicity provides two things to a software engineer: speed and flexibility.
    - UDP is a connectionless protocol:  Applications using UDP at the Transport layer can just start sending data without having to wait for a connection to be established with the application process of the receiver. 
    - Latency is less of an issue since without acknowledgements data essentially just flows one way: from sender to receiver.
    - The lack of in-order delivery also removes the issue of Head-of-line blocking (at least at the Transport layer).
- With UDP it's up to the developer to implement the services that the application needs at the application level.

### Summary
Multiplexing and demultiplexing provide for the transmission of multiple signals over a single channel

Multiplexing is enabled through the use of network ports

Network sockets can be thought of as a combination of IP address and port number

At the implementation level, sockets can also be socket objects

The underlying network is inherently unreliable. If we want reliable data transport we need to implement a system of rules to enable it.

TCP is a connection-oriented protocol. It establishes a connection using the Three-way-handshake

TCP provides reliability through message acknowledgement and retransmission, and in-order delivery.

TCP also provides Flow Control and Congestion Avoidance

The main downsides of TCP are the latency overhead of establishing a connection, and the potential Head-of-line blocking as a result of in-order delivery.

UDP is a very simple protocol compared to TCP. It provides multiplexing, but no reliability, no in-order delivery, and no congestion or flow control.

UDP is connectionless, and so doesn't need to establish a connection before it starts sending data

Although it is unreliable, the advantage of UDP is speed and flexibility.

## Application Layer Protocols
- We can perhaps think of Application layer protocols as being the rules for how applications talk to each other at a syntactical level. 
- Different types of applications have different requirements with regards to how they communicate at a syntactical level, and so as a result there are many different protocols which exist at the application layer.
- HTTP operates at the application layer and is concerned with structuring the messages that are exchanged between applications; it's actually TCP/IP that's doing all the heavy lifting and ensuring that the request/response cycle gets completed between your browser and the server.

### HTTP and the web
- The World Wide Web, or web for short, is a service that can be accessed via the internet. In simple terms it is a vast information system comprised of resources which are navigable by means of a URL (Uniform Resource Locator). HTTP is closely tied, both historically and functionally, to the web as we know it. It is the primary means by which applications interact with the resources which make up the web.
- Required components:
    - Hypertext Markup Language (HTML) was the means by which the resources in this system should be uniformly structured.
    -  Uniform Resource Identifier (URI), is a string of characters which identifies a particular resource. I
    - Hypertext Transfer Protocol (HTTP) is the set of rules which provide uniformity to the way resources on the web are transferred between applications

### HTTP history
- HTTP 0.9:
    - A request was a single line (no headers or request body) consisting of:
        - The method (GET was the only method supported at this stage)
        - The path (this was simply the path to the resource on the server, e.g. /index.html, rather than a complete URI/ URL, since connection to the server would already have been established by some other means such as telnet).
    - A response under this protocol would be a single hypertext document, with no headers or other meta-data such as status codes or version numbers. The end of the response was signalled by the server closing the connection.
- HTTP 1.0:
    - The addition of two HTTP methods: HEAD and POST.
    - The request line now became more flexible and varied.
        - The Request-URI portion of the request line could now either be a path or an absolute URI. 
        - The HTTP version was added to the end of the request line.
        - Request header with metadata.
    - Responses now included a status line.
        - Status code with accompanying status text.
        - HTTP version being used for the response.
        - Response header with metadata.
- HTTP 1.1:
    - The first standard version of HTTP.
    - Resolved various ambiguities and interoperability issues.
    - Provided much needed performance improvements to better serve the type of content that was now being produced for the web.
    - Before v1.1 HTTP still operated on the basis of using a separate TCP connection for each request-response cycle.
        - The client would open a connection to the server and make the request, the server would then provide the response and close the connection. If the client needed to make another request it would open a new connection.
    - One of the major advances that HTTP/1.1 provided was connection re-use, where the same TCP connection could be used for making multiple requests.
       - Reduced the overhead in latency required for the TCP handshake.
       - Allowed pipelining requests; this is where the client doesn't need to wait for a response to a request before sending more requests. 
    - Addition of more HTTP methods: PUT, DELETE, TRACE, and OPTIONS.
- HTTP/2:
    - HTTP/2 provides multiplexing instead of pipelining. Multiple requests can still be sent at the same time, but in parallel instead of having a reliance on message order.

### Statelessness

- A protocol is said to be stateless when it's designed in such a way that each request/response cycle is completely independent of the previous one.
- Each request made to a resource is treated as a brand new entity, and different requests are not aware of each other. 

### URL (Uniform Resource Locator)

- http://www.example.com:88/home?item=book"
- `http`: The scheme. It always comes before the colon and two forward slashes and tells the web client how to access the resource. In this case it tells the web client to use the Hypertext Transfer Protocol or HTTP to make a request. Other popular URL schemes are ftp, mailto or git. 
- `www.example.com`: The host. It tells the client where the resource is hosted or located.
- `:88` : The port or port number. It is only required if you want to use a port other than the default.
- `/home/`: The path. It shows what local resource is being requested. This part of the URL is optional.
- `?item=book`: The query string, which is made up of query parameters. It is used to send data to the server. This part of the URL is also optional.
- Unless a different port number is specified, port 80 will be used by default in normal HTTP requests. 

#### Query strings / parameters

- http://www.example.com?search=ruby&results=10
- `?` marks the start of the query string
- `search=ruby` is a name/value pair
- `&` means that another name/value pair will follow
- `results=10` is a name/value pair

- Because query strings are passed in through the URL, they are only used in `HTTP GET` requests.

#### URL encoding

URLs are designed to accept only certain characters in the standard 128-character ASCII character set. Reserved or unsafe ASCII characters which are not being used for their intended purpose, as well as characters not in this set, have to be encoded. URL encoding serves the purpose of replacing these non-conforming characters with a % symbol followed by two hexadecimal digits that represent the ASCII code of the character.
- Allowed characters: Only alphanumeric and special characters `$-_.+!'()",` and reserved characters when used for their reserved purposes 
- As long as a character is not being used for its reserved purpose, it has to be encoded.

### Request methods

- `GET` retrieve resources from server
- `POST` send data to server

### HTTP headers

- HTTP headers allow the client and the server to send additional information during the request/response HTTP cycle.
- Headers are colon-separated name-value pairs that are sent in plain text.
- Request headers give more information about the client and the resource to be fetched.
- Response headers contain additional meta-information about the response data being returned.

### Status codes

- `200`: OK
- `302`: Redicect (resource moved)
    - `3XX`: generally used in relation to redirection, and indicate to the client that it must take some additional action in order to complete the request.
- `404`: Resource not found
    - `4XX` error is a general indication that there is a problem with the structure of the request; in other words the server did not understand the request due to its syntax.
- `500`: Internal Server Error
    - `5XX` error codes indicate an error or issue on server side.

### Sessions

- With some help from the client (i.e., the browser), HTTP can be made to act as if it were maintaining a stateful connection with the server, even though it's not. 
- Server sends some form of a unique token to the client. 
- When client makes a request to that server, this token as part of the request, allowing the server to identify clients.
- Called `session identifier`.
- Creates a sense of persistent connection between requests. 
- This sort of faux statefulness has several consequences:
  - First, every request must be inspected to see if it contains a session identifier. 
  - Second, if it does contain a session id, the server must check to ensure that this session id is still valid. The server needs to maintain some rules with regards to how to handle session expiration and also decide how to store its session data. 
  - Third, the server needs to retrieve the session data based on the session id.
  - And finally, the server needs to recreate the application state (e.g., the HTML for a web request) from the session data and send it back to the client as the response.
- Server has to work very hard to simulate a stateful experience, and every request still gets its own response, even if most of that response is identical to the previous response. i.e. unless something like AJAX is used the whole page needs to be crecreated.
- The most common way to store session id information is via a browser cookie. 
- A cookie is a piece of data that's sent from the server and stored in the client during a request/response cycle.
- Cookies or HTTP cookies, are small files stored in the browser and contain the session information.
- The client side cookie is compared with the server-side session data on each request to identify the current session.

### AJAX

- Asynchronous JavaScript and XML. 
- Allows browsers to issue requests and process responses without a full page refresh. 
- AJAX requests are just like normal requests: they are sent to the server with all the normal components of an HTTP request, and the server handles them like any other request. 
- The only difference is that instead of the browser refreshing and processing the response, the response is processed by a callback function, which is usually some client-side JavaScript code to re-render a part of the page.

## Security

### Secure HTTP (HTTPS)

- Requests and responses are strings containing information. 
- If a malicious hacker was attached to the same network, they could employ packet sniffing techniques to read the messages being sent back and forth. 
- With HTTPS every request/response is encrypted before being transported on the network. 
- This means if a malicious hacker sniffed out the HTTP traffic, the information would be encrypted and useless.
- HTTPS sends messages through a cryptographic protocol called TLS for encryption. Earlier versions of HTTPS used SSL or Secure Sockets Layer until TLS was developed. 

### Same-origin policy

- Same-origin policy permits unrestricted interaction between resources originating from the same origin, but restricts certain interactions between resources originating from different origins.
- Same-origin policy doesn't restrict all cross-origin requests. Requests such as linking, redirects, or form submissions to different origins are typically allowed. Also typically allowed is the embedding of resources from other origins, such as scripts, css stylesheets, images and other media, fonts, and iframes. What is typically restricted are cross-origin requests where resources are being accessed programmatically using APIs such as XMLHttpRequest or fetch.
- CORS (Cross-origin resource sharing) is a mechanism that allows interactions that would normally be restricted cross-origin to take place. It works by adding new HTTP headers, which allow servers to serve resources cross-origin to certain specified origins.

### Session Hijacking

- If an attacker gets a hold of the session id, both the attacker and the user now share the same session and both can access the web application.
- Countermeasures for Session Hijacking:
  - Resetting sessions: A new login renders old session id invalid.
  - Expiration time on sessions
  - HTTPS across the entire app

### Cross-Site Scripting (XSS)

- This type of attack happens when you allow users to input HTML or JavaScript that ends up being displayed by the site directly.
- If the server side code doesn't do any sanitization of input, the user input will be injected into the page contents, and the browser will interpret the HTML and JavaScript and execute it.
- Countermeasures: 
  - Sanitize user input
  - Escape all user input data when displaying it.

### Basic server infrastructure
- Web server: is typically a server that responds to requests for static assets: files, images, css, javascript, etc. These requests don't require any data processing, so can be handled by a simple web server.

- Application server: is typically where application or business logic resides, and is where more complicated requests are handled. This is where your server-side code lives when deployed.

- Persistent data store: like a relational database, used to retrieve or create data. Data stores can also be simple files, key/value stores, document stores and many other variations, as long as it can save data in some format for later retrieval and processing.

### Summary 
The Domain Name System (DNS) is a distributed database which translates a domain name such as google.com to an IP Address such as 216.58.213.14.

A URI is an identifier for a particular resource within an information space.

A URL is a subset of URI, but the two terms are often used interchangeably.

URL components include the scheme, host (or hostname), port, path, and query string.

Query strings are used to pass additional data to the server during an HTTP Request. They take the form of name/value pairs separated by an = sign. Multiple name/value pairs are separated by an & sign. The start of the query string is indicated by a ?.

URL encoding is a technique whereby certain characters in a URL are replaced with an ASCII code.

URL encoding is used if a character has no corresponding character in the ASCII set, is unsafe because it is used for encoding other characters, or is reserved for special use within the url.

A single HTTP message exchange consists of a Request and a Response. The exchange generally takes place between a Client and a Server. The client sends a Request to the server and the server sends back a Response.

An HTTP Request consists of a request line, headers, and an optional body.

An HTTP Response consists of a status line, optional headers, and an optional body.

Status codes are part of the status line in a Response. They indicate the status of the request. There are various categories of status code.

HTTP is a stateless protocol. This means that each Request/ Response cycle is independent of Request and Responses that came before or those that come after.

Statefulness can be simulated through techniques which use session IDs, cookies, and AJAX.

HTTP is inherently insecure. Security can be increased by using HTTPS, enforcing Same-origin policy, and using techniques to prevent Session Hijacking and Cross-site Scripting.

## The Transport Layer Security (TLS) Protocol
There are three important security services that are provided by TLS:
- Encryption: a process of encoding a message so that it can only be read by those with an authorized means of decoding the message
- Authentication: a process to verify the identity of a particular party in the message exchange
- Integrity: a process to detect whether a message has been interfered with or faked

Combined these three services provide very secure message exchange over what is essentially an unsecure channel (HTTP).

### Cryptography basics
- Symmetric encryption: 
    - same key used for encryption and decription e.g. ceasar cipher and vigenere cipher
    - works in both directions
- Asymmetric encryption: 
    - uses keypair: public and private key
    - public key: used to encrypt
    - private key: used to decrypt
    - works in one direction
    

### TLS Encryption
- To set up an ecrypted connection TSL performs a TSL Handshake.
- Uses a combination of symmetric and asymmetric cryptography
- The bulk of the message exchange is conducted via symmetric key encryption, but the initial symmetric key exchange is conducted using asymmetric key encryption in order to exchange the key for the subsequent symmetric encryption.
- TLS assumes TCP is being used at the Transport layer, and the TLS Handshake takes place after the TCP Handshake

#### Process:
1. Immediately after Client received `TCP ACK`, client sends `ClientHello` message that contains among other things: max. supported TSL version, list of supported Ciipher-Suites.
2. Server receives `ClientHello` and sends `ServerHello` messsage which: sets the Cipher-Suite and TSL protocol version, the server certificate (which contains the public key and a `ServerHelloDone` marker.
3. Client receives `ServerHelloDone` and initiates key exchange process (in order to create the key for the symmetric encryption). The exact procedure depends on which Cipher-Suite was chosen (e.g. RSA, Diffie-Hellman, etc). 

Example RSA:
4. Client generates `pre-master secret`, encrypts it using the server's public key, and sends `ClientKeyExchange` message to the server that contains the encrypted `pre-master secret`,  a `ChangeCipherSpec` flag and a `Finished` flag.
5. Server receives the encrypted 'pre-master secret' and decrypt it using its private key.
6. Now both client and server will use the 'pre-master' secret, along with some other pre-agreed parameters, to generate the same symmetric key.
7. Server  sends a message with `ChangeCipherSpec` and `Finished` flags. The client and server can now begin secure communication using the symmetric key.


### TLS Authentication
- During the TLS Handshake, as part of its response to the `ClientHello` message, the server provides its certificate. - Part of the function of this certificate is so that the client can use the Public Key contained within it during the key exchange process. 
- Another function of this certificate is to provide a means of identification for the party providing it.

#### Authentication process:
1. Server sends its certificate, which includes its public key.
2. Server creates a 'signature' in the form of some data signed with the server's private key.
3. The signature is sent to the client in a message along with the original data from which the signature was created.
4. Client decrypts the signature using the server's public key and compares the decrypted data to the original version.

If the two versions match then the encrypted version could only have been created by a party in possession of the private key. This proves that the server has access to the private key and therefore is the owner of the certificate.

However, what's to stop a malicious third-party creating their own key pair and certificate identifying them as, say, a well-known bank? 
#### Certificate Authorities and the Chain of Trust
When a Certificate Authority issues a certificate, it does a couple of important things:
1. Verifies that the party requesting the certificate is who they say they are. The way that this is done is up to the CA and will depend to an extent on the type of certificate being issued.

2. Digitally signs the certificate being issued. This is often done by encrypting some data with the CA's own private key and using this encrypted data as a 'signature'. The unencrypted version of the data is also added to the certificate. In order to verify that the certificate was issued by the CA, the signature can be decrypted using the CA's public key and checked for a match against the unencrypted version.

- There are different 'levels' of CA. An 'Intermediate CA' can be any company or body authorised by a 'Root CA' to issue certificates on its behalf.
    -  If the private key of an Intermediate CA somehow became compromised, the root CA can revoke the certificate for Intermediate, therefore invalidating all of the certificates down the chain from it, and simply issue a new one.
- Root Certificates are 'self-signed', and are essentially the end-point of the chain of trust.
- Client software, such as browsers, store a list of these authorities along with their Root Certificates (which includes their public key).
- When receiving a certificate for checking, the browser can go up the chain to the Root Certificate stored in its list.

### TLS Integrity
- In the TSL PDU there is a MAC field (Message Authentication Code)
- The intention of the MAC field in a TLS record is to add a layer of security by providing a means of checking that the message hasn't been altered or tampered with in transit.

#### Process:
1. Sender will create what's called a digest of the data payload by hashing the payload with a pre-agreed value. This hashing algorithm to be used and hash value will have been agreed as part of the TLS Handshake process when the Cipher Suite is negotiated.
2. Sender will encrypt the data payload, encapsulate it into a TLS record, and pass this record down to the Transport layer to be sent to the other party.
3. Receiver will decrypt the data payload and create a digest of the payload using the same algorithm and hash value. 

If the two digests match, this confirms the integrity of the message.

### TSL Summary
By default HTTP Requests and Responses are transferred in plain text; as such they are essentially insecure.

We can use the Transport Layer Security (TLS) Protocol to add security to HTTP communications.

TLS encryption allows us to encode messages so that they can only be read by those with an authorized means of decoding the message

TLS encryption uses a combination of Symmetric Key Encryption and Asymmetric Key Encryption. Encryption of the initial key exchange is performed asymmetrically, and subsequent communications are symmetrically encrypted.

The TLS Handshake is the process by which a client and a server exchange encryption keys.

The TLS Handshake must be performed before secure data exchange can begin; it involves several round-trips of latency and therefore has an impact on performance.

A cipher suite is the agreed set of algorithms used by the client and server during the secure message exchange.

TLS authentication is a means of verifying the identity of a participant in a message exchange.

TLS Authentication is implemented through the use of Digital Certificates.

Certificates are signed by a Certificate Authority, and work on the basis of a Chain of Trust which leads to one of a small group of highly trusted Root CAs.

Certificates are exchanged during the TLS Handshake process.

TLS Integrity provides a means of checking whether a message has been altered or interfered with in transit.

TLS Integrity is implemented through the use of a Message Authentication Code (MAC).

## Browser Optimizations
Browsers have their own optimizations in order to overcome the performance challenges of today's web. While every browser has specific optimizations, there are two broad types: Document-Aware Optimizations and Speculative Optimizations.
- Document-aware optimizations are when the browser leverages networking integrated with parsing techniques to identify and prioritize fetching resources. The goal is to more efficiently load a web page by prioritizing certain resources such as CSS layouts and JS which can take the longest amount of time.

- Speculative optimizations are when the browser learns the navigation patterns of the user over time and attempts to predict user actions. This can involve pre-resolving DNS names, or even pre-rendering pages to frequently visited sites. Or the browser can open a TCP connection in anticipation of an HTTP request when a user hovers over a link.

### Other optimisations
- Remove unnecessary resources to reduce number of requests
- Compress files to reduce size
- Re-use TCP connections, make sure `keep-alive` settings are set properly on server
- Self host external resources to reduce number of DNS resolutions

### HTTP and Real-Time Data Synchronization
The HTTP request-response model requires a request to be sent by the client before a response can be returned. So how would we update page content without issuing a request, e.g. to show a notification.
#### XHR (XMLHttpReques)
- XHR enables clients to manage requests and responses programmatically and asynchronously
- XHR is a key component of Asynchronous JavaScript and XML (or AJAX for short
- While XHR is popular for “real-time” delivery of data updates via the use of polling or long-polling, it may not be the most performant solution when compared to the SSE and WebSocket APIs.

#### SSE (Server-Sent Events)
- SSE is a networking API that enables efficient server-to-client streaming of text-based event data. 
- It enables the server to send real-time notifications or updates created by the server to the client, without requiring the client to send a request for the updates.
- The way that SSE achieves this is through the delivery of messages over a single, long-lived TCP connection. 
- SSE enables efficient, low-latency server-to-client streaming of text-based data in which the client initiates a connection and the server streams updates to the client. 
- After the initial handshake to establish the connection, the client can no longer send any data to the server using that particular connection, instead the server uses it to provide real-time data updates to the client.

#### WebSocket
- The WebSocket API is a simple and minimal API that enables us to deliver arbitrary application protocols between the client and server such that either side can send data to the other at any time.
- Allows either side to independently send messages to each other. This form of communication is known as bidirectional communication.
- Communication over WebSocket consists of messages and application code that does not need to worry about buffering, parsing, and reconstructing received data. 
- It provides low latency delivery of text and binary application data in both directions and overall is a perfect fit for delivering custom application protocols in the browser.
- However, the simplicity and minimal nature of this API comes with a trade-off as the application must account for missing state management, compression, caching, and other services otherwise provided by the browser. 

## Peer to Peer Networking
With a P2P network, there isn't the same clear separation of roles; instead of a 'client' and a 'server', each computer within the network acts as a 'node'. Within this network architecture, each node is capable of performing the functions that both a client and a server would in the client-server model.
- One of the main difficulties in setting up a P2P network involves discovery, i.e. finding other nodes on the network
    - One approach is to use flooding. This is where a message is sent out to the network and each node forwards it until a specified number of network 'hops' have elapsed. 
    - A more structured approach is to use a Distributed Hash Table (DHT). We won't go into the details of exactly how this is implemented, but it is essentially a table of key-value pairs. In a file-sharing context, the key could be a particular filename and the value could be the id of the network node which has the file. The table is split into parts in such a way as to logically map to the underlying structure of nodes within the network, with responsibility maintaining different parts of the table distributed among different nodes.
- Historically, browsers have undertaken the role of the client in a client-server architecture. WebRTC however, provides real-time communication functionality within the browser, effectively allowing the browser to act as a node within a P2P communication network.
- WebRTC is a collection of standards, protocols, and APIs available in most modern web browsers. 
    - It abstracts away the complexities of establishing P2P communication between nodes.
    - WebRTC uses UDP at the Transport layer. 
    - Since UDP is connectionless and inherently unreliable, WebRTC combines various different protocols for session establishment and maintenance (STUN, TURN, ICE), for security (DTLS), and for congestion and flow control and a certain level of reliability (SRTP, SCTP).

## Bash basics
### Variables: 
- To declare a variable: `name="Sara"`
- Note the lack of spaces on either side of the = operator.
- To reference the variable: `$name`

### shebang e.g. `#!/bin/bash`
- at top of file
- The `#!` character sequence is known as the shebang, and it is followed by the path to the file or program that should be used to run the subsequent code in the bash.
    
### Make file executable: 
- `chmod +x filename.sh`

### Conditionals:
- If statements: 
 ```
 if condition 
 then 
    echo "true" 
  fi
```

### Operations
#### Strings
- `-n string`	Length of string is greater than 0
- `-z string`	Length of string is 0 (string is an empty string)
- `string_1 = string_2`	string_1 is equal to string_2
- `string_1 != string_2`	string_1 is not equal to string_2

#### Integers
- `integer_1 -eq integer_2`	integer_1 is equal to integer_2
- `integer_1 -ne integer_2`	integer_1 is not equal to integer_2
- `integer_1 -gt integer_2`	integer_1 is greater than integer_2
- `integer_1 -ge integer_2`	integer_1 is greater than or equal to integer_2
- `integer_1 -lt integer_2`	integer_1 is less than integer_2
- `integer_1 -le integer_2`	integer_1 is less than or equal to integer_2

#### Files
- `-e path/to/file`	file exists
- `-f path/to/file`	file exists and is a regular file (not a directory)
- `-d path/to/file`	file exists and is a directory


### Loops
#### while loop 
- executes while condition is true

```bash 
counter=0
max=10

while [ $counter -le $max ]
do
  echo $counter
  ((counter++))
done
```

#### until loop 
- executes commands until condition becomes true

```bash 
counter=0
max=10

until [ $counter -gt $max ]
do
  echo $counter
  ((counter++))
done
```

#### for loop 

```bash 
numbers='1 2 3 4 5 6 7 8 9 10'

for number in $numbers
do
  echo $number
done
```


### Functions
```bash 
greeting () {
  echo "Hello $1"
  echo "Hello $2"
}

greeting 'Peter' 'Paul' # outputs 'Hello Peter' 'Hello Paul' on separate lines
```
- bash functions refer to the arguments passed in by their position

### Arrays
- `array=()`

## TO-DO:

- Read: https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
- Read: https://jsinibardy.com/how-internet-works