## Local Area Networks (LAN)

- Our home’s LAN allows multiple devices, or clients, to connect either via an **Ethernet** cable or wirelessly to a **switch** (what most of us colloquially call a **“router”**), which is itself connected to a **modem**.
- Each device that can connect to a network does so through its **Network Interface Card (NIC)**
- Each NIC has a unique **Media Access Control (MAC) address**, which is a physical, hardcoded identifier that distinguishes that device on a network.
-  Each device on a network can be statically or dynamically assigned an **IP address** by a switch
- **Unlike a MAC address, an IP Address is not hardcoded into a device, but is instead logical and hierarchical, which allows a device to be more easily located based on its IP address.**
- A **switch** is a networking device that sends data to the appropriate device within the LAN, using an ARP table that matches each IP address to a MAC address.
- **Ethernet** is the protocol that allows for communication within a LAN.a

## Internet Service Providers (ISP)


- Our LAN connects to an ISP’s infrastructure via our modem
- Through a series of routers, our ISP’s infrastructure directs traffic between the many households it serves and the wider Internet. 
- Traveling through all these hops across many routers means there’s always going to be some latency in our connection, regardless of our connection’s bandwidth.

**Key Terms**

- **Bandwidth**: The amount of data that can be sent during a specific unit of time (typically a second) across a network. 
    - We can think of the number of lanes on a highway as being analogous to the concept of bandwidth.
- **Latency**: The time it takes for one piece of data to go from point A to point B in a network. Whereas bandwidth measures the maximum size of data per a set unit of time (i.e. a second), latency measures the time it takes for a set, minimal size of data.
    - To use our highway analogy, a single car traveling at the same speed limit across the same geographic span would take the same amount of time, no matter if you increased or decreased the number of lanes on the highway. The only situation where increasing bandwidth might make a difference in the latency is when there’s so much traffic that data is stuck in a bandwidth bottleneck somewhere.
- **Last-mile latency**: The latency specific to the last link stretch closest to the customer, a stretch often called the “last mile.” This last-mile latency is what ISPs tend to focus on, since that’s the main segment they have control over. 

## The Domain Name System (DNS)

- Whenever you use your browser to go to a specific website, your browser application needs to first figure out which server would have the correct website information.
- Every time you go to a website you’ve never been to before, your browser has to first triangulate the IP address associated with that domain.
- It first hits up the DNS Servers supported by your ISP. If those servers don’t have a record of that domain either, they then forward the request further up the chain until it gets to the DNS, when the request travels down the DNS hierarchy from the root name servers down, until it finds the IP address of that domain name.
- Your browser will then cache (i.e. save) that domain-to-IP address key-value pair so that it won’t have to look it up again the next time you enter that domain.
- It’s simply for convenience, because it’s much easier to remember, for example, “google.com” than it is to remember an IP address.

**Key Terms**
- **Domain name**: A domain name is a unique identifier that points to an IP address, or several IP addresses. A domain name consists of a combination of at least two parts: a top-level domain (TLD) to the right of the (rightmost) dot, and a second-level domain (SLD) to the left of the (rightmost) dot.
    - For the domain “google.com” for example, `com` is the TLD and `google` is the SLD.
- **DNS Servers**: When we say that the DNS is hierarchical in nature, we mean that there are logical steps for identifying the IP address of a particular domain name. This hierarchical process is handled by various levels of DNS Servers, routed and structured by domain name level.

## Cloud Services

**Key Terms**

- **Web Servers**: Servers that host static content (including HTML, Javascript, and CSS files) are called web servers. 
    - These are sites that don’t require much backend processing power, and can merely deliver prepackaged content without having to handle business logic. 
    - Most landing pages, basic company websites, news sites, personal pages, and blogs, such as this one, are hosted by a web server. Web servers probably account for the largest chunk of all server types.
- **Application Servers**: Although a personal blog might be hosted by a basic web server that can easily deliver read-only content, there might be an underlying application (e.g. Wordpress) that needs to not just read, but also actually generate, edit, and delete new static content.
    - Application Servers require much more processing power than web servers because they need to be able to handle the more elaborate, business-logic requests coming from millions (or billions) of users.
- **Database Servers**: Often used in collaboration with application servers (but can also be independent), database servers (also called **data stores**) are designed to manage databases. 
    - There are two main types of database structures used in database design: either a relational model (usually SQL), or a non-relational model (usually the document model)
- **Certificate Authorities**: These companies, also organized in a hierarchy, provide digital certificates that ensure content is coming from a trusted source. 

## The Internet

- **The Internet** connects ISPs, the DNS, and Cloud Services together, and allows them to communicate. 
- It’s useful to think of the Internet as just a network of routers spanning across the entire world, connecting networks together.
- Each router redirects data across a hop to another router, getting the data progressively closer to its final destination IP address.
- Along the way, a signal will encounter four different types of **latency**, causing a delay to its arrival time:
    - **propagation delay**: The natural limits of a signal traveling from sender to receiver, which can be calculated as the ratio between distance and speed, with speed usually being the speed of light. We can think of propagation delay as the physical limits imposed by nature and geographic distance.
    - **transmission delay**: The delay that occurs as a result of the signal traveling down multiple hops or links, interconnecte
    d by routers and switches, etc. Reducing the number of hops required would thereby reduce the transmission delay.
    - **processing delay**: The delay incurred by having to process data at any point along the transmission route. We could potentially reduce processing delay by upgrading to more efficient routers that can redirect traffic faster.
    - **queuing delay**: The delay that occurs when the data is in line waiting to be processed. This can be caused either because of insufficient bandwidth, a particularly high traffic time of the day, or an inefficient router. If data is arriving at a router at a faster rate than the router is able to redirect it appropriately, this gives rise to queuing delay. Breaking up a hop into several hops, with more routers, may help to reduce queuing delay, but it may add to transmission delay.
    
**Key Terms**

- **TCP/IP protocol suite**: A **protocol** is a set of rules that governs how communication is handled. The TCP/IP protocol suite, also known as the **Internet protocol suite** or the Department of Defense (DoD) model, is the set of protocols that has become dominant across the globe.
- **RTT or Round-trip time**: Essentially the sum of the latency of going from A to B, plus the latency of coming back from B to A.

## OSI Model v.s. TCP/IP Model

- The **OSI model** is a conceptual, theoretical model. 
- The model standardized a layered approach to telecommunications, where each layer provides a certain level of abstraction and functionality. Each layer also contains some encapsulated data to be opened and used by the layer directly above it.
- The OSI Model identified seven layers, numbering them from lowest to highest: 
    1) Physical<br>
    2) Data Link<br>
    3) Network<br>
    4) Transport<br>
    5) Session<br>
    6) Presentation<br>
    7) Application<br>
- TCP/IP model identified five layers:
    1) Physical<br>
    2) Data Link<br>
    3) Network<br>
    4) Transport<br>
    5) Application

## Physical Layer

- The Physical layer is the most rudimentary and foundational level, on top of which all the other layers rest.
- This layer concerns itself with transmitting signals, or bits, either over a coaxial or fiberoptic wire or a wireless medium, like Wi-Fi.

## Data Link Layer

- The second layer is the Data Link layer, which interprets the physical transmissions of Layer 1, and converts them into a Frame.
- A **Frame** is a protocol data unit (PDU), that contains a **source MAC Address**, a **destination MAC Address**, and some encapsulated data to be read later by Layer 3.
- LANs, NIC Cards, and Switches all function at this layer. Switches read and process each frame, and send them to the appropriate device on the network that has the frame’s destination MAC Address.
- Because this layer delivers data from one point (or node) on a network to another, it is referred to as providing a **node to node** connection.
- The important protocols at work here are the **Ethernet (wired)** and the **802.11 (Wi-Fi)** protocols.

## Internet Layer

- The Internet layer is where the majority of the action takes place, because that’s where the data travels across routers, from **network to network**.
- The main protocols at work here are the **Internet Protocol version 4 (IPv4)** and the **Internet Protocol version 6 (IPv6)**.
- At this level, routers open up the frame, and read the encapsulated data, which is known as a **Packet**.
- A **Packet** is an Internet-layer PDU that contains a **source IP Address** and **destination IP Address**, and some encapsulated data to be read later by Layer 4.
- Routers then progressively forward the packet to the router that will get it closer to its destination, until it finally arrives to the destination network containing the destination IP address.
- For this reason, the Internet layer is known as providing a **network to network connection**.

## Transport Layer

- The Transport Layer is the first layer that takes place on the actual host device, whether that’s the client or the server.
- This layer opens the encapsulated data from layer 3, which is either a **segment (TCP)** or a **datagram (UDP)**, depending on the protocol used.
- A **Segment** is a Transport-layer PDU that operates with the **TCP protocol**, which is a connection-oriented protocol that ensures **reliability**. 
    - Any form of communication that requires reliability and integrity, like processing credit card payments, loading web pages, or transferring files, is done through TCP segments.
- A **Datagram** is a Transport-layer PDU that operates with the **User Datagram Protocol (UDP)**, which is a **connection-less protocol** that favors **speed**.
    - Any form of communication that requires receiving continuous updates or streams of data with no need to worry about the occasional dropped bits, such as videoconferencing (e.g. Zoom) or voice-over-ip (VOIP) is accomplished through UDP datagrams.
    - DNS queries are also usually done through UDP because of its faster speed.
- Both Segments and Datagrams contain a **source destinatioin port** and a **destination port**, as well as an encapsulated payload to be read by Layer 5.
- A **port** is a specific channel on a device. 
    - Each tab you open on a browser opens on a new port on your computer. Each application type typically runs on specific port ranges.
    - On the server side, for example, web pages are served from port 80 (or 8080), and DNS responses go out from port 53. Mail servers now typically run on port 587, which is dedicated for secure Simple Mail Transfer Protocol (SMTP).
- The combination of an IP address and a port number makes up a unique **Socket**.
- That’s why the Transport layer is known as providing a **port to port connection**, or a **socket to socket connection**
- The process of sending the contents of received datagrams and segments to the correct destination ports is known as **demultiplexing**. 
- The reverse process, which is gathering content from all the ports, and blending them into a single channel is known as **multiplexing**.

## Application Layer

- Finally, once the content of segments or datagrams is sent to the correct port, the Application layer is finally ready to read the encapsulated data, which is a **Message** (sometimes also simply called **Data**).
- A **Message** is an Application-layer PDU that can contain many different things, depending on what Application process runs on that port, and/or what protocol was used to encapsulate the message in the first place.
- A Message can be, among other things: 
    - an HTTP request
    - an HTTP response
    - a DNS query
    - a DNS response
    - an SMTP message
    - a TLS record (used for HTTPS).
- Each Application process runs on a dedicated port. That’s why this layer is known as providing a **process to process connection**.

## Layered Trip Through an HTTP Request/Response Cycle

- Break down all the steps that happen from the time we enter a URL in our browser to the time we get a page loaded.
- In reality the HTTP request/response cycle (and any communication) actually **goes from the Application layer down, and then back up again**.

![layered trip](layered_trip_graph.jpeg)

**Request Phase - Client Side**

1. User enters a URL in browser tab - Let’s say we entered `google.com`
2. **DNS Lookup** to get destination IP 
    - If our browser had never been to this site before, it would have to first send a DNS query in order to find the destination IP address of the `google.com` domain. 
    - That would of course require a separate DNS query/response cycle. Here, we are assuming we’ve been to the site before and therefore have the IP address saved in our browser’s cache.
3. **HTTP Request** sent to Port 7124 of User’s device.
    - Your port number might be different, of course. Each opened browser tab you have uses a unique port.
4. **Multiplexing** - The client processes signals from all ports, including the one sent to 7124, and prepares them for transmission over a single channel.
5. **Segmentation** divides HTTP Request into Segments 
    - This breaks up the HTTP Request message into multiple pieces, each to be contained in a separate segment. 
    - Henceforth we’ll just focus on the journey of one segment, but the same is happening for all the other segments of our HTTP Request.
6. Segment is wrapped in a **Packet**
7. **Packet travels across Internet routers**
    - The packet keeps being wrapped in a new **frame** (i.e. Layer 2) so it can travel to the next hop, then opened again as a **packet** (back to Layer 3), and wrapped again in a new frame with an updated **destination MAC address**, etc.
    - After many hops, however, it finally arrives at the destination network, which in this case is the network that the server with the destination IP address is on.
    
**Request Phase - Server Side**

8. **Packet is wrapped in a Frame** - This is the final frame on this leg of the trip, containing the server’s MAC address as the destination MAC address.
9. Frame travels across network to the destination server
10. Server receives frame through its NIC card
    - Note that this step also involves Layer 1, since the NIC Card is interpreting the signals it receives, either via a cable or wirelessly. The same could be said about the Routers on Layer 3, since they also have to process physical signals.
11. Server opens frame
12. Server opens packet
13. **Reassembly** - Server receives all the segments and reassembles them into the HTTP Request.
14. **Demultiplexing** - This process sends each compiled message to the appropriate port–which is 80 in the case of a server processing an HTTP request.
15. Server opens Request on Port 80

**Response Phase - Server Side**

16. **Server processes Request and generates a Response** - Occasionally, a server may need to generate multiple responses. 
    - An HTTP response will include a status code indicating whether the server found the information or not.
17. Server sends Response to Port 80
18. **Multiplexing** The server processes signals from all ports, including the one sent to 80, and prepares them for transmission over a single channel.
19. **Segmentation** - The response is divided up and wrapped in separate segments. We’ll focus on one segment for the rest of this leg, but the same thing is happening to all segments.
20. Segment is wrapped in a Packet
21. Packet travels across Internet routers

**Response Phase - Client Side**

22. **Packet is wrapped in a Frame** - This is the final frame for this leg of the trip, containing our client’s MAC address as the destination MAC address.
23. Frame travels across network to the destination client 
    - For most people working from home, this would be the home LAN, but it could also be the coffee shop’s WLAN or the office intranet.
24. Client receives frame through its NIC card - Same comment here as step 10.
25. Client opens frame
26. Client opens packet
27. **Reassembly** - Client assembles all the segments back together to recreate the HTTP response
28. **Demultiplexing** - The response is sent to the appropriate port, which is 7124 in the case of our specific tab that sent the original request.
29. Client opens Response on Port 7124 - This may involve the browser interpreting the HTML page, applying CSS, and running the Javascript.
30. Client’s browser tab displays the requested URL page - We enjoy our page.<br><br>
**This entire process constitutes one round-trip time (RTT), which is the time it takes for data to go all the way to a server and then come back to the client.**

## Layered Reliability and Security

- The concept of **encapsulating each layer’s PDU** provides a basic foundation for how various Internet security measures function. 
- Most fundamentally, encapsulation is a way to enclose certain bits of data to render them inaccessible from external contexts.
- Each layer therefore secures its contents from the layer beneath it.

## Physical Layer bits

- The Physical layer allows the entire system to function.
- Consisting of bits transmitted either across a wire or wirelessly, the physical layer contains a payload that will be progressively decoded by higher layers.
- Although the Physical layer doesn’t technically contain anything that can be classified as data ***yet***, and therefore does not have a PDU, there is nevertheless a concept worth addressing – the **Interframe Gap**.
- The **Interframe Gap (IFG)** is a required pause in the signal transmission, which lets a NIC card operating on layer 2 know that a frame was completed and another may begin.
- The physical layer’s IFG therefore contributes to reliability by ensuring that the signals of one frame don’t get accidentally mixed up with the signals of another frame during the initial transmission.

## Data Link Layer PDU - Frames

- The Data Link layer converts the physical transmissions of Layer 1 into a **Frame**, which is the first formal PDU.
- In addition to containing the **source and destination MAC addresses** of where this frame originated from and is going to, a frame contains an **encapsulated payload**, and a **Frame Check Sequence** at the end.
- This **Frame Check Sequence (FCS)** is what the Data Link layer uses for error detection (using a checksum), to make sure that the frame has neither lost any bits along the way, nor has been corrupted due to signal interference.

## Internet Layer PDU - Packet

- The Internet Protocols used in layer 3 truly comprise the cornerstone of the Internet protocol suite that make the modern Internet possible.

**MAC addresses v.s. IP Addresses**

- If we did not have IP addresses and had to rely simply on MAC addresses to connect a client and a server, the process would be profoundly inefficient and time consuming, because every device would need to somehow keep track of where all the most-used destination servers are, and the MAC addresses of all the routers along the way–and we’d need to keep updating these routes every single time a router or server is replaced, since each device has a unique MAC address.
- Thanks to the **logical** and **dynamic** nature of IP addresses, however, that’s not a concern.

**IPv4 v.s. IPv6**

- The main difference between packets encoded in IPv4 and those encoded in IPv6 is the **size of the space reserved on the packet’s header for the source and destination IP addresses**.
    - In IPv4, 32 bits were reserved, which meant that there could be up to about 4.3 billion unique combinations
    - IPv6 reserves 128 bits for the source and destination IP address fields. This should be able to accommodate 2^128 addresses, which is equal to 340 trillion trillion trillion IP addresses.
    
**Time To Live**

- The Time-To-Live (TTL) field is on a packet’s header.  
- The purpose of this number is to make sure that packets don’t get somehow caught in limbo or otherwise run around continuously across routers, never finding their destination yet taking up bandwidth unnecessarily.
- They may originally start out with, for example, a TTL of 64. At every hop, the router decrements that number by one. Once the TTL reaches 0, the packet is discarded and not allowed to be forwarded along anymore.

## Transport Layer PDU - Datagrams and Segments

**Datagrams**

- The main reliability feature of Datagrams is the checksum, which datagrams actually have in common with segments. A
- A **checksum** is an error detecting mechanism ensuring the PDU received contains the same content as the PDU sent.

**Segments**

- The TCP protocol’s primary advantage compared against UDP is its **reliability**.
- TCP is a **connection-oriented protocol**, which means that before any data can even be exchanged, the TCP protocol requires that a connection first be established between the client and the server.

**TCP Handshake and Flags**

- **TCP Handshake and Flags**
    - The TCP Handshake establishes this initial connection, using the **Flags** component of the segment header.
    - It uses a three-step process that takes one and a half RTTs (round-trip times).
        - 1. The client sends an empty (i.e. with no payload, or bodyless) SYN segment (i.e. where the segment header’s SYN flag is set to true, or on)
        - 2. The server receives the segment and replies with another bodyless segment with the SYN and ACK flags turned on
        - 3. The client receives the server’s segment and sends a final acknowledgment back to the server, which is a bodyless segment containing only the ACK flag turned on.
    - Immediately after that’s complete, the client begins sending the actual HTTP request (or whatever is being sent in the Message).<br><br>
    
- **Segmentation and In-Order Delivery**
    - TCP enables segmentation, which is the breaking up of the Message into multiple segments so that these can travel in separate packets.
    - This enables **in-order delivery**.
    - With **datagrams**, there’s no concept of order. The datagram is simply processed in order that it is received.
    - With segments, however, sequence is preserved using the **Sequence #** field in the header. This field is filled in during the segmentation process. 
    -  As soon as a segment is received, a parallel, bodyless reply is sent with the corresponding **Acknowledgment #** filled in
    - This sequence number is also how the Message is reconstructed in-order during the **Reassembly** phase.<br><br>
    
- **Pipelining and Window Size**
    - **Pipelining** is the concept of sending multiple signals at the same time, to maximize bandwidth use. 
        - The analogy, for those who’ve ever moved before, is dividing up and sending your belongings into multiple trucks (or cars) instead of having to wait for the one truck to go back and forth several trips.
    - Exactly how many “trucks” are working simultaneously is determined by the **Window** size on the segment header.
    - The sender doesn’t move on to the next 5 segments until it has received an acknowledgment from the receiver for those 5 segments
    - If sufficient time has passed and the sender has still not received an acknowledgment back (**Acknowledgement #**) on any segment, the sender actually resends that segment (**retransmission of lost data**)
    - Any segment already received (i.e. with the same **Sequence #** as an already-received segment) is simply dropped (known as **de-duplication**).<br><br>

- **Flow Control**
    - With every acknowledgment that it sends back, the receiver uses the **Window Size** to indicate how many segments at a time it is currently capable of receiving. 
    - This helps the sender determine how busy the receiver is, and thereby adjust its sending window size accordingly. 
    - **That’s how TCP provides Flow Control**.<br><br>
    
- **Congestion Avoidance**
    - Although flow control helps the sender determine how busy the receiver is, it doesn’t really say anything about how busy **the traffic is**.
    - Using an algorithm that keeps track of how long it takes to receive acknowledgments back, the sender gets a sense of how congested the network is, and adjusts its sending behavior accordingly. 
    - This helps to reduce network congestion, and ensures transmission is only occurring when there’s some bandwidth capacity.
    
<img src="tcp_reliability.png" width=500>

## Application Layer PDU - Messages

- There are many different types of messages at the Application layer (DNS query/response, HTTP request/reponse, TLS Record (HTTPS, etc.)

**HTTP Request**

- The HTTP Request message contains a:
    - **Header**, which can be further divided into two sub-components: 
        - 1) a Request line
            - HTTP Method
            - PAth (the full URL for `GET`)
            - HTTP Version
        - 2) a set of Headers
            - Key-value pairs, e.g. Host, User-Agent, Accept-Language, Cookie(s), etc
    - The **Body** of a request is typically empty, but would contain the query parameters of a POST request (if there are any).
- The request line contains the most important components and is **required**, but headers are **optional**.
- The three items in the Request Line are separated by a **space**.
- An **empty line** separates and delineates the Header from the Body.
- Any query parameters used in a `GET` request are included in the Path of the Request Line. 
- For `POST` requests, however, query parameters are separated out and part of the Body.

**HTTP Response**

- The HTTP Response message contains a:
    - **Header**, which can be further divided into two sub-components: 
        - 1) a Status line
            - HTTP Version
            - Status Code
            - Status Text
        - 2) a set of Headers.
            - Key-value pairs, e.g. Server, Content-Type, Set-cookie, etc
    - The **Body** of a response contains the actual contents returned by the server, e.g. the HTML page.
- The Status line contains the most important components and is **required**, but headers are **optional**.
- The three items in the Status Line are separated by a **space**.
- An **empty line** separates and delineates the Header from the Body.<br><br>
- **Both HTTP Requests and Responses happen over TCP, which allows the message to be broken up into segments, and provides additional reliability.**

## TLS and HTTPS

- Although **TLS** stands for **Transport Layer Security**, it is currently technically part of the TCP/IP’s Application layer, and provides encryption to any other Application-layer message type.
- TLS is mainly used to **encrypt HTTP messages**, thereby creating **Secure HTTP, or HTTPS** (aka. HTTP/TLS, or “HTTP over TLS”).
- It might be helpful to conceptually think of TLS as being an **imaginary Security Layer in between the Transport Layer and the Application Layer**.
- Although **TLS only works on top of TCP**, there is another protocol called **DTLS which is the equivalent of TLS, but for UDP**<br>

**TLS Record**

- A TLS Record’s header contains: 
    - 1) the Content Type that is in the payload
    - 2) the TLS version used by this record
    - 3) the Length of the record.
- In the first stages of the TLS process, which **happens immediately after the TCP handshake is completed**, the TLS handshake takes place.
- The  **Message Authentication Code (MAC)**, which is on the record’s footer, is essentially a form of a checksum, but instead of serving as a check against corruption, it serves as a check against **data tampering**.
    - It uses the same symmetric key exchanged in the TLS Handshake to generate the MAC, thereby ensuring that the contents came from the same source.
    
**TLS Handshake**

- Right after the client sends the third part of the TCP handshake, i.e. the `“ACK”` segment, the client also initiates the first part of the TLS handshake, which is the `ClientHello` step.

<img src="tls_handshake.png" width=500>

- After the TLS Handshake is successfully completed, all the subsequent messages sent and received are TLS records containing the encrypted piece of the HTTP request or HTTP response.
- The headers of these records would be marked with the Application Data content type. The receiver would need to decode them back to the original HTTP request or response before being able to process or display the contents.

**Certificate Authorities**

- Each site’s certificate (which the server sends in step 2 of the TLS handshake) is signed by an Intermediate Certificate Authority (CA) that certifies the certificate’s authenticity. 
- Each CA in turn derives its authority from another intermediate CA higher up the chain, all the way up to the Root CA.
- The hierarchical structure of certificate authorities establishes the **Chain of Trust** from the original site all the way to the Root CA.