## LS170 Networking foundations

## Basics

### What is the internet?
- LAN (Local Area Network): multiple computers and other devices connected via a network bridging device such as a hub or, more likely, a switch. The computers are all connected to this device via network cables, and this forms the network. The scope of communications is limited to devices that are connected (either wired or wirelessly) to the network switch or hub, which imposes some geographic limitations. That's the 'local' in Local Area Network.
- Inter-network Communication: In order to enable communication between networks, we need to add routers into the picture. Routers are network devices that can route network traffic to other networks. Within a Local Area Network, they effectively act as gateways into and out of the network.
- Network of networks: We can imagine the internet as a vast number of these networks connected together. In between all of the sub-networks are systems of routers that direct the network traffic.

### What is a protocol?
- In general, a protocol is a "system of rules".
- In terms of networks:  A protocol is "set of rules governing the exchange or transmission of data".
- Examples of the most common protocls: IP, SMTP, TCP, HTTP, Ethernet, FTP, DNS, UDP, TLS
- Why so many protocols? 
    1. Different protocols were developed to address different aspects of network communication. (TCP and HTTP)
    2. Different protocols were developed to address the same aspect of network communication, but in a different way or for a specific use-case. (TCP vs UDP)

### A layered system
#### Network models
- OSI model: divides the layers in terms of the functions that each layer provides (physical addressing, logical addressing and routing, encryption, compression, etc).
- Internet Protocol Suite (TCP/IP) model: divides the layers in terms of the scope of communications within each layer (within a local network, between networks, etc).
![title](img/network_models.png)
- Both models have utility, no single model will perfectly fit a real-world implementation. 
- Such models are useful for gaining a broad-brush view of how a system works as a whole, and for modularizing different levels of responsibility within that system.
- However, attempting to strictly adhere to the model when drilling into the detail of how a specific protocol works can be counter-productive. 

###
- A Protocol Data Unit (PDU) is an amount or block of data transferred over a networ

### Statelessness

- A protocol is said to be stateless when it's designed in such a way that each request/response cycle is completely independent of the previous one.
- Each request made to a resource is treated as a brand new entity, and different requests are not aware of each other. 

### URL (Uniform Resource Locator)

- http://www.example.com:88/home?item=book"
- `http`: The scheme. It always comes before the colon and two forward slashes and tells the web client how to access the resource. In this case it tells the web client to use the Hypertext Transfer Protocol or HTTP to make a request. Other popular URL schemes are ftp, mailto or git. 
- `www.example.com`: The host. It tells the client where the resource is hosted or located.
- `:88` : The port or port number. It is only required if you want to use a port other than the default.
- `/home/`: The path. It shows what local resource is being requested. This part of the URL is optional.
- `?item=book`: The query string, which is made up of query parameters. It is used to send data to the server. This part of the URL is also optional.
- Unless a different port number is specified, port 80 will be used by default in normal HTTP requests. 

### Query strings / parameters

- http://www.example.com?search=ruby&results=10
- `?` marks the start of the query string
- `search=ruby` is a name/value pair
- `&` means that another name/value pair will follow
- `results=10` is a name/value pair

- Because query strings are passed in through the URL, they are only used in `HTTP GET` requests.

### URL encoding

URLs are designed to accept only certain characters in the standard 128-character ASCII character set. Reserved or unsafe ASCII characters which are not being used for their intended purpose, as well as characters not in this set, have to be encoded. URL encoding serves the purpose of replacing these non-conforming characters with a % symbol followed by two hexadecimal digits that represent the ASCII code of the character.
- Allowed characters: Only alphanumeric and special characters `$-_.+!'()",` and reserved characters when used for their reserved purposes 
- As long as a character is not being used for its reserved purpose, it has to be encoded.

## Requests

### Request methods

- `GET` retrieve resources from server
- `POST` send data to server

### Request headers

- HTTP headers allow the client and the server to send additional information during the request/response HTTP cycle. - Headers are colon-separated name-value pairs that are sent in plain text.
- Request headers give more information about the client and the resource to be fetched.

## Responses

### Status codes

- `200`: OK
- `302`: Redicect (resource moved)
- `403`: Resource not found
- `500`: Internal Server Error

### Response headers

- Response headers contain additional meta-information about the response data being returned.

## Stateful web applications

### Sessions

- With some help from the client (i.e., the browser), HTTP can be made to act as if it were maintaining a stateful connection with the server, even though it's not. 
- Server sends some form of a unique token to the client. 
- When client makes a request to that server, this token as part of the request, allowing the server to identify clients.
- Called `session identifier`.
- Creates a sense of persistent connection between requests. 
- This sort of faux statefulness has several consequences:
  - First, every request must be inspected to see if it contains a session identifier. 
  - Second, if it does contain a session id, the server must check to ensure that this session id is still valid. The server needs to maintain some rules with regards to how to handle session expiration and also decide how to store its session data. 
  - Third, the server needs to retrieve the session data based on the session id. And finally, the server needs to recreate the application state (e.g., the HTML for a web request) from the session data and send it back to the client as the response.
- Server has to work very hard to simulate a stateful experience, and every request still gets its own response, even if most of that response is identical to the previous response. i.e. unless something like AJAX is used the whole page needs to be crecreated.
- The most common way to store session id information is via a browser cookie. 
- A cookie is a piece of data that's sent from the server and stored in the client during a request/response cycle.
- Cookies or HTTP cookies, are small files stored in the browser and contain the session information.
- The client side cookie is compared with the server-side session data on each request to identify the current session.

## AJAX

- Asynchronous JavaScript and XML. 
- Allows browsers to issue requests and process responses without a full page refresh. 
- AJAX requests are just like normal requests: they are sent to the server with all the normal components of an HTTP request, and the server handles them like any other request. 
- The only difference is that instead of the browser refreshing and processing the response, the response is processed by a callback function, which is usually some client-side JavaScript code to re-render a part of the page.

## Security

### Secure HTTP (HTTPS)

- Requests and responses are strings containing information. 
- If a malicious hacker was attached to the same network, they could employ packet sniffing techniques to read the messages being sent back and forth. 
- With HTTPS every request/response is encrypted before being transported on the network. 
- This means if a malicious hacker sniffed out the HTTP traffic, the information would be encrypted and useless.
- HTTPS sends messages through a cryptographic protocol called TLS for encryption. Earlier versions of HTTPS used SSL or Secure Sockets Layer until TLS was developed. 

### Same-origin policy

- Same-origin policy permits unrestricted interaction between resources originating from the same origin, but restricts certain interactions between resources originating from different origins.
- Same-origin policy doesn't restrict all cross-origin requests. Requests such as linking, redirects, or form submissions to different origins are typically allowed. Also typically allowed is the embedding of resources from other origins, such as scripts, css stylesheets, images and other media, fonts, and iframes. What is typically restricted are cross-origin requests where resources are being accessed programmatically using APIs such as XMLHttpRequest or fetch.
- CORS (Cross-origin resource sharing) is a mechanism that allows interactions that would normally be restricted cross-origin to take place. It works by adding new HTTP headers, which allow servers to serve resources cross-origin to certain specified origins.

### Session Hijacking

- Countermeasures for Session Hijacking:
  - Resetting sessions: A new login renders old session id invalid.
  - Expiration time on sessions
  - HTTPS across the entire app

### Cross-Site Scripting (XSS)

- This type of attack happens when you allow users to input HTML or JavaScript that ends up being displayed by the site directly.
- If the server side code doesn't do any sanitization of input, the user input will be injected into the page contents, and the browser will interpret the HTML and JavaScript and execute it.
- Countermeasures: 
  - Sanitize user input
  - Escape all user input data when displaying it.

## TO-DO:

- Read: https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol