# Intro to HTTP

## What to Focus On

- **Develop a clear understanding of the role of HTTP**: The focus of this lesson is on HTTP, however it doesn't function in isolation. As you work through the lesson, try and build a picture of the functioning of the web as a combination of multiple different technologies, a combination within which HTTP has a specific role.
- **Break things down into individual components**: Ensure clarity within your mental models by breaking concepts like HTTP and URLs into specific components, and understanding the purpose of each of those components.

## The Application Layer

- Both the TCP/IP model and the OSI model define an Application layer as the topmost layer in their respective layered systems. 
- Something to be clear about here is that **the application layer is not the application itself, but rather a set of protocols which provide communication services to applications**.
- One thing both models have in common however is that the protocols which exist at the Application layer are the ones with which the application most directly interacts.
- Application layer protocols rely on the protocols at the layers below them to ensure that a message gets to where it is supposed to, and focus instead on the structure of that message and the data that it should contain.

### Application Layer Protocols

- We can perhaps think of Application layer protocols as being the rules for how applications talk to each other at a syntactical level.
- Different types of applications have different requirements with regards to how they communicate at a syntactical level, and so as a result there are many different protocols which exist at the application layer. 
-  HTTP is the primary protocol used for communication on the Web.

## HTTP and the Web

- The **internet** is essentially a network of networks. It can be thought of as the infrastructure that enables inter-network communication, both in terms of the physical network and the lower-level protocols that control its use.
- The **World Wide Web**, or web for short, is a **service** that can be accessed via the internet.
- It is a vast information system comprised of resources which are navigable by means of a URL (Uniform Resource Locator)
- HTTP is closely tied, both historically and functionally, to the web as we know it. It is the primary means by which applications interact with the resources which make up the web.

### A Brief History of the Web

- **Hypertext Markup Language (HTML)** was the means by which the resources in this system should be **uniformly structured**.
- A **Uniform Resource Identifier** (URI), is a string of characters which identifies a particular resource. It is part of a system by which resources should be **uniformly addressed** on the Web.
- **Hypertext Transfer Protocol** (HTTP) is the set of rules which provide **uniformity to the way resources on the web are transferred between applications**.

## Introduction to HTTP

<h2> <mark>Background</mark></h2>

- Under your browser's hood lies a collection of files -- CSS, HTML, Javascript, videos, images, etc. -- that makes displaying the page possible. 
- All these files were sent from a **server** to your browser, the **client**, by an application protocol called HTTP (yes, this is why URLs in your browser address bar start with "http://").
- HTTP, or Hypertext Transfer Protocol, is a system of rules, a protocol, that serve as a link between applications and the transfer of hypertext documents. 
- It's an agreement, or message format, of how machines communicate with each other. 
- HTTP follows a simple model where a **client** makes a **request** to a server and waits for a **response**. 
- Hence, it's referred to as a **request response protocol**.

#### How the Internet Works

- All devices that participate in a network are provided unique labels. The general term for this type of label is an **IP Address** (Internet Protocol Address).
- **Port numbers** add more detail about how to communicate 
- `192.168.0.1:1234`
    - the IP Address is `192.168.0.1` and
    - the port number is `1234`
- An IP Address acts as the identifier for a device or server, which can contain hundreds or thousands of ports, each used for a different communication purpose to that device or server.
- Effective communication begins when each device has a public IP address provided by an Internet Service Provider.

**DNS**

- This mapping from domain name to IP address is handled by the **Domain Name System** or **DNS**.
- DNS is a distributed database which translates domain names like www.google.com to an IP address, so that the IP address can then be used to make a request to the server. 
- DNS databases are stored on computers called **DNS servers**.
-  There is a very large world-wide network of hierarchically organized DNS servers, and no single DNS server contains the complete database.<br><br>
- Your typical interaction with the Internet starts with a web browser when you:
    1. Enter a URL like http://www.google.com into your web browser's address bar.
    2. The browser creates an HTTP request, which is packaged up and sent to your device's network interface.
    3. If your device already has a record of the IP address for the domain name in its DNS cache, it will use this cached address. If the IP address isn't cached, a DNS request will be made to the Domain Name System to obtain the IP address for the domain.
    4. The packaged-up HTTP request then goes over the Internet where it is directed to the server with the matching IP address.
    5. The remote server accepts the request and sends a response over the Internet back to your network interface which hands it to your browser.
    6. Finally, the browser displays the response in the form of a web page.<br><br>
- The main thing to understand though is that when your browser issues a request, it's simply sending some text to an IP address. 
- Because the client (web browser) and the server (recipient of the request) have an agreement, or protocol, in the form of HTTP, the server can take apart the request, understand its components and send a response back to the web browser. 
- The web browser will then process the response strings into content that you can understand. 

#### Clients and Servers

- The most common client is an application you interact with on a daily basis called a **Web Browser**.
- Web browsers are responsible for issuing HTTP requests and processing the HTTP response in a user-friendly manner onto your screen. 
- The content you're requesting is located on a remote computer called a server. 
- Servers are nothing more than machines or devices capable of handling inbound requests, and their job is to issue a response back. 
- Often, the response they send back contains relevant data as specified in the request.

#### Resources

- "Resource" is a generic term for things you interact with on the Internet via a URL.
- This includes images, videos, web pages and other files.
- Resources can also be in the form of software that lets you trade stock or play a video game.

#### Statelessness

- A protocol is said to be **stateless** when it's designed in such a way that each request/response pair is completely independent of the previous one. 
- HTTP is a stateless protocol, which means that the server does not need to hang on to information, or state, between requests.
- As a result, when a request breaks en route to the server, no part of the system has to do any cleanup. 
- Both these reasons make HTTP a resilient protocol, as well as a difficult protocol for building stateful applications.
- Since HTTP, the protocol of the internet, is inherently stateless that means web developers have to work hard to simulate a stateful experience in web applications.

<h2> <mark>What is a URL?</mark></h2>

#### Introduction

- When you want to check Facebook's games page, you start by launching your web browser and navigating to http://www.facebook.com/games. 
- The web browser makes an HTTP request to this address resulting in the resource being returned to your browser. 
- The address you entered, https://www.facebook.com/games, is known as a **Uniform Resource Locator** or **URL**.
- A URL is like that address or phone number you need in order to visit or communicate with your friend. 
- A URL is the most frequently used part of the general concept of a **Uniform Resource Identifier** or **URI**, which specifies how resources are located. 

#### URL Components
- `"http://www.example.com:88/home?item=book"`
- We can break this URL into 5 parts:
    - `http`: The **scheme**. It always comes before the colon and two forward slashes and tells the web client how to access the resource. In this case it tells the web client to use the **Hypertext Transfer Protocol** or **HTTP** to make a request. Other popular URL schemes are ftp, mailto or git. You may sometimes see this part of the URL referred to as the "protocol". There is a connection between the scheme and the protocol, as the scheme can indicate which protocol (or system of rules) should be used to access the resource. However, the correct term to use in this context is "scheme".
    - `www.example.com`: The **host**. It tells the client where the resource is hosted or located.
    - `:88` : The **port** or **port number**. It is only required if you want to use a port other than the default.
    - `/home`: The **path** . It shows what local resource is being requested. This part of the URL is optional.
    - `?item=book` : The **query string**, which is made up of **query parameters**. It is used to send data to the server. This part of the URL is also optional.<br><br>
- Sometimes, the path can point to a specific resource on the host. 
- For instance, `www.example.com/home/index.html` points to an HTML file located on the example.com server.
- The default port number for HTTP is port 80
- **Unless a different port number is specified, port 80 will be used by default in normal HTTP requests**. 
- To use anything other than the default, one has to specify it in the URL.

#### Query Strings/Parameters
- `http://www.example.com?search=ruby&results=10`

<img src="query-strings-parameters.png" width=750>

<img src="query_string_components.png" width=750>

- In the above example, name/value pairs in the form of `product=iphone`, `size=32gb` and `color=white` are passed to the server from the URL.
- **Because query strings are passed in through the URL, they are only used in HTTP GET requests.**
- Whenever you type in a URL into the address bar of your browser, you're issuing HTTP GET requests. 

<img src="query_strings.jpeg" width=750>

- Query strings are great to pass in additional information to the server, however, there are some limits to the use of query strings:
    - Query strings have a maximum length. Therefore, if you have a lot of data to pass on, you will not be able to do so with query strings.
    - The name/value pairs used in query strings are visible in the URL. For this reason, passing sensitive information like username or password to the server in this manner is not recommended.
    - Space and special characters like `&` cannot be used with query strings. They must be URL encoded, which we'll talk about next.
    
#### URL Encoding

- URLs are designed to accept only certain characters in the standard 128-character ASCII character set. 
- Reserved or unsafe ASCII characters which are not being used for their intended purpose, as well as characters not in this set, have to be encoded. 
- URL encoding serves the purpose of replacing these non-conforming characters with a `%` symbol followed by two hexadecimal digits that represent the equivalent UTF-8 character.

<img src="url_encoding.png" width=600>

- Characters must be encoded if:
    - They have no corresponding character within the standard ASCII character set. Note that this means all extended ASCII characters as well as 2-, 3-, and 4-byte UTF-8 characters.
    - The use of the character is unsafe since it may be misinterpreted or modified by some systems. For example, `%` is unsafe since it is used to encode other characters. Other unsafe characters include spaces, quotation marks, the `#` character, `<` and `>`, `{` and `}`, `[` and `]`, and `~`, among others.
    - The character is reserved for special use within the URL scheme. Some characters are reserved for a special meaning; their presence in a URL serves a specific purpose. Characters such as `/`, `?`, `:`, `@`, and `&` are all reserved and must be encoded. 
    - For example `&` is reserved for use as a query string delimiter. `:` is also reserved to delimit host/port components and user/password.<br><br>
- So what characters can be used safely within a URL? **Only alphanumeric and special characters `$-_.+!'()"`, and reserved characters when used for their reserved purposes can be used unencoded within a URL.**

<h2> <mark>Making HTTP Requests</mark></h2>

#### HTTP Request with a Browser

- Launch your browser and enter the address https://www.reddit.com to make an HTTP request
- The server that hosts the main Reddit website handles your request and issues a response back to your browser. 
- Your browser is smart enough to process the response that is sent back and display the site you see in the screenshot, with all its colors, images, text and presentation.s

#### HTTP Request with an HTTP Tool

- `curl https://www.reddit.com` will return the raw HTTP response data

#### Using the Inspector

- Every modern browser has a way to view HTTP requests and responses, and it's usually called the **inspector**. 
- With the inspector still open click on the *Network* tab:
- By visiting the URL, your browser is making multiple requests, one for every resource (image, file, etc.). 
- In Network, you'll be able to see the specific request headers, cookies as well as the raw response data:
- What's happening is that the resource we requested, the initial www.reddit.com entry, returned some HTML. And in that HTML body are references to other resources like images, css stylesheets, javascript files and more. Your browser, being smart and helpful, understands that in order to produce a visually appealing presentation, it has to go and grab all these referenced resources. 

#### Request Methods

- Information displayed in the **Method** column is known as the **HTTP Request Method**.
- You can think of this as the verb that tells the server what action to perform on a resource.
- When you think about retrieving information, think `GET`, which is the most used HTTP request method.
- The **Status** column shows the response status for each request.
- **Every request gets a response**, even if the response is an error -- that's still a response.

#### GET Requests

- `GET` requests are initiated by clicking a link or via the address bar of a browser.
- When you type an address like `https://www.reddit.com` into the address bar of your browser, you're making a `GET` request. You're asking the web browser to go retrieve the resource at that address.
- The same goes for interacting with links on web applications. The default behavior of a link is to issue a `GET` request to a URL.
- `$ curl -X GET "https://www.reddit.com/" -m 30 -v`
- `$ curl -X GET "https://itunes.apple.com/search?term=Michael%20Jackson" -m 30 -v`
- **GET requests are used to retrieve a resource, and most links are GETs.**
- **The response from a GET request can be anything, but if it's HTML and that HTML references other resources, your browser will automatically request those referenced resources. A pure HTTP tool will not.**

#### POST Requests

- `POST` is used when you want to initiate some action on the server, or send data to a server. 
- `curl -X POST "https://echo.epa.gov" -m 30 -v` : a POST request to https://echo.epa.gov and the response from the server.<br><br>
- Typically from within a browser, you use `POST` when submitting a form.<br><br>
- `POST` requests allow us to send much larger and sensitive data to the server, such as images or videos.
- Say we need to send our username and password to the server for authentication. We could use a `GET` request and send it through query strings. 
- The flaw with this approach is obvious: our credentials become exposed instantly in the URL; that isn't what we want. Using a `POST` request in a form fixes this problem. 
- `POST` requests also help sidestep the query string size limitation that you have with `GET` requests. With `POST` requests, we can send significantly larger forms of information to the server.<br><br>
- `$ curl -X POST "http://al-blackjack.herokuapp.com/new_player" -d "player_name=Albert" -m 30 -v`
- Notice that in the screenshot and curl command we're supplying the additional parameter of `player_name=albert`. 
- It has the same effect as inputting the name into the first "What's your name?" form and submitting it.
- HTTP **body** - the body contains the data that is being transmitted in an HTTP message and is optional. In other words, an HTTP message can be sent with an empty body. When used, the body can contain HTML, images, audio and so on.
- The `Location` header is an HTTP **response header** 
- Your browser sees the Location header and automatically issues a brand new request to the specified URL, thereby initiating a new, unrelated request.
- Your browser issued the initial `POST` request, got a response with a `Location` header, then issued another request without any action from you, then displayed the response from that second request.

#### HTTP Headers

- HTTP headers allow the client and the server to send additional information during the HTTP request/response cycle. 
- Headers are colon-separated name-value pairs that are sent in plain text.

#### Request Headers

- Request headers give more information about the client and the resource to be fetched. 

<img src="request-headers.png" width=500>

<h2> <mark>Processing Responses</mark></h2>

- Raw data returned by the server is called a response. 

#### Status Code

- The first component we'll look at is the **HTTP Status Code**. 
- The `status code` is a three-digit number that the server sends back after receiving a request signifying the status of the request. 
- The `status text` displayed next to `status code` provides the description of the code. It is listed under the Status column of the Inspector

<img src="response-codes.png" width=500>

- **302 Found**: 
    - What happens when a resource is moved? The most common strategy is to re-route the request from the original URL to a new URL. 
    - The general term for this kind of re-routing is called a `redirect`
    - When your browser sees a response status code of 302, it knows that the resource has been moved, and will automatically follow the new re-routed URL in the `Location` response header. 
    - Say you want to access the account profile at GitHub, you'll have to go to the address https://github.com/settings/profile. 
    - However, in order to have access to the profile page, you must first be signed in. 
    - If you're not already signed in, the browser will send you to a page to do that. 
    - After you enter your credentials, you'll be redirected to the original page you were trying to access.
    - Take note of the `Location` response header. You should see `Location: https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fsettings%2Fprofile`, which contains a **return_to** parameter with a value of the URL where the client should be redirected to after signing in.
- **404 Not Found**
    - The server returns this status code when the requested resource cannot be found. 
    - `curl -X GET "https://www.dropbox.com/awesome_file.jpg" -m 30 -v`
    - Because the resource we want does not exist, the browser shows us nice formatted text while the HTTP tool shows us the raw response with the status code.
- **500 Internal Server Error**
    - A 500 status code says "there's something wrong on the server side". 
- **Response Headers**
    - Response headers offer more information about the resource being sent back. Some common response headers are:
    
<img src="response-headers.png" width=500>

<h2> <mark>Stateful Web Applications</mark></h2>

- HTTP is stateless: the server does not hang on to information between each request/response cycle.
- Each request made to a resource is treated as a brand new entity, and different requests are not aware of each other. 
- This statelessness is what makes HTTP and the internet so distributed and difficult to control, but it's also the same ephemeral attribute that makes it difficult for web developers to build stateful web applications.

#### Sessions

- It's obvious the stateless HTTP protocol is somehow being augmented to maintain a sense of statefulness. 
- One way to accomplish this is by having the server send some form of a unique token to the client. 
- Whenever a client makes a request to that server, the client appends this token as part of the request, allowing the server to identify clients. 
- In web development, we call this unique token that gets passed back and forth the **session identifier**.<br><br>
- This mechanism of passing a `session id` back and forth between the client and server creates a sense of persistent connection between requests.
- Each request, however, is technically stateless and unaware of the previous or the next one.
- This sort of faux statefulness has several consequences. 
     - First, every request must be inspected to see if it contains a session identifier. 
     - Second, if this request does, in fact, contain a session id, the server must check to ensure that this session id is still valid.
     - Third, the server needs to retrieve the session data based on the session id. 
     - And finally, the server needs to recreate the application state (e.g., the HTML for a web request) from the session data and send it back to the client as the response.<br><br>

#### Cookies

- A **cookie** is a piece of data that's sent from the server and stored in the client during a request/response cycle.
- **Cookies** or **HTTP cookies**, are small files stored in the browser and contain the session information.
- Session data is generated and stored on the server-side and the session id is sent to the client in the form of a cookie. 
- When you access any website for the first time, the server sends session information and sets it in your browser cookie on your local computer. 
- The client side cookie is compared with the server-side session data on each request to identify the current session.<br><br>
- navigate to `google.com` and inspect the request headers.  Note it has no reference to cookies. 
- You'll notice the reponse headers has **set-cookie** headers that add cookie data to the response. This cookie data got set on the first visit to the website. 
- make a request to the same address and then look at the request headers.
- You'll see the cookie header set (note that this is the request header, which implies it's being sent by your client to the server). 
- It contains the cookie data sent previously by the **set-cookie** response header. 
- This piece of data will be sent to the server each time you make a request and uniquely identifies you -- or more precisely, it identifies your client, which is your browser.<br><br>
- Click the Application tab and navigate to `https://www.reddit.com`
- Expand the cookies section and click on `www.reddit.com` where you'll see the cookies that came with our initial request under the value column
- After logging in, you should notice a unique session in the second to last row. That session id is saved into a cookie in your browser, and is attached along with every future request that you make to reddit.com
- With the session id now being sent with every request, the server can now uniquely identify this client. 
- When the server receives a request with a session id, the server will look for the associated data based on that id, and in that associated session data is where the server "remembers" the state for that client, or more precisely, for that session id.
- It is important to be aware of the fact that the id sent with a session is unique and expires in a relatively short time. In this context, it means you'll be required to login again after the session expires. If we log out, the session id information is gone<br><br>
- **Summary**
    - The session plays an important role in keeping HTTP stateful. 
    - A session id serves as that unique token used to identify each session. 
    - Usually, the session id is implemented as a random string and comes in the form of a cookie stored on the computer. 
    - With the session id in place on the client side now every time a request is sent to the server, this data is added and used to identify the session. 
    - This is what many web applications with authentication systems do. When a user's username and password match, the session id is stored on their browser so that on the next request they won't have to re-authenticate.

#### AJAX

- AJAX is short for Asynchronous JavaScript and XML. 
- Its main feature is that it **allows browsers to issue requests and process responses without a full page refresh**.
- When AJAX is used, all requests sent from the client are performed asynchronously, which just means that the page doesn't refresh.<br><br>
- Every letter you type is issuing a new request, which means that an AJAX request is triggered with every key-press.
- The responses from these requests are being processed by some **callback**. 
- You can think of a callback as a piece of logic you pass on to some function to be executed after a certain event has happened.

<h2> <mark>Security</mark></h2>

- The same attributes that make HTTP so difficult to control, also make it so difficult to secure.

#### Secure HTTP (HTTPS)

- As the client and server send requests and responses to each other, all information in both requests and responses are being sent as strings. 
- If a malicious hacker was attached to the same network, they could employ **packet sniffing** techniques to read the messages being sent back and forth. 
- Requests can contain the session id, which uniquely identifies you to the server, so if someone else copied this session id, they could craft a request to the server and pose as your client, and thereby automatically being logged in without even having access to your username or password.
- **Secure HTTP**, or **HTTPS**: A resource that's accessed by HTTPS will start with `https://` instead of `http://`, and usually be displayed with a lock icon in most browsers.<br><br>
- With HTTPS every request/response is encrypted before being transported on the network. 
- This means if a malicious hacker sniffed out the HTTP traffic, the information would be encrypted and useless.
- HTTPS sends messages through a cryptographic protocol called TLS for encryption.

#### Same-origin policy

- The same-origin policy permits unrestricted interaction between resources originating from the same origin, but restricts certain interactions between resources originating from different origins. 
- By **origin**, we mean the combination of the **scheme**, **host**, and **port**.
- So `http://mysite.com/doc1` 
    - has the same origin as `http://mysite.com/doc2`
    - a different origin from `https://mysite.com/doc1` (different scheme)
    - `http://mysite.com:4000/doc1` (different port)
    - `http://anothersite.com/doc1` (different host).
-  The same-origin policy is an issue for web developers who have a legitimate need for making these restricted kinds of cross-origin requests.
- **Cross-origin resource sharing**, or **CORS**, was developed to deal with this issue. 
- CORS is a mechanism that allows interactions that would normally be restricted cross-origin to take place. 

#### Session Hijacking

- If an attacker gets a hold of the session id, both the attacker and the user now share the same session and both can access the web application. 
- In session hijacking, the user won't even know an attacker is accessing his or her session without ever even knowing the username or password.
- **Countermeasures for Session Hijacking**
    - Resetting sessions. With authentication systems, this means a successful login must render an old session id invalid and create a new one. 
    - Setting an expiration time on sessions
    - Use HTTPS across the entire app to minimize the chance that an attacker can get to the session id
    
#### Cross-Site Scripting (XSS)

- **Cross-site scripting**, or **XSS**: This type of attack happens when you allow users to input HTML or JavaScript that ends up being displayed by the site directly.
- Because it's just a normal HTML `<textarea>`, users are free to input anything into the form. This means users can add raw HTML and JavaScript into the text area and submit it to the server as well
- If the server side code doesn't do any sanitization of input, the user input will be injected into the page contents, and the browser will **interpret the HTML and JavaScript and execute it**. 
- **Potential solutions for cross-site scripting**
    - making sure to always sanitize user input. This is done by eliminating problematic input, such as `<script>` tags, or by disallowing HTML and JavaScript input altogether.
    - Escape all user input data when displaying it so that the browser does not interpret it as code.<br>
    (To escape a character means to replace an HTML character with a combination of ASCII characters, which tells the client to display that character as is, and to not process it)

## Some Background and Diagrams

### Client-Server

- **HTTP**: the canonical definition is a stateless protocol for how clients communicate with servers.
- A client that issues an HTTP request to the server. The server then processes the request, and sends a response back to the client. 

<img src="server-zoom-web-app-data.png" width=750>

- The 3 primary server-side infrastructure pieces represented in the above zoomed-in diagram
    - A **web server** is typically a server that responds to requests for static assets: files, images, css, javascript, etc. These requests don't require any data processing, so can be handled by a simple web server.
    - An **application server**, on the other hand, is typically where application or business logic resides, and is where more complicated requests are handled. This is where your server-side code lives when deployed.
     - The application server will often consult a persistent **data store**, like a relational database, to retrieve or create data. Data stores can also be simple files, key/value stores, document stores and many other variations, as long as it can save data in some format for later retrieval and processing.
- Regardless of how the persistent data store is implemented, **it can be used to persist our data between stateless request/response cycles**. That is, the data doesn't go away after each cycle, but persists inside the data store.

### HTTP over TCP/IP

<img src="http-zoom-tcpip.png" width=750>

- HTTP is actually relying on a TCP/IP connection (most of the time).
- HTTP, TCP, and IP are all separate protocols operating at different 'layers' of a conceptual model of the network.
- HTTP operates at the application layer and is concerned with structuring the messages that are exchanged between applications.
- TCP/IP that's doing all the heavy lifting and ensuring that the request/response cycle gets completed between your browser and the server.

## URL

- URLs are "the most frequently used part of the general concept of a Uniform Resource Identifier or URI.
- **URI** is "sequence of characters that identifies an abstract or physical resource"
- **URL** refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").
- We can take from this that a URL is a subset of URI that includes the network location of the resource. 

### Schemes and Protocols

- The component that prepends the colon and two forward slashes at the start of a URL are the **scheme**.
- The scheme identifies which protocol should be used to access the resource.  However, referring to the scheme as the protocol is **incorrect**.
-  It should be noted that 'protocol' in this sense refers to a 'family' of protocols, rather than a specific protocol version, e.g. `HTTP` rather than `HTTP 1.0` or `HTTP 1.1`.
- In the more general context of a URI, a **scheme** name is defined as "a specification for assigning identifiers within that scheme"
- The convention is to refer to scheme names in lowercase, e.g. `http`, and protocol names in uppercase, e.g. HTTP.

### URLs and Filepaths

- The way the path portion of the URL is used is determined by the application logic, and doesn't necessarily bear any relationship to an underlying file structure on the server.
- The way that the path is used often involves URL pattern-matching to match the path to a pre-defined 'route' which then executes some specific logic. 

## The Request Response Cycle

- The client will be a web browser and the server is going to be whatever software is running on the server machine (NGiNX, JavaScript, Ruby, Python ...)<br><br>

**Example** 

- We have a web server that is located at `todos.com` and it runs Ruby
- We connect to that server using a Web Browser and get a list of all of our tasks on the todo system.
- **Step 1**: Create the request  -> `http://todos.com/tasks?due-today`
    - The browser creates a request and sends that request across the network to the server.
    - Two required pieces of data in a HTTP request: 
        - The method (GET, POST ...)
        - The path (`/tasks`)
        - Parameters (optional)
        - Headers (optional)
        - Body (optional)
    - **The domain name is just used to determine what server to send the request to, but it's not a part of the request itself.**
    - Once a connection has been created between the client and the server, it is not used again.  
    - `GET` is used to fetch or retrieve data from the server.
    - `POST` is used to push data back to the server (sending data from client to server)
- **Step 2**: After the server receives the request, it will perform some type of work (ex. verify user session, load tasks from database, render HTML)
- **Step 3**: Once the server has performed the work, it sends a response to the client.
    - Status: A numeric code and a short string of text (ex. 200 OK).  Used to signify if the request was successful or not.
    - Headers: A collection of metadata about the contents of the response.  (ex. Content-Type: text/html). 
        - This value tells the browser that once it receives the response it can then be displayed in that format.
    - Body: the bulk of the actual data being sent.  
        - In the case of a web page, the body will contain all the HTML code that the browser will use to display the result to the user.
- **Step 4**: When the client receives by the response, it takes a look at the content type header and it will act accordingly (ex. HTML -> display web page).

### Practice Problems

**1. What are the required components of an HTTP request? What are the additional optional components?**

- The method and path are required. and form part of what is known as the **start-line** or **request-line**.
- As of HTTP 1.0, the HTTP version also forms part of the request-line. The Host header is a required component since HTTP 1.1. 
- The parameters, all other headers and body are optional.

**2. What are the required components of an HTTP response? What are the additional optional components?**

- A status line with a status code is required. 
- Headers and body are optional.

**3. What determines whether a request should use GET or POST as its HTTP method?**

- **GET requests should only retrieve content from the server**. They can generally be thought of as "read only" operations, however, there are some subtle exceptions to this rule. For example, consider a webpage that tracks how many times it is viewed. GET is still appropriate since the main content of the page doesn't change.

- **POST requests involve changing values that are stored on the server**. Most HTML forms that submit their values to the server will use POST. Search forms are a noticeable exception to this rule: they often use GET since they are not changing any data on the server, only viewing it.