# Preface
---
| Sigla | Def |
|:----:|:-----|
|_WWW_  |World Wide Web|
|_IETF_ |Internet Engineering Task Force|
|_HTTP_ |Hypertext Transfer Protocol|
|_TCP_  |Transmission Control Protocol|
|_FTP_  |File Transfer Protocol|
|_SMTP_ |Simple Mail Transfer Protocol|
|_RTSP_ |Real Time Streaming Protocol|
|_IP_   |Internet Protocol|
|_DMS_  |Domain Name Service|
|_SSL_  |Secure Sockets Layer|
|_MINE_ |Multipurpose Internet Mail Extensions (type tags)| 
|_URI_  |Uniform Resource Identifier|
|_URL_  |Uniform Resource Locators|
|_URN_  |Uniform Resource Names|
|_PURL_ |Persistent Uniform Resource Locator|

---

## Linux man pages
# BAsh Commands
* nc localhost 8080
* nc -lvvnp 8888
	> -l: Listen for incoming connections.
	>
	> -vv: Operate in highly verbose mode (because -v is repeated twice).
	>
	> -n: Use only numeric addresses (skip DNS lookups).
	>
	> -p: Bind to the specified local port.
* curl `http://localhost:8888`
* curl `http://localhost:8080 -vv --output -`
* netstat -tulpn

# C Functions
* socket
* setsockopt
	- htons
* getaddrinfo
* bind
* listen
* accept
* read
* send

# (_HTTP_) Hypertext Transfer Protocol
---
The Hypertext Transfer Protocol (_HTTP_) is the protocol programs use to communicate over the World Wide Web (WWW).
* Is an application layer protocol
* Two-way conversation between web browsersand web servers.
* Reliable data-transmission protocols

**HTTP Request**

_Client:_ "Get me the document called _/index.html_"

**HTTP Resposne**

_Server:_ "Okay, here it is, it’s in HTML format and is 3,150 characters long"

## _HTTP_ Network Protocol Stack

|   |   |
|:---:|:---:|
|_HTTP_|Application layer|
|_TCP_ |Transport layer  |
|_IP_  |Network layer    |
|_Network-specific link interface_|Data link layer|
|_Physical network hardware_|Physical layer|
|   |   |

# Web 
---
## Web Servers
* Web servers speak the HTTP protoco
* Store the Internet’s data and provide it when requested by HTTP clients
* Send a response result to client (_HTTP_ transaction)

## Web Resources (Content Source)
* Static File
* Software Programs (dynamic content)
* Web Gateway

## Web Client (e.g., Web Browser)
* Send request command to server (_HTTP_ transaction)
* Looks at the associated MIME type to see if it knows how to handle the objec

## (_MINE_) Multipurpose Internet Mail Extensions (examples in Appendix D)
* HTML text document: text/html
* Plain ASCII text document: text/plain
* JPEG image: image/jpeg
* GIF image: image/gif
* Apple QuickTime movie: video/quicktime
* Microsoft PowerPoint: application/vnd.ms-powerpoint


## (_URI_) Uniform Resource Identifier
* Is the server resource name (like postal addresses)
* Come in two flavors, called _URLs_ and _URNs_

### (_URLs_) Uniform Resource Locators (```http://www.joes-hardware.com/specials/saw-blade.gif```)
* Describe the specific location of a resource on a particular server
* Are permitted to contain only characters from a relatively small, universally safe alphabet
* First part of the URL is called the _scheme_, and it describes the protocol used to access the resource. This is usually the _HTTP_ protocol (```http://```)
* Second part gives the server Internet address (e.g., ```www.joes-hardware.com```)
* Third part is the resource path on the web server (e.g., ```/specials/saw-blade.gif```)
* Today, almost every _URI_ is a _URL_

### (_URNs_) Uniform Resource Name
* Serves as a unique name for a particular piece of content, independent of where the resource currently resides
* Allow resources to move from place to place
* Allow resources to be accessed by multiple network access protocols while maintaining the same name
* **Obs.:** Are experimental and not yet widely adopted, to work effectively they need a supporting infrastructure to resolve resource locations

# HTTP’s messages
---
## Transactions
* Consists of a request command (sent from client to server), and a response result (sent from the server back to the client)

## _HTTP_ Methods
* Tells the server what action to perform
* Every request message has a method

|**_HTTP_ method**| Description|
|:---------------:|:-----------|
|**GET**          |Send named resource from the server to the client|
|**PUT**          |Store data from client into a named server resource|
|**DELETE**       |Delete the named resource from a server|
|**POST**         |Send client data into a server gateway application|
|**HEAD**         |Send just the HTTP headers from the response for the named resource|

## Status Codes
* Every response message comes back with a status code
* Is a three-digit numeric code that tells the client if the request succeeded, or if other actions are required
* HTTP also sends an explanatory textual “reason phrase” with each numeric status code. The textual phrase is included only for
descriptive purposes; the numeric code is used for all processing

|**_HTTP_ status code**|**Description**|
|:--------------------:|:--------------|
|**200**|**OK** Document returned correctly|
|**302**|**Redirect** Go someplace else to get the resource|
|**404**|**Not Found** Can’t find this resource|

## Web Pages
* An application often issues multiple HTTP transactions
* Web Browser issues a cascade of _HTTP_ transactionss to fetch and display a graphics-rich web page

## Messages
* Plain text, simple, line-oriented sequences of characters

**Request Message**
>**_Start line_**
>>```http
>>GET /test/hi-there.txt HTTP/1.0
>>```
>
>**_Headers_**
>>```http
>>Accept: text/*
>>Accept-Language: en,fr
>><!-- blank line -->
>>```

**Response message**
>**_Start line_**
>>```http
>>HTTP/1.0 200 OK
>>```
>
>**_Headers_**
>>```http
>>Content-type: text/plain
>>Content-length: 19
>><!-- blank line -->
>>```
>
>**_Body_**
>>```http
>>>Hi! I’m a message!
>>```

**_Start line_**

Indicate what to do for a request or what happened for a response

**_Header fields_**

Zero or more header fields follow the start line. Each header field consists of a
name and a value, separated by a colon (:) for easy parsing. The headers end
with a blank line

**_Body_**

After the blank line is an optional message body containing any kind of data.
Request bodies carry data to the web server; response bodies carry data back to
the client. Unlike the start lines and headers, which are textual and structured,
the body can contain arbitrary binary data.

# Connections
---
## (_TCP/IP_) Transmission Control Protocol / Internet Protocol
* Error-free data transportation
* In-order delivery (data will always arrive in the order in which it was sent)
* Unsegmented data stream (can dribble out data in any size at any time)

## _IP_ Addresses
* The hostname (_URL_) is just a human-friendly alias for an _IP_ address
* Hostnames can easily be converted into _IP_ addresses through a facility called the Domain Name Service (_DNS_)
* The final _URL_ has no port number. When the port number is missing from an _HTTP URL_, you can assume the default value of port 80

**Basic browser connection process**
1. The browser extracts the server’s hostname from the _URL_
2. The browser converts the server’s hostname into the server’s _IP_ address
3. The browser extracts the port number (if any) from the _URL_
4. The browser establishes a _TCP_ connection with the web server
5. The browser sends an _HTTP_ request message to the server
6. The server sends an _HTTP_ response back to the browser
7. The connection is closed, and the browser displays the document

# Architectural Components of the Web

* **Proxies:** HTTP intermediaries that sit between clients and servers
* **Caches:** HTTP storehouses that keep copies of popular web pages close to clients
* **Gateways:** Special web servers that connect to other applications
* **Tunnels:** Special proxies that blindly forward HTTP communications
* **Agents:** Semi-intelligent web clients that make automated HTTP requests



# URLs and Resources
---
## URL Syntax
```html
<scheme>://<user>:<password>@<host>:<port>/<path>;<params>?<query>#<frag>

# examples:
http://www.joes-hardware.com:80/index.html
<scheme> http
<host>   www.joes-hardware.com
<port>   80
<path>   /index.html

http://www.joes-hardware.com/hammers;sale=false/index.html;graphics=true
<path>   /hammers
<param>  sale=false
<path>   /index.html
<param>  graphics=true

http://www.joes-hardware.com/inventory-check.cgi?item=12731&color=blue&size=large
<query>  item=12731&color=blue&size=large

http://www.joes-hardware.com/tools.html#drills
<frag>   #drills
```

|**Component**|**Description**|**Default value**|
|:------------|:--------------|:----------------|
|scheme       |Which protocol to use when accessing a server to get a resource (case-insensitive).|None|
|user         |The username some schemes require to access a resource.|anonymous|
|password     |The password that may be included after the username, separated by a colon (:).|`<Email_address>`|
|host         |The hostname or dotted _IP_ address of the server hosting the resource.|None|
|port         |The port number on which the server hosting the resource is listening. Many schemes have default port numbers (the default port number for _HTTP_ is 80).|Scheme-specific|
|path         |The local name for the resource on the server, separated from the previous _URL_ components by a slash (/). The syntax of the path component is server- and scheme-specific. (We will see later in this chapter that a _URL’s_ path can be divided into segments, and each segment can have its own components specific to that segment.).|None|
|params       |Used by some schemes to specify input parameters. Params are name/value pairs. A _URL_ can contain multiple params fields, separated from themselves and the rest of the path by semicolons (;).|None|
|query        |Used by some schemes to pass parameters to active applications (such as databases, bulletin boards, search engines, and other Internet gateways). There is no common format for the contents of the query component. It is separated from the rest of the _URL_ by the “?” character.|None|
|frag         |A name for a piece or part of the resource. The frag field is not passed to the server when referencing the object; it is used internally by the client. It is separated from the rest of the _URL_ by the “#” character.|None|

## URL Shortcuts

### Relative _URLs_
* Two flavors: _absolute_ and _relative_
* With absolute you have all the information you need to access a resource
* With realtive the _URLs_ are incomplte. To get all the information needed to access a resource from a relative _URL_, you must interpret it relative to another _URL_, called its _base_

**Example of realtive _URLs_**

_HTML_ document for the resource:
```
http://www.joes-hardware.com/tools.html
```

```html
<HTML>
<HEAD><TITLE>Joe's Tools</TITLE></HEAD>
<BODY>
<H1> Tools Page </H1>
<H2> Hammers <H2>
<P> Joe's Hardware Online has the largest selection of <A HREF="./hammers.html">hammers
</A> on earth.
</BODY>
</HTML>
```

Using the base _URL_:
```
http://www.joes-hardware.com/hammers.html
```

### Base _URLs_
* The base _URL_ serves as a point of reference for the relative _URL_
    * Explicitly provided in the resource (`<BASE>` _HTML_ tag)
    * Base _URL_ of the encapsulating resource
    * No base _URL_ often means that you have an absolute _URL_; however, sometimes you may just have an incomplete or broken _URL_

### Resolving relative references (parsing or _decomposing_ the URL)

![image.png](attachment:64fa713a-d964-4fe2-91ca-fae72d985205.png)

## Expandomatic URLs

Some browsers try to expand URLs automatically, either after you submit the URL
or while you’re typing.
1. Hostname expansion
2. History expansion
   

## Shady Characters

### Encoding Mechanisms

### Character Restrictions

## A Sea of Schemes

|Scheme|Description|
|:----:|:----------|
|_http_|The Hypertext Transfer Protocol scheme conforms to the general URL format, except that there is no username or password. The port defaults to 80 if omitted.<br>Basic form:<br> `http://<host>:<port>/<path>?<query>#<frag>`<br> Examples:<br> `http://www.joes-hardware.com/index.html`<br> `http://www.joes-hardware.com:80/index.html`|
|_https_|The _https_ scheme is a twin to the _http_ scheme. The only difference is that the _https_ scheme uses Netscape’s Secure Sockets Layer (_SSL_), which provides end-to-end encryption of _HTTP_ connections. Its syntax is identical to that of _HTTP_, with a default port of 443.<br> Basic form:<br> `https://<host>:<port>/<path>?<query>#<frag>`<br> Example:<br> `https://www.joes-hardware.com/secure.html`|
|_mailto_|Mailto _URLs_ refer to email addresses. Because email behaves differently from other schemes (it does not refer to objects that can be accessed directly), the format of a mailto _URL_ differs from that of the standard _URL_. The syntax for Internet email addresses is documented in Internet **RFC 822**.<br> Basic form:<br> `mailto:<RFC-822-addr-spec>`<br> Example:<br> `mailto:joe@joes-hardware.com`

ftp File Transfer Protocol URLs can be used to download and upload files on an FTP server and to obtain listings of
the contents of a directory structure on an FTP server.
FTP has been around since before the advent of the Web and URLs. Web applications have assimilated FTP as a
data-access scheme. The URL syntax follows the general form.
Basic form:
ftp://<user>:<password>@<host>:<port>/<path>;<params>
Example:
ftp://anonymous:joe%40joes-hardware.com@prep.ai.mit.edu:21/pub/gnu/

rtsp, rtspu RTSP URLs are identifiers for audio and video media resources that can be retrieved through the Real Time
Streaming Protocol.
The “u” in the rtspu scheme denotes that the UDP protocol is used to retrieve the resource.
Basic forms:
rtsp://<user>:<password>@<host>:<port>/<path>
rtspu://<user>:<password>@<host>:<port>/<path>
Example:
rtsp://www.joes-hardware.com:554/interview/cto_video

file The file scheme denotes files directly accessible on a given host machine (by local disk, a network filesystem, or
some other file-sharing system). The fields follow the general form. If the host is omitted, it defaults to the local
host from which the URL is being used.
Basic form:
file://<host>/<path>
Example:
file://OFFICE-FS/policies/casual-fridays.doc

news The news scheme is used to access specific articles or newsgroups, as defined by RFC 1036. It has the unusual
property that a news URL in itself does not contain sufficient information to locate the resource.
The news URL is missing information about where to acquire the resource—no hostname or machine name is
supplied. It is the interpreting application’s job to acquire this information from the user. For example, in your
Netscape browser, under the Options menu, you can specify your NNTP (news) server. This tells your browser
what server to use when it has a news URL.
News resources can be accessed from multiple servers. They are said to be location-independent, as they are not
dependent on any one source for access.
The “@” character is reserved within a news URL and is used to distinguish between news URLs that refer to
newsgroups and news URLs that refer to specific news articles.
Basic forms:
news:<newsgroup>
news:<news-article-id>
Example:
news:rec.arts.startrek


# HTTP Messages
---
## 

