## Sockets

Since TCP (and Python) gives us a reliable socket, what do we want to do with socket?

- Application Protocols
    - Mail
    - World wide web

#### What is a Protocol and Why should we use it?

- A set of rules that all parties follow so we can predict each other's behaviour.

- And not bump into each other
    - On two-way roads, drive on left-hand side of the road.
    - On two-way roads in different countries, drive on the right-hand side of the country.
    
 - It helps us maintain a regime for standard way to transfer data.
 
    
### HyperText Transfer Protocol

The HypetText Transfer Protocol is the set of rules to allow browsers to retrieve web documents from servers over the internet of rules


- The dominant application layer protocol on the Internet.

- Invented for the Web - to Retrieve HTML, Images, Documents etc

- Extended to be data in addition to documents- RSS, Web Services, etc. Basic concept
        - Make Connection
        - Request a document
        - Retrieve the Document
        - Close the connection
        

Generally, internet browser has to be the one dealing with all this. If you are building one, ensure that you comply with HTTP protocols in order to connect to the internet.



### URLs

One of the things that HTTP standardize, was this protocol of Uniform Resource Locators or URLs. 

![](./images/URL.png)

#### Getting Data From the Server

Each user clicks on an anchor tag with an ``href=`` value to switch to a new page, the browser makes a connection to the web server and issues a "GET" request - to GET the content of the page at the specified URL


The server returns the HTML document to the Browser which formats and displays the documents to the user.

#### Request Response Cycle

![](./images/request-response.png)

### REQUEST

- From the HTTP RFC

![](./images/Request.png)



#### Making a HTTP Request

```
<GET/POST..>   <URL>  <PROTOCOL>
```


![](./images/making_a_http_request.png)


#### Using Telnet we can do the same thing.

##### Hacking Networks

Make connections, send stuff on those connections 

Reference: [Matrix movie](http://nmap.org/movies.html)

### Following: HTTP Request in Python

In [2]:
import socket

# mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# mysock.connect(('127.0.0.1', 8888))

### The above code creates a socket that the following diagram illustrates.

```python
# By importing socket library, we are creating a porthole
import socket

# It's like a doorway out of your computer, 
# but the doorways not open and the doorways not connected to it yet. 
```
- Socket is a object (i.e. instance of a class). 
- The cloud is the website (In the image 


```python
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('127.0.0.1', 8888))
# socket object is now connected, and now it can send and recieve.
```

![](./images/socket.png)

In [14]:
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    print(data.decode(),end='')

mysock.close()

HTTP/1.1 200 OK
Date: Sat, 04 Jul 2020 12:18:03 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "1d3-54f6609240717"
Accept-Ranges: bytes
Content-Length: 467
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

Why should you learn to write programs?

Writing programs (or programming) is a very creative 
and rewarding activity.  You can write programs for 
many reasons, ranging from making your living to solving
a difficult data analysis problem to having fun to helping
someone else solve a problem.  This book assumes that 
everyone needs to know how to program, and that once 
you know how to program you will figure out what you want 
to do with your newfound skills.  


```python
.encode()
```
We encode the request because there are strings inside of Python that are in unicode, and
we have to send them out is what's called UTF-8. And encode converts from unicode internally to UTF-8. 

### Unicodes

Since computers only understand numbers, we have to find a way to represent each of these letters/symbols/numbers to the standard number that it understand. 

One of the most early implementation of "encoding" was ASCII.


### Representing Simple Strings

- Each character is represented by a number netween 0 and 256 stored in 8 bits of memory.

- We refer to "8 bits of memory as a "byte" of memory - (i.e. my dis drive contains 3 Terabytes of memory)

- The `ord()` function tells us the numberic value of a simple ASCII character.

In [12]:
print(str(ord('H'))+' , new line:'+str(ord('\n')))

72 , new line:10


![](./images/kinds-of-string.png)

### Multi-Byte Characters

Byte : 8 bits of character!


To represent the wide range of characters computers must handle we represent characyers with more than one byte.

- UTF-16 - Fixed length - Takes two bytes represent a character.
- UTF-32 - Fixed length - Takes Four bytes to represent each character.
- UTF-8 - Variable length with a "seperator"
    - Upwards compatible with ASCII
    - Automatic detection between ASCII and UTF-8

UTF-8 is recommended practice for encoding data to be exchanged between systems.

### Why?

Suppose you have a 16GB pendrive by the way, a 16 gigabyte USB stick, that means there are 16 billion bytes of memory on there. So we can input 16 billion ASCII characters.

If UTF-32 is used to encode the characters, we can only use 4 billion characters.

In case of UTF-16, 8 billion characters only.

But in case of UTF-8, which encodes character depending on the type of character, we can encode somewhere between 4-16 billion characters depending on the characters that we are storing.

### Byte string vs Unicode in Python

Following pictures shouws how a byte string is represented in Python2 and 3

![](./images/byte.png)


However, inside Python everything is Unicode.

#### Byte String

Byte String  is raw, unencoded, that might be UTF-8, might be UTF-16, it might be ASCII.
We don't know what it is, we don't know what its encoding is. 


In Python3, all string internall are UNICODE.

Working with string variables in Python programs and reading data from files usually "just works"

When we talk to a network resource using sockets or talk to a database we have to encode and decode data (usually to UTF-8)


### Decoding

We have to realise what kind of stuff are we pulling in, and most of the websites use utf-8.


### Python Strings to Bytes

- When we talk to an external resource like a network socket we send bytes, so we need to encode Python3 strings into a given character encoding.

- When we read data from an external resource, we must decode it based on the charaacter se tos it properly represented in Python3 as a string.


```python
while True:
    data = mysock.recv(512) ## Here type(data) is bytes
    if (len(data) <1):
        break
    mystring = data.decode()
    print(mystring)

```

assumes UTF-8 by default, but we can mention it in the argument.
```
.decode()  # -> Goes from bytes (data) to unicode (mystring)
```

Similarly, ``` encode()```  takes the string and makes it into bytes, and assumes to encode in UTF-8 by default.


![](./images/encode-decode.png)


#### Sending and Recieving data


![](./images/send-recieve.png)



We assume that data we recieve from the Network encodes it in UTF-8 (if we are not passing arguments through our encode and decode functions)
