# Networks and sockets

This is a small intro to network and sockets, if we want to learn more, we can take Dr.Chuck course "Introduction to networking". 

## Transport control protocol (TCP)

* Built on top of IP (Internet Protocol).
* Assumes IP might lose some data - stores and retransmits data if it seems to be lost.
* Handles "flow control" using a transmit window.
* Provides a nice reliable pipe.

### TCP connections / Sockets

"An internet **socket** or network socket is an endpoint of a bidirectional inter-process communication flow across an Internet Protocol-based computer network, such as the *Internet*." 

### TCP port numbers

* A port is an application-specific or process-specific software communications endpoint. 
* It allows multiple networked applications to coexist on the same server.
* There is a list of well-known TCP port numbers.

## Sockets in Python

Python has a built-in support for TCP sockets, because Python is great. 

We will usually create a socket and comunicate through it doing the following:

```
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
```

Here the second line basically says "create a socket". The parameter `socket.AF_INET` says "I'm going to make an internet socket" and the `socket.SOCK_STREAM` one says "I'm going to do a stream socket" (A stream socket means that I'm just going to send data, and I'm going to get back the data and just keep track of the data, give it back to me in order). We won't probably change these parameters for the purposes of this course. The third line says "please stablish a connection between me and find the host `www.py4info.com` and go to port 80 on that". Then if there is a web server connected, then we can actually send data back and forth. 

# From sockets to applications. 

To get a visual idea of the stack connection, go to this [link](https://en.wikipedia.org/wiki/Internet_protocol_suite) and look at the image in 
the section called *abstraction layers.*

## Application Protocol

* TCP (and Python) gives us a reliable **socket**, but what do we want to do with this?:

    Well we are not just interested in creating sockets, we want to do something more interesting like *web browsing* to a web server.

* Application Protocols

    - Mail
    - World Wide Web
    
### HTTP - Hypertext Transport Protocol

*Definition*: A **Protocol** is a set of rules that all parties follow for so we can predict each other's behaviour and basically not bump into each other.

*Definition:* The **H**yper**Text** **T**ransport **P**rotocol is a set of rules to allow browsers to retrieve web documents from servers over the internet.

* It's the dominant application layer protocol on the internet.
* Invented for the Web to retrieve HTML, images, documents, etc.
* Extended to be data in addition to documents - Web services, etc.

**Basic concept:** 

Make the connection - Request a document - Retrieve the document - Close the conection. 


## The uniform resource locator (url)

If we take a look to a typical **url** we can identify three components:

<span style="color:green"> http:// </span> <span style="color:blue"> www.dr-chuck.com </span> <span style="color:red"> /page1.htm </span> 

<span style="color:green"> protcol </span> <span style="color:blue"> host </span> <span style="color:red"> document </span> 

So we are goint to connect to the <span style="color:green"> host </span> and use the <span style="color:blue"> protocol </span> to request the <span style="color:red"> document </span>.

## Request - response cycle

### Getting data from the server

* Each time the user clicks on an anchor tag with an **href=** value to switch to a new page, the broweser makes a connection to the web server and issues a **"GET"** request - to **GET** the content of the page at the specified **URL**.

* The server returns the **HTML** document to the browser, which formats and displays the documents to the user.


## How to do an HTTP request in Python

In [2]:
import socket 

#open a socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#connect the socket
mysock.connect(('www.py4inf.com', 80))
# Send a request for the data I want
mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')

#Receive the data and do something with it. 
while True:
    data = mysock.recv(512) # 512 is the max amount of characters at a time 
    if (len(data) < 1):
        break
    print(data)

mysock.close()

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 167
Connection: close
Date: Wed, 12 Oct 2016 16:21:37 GMT
Server: Apache
Last-Modified: Fri, 04 Dec 2015 19:05:04 GMT
ETag: "a7-526172f5b5d89"
Accept-Ranges: bytes
Cache-Control: max-age=604800, public
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: origin, x-requested-with, content-type
Access-Control-Allow-Methods: GET

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and
 kill the envious moon
Who is already sick and pale with grief



The code above will give as back an HTTP header a blank line and then the document we requested. Even though the previous code is easy, we can make things even easier and shorter by using the **urllib** python library. 

Since HTTP is so common, we have a library that does all the socket work for us and makes web pages look like a file. 

The code above using **urllib** looks like this:

In [3]:
import urllib 

fhand = urllib.urlopen('http://www.py4inf.com/code/romeo.txt')

for line in fhand:
    print(line.strip())

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief


Yes! only 4 lines, and you can notice that it doesn't include the HTTP header, that is because python assumes you might not need it. If you need the header, there are ways to tell `urllib` you want it. 

The great thing about **urllib** is that it makes look web pages as files, so then we can do all the things we used to do with files in the previous courses. For example:


In [4]:
import urllib 

fhand = urllib.urlopen('http://www.py4inf.com/code/romeo.txt')

counts = dict()
for line in fhand:
    words = line.split()
    for word in words:
        counts[word] = counts.get(word,0) + 1

print(counts)

{'and': 3, 'envious': 1, 'already': 1, 'fair': 1, 'is': 3, 'through': 1, 'pale': 1, 'yonder': 1, 'what': 1, 'sun': 2, 'Who': 1, 'But': 1, 'moon': 1, 'window': 1, 'sick': 1, 'east': 1, 'breaks': 1, 'grief': 1, 'with': 1, 'light': 1, 'It': 1, 'Arise': 1, 'kill': 1, 'the': 3, 'soft': 1, 'Juliet': 1}


In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()