## Fetching an object from web server using socket
### Non-persistent HTTP case
HTTP client는 request message에 ```Connection: close``` header를 삽입하여 HTTP server(web server의 일부분)에게 response message 후에 connection을 close해 줄 것을 요구하는 GET request message를 만든다. 

In [8]:
from urllib.request import urlparse
import socket

template = "GET {path} HTTP/1.1\r\nHost: {host}\r\nConnection: close\r\n\r\n"
url = "http://mclab.hufs.ac.kr/test/index.html"
r = urlparse(url)
host, port = r.hostname, r.port if r.port else 80
path = r.path + '?' + r.query if r.query else r.path
request = template.format(path=path, host=host)
print(request.encode())

b'GET /test/index.html HTTP/1.1\r\nHost: mclab.hufs.ac.kr\r\nConnection: close\r\n\r\n'


Request message를 보낸다.

Server가 회신하는 response message를 반복하여 ```recv```한다. 
받을 message의 끝은 server가 connection을 close한 mark, 즉, empty string(```b''```)을 recv하는 것으로 확인할 수 있다.

In [9]:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
sock.send(request.encode('utf-8'))

chunks = []
while True:
    chunk = sock.recv(4096)
    if chunk == b'': break       # server closes
    chunks.append(chunk)
sock.close()

response = b''.join(chunks)
print(response)

b'HTTP/1.1 200 OK\r\nDate: Fri, 10 May 2019 05:23:49 GMT\r\nServer: Apache/2.2.22 (Ubuntu)\r\nLast-Modified: Tue, 16 Oct 2018 06:13:58 GMT\r\nETag: "1e982f-51e-578527589c88c"\r\nAccept-Ranges: bytes\r\nContent-Length: 1310\r\nVary: Accept-Encoding\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<html>\n<head>\n<title>Test Page</title>\n<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n</head>\n\n<body>\n<h1>Information and Communications Engineering</h1>\n<p><img src="http://ice.hufs.ac.kr/hufs-image01.jpg" border="0"></p>\n<p>Welcome to Dept. of Information and Communications Engineering</p>\n<p></p>\n<p>\xed\x95\x9c\xea\xb5\xad\xec\x99\xb8\xea\xb5\xad\xec\x96\xb4\xeb\x8c\x80\xed\x95\x99\xea\xb5\x90 \xec\xa0\x95\xeb\xb3\xb4\xed\x86\xb5\xec\x8b\xa0\xea\xb3\xb5\xed\x95\x99\xea\xb3\xbc</p>\n\n<h2>Blue Sky</h2>\n<h3>SKY 2</h3>\n<p><img src="s3test2.gif" border="0"></p>\n\n<h3>SKY 3</h3>\n<p><img src="s3test3.jpg" border="0"></p>\n<h3>SKY 4</h3>\n<p><img src="s3te

> Internet에서 줄바꿈 표준은 `b'\r\n'`이다. 하지만, 많은 web server들은 `b'\n'`만 있어도 OK.
>
> response message에는 
> - status line: `HTTP/1.1 200 OK` (200은 request가 성공적으로 처리됐다는 뜻)
> - header lines
> `b'\r\n\r\n'` - blank line (한줄 띄기)
> 이어서 실제 download된 web contents가 있다.

Contents를 가져오려면, `b'\r\n\r\n'` 다음 byte부터 끝까지 분리하면 된다.

In [10]:
eoh = response.find(b'\r\n\r\n')
headers, contents = response[:eoh], response[eoh+4:]
print(contents.decode())

<html>
<head>
<title>Test Page</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>

<body>
<h1>Information and Communications Engineering</h1>
<p><img src="http://ice.hufs.ac.kr/hufs-image01.jpg" border="0"></p>
<p>Welcome to Dept. of Information and Communications Engineering</p>
<p></p>
<p>한국외국어대학교 정보통신공학과</p>

<h2>Blue Sky</h2>
<h3>SKY 2</h3>
<p><img src="s3test2.gif" border="0"></p>

<h3>SKY 3</h3>
<p><img src="s3test3.jpg" border="0"></p>
<h3>SKY 4</h3>
<p><img src="s3test4.jpg"  height="100" width="100" border="0"></p>
<h3>SKY 5</h3>
<p><img src="s3test5.jpg" height="100" width="100"></p>

<h3>TCP/IP Protocol Suits</h3>
<p><img src="tcp_ip.png" height="300" width="500"></p>
<h3>HTTP Protocol</h3>
<p>HTTP Request/Response Messages</p>
<p><img src="HTTP_RequestResponseMessages.png"></p>
<h3>Web Server Architecture</h3>
<p>Single Threaded Web Server</p>
<p><img src="single_threaded_web_server.gif" height="400" width="500"></p>
<p>Thread Pool Web Serve

### Rewrite above code just like file stream
Response message에서 contents를 분리하거나, header들을 뽑아 내려면, ```readline()```과 같은 file stream 처럼 쓸 수 있으면 편할 것이다. ```makefile()``` method는 socket object을 file object 처럼 변환해 준다. 내부적으로 buffering하고 file 처럼 method를 사용할 수 있게 된다.

In [11]:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
infile = sock.makefile('rb')    # convert incoming socket to file-like object

sock.sendall(request.encode())
print(request.encode())

# status line
status = infile.readline().decode().split()[1]
print(status)

# extract headers, respresening as a dict
def parse_headers(file):
    headers = {}
    for line in file:
        if line == b'\r\n':    # end of headers
            break
        header = line.decode().strip()    # remove leading and trailing white spaces
        key, value = header.split(':', maxsplit=1)
        headers[key] = value.strip()
    return headers

headers = parse_headers(infile)
print(headers)

# Now, we are on the contents
# Read until server close.
# contents = infile.read()

# Or, read 'Content-Length' bytes.
contents = infile.read(int(headers['Content-Length']))
print(contents.decode())
sock.close()

b'GET /test/index.html HTTP/1.1\r\nHost: mclab.hufs.ac.kr\r\nConnection: close\r\n\r\n'
200
{'Date': 'Fri, 10 May 2019 05:23:49 GMT', 'Server': 'Apache/2.2.22 (Ubuntu)', 'Last-Modified': 'Tue, 16 Oct 2018 06:13:58 GMT', 'ETag': '"1e982f-51e-578527589c88c"', 'Accept-Ranges': 'bytes', 'Content-Length': '1310', 'Vary': 'Accept-Encoding', 'Connection': 'close', 'Content-Type': 'text/html'}
<html>
<head>
<title>Test Page</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>

<body>
<h1>Information and Communications Engineering</h1>
<p><img src="http://ice.hufs.ac.kr/hufs-image01.jpg" border="0"></p>
<p>Welcome to Dept. of Information and Communications Engineering</p>
<p></p>
<p>한국외국어대학교 정보통신공학과</p>

<h2>Blue Sky</h2>
<h3>SKY 2</h3>
<p><img src="s3test2.gif" border="0"></p>

<h3>SKY 3</h3>
<p><img src="s3test3.jpg" border="0"></p>
<h3>SKY 4</h3>
<p><img src="s3test4.jpg"  height="100" width="100" border="0"></p>
<h3>SKY 5</h3>
<p><img src="s3test5.jpg" height=