# Fisrt Socket Programming

## Bytes and Bytesarray Objects
`bytes`: immutable sequences of single bytes

`bytearray`: mutable counterpart to `bytes` objects

In [1]:
a = 'hello, world'
b = bytes(a, encoding='utf-8')
print(b)
print(len(a), len(b))
a == b

b'hello, world'
12 12


False

> UTF-8 encoding 방식은 영문자(7-bit ASCII)는 그대로 1 byte로 표현된다. 그러나, 8-bit ASCII나 muti-byte code들(한글 완성형 code, MS949 code, unicode 등)은 문자 당 2~4 byte로 표현된다.

In [2]:
h = '안녕, 대한민국'
bh = h.encode('utf-8')
print(bh)
print(len(h), len(bh))
bh.decode('utf-8')

b'\xec\x95\x88\xeb\x85\x95, \xeb\x8c\x80\xed\x95\x9c\xeb\xaf\xbc\xea\xb5\xad'
8 20


'안녕, 대한민국'

In [3]:
ba = bytearray(bh)
ba.extend(b'korea')
bh.hex()

'ec9588eb85952c20eb8c80ed959cebafbceab5ad'

|         | Python2    | Python3 |
|---------|------------| --------|
| str     | utf-8 or ms949 | unicode|
| print   | statement      | function |

## First socket program - a client

In [4]:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  # create a TCP socket object
print(s)
type(s)

<socket.socket fd=45, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)>


socket.socket

In [5]:
s.connect(('np.hufs.ac.kr', 7))    # connect to echo server
print(s)

<socket.socket fd=45, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('203.253.70.30', 42232), raddr=('203.253.70.30', 7)>


In [6]:
# msg = 'Hello, np!'
msg = '안녕 대한민국 hollo Korea'
s.send(msg.encode('utf-8'))        # bytes type

31

In [7]:
reply = s.recv(1024)
print(reply.decode('utf-8'))       # bytes type

안녕 대한민국 hollo Korea


In [8]:
s.close()    # teminate the TCP connection

In [10]:
type(reply)

bytes

## Fetching a page from web server - using ```urlopen```

In [18]:
import sys
from urllib.request import urlopen

url = "http://mclab.hufs.ac.kr/wiki/Lectures/IA/2018"
with urlopen(url) as f:
    contents = f.read().decode()
print(contents[:500])

<!DOCTYPE html>
<html lang="en" dir="ltr" class="client-nojs">
<head>
<title>Lectures/IA/2018 - MCLab</title>
<meta charset="UTF-8" />
<meta name="generator" content="MediaWiki 1.21.2" />
<link rel="shortcut icon" href="/favicon.ico" />
<link rel="search" type="application/opensearchdescription+xml" href="/mediawiki/opensearch_desc.php" title="MCLab (en)" />
<link rel="EditURI" type="application/rsd+xml" href="http://mclab.hufs.ac.kr/mediawiki/api.php?action=rsd" />
<link rel="alternate" type="a


## Fetching a page from web server - by socket programming

In [15]:
from urllib.request import urlparse
import socket

template = "GET {path} HTTP/1.1\r\nHost: {host}\r\nConnection: close\r\n\r\n"
url = "http://mclab.hufs.ac.kr/test/index.html"
r = urlparse(url)
host, port = r.hostname, r.port if r.port else 80
path = r.path + '?' + r.query if r.query else r.path
request = template.format(path=path, host=host)

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
print(request)
sock.send(request.encode('utf-8'))

chunks = []
while True:
    chunk = sock.recv(4096)
    if not chunk: break       # server closes
    print(len(chunk))
    chunks.append(chunk)
sock.close()

response = b''.join(chunks)
print(response)

GET /test/index.html HTTP/1.1
Host: mclab.hufs.ac.kr
Connection: close


1448
216
b'HTTP/1.1 200 OK\r\nDate: Thu, 27 Sep 2018 04:25:01 GMT\r\nServer: Apache/2.2.22 (Ubuntu)\r\nLast-Modified: Tue, 19 Sep 2017 06:13:15 GMT\r\nETag: "1e982f-569-55984c1337a5f"\r\nAccept-Ranges: bytes\r\nContent-Length: 1385\r\nVary: Accept-Encoding\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<html>\n<head>\n<title>Test Page</title>\n<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n</head>\n\n<body>\n<h1>Information and Communications Engineering</h1>\n<p><img src="http://ice.hufs.ac.kr/hufs-image01.jpg" border="0"></p>\n<p>Welcome to Dept. of Information and Communications Engineering</p>\n<p></p>\n<p>\xed\x95\x9c\xea\xb5\xad\xec\x99\xb8\xea\xb5\xad\xec\x96\xb4\xeb\x8c\x80\xed\x95\x99\xea\xb5\x90 \xec\xa0\x95\xeb\xb3\xb4\xed\x86\xb5\xec\x8b\xa0\xea\xb3\xb5\xed\x95\x99\xea\xb3\xbc</p>\n\n<h2>Blue Sky</h2>\n<h3>SKY 2</h3>\n<p><img src="s3test2.gif" border="0"></p>\n\n<h3>SK

> Internet에서 줄바꿈 표준은 `b'\r\n'`이다. 하지만, 많은 web server들은 `b'\n'`만 있어도 OK.
>
> response message에는 
> - status line: `HTTP/1.1 200 OK` (200은 request가 성공적으로 처리됐다는 뜻)
> - header lines
> `b'\r\n\r\n'` - blank line (한줄 띄기)
> 이어서 실제 download된 web contents가 있다.

Contents를 가져오려면, `b'\r\n\r\n'` 다음 byte부터 끝까지 분리하면 된다.

In [23]:
eol = response.find(b'\r\n') + 2
status, remainder = response[:eol], response[eol:]
eoh = remainder.find(b'\r\n\r\n') + 2
headers, contents = remainder[:eoh], remainder[eoh+2:]
print(status.decode())
print(headers.decode())
print(contents.decode())

HTTP/1.1 200 OK

Date: Thu, 27 Sep 2018 04:25:01 GMT
Server: Apache/2.2.22 (Ubuntu)
Last-Modified: Tue, 19 Sep 2017 06:13:15 GMT
ETag: "1e982f-569-55984c1337a5f"
Accept-Ranges: bytes
Content-Length: 1385
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

<html>
<head>
<title>Test Page</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>

<body>
<h1>Information and Communications Engineering</h1>
<p><img src="http://ice.hufs.ac.kr/hufs-image01.jpg" border="0"></p>
<p>Welcome to Dept. of Information and Communications Engineering</p>
<p></p>
<p>한국외국어대학교 정보통신공학과</p>

<h2>Blue Sky</h2>
<h3>SKY 2</h3>
<p><img src="s3test2.gif" border="0"></p>

<h3>SKY 3</h3>
<p><img src="s3test3.jpg" border="0"></p>
    <tr>
        <td width="0"><font size="1">&nbsp;</font></td>
        <td width="0"><font size="1">&nbsp;</font></td>
            <p><span style="font-size: 22pt;"><font color="#17365d" face="Impact">SKY 4</font></span></p>
            <p

## Fetching a page from web server by converting socket to file-like object

In [41]:
import socket

template = "GET {path} HTTP/1.1\r\nHost: {host}\r\nConnection: close\r\n\r\n"
url = "http://mclab.hufs.ac.kr/test/index.html"

r = urlparse(url)
host, port = r.hostname, r.port if r.port else 80
path = r.path + '?' + r.query if r.query else r.path

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
    sock.connect((host, port))
    infile = sock.makefile('rb')    # convert to file-like object
    request = template.format(path=path, host=host)
    sock.sendall(request.encode()) 
    status_line = infile.readline().decode()
    protocol, code, phrase = status_line[:-2].split(maxsplit=2)
    headers = {}
    for line in infile:
        if line == b'\r\n':       # empty line reached
            break
        header = line[:-2].decode()
        key, value = header.split(':', maxsplit=1)
        headers[key] = value.lstrip()   # remove leading white spaces
    contents = infile.read().decode()

print(code, phrase); print()
print(headers); print()
print(contents)

200 OK

{'Date': 'Thu, 27 Sep 2018 05:52:37 GMT', 'Server': 'Apache/2.2.22 (Ubuntu)', 'Last-Modified': 'Tue, 19 Sep 2017 06:13:15 GMT', 'ETag': '"1e982f-569-55984c1337a5f"', 'Accept-Ranges': 'bytes', 'Content-Length': '1385', 'Vary': 'Accept-Encoding', 'Connection': 'close', 'Content-Type': 'text/html'}

<html>
<head>
<title>Test Page</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>

<body>
<h1>Information and Communications Engineering</h1>
<p><img src="http://ice.hufs.ac.kr/hufs-image01.jpg" border="0"></p>
<p>Welcome to Dept. of Information and Communications Engineering</p>
<p></p>
<p>한국외국어대학교 정보통신공학과</p>

<h2>Blue Sky</h2>
<h3>SKY 2</h3>
<p><img src="s3test2.gif" border="0"></p>

<h3>SKY 3</h3>
<p><img src="s3test3.jpg" border="0"></p>
    <tr>
        <td width="0"><font size="1">&nbsp;</font></td>
        <td width="0"><font size="1">&nbsp;</font></td>
            <p><span style="font-size: 22pt;"><font color="#17365d" face="Impact">SKY 4</font

## An OO HTTP client implementation

In [24]:
import socket

class HTTPcli:
    template = "GET {path} HTTP/1.1\r\nHost: {host}\r\nConnection: close\r\n\r\n"

    def __init__(self, url):
        r = urlparse(url)
        self.host, self.port = r.hostname, r.port if r.port else 80
        self.path = r.path + '?' + r.query if r.query else r.path
        self.request = HTTPcli.template.format(path=self.path, host=self.host) 
            
    def read(self):
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
            sock.connect((host, port))
            self.infile = sock.makefile('rb')    # convert to file-like object
            request = template.format(path=path, host=host)
            sock.sendall(request.encode())

            self.status = self.infile.readline().decode().split()[1]
            self._proc_headers()
            contents = self.infile.read()
        return contents

    def _proc_headers(self):
        self.headers = {}
        for line in self.infile:
            if line == b'\r\n':
                 break
            matched = re.search(r'([\w-]+)\s*:\s*(.*)\s*\r\n$', line.decode())
            if matched: 
                key, value = matched.groups()
                self.headers[key] = value   
    def get_headers(self):
        return self.headers
    
    def get_header(self, key):
        return self.headers.get(key)

cli = HTTPcli("http://mclab.hufs.ac.kr/wiki/Lectures/IA/2018")
contents = cli.read()
print(contents[:500].decode())
cli.get_headers()

aaea
<!DOCTYPE html>
<html lang="en" dir="ltr" class="client-nojs">
<head>
<title>Lectures/IA/2018 - MCLab</title>
<meta charset="UTF-8" />
<meta name="generator" content="MediaWiki 1.21.2" />
<link rel="shortcut icon" href="/favicon.ico" />
<link rel="search" type="application/opensearchdescription+xml" href="/mediawiki/opensearch_desc.php" title="MCLab (en)" />
<link rel="EditURI" type="application/rsd+xml" href="http://mclab.hufs.ac.kr/mediawiki/api.php?action=rsd" />
<link rel="alternate" t


{'Cache-Control': 'private, must-revalidate, max-age=0',
 'Connection': 'close',
 'Content-Type': 'text/html; charset=UTF-8',
 'Content-language': 'en',
 'Date': 'Mon, 14 May 2018 08:10:19 GMT',
 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT',
 'Last-Modified': 'Mon, 14 May 2018 04:04:45 GMT',
 'Server': 'Apache/2.2.22 (Ubuntu)',
 'Transfer-Encoding': 'chunked',
 'Vary': 'Accept-Encoding,Cookie',
 'X-Content-Type-Options': 'nosniff',
 'X-Powered-By': 'PHP/5.3.10-1ubuntu3.10'}