# Antwoorden van hoofdstuk 12: Networked programs

## Exercise 1
Change the socket program socket1.py to prompt the user for the
URL so it can read any web page. You can use `split('/')` to break the URL into
its component parts so you can extract the host name for the socket connect
call. Add error checking using try and except to handle the condition where
the user enters an improperly formatted or non-existent URL

### Antwoord

In [1]:
import socket

# url = 'http://data.pr4e.org/romeo.txt'
url = input("Enter url ")
parts = url.split("/")
hostname = parts[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((hostname, 80))
cmd = ('GET %s HTTP/1.0\r\n\r\n' % url).encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    print(data.decode(), end='')

mysock.close()

Enter url  http://data.pr4e.org/romeo.txt


HTTP/1.1 200 OK
Date: Tue, 22 Nov 2022 11:59:20 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief


## Exercise 2
Change your socket program so that it counts the number of
characters it has received and stops displaying any text after it has shown
3000 characters. The program should retrieve the entire document and count
the total number of characters and display the count of the number of
characters at the end of the document.

### Antwoord

In [2]:
import socket

# url = 'http://data.pr4e.org/romeo.txt'
url = input("Enter url ")
parts = url.split("/")
hostname = parts[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((hostname, 80))
cmd = ('GET %s HTTP/1.0\r\n\r\n' % url).encode()
mysock.send(cmd)

accumulated = ""
while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    accumulated += data.decode()

print("Total number of characters:", len(accumulated))
print("First 3000 bytes:", accumulated[:3000])
mysock.close()

Enter url  http://data.pr4e.org/romeo.txt


Total number of characters: 536
First 3000 bytes: HTTP/1.1 200 OK
Date: Tue, 22 Nov 2022 12:00:42 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief



## Exercise 3
Use `urllib` to replicate the previous exercise of (1) retrieving the
document from a URL, (2) displaying up to 3000 characters, and (3) counting
the overall number of characters in the document. Don’t worry about the
headers for this exercise, simply show the first 3000 characters of the
document contents.

### Antwoord

In [3]:
import urllib

# url = 'http://data.pr4e.org/romeo.txt'
url = input("Enter url ")

webpage = urllib.request.urlopen(url).read()

print("Total number of characters:", len(webpage))
print("First 3000 bytes:", webpage[:3000])
mysock.close()

Enter url  http://data.pr4e.org/romeo.txt


Total number of characters: 167
First 3000 bytes: b'But soft what light through yonder window breaks\nIt is the east and Juliet is the sun\nArise fair sun and kill the envious moon\nWho is already sick and pale with grief\n'


## Exercise 4
Change the urllinks.py program to extract and count paragraph (p)
tags from the retrieved HTML document and display the count of the
paragraphs as the output of your program. Do not display the paragraph text,
only count them. Test your program on several small web pages as well as
some larger web pages.

### Antwoord

In [5]:
# To run this, download the BeautifulSoup zip file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
paragraphs = soup("p")
count = len(paragraphs)
print(count, "paragraphs found")

Enter -  https://docs.python.org


27 paragraphs found


## Exercise 5
(Advanced) Change the socket program so that it only shows data
after the headers and a blank line have been received. Remember that recv
receives characters (newlines and all), not lines.

### Antwoord

In [6]:
import socket

# url = 'http://data.pr4e.org/romeo.txt'
url = input("Enter url ")
parts = url.split("/")
hostname = parts[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((hostname, 80))
cmd = ('GET %s HTTP/1.0\r\n\r\n' % url).encode()
mysock.send(cmd)

accumulated = ""
while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    accumulated += data.decode()

# first try to find the sequence CarriageReturn + Linefeed + CarriageReturn + Linefeed
start = accumulated.find("\r\n\r\n")
if start == -1: # if not found try to find the sequence LineFeed + LineFeed
    start = accumulated.find("\n\n")
content = accumulated[start+2:]

print("Total number of characters:", len(content))
print("First 3000 bytes:", content[:3000])
mysock.close()

Enter url  http://data.pr4e.org/romeo.txt


Total number of characters: 169
First 3000 bytes: 
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief

