# Network Programming

### Text Processing (Characters & Strings)

ASCII: `ord()` to find ASCII number associated to a text character(doesn't take numbers, only takes one argument)

In [None]:
ord('H')

72

In [None]:
ord("\n")

10

In [None]:
ord('n')

110

### UTF-8 is now recommended practice

ALL unicode strings (strings preceded by the letter 'u' for unicode) are considered type 'str' in Python 3

Python 2: unicode strings were considered type 'unicode' instead of type 'str'

<br>

__NOTE: Where connecting to a network resource/protocol or database, data must be encoded & decoded for UTF-8 bytes__

In [None]:
x = u'こんにちは世界'
print(x)
print(type(x))

こんにちは世界
<class 'str'>


### TCP Sockets Library

Connect to a host address via a specified port

__Socket__: Endpoints of a two-way communication feed between two programs on a network (e.g. browser ["user agent"] & web server are both sockets)

`socket.AF_INET` -- IPv4 Internet protocols <br>
`socket.SOCK_STREAM` -- creates a TCP socket <br>
* *Note: TCP relies on multiple handshakes and checksums to ensure all data packets are received in approriate order (reliable)* <br>

`socket.SOCK_DGRAM` -- creates a UDP socket <br>
* *Note: UDP is a connectionless protocol that doesn't rely on multiple handshakes unlike TCP. As a result, some of the data packets might be missing and/or received out of order (not reliable, but great for streaming videos)* <br>

`socket.recv([number of bytes])` -- number of bytes to receive at a time (buffer size)

In [1]:
import socket

### Get IP Address of Host Address

In [2]:
# don't include "http://" part of url, host name only
ip = socket.gethostbyname('birdsarentreal.com')

print(ip)

23.227.38.32


### Build Web Browser (HTTP Request; Insecure HTTP websites ONLY)

Possible causes for the following HTTP statuses:
* __200 OK__: Request was successful
* __403 Forbidden__: Request was understood by server, but not allowed to access webpage via port 80, cannot use for the code below
* __301 Moved Permanently__: Website redirected from http:// to https://; would be best to use `requests` library instead (unless you can figure out how to use `ssl` library & ssl wrap socket connection) 
    * __NOTE: Most websites will NOT work with code below because it will auto-redirect to https; use `requests` library instead__ 
* __400 Bad Request__: Syntax error, incorrect host name, 

In [3]:
## Establish a connection with host name
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(('data.pr4e.org', 80)) 

## retrieve data from website
request = 'GET /romeo.txt HTTP/1.1\r\nHost:data.pr4e.org\r\n\r\n'.encode() # GET request command; encode to UTF-8
client.send(request) # GET request command sent to web server via TCP socket

while True:
    data = client.recv(512) # receive data request 512 characters at a time
    if(len(data) < 1):
        break
    print(data.decode()) # can put an optional argument of 'iso-8859-1' to decode non-UTF-8 characters


## Close connection
client.close()

HTTP/1.1 200 OK
Date: Mon, 04 Jul 2022 03:19:46 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Content-Type: text/plain

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with g
rief



### Build Web Browser (Easier method with `urllib`)

In [16]:
import urllib.request, urllib.parse, urllib.error

In [11]:
file_handle = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')

for line in file_handle:
    print(line.decode().strip()) # can treat it like a file & manipulate it (see next cell)

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief


In [22]:
# Finding count frequency for each word
file_handle = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')

counts = dict()

for line in file_handle:
    words = line.decode().split()
    
    for word in words:
        counts[word] = counts.get(word, 0) + 1
    
print(counts)

{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1, 'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3, 'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1, 'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1, 'grief': 1}


In [29]:
# finding top 3 most frequent words

counts_list = list()

for k, v in counts.items():
    counts_list.append( (v, k) )

counts_list = sorted(counts_list, reverse=True)

for v, k in counts_list[:3]:
    print(k, "---", v)


the --- 3
is --- 3
and --- 3


Reading webpages:

In [15]:
html_doc = urllib.request.urlopen('https://www.theroot.com/florida-man-steals-car-with-baby-still-inside-then-dro-1823957884')

for line in html_doc:
    print(line.decode().strip())

<!DOCTYPE html><html lang="en-us" data-reactroot=""><head><meta name="google-site-verification" content="P13oAn8o8LB6FVCJWKsHWXvxfbR-SJrmvox6EgULpUs"/><meta name="google-site-verification" content="QDPLbDJXTQNT0n69mvNADCeRmwnbkYyL20OKJAVCKq8"/><meta name="ir-site-verification-token" value="-1270174611"/><meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0,maximum-scale=10.0"/><meta charSet="utf-8"/><meta name="ROBOTS" content="INDEX, FOLLOW"/><title>Florida Man Steals Car With Baby Still Inside, Then Drops Baby Off at Nearby Gas Station</title><link rel="shortcut icon" type="image/png" href="https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_80,q_80,w_80/f5zr3vuc90hrpnmx0nme.png"/><link rel="apple-touch-icon" type="image/png" href="https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_200,q_80,w_200/f5zr3vuc90hrpnmx0nme.png"/><meta name="msapplication-square70x70logo" content="

For websites with expired certificates (*import `ssl`*):

In [19]:
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

file_handle = urllib.request.urlopen('https://birdsarentreal.com/pages/faq')

for line in file_handle:
    print(line.decode().strip())

<!doctype html>
<html class="no-js" lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width,initial-scale=1">
<meta name="theme-color" content="">
<link rel="canonical" href="https://birdsarentreal.com/pages/faq">
<link rel="preconnect" href="https://cdn.shopify.com" crossorigin><link rel="preconnect" href="https://fonts.shopifycdn.com" crossorigin><title>
Frequently Asked Questions - Birds Aren&#39;t Real
</title>


<meta name="description" content="What is this movement&#39;s purpose? When did it start? Why are birds not real? Learn answers to these questions plus news from the Birds Aren&#39;t Real Movement here.">




<meta property="og:site_name" content="Birds Aren&#39;t Real">
<meta property="og:url" content="https://birdsarentreal.com/pages/faq">
<meta property="og:title" content="Frequently Asked Questions - Birds Aren&#39;t Real">
<meta property="og:type" content="website">
<meta property