# Networking

- A socket is an endpoint of a bidirectional inter-process communication flow
- A port is an application-specific or process-specific software communications endpoint

### Common TCP Ports

- Telnet (23) - Login
- SSH (22) - Secure Login
- HTTP (80)
- HTTPS (443) - Secure
- SMTP (25) - Mail
- IMAP (143/220/993) - Mail Retrieval
- POP (109/110) - Mail Retrieval
- DNS (53) - Domain Name
- FTP (21) - File Transfer

In [1]:
import socket

In [18]:
# Define socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Connect socket to web via a port
mysock.connect(('data.pr4e.org',80))

# Send a command
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
mysock.send(cmd)

# Read
while True:
    data = mysock.recv(512)
    if len(data)<1:
        break
    print(data.decode())


# Close connection
mysock.close()



HTTP/1.1 400 Bad Request
Date: Wed, 07 Dec 2022 16:44:17 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Length: 308
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at do1.dr-chuck.com Port 80</address>
</body></html>



In [15]:
import urllib.request

In [21]:
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
    # Se pone strip() porque python añade por defecto \n entre dos prints diferentes, pero ademas,
    # al leer un texto, esto inclute un \n al final por defecto. Con strip() eliminamos uno 
    print(line.decode().strip())

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief


### Web Scraping

- Usamos BeautifulSoup of www.crummy.com to get data from a web

In [23]:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup

In [26]:
url = 'http://www.dr-chuck.com/page1-htm'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
print(soup.prettify())
for tag in tags:
    print(tag.get('href',None))

<h1>
 The First Page
</h1>
<p>
 If you like, you can switch to the
 <a href="http://www.dr-chuck.com/page2.htm">
  Second Page
 </a>
 .
</p>

http://www.dr-chuck.com/page2.htm
