## This is the Notebook for Lecture 17

In this lecture, we will learn techniques for accessing the internet. This will include:

<ol>
    <li>Opening a browser using Python <code>webbrowser</code></li>
    <li>Combine concepts by modifying strings to generate a Google Maps search</li>
    <li>Creating a socket</li>
    <li>Measure latency</li>
</ol>

### Now we will work with a web browser through Jupyter Notebooks

In [12]:
# First, you will learn how to open a URL in a webbrowser
import webbrowser

In [13]:
# See what happens when you open the Mendoza School of Business Website
nd_url_string = 'https://nd.edu'
course_url_string = 'https://canvas.nd.edu/courses/53612/pages/lecture-notes-and-schedule'

In [14]:
# Now, we will write a simple function to open a URL
def open_url( url_string ):
    webbrowser.open( url_string )

### Now we can open several Uniform Resource Locators (URLs) using our function

In [None]:
open_url( nd_url_string )

In [None]:
open_url( course_url_string )

In [None]:
# This command will also work in you input an IP Address as a string
open_url( '142.250.190.78' )

### In-Class Coding Opportunity
<p> </p>
You will write three functions:
<ol>
    <li>A function <code>update_input</code> that returns a string where every space is replaced with a <code>+</code>. This will ensure our input addresses match the format tha Google Maps uses</li>
    <li>A function <code>open_google_maps_url</code> that starts with the provided <code>base_search_str</code>, and appends the origin and destination address, putting a <code>/</code> between each of them. Call <code>update_input</code>  on both the origin and desination address
    <li>A function <code>get_directions</code> that prompts the user for an origin and destination address</li>
</ol>

In [None]:
def update_input( addr_string ):
    
    # update_input code goes here
    final_string = ""
    
    for index in range(0, len(addr_string) ):
        
        if addr_string[index] == ' ':
            final_string += '+'
            
        else:
            final_string += addr_string[index]
            
    return final_string


def open_google_maps_url( origin_address, destinataion_address ):
    
    # open_google_maps_url code goes here
    base_search_str = "https://www.google.com/maps/dir"
    
    final_search_str = base_search_str + "/" + update_input(origin_address) + "/" + update_input(destinataion_address)
    
    open_url( final_search_str )
    

def get_directions():
    
    # get_directions code goes here
    your_origin_address = input( "Type in the origin address: " )
    your_destinataion_address = input( "Type in the destination address: " )
    
    open_google_maps_url( your_origin_address, your_destinataion_address )

#### To simplify the testing of <code>get_directions</code>, you can copy and paste use these two addresses (although you can use different addresses if you wise):
<ol>
    <li>Origin address: <code>416 McKenna Hall, Notre Dame, IN 46556</code></li>
    <li>Destination address: <code>1600 Pennsylvania Avenue NW, Washington, DC 20500</code></li>
</ol>

In [None]:
get_directions()

## How do I retrieve something from the web?

In [15]:
import requests

main_building_tour_url = 'https://tour.nd.edu/locations/main-building/'

# Make the request
url_response = requests.get( main_building_tour_url )

In [16]:
url_response

<Response [200]>

It is highly likely you got <code>Response [200]</code> when running the last command. Check out what that code means at https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/200

In [17]:
# Combining conceps: What is the type?
type(url_response)

requests.models.Response

In [18]:
# How big is the HTML file?
len(url_response.text)

27892

In [None]:
# View the HTML by printing the url_response text
print (url_response.text)

### You can view the HTML code on your own by opening the page and then right-clicking the page and selecting View Source

In [None]:
open_url( main_building_tour_url )

### Now let's see what happens if there is a bad URL!

In [None]:
bad_request_result = requests.get( 'https://www.nd.edu/Pupfessor' )

bad_request_result.raise_for_status()

### Now that we know a potential issue, let's use a try/catch to prevent 404 errors!

In [None]:
# Importing again so you can start here
import requests
import webbrowser

def open_url_fixed( url_string ):

    try:
        
        request_result = requests.get( url_string )
        
        request_result.raise_for_status()
        
        webbrowser.open( url_string )
        
        print( f'Successful opening of {url_string}')
        
    except:
        
        print( f'{url_string} is not a valid URL')

In [None]:
# Working URL
open_url_fixed( nd_url_string )

In [None]:
# Bad URL
open_url_fixed( 'https://pupfessor.nd.edu' )

## Saving Downloaded Files to the Hard Drive

In [20]:
# Importing again so you can start here
import requests

large_file_example_url = 'https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt'

large_file_res = requests.get(large_file_example_url)

# Check that it works
large_file_res.raise_for_status()

# In-Class Coding - 
playFile = open('Shakespeare.txt', 'wb')

# Instead of using the default 100000 the text presents, let's get the actual size!
play_file_len = len(large_file_res.text)

# Why? Because the works of Shakespeare exceeds 100000 bytes!
print(play_file_len)

# We will iterate using the iter_content function
for chunk in large_file_res.iter_content(play_file_len):
        playFile.write(chunk)


# If we open a file, what must we do?
playFile.close()

5458199


## What is Notre Dame's IP Address using Sockets?

In [4]:
import socket

In [None]:
socket.gethostbyname('nd.edu')

In [None]:
socket.gethostbyname('google.com')

## What is my IP address by Socket?

In [None]:
# Here you will see how to print your hostname and your IP Address
hostname = socket.gethostname()
hostname

### Look familiar? You found this name on the first lab! Now let's get your IP Address

In [None]:
socket.gethostbyname(hostname)

### localhost is the default name for the home IP address 

In [None]:
socket.gethostbyname('localhost')

## What services are running on my machine?

### We will go through every possible port, and create a list of the available ports on the machine

In [None]:
#Python code for simple port scanning

# Get and print local ip
local_ip = socket.gethostbyname(hostname)  #getting ip-address of host

print( local_ip )

# Create an empty list of ports
port_list = []
  
for port in range(65535):      #check for all available ports
  
    try:
   
        # create a new socket
        serv = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
  
        # bind socket with address
        serv.bind( (local_ip,port) )
        
        serv.close()
             
    except:
  
        # If the socket is not open, then it is being used by the computer
        print('[OPEN] Port open :',port) #print open port number
    
        # Append the port name to the list
        port_list.append(port)
  

In [None]:
for current_port in port_list:
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
    try:
        s.connect(('localhost', current_port))
        
        print(f'Successfully connected to {local_ip}:{current_port}...')
        
    except socket.error:
        print(f'Unable to connect to {local_ip}:{current_port}, {socket.error}')
        pass

### Compare and Contrast
Look at the result of the <code>!netstat</code> command from the previous lecture. Review the result with the initial "Listening" results. What have we determined?

### Get a large file from the internet

In [24]:
import requests 

# For this to work, you must use http:// in the link_string
response = requests.get( main_building_tour_url )

In [25]:
html_string = response.text
print (html_string)


<!doctype html>
<html lang="en" class="no-js">
<head>
<meta charset="utf-8">
<script type="text/javascript">window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"db51011748","applicationID":"9339","transactionName":"Jw4IFxdXCQgHExslVwoFEwARVxcnDQ9AFFcIDQMRSlwMFxIAQAVQOwUHFwRaBBcHPkYDWwsTAg==","queueTime":0,"applicationTime":275,"agent":"","atts":"H0MTQV9DRwwNEkBEAkYVCRYXFgsATARQExoZHA=="}</script>
<script type="text/javascript">(window.NREUM||(NREUM={})).init={ajax:{deny_list:["bam.nr-data.net"]}};(window.NREUM||(NREUM={})).loader_config={licenseKey:"db51011748",applicationID:"9339"};window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var i=e[n]={exports:{}};t[n][0].call(i.exports,function(e){var i=t[n][1][e];return r(i||e)},i,i.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var i=0;i<n.length;i++)r(n[i]);return r}({1:[function(t,e,n){function r(){}function i(

In [26]:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

s.connect((main_building_tour_url, 80))

s.send(b'GET / HTTP/1.0\n')        # Client sends request to server
s.send(b'Host: {link_string}\n')
s.send(b'\n')

data = s.recv(4096)                # Client reads response from server
while data:
    print(data.decode())
    data = s.recv(4096)

gaierror: [Errno 11001] getaddrinfo failed

## How do we measure latency?

In [27]:
import time

def measure_latency(domain):
    # Measure latency by doing the following:

    # 1. Create streaming internet socket.
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
    # 2. Record start time.
    start_time = time.time()
    
    # 3. Connect to specified domain at port 80.
    s.connect((domain, 80))
    
    # 4. Record end time.
    end_time = time.time()
    
    # 5. Compute latency: latency = Elapsed Time * 1000
    latency = (end_time - start_time) * 1000
    
    return latency

In [28]:
measure_latency( 'www.nd.edu' )

34.658193588256836

In [30]:
measure_latency( 'www.google.com' )

15.355587005615234