# Introduction

Effective programming involves combining problem solving skills, domain knowledge, and then programming skills.  The goal of these notebooks is not necessarily to teach you Python, but rather how you can solve problems and perform tasks with computer programs. These notebooks will target each of these different areas.

As you learn to program, realize that learning to program is not a passive activity.  You can't simply just read documentation and these notebooks.  It will take targeted practice and that practice takes times.  These notebooks will present fundamental concepts and then how those concepts are implemented in Python as well as how you can solve real problems with those concepts.  These notebooks contain a large number of sample code.  Not only should you run this could, you should also make changes to the code and see what happens.  Don't be afraid to make mistakes - try things!.  The computer doesn't care and won't make fun of you.  You should complete the exercises at the end of each notebook.  Yes, we have provided the answers.  However, you will really learn by attempting the exercises on your own.  These exercises have also been written to reinforce the concepts presented.

In the next notebook, we'll present an approach to solve problems with programming.  The key principle with this approach is understand what occurs and then how we can translate that into a series of steps.  And most importantly, to plan how to implement that approach before you start writing code. 

To start, this notebook contains three separate programs to demonstrate some of the potential of computer programs as well as some fundamental concepts within computer science. As this is also one of the first Python programs you might have seen, we'll provide a detailed explanation of each step.


## Sample Program: The Wayback Machine

The following program allows the user to enter a particular website and date. The program will then query the "Wayback Machine" hosted at https://archive.org/ to find a copy of that website closest to the entered date and open the results in a new browser window.

In [None]:
import webbrowser
import json
from urllib.request import urlopen
    
print("Let's find an old website.")
site = input("Type a website URL: ")
era = input("Type a year, month, and day, like 20140410: ")
url = "http://archive.org/wayback/available?url=%s&timestamp=%s" % (site, era)
response = urlopen(url)
contents = response.read()
data = json.loads(contents)
try:
    old_site_url = data["archived_snapshots"]["closest"]["url"]
    print("Found this copy: ", old_site_url)
    print("It should appear in your browser now.")
    webbrowser.open(old_site_url)
except:
    print("Sorry, no luck accessing", site)

Source: _Introducing Python: Modern Computing in Simple Packages, 2nd Ed_, Bill Lubanovic [O'Reilly](https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/) [Amazon](https://www.amazon.com/Introducing-Python-Modern-Computing-Packages-dp-1492051365/dp/1492051365)

While you may be new to programming, hopefully you could see a little bit how the previous code block worked. In command mode for the previous cell, type <kbd class="key-l">l</kbd></span> to see line numbers.

* lines 1 and 2 let the program use the webbrowser and json libraries.  The webbrowser library lets us open a new browser window with a given URL (line 17).  [JSON](https://www.json.org/) has become one of the standards for exchanging data through internet-based APIs. We'll cover JSON in more detail as the course progresses. 
* line 3 allows us to use the the `urlopen()` method from the `urllib.request` module.
* line 5 prints a message to the console telling the user what the program will do
* line 6 allows the user to type in the URL for a particular website.  For this example, I used `http://irishwildcat.com`, an old blog that's no longer available on the Internet.
* line 7 gets a date in a particular format starting with the year, month, and then day.  This representation is based off an international standard - [ISO-8601](https://en.wikipedia.org/wiki/ISO_8601)
* line 8 creates a variable called `url` to point to a location on Wayback Machine at https://archive.org/
* line 9 opens a connection to that url on archive.org and places the result into the variable `response`
* line 10 then reads the text output from that connection, placing the output into the variable `contents`
* line 11 converts the contents of the text variable (which contains a JSON object) to a Python dictionary.  A dictionary stores data in key-value pairs - we'll give this in much greater detail a few notebooks from now.
* In lines 12-18, we execute a block of code in a special region.  If any Python errors occur, the error will be caught and  the user will be shown the message from line 19.
* line 13 grabs a specific URL from the results of the Wayback Machine.
* line 14 prints that URL
* line 15 tells the user that we will be opening that URL in a new browser window.
* line 16 then opens a browser window with that URL. NOTE: this will not work in Google Colaboratory.

Note: if you want to see the values of some of these intermediate values, you can edit the above source code and insert a line such as the following just after that variables is assigned (before line 12):
```
print(contents)
```

## Sample Program: A Google Search
The following series of Python commands will perform a Google search.

In [None]:
from googlesearch import search

In [None]:
help(search)

The previous cell allows us to get help information about the `search()` method from the `googlesearch` module.

And, now, in the following cell, the program searches Google for the search query "financial technology". The program then converts that result into a built-in data structure called a list that holds a sequences of "things/data" in a particular order.

As we call `search`, we pass 4 arguments:
* query terms
* num - number of  search results per page
* stop - last result to retrieve
* pause - a time in seconds to wait between making requests to Google.


In [None]:
search_results = list(search("financial technology", num=10, stop=30, pause=2))

This next cell displays the documentation that associated with the type of `search_results`.

In [None]:
help(search_results)

The next code blocks prints the number of entries in the list `search_result` followed by the entry in the first position.  For largely historical reasons tied to what was a performance optimization, most programming languages start to count things at 0 and then go to length-1 for the last item. [Zero Based Numbering](https://en.wikipedia.org/wiki/Zero-based_numbering).  

Try running the following block with different values for `0`.  What happens if `[0]` is removed?

In [None]:
print(len(search_results))
print(search_results[3])

This next code block takes that particular URL, opens a network connection to that URL, and then reads the results into the variable `mystr`.

Then the code uses the BeautifulSoup library to parse the HTML document that's returned and then extract all of the text content, placing the results in `text`.

In [None]:
url = search_results[3]
import urllib.request
from bs4 import BeautifulSoup

fp = urllib.request.urlopen(url)
mybytes = fp.read()

mystr = mybytes.decode("utf-8")   # convert from bytes into a string representation
fp.close()

soup = BeautifulSoup(mystr)
[s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
text = soup.getText(separator="\n")
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text)

Yes, the output is rather messy.  Part of the challenge with scraping websites like this is to figure out what to keep and how to effectively combine the results.  Researchers then look to see how we can find meaning in that text. 

## Sample Program: Traveling Salesman and Solvability
The lecture slides presented computer science as being concerned with asking if a particular problem can be solved and if it is solvable, how expensive is it to solve.

One of the most well-studied problems in computer science is the [travelling salesperson problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem).  Suppose a salesperson has to visit $n$  cities and return back to that starting point.  What is the shortest route that visits each city exactly once?

The brute-force method would be to try all possible [permutations](https://en.wikipedia.org/wiki/Permutation) of the city ordering. Suppose we had 3 cities, the following possibilities exits:
- City 1, City 2, City 3
- City 1, City 3, City 2
- City 2, City 1, City 3
- City 2, City 3, City 1
- City 3, City 1, City 2
- City 3, City 2, City 1

But how long would that take to exhaustively search to find the optimal answer?

$n!$ possible permutations exist.

Assuming we can search 1 billion possibilities in a second, how much time is required to solve the problem for 20 cities?  What about 100?


In [None]:
import math

num_sec_in_day = 24 * 60 * 60
num_tries_per_second = 1_000_000_000
num_cities = 20
num_route_permutations = math.factorial(num_cities)
num_days = num_route_permutations // num_tries_per_second // num_sec_in_day
print(f"{num_days:,}")

Try running the above code block for different values for the number of cities.  As you can see if you use a small value, you'll end up with zero as the code uses integral division `//`.  For small values, take out `// num_sec_in_day` to just see the number of seconds or alter the assumption of 1 billion tries per second. How could you convert the result to the number of years?

This problem has direct applicability to real-life:
- Creating routes on maps
- Delivery schedules for companies such as Amazon, FedEx, and UPS.

## Python Documentation
Python offers a substantial amount of documentation - both from within the interpreter with the `help()` function as well as online.

The homepage for the Python documentation is at https://docs.python.org/3/   
Visit that site and see what's available.