<div class="pagebreak"></div>

# Introduction

Effective programming combines problem-solving skills, domain knowledge, and programming skills; these notebooks will target each of these different areas.  The overall goal, though, is not necessarily to teach you Python but rather how you can solve problems and perform tasks with computer programs.

A program is a series of instructions to carry out a specific task.  These tasks can run a large gamut of possibilities:
- solving mathematical problems
- processing images and text
- playing video games
- analyzing financial data
- making decisions (which covers another range from answering simple yes/no questions to driving a car).

Learning to program is not a passive activity.  You cannot simply just read documentation and these notebooks.  Becoming an effective programmer takes targeted practice, and that practice takes time.  These notebooks will present fundamental concepts, how Python implements those concepts, and then how you can solve real problems with those concepts. These notebooks contain a large amount of Python code. Not only should you run this code, but you should also make changes to the code and see what happens. Do not be afraid to make mistakes - try things! The computer does not care and will not make fun of you. You should complete the exercises at the end of each notebook. Yes, we provide many of the answers. However, you will learn more by attempting the exercises on your own. We have written these exercises to reinforce the concepts presented.

The following notebook presents an approach to solving problems with programming.  The fundamental principle is understanding what occurs and how we can translate that into a series of steps.  And most importantly, plan how to implement that approach before you write any code. Once we have those steps, we can then write code.  As you perform this process, you should have a paper handy to take notes and record your thoughts.

To start our journey to learn computer programming through Python, this notebook contains three separate programs to demonstrate some of the capabilities of computer programs and some fundamental concepts within computer science. As these are some of the first Python programs you might have seen, we provide detailed explanations. We do not expect that you will now be able to write equivalent programs.

## Sample Program: The Wayback Machine

The following program allows users to enter a particular web address (URL) and date. The program will then query the "Wayback Machine" hosted at https://archive.org/ to find a copy of that web address closest to the entered date and open the results in a new browser window.

In [2]:
import webbrowser
import json
from urllib import request 
    
print("Let's find an old website.")
site = input("Type a website URL: ")
era = input("Type a year, month, and day, like 20140410: ")
url = "http://archive.org/wayback/available?url=%s&timestamp=%s" % (site, era)
response = request.urlopen(url)
contents = response.read()
data = json.loads(contents)
try:
    old_site_url = data["archived_snapshots"]["closest"]["url"]
    print("Found this copy: ", old_site_url)
    print("It should appear in your browser now.")
    webbrowser.open(old_site_url)
except:
    print("Sorry, no luck accessing", site)

Let's find an old website.
Type a website URL: http://irishwildcat.com
Type a year, month, and day, like 20140410: 20140410
Found this copy:  http://web.archive.org/web/20140317220451/http://irishwildcat.com:80/
It should appear in your browser now.


Source: _Introducing Python: Modern Computing in Simple Packages, 2nd Ed_, Bill Lubanovic [O'Reilly](https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/) [Amazon](https://www.amazon.com/Introducing-Python-Modern-Computing-Packages-dp-1492051365/dp/1492051365)

While you may be new to programming, hopefully you can see how the previous code block worked.

A few general notes:
1. This code makes substantial use of existing modules and functions.  `webbrowser`, `json`, and `urllib` are modules - collection of code others have written that we can use in our code. One of the benefits of most programming languages is the libraries - both delivered as part of the programming platform (aka, "standard libraries") as well as those that others have written.  These libraries abstract many of the tasks to perform specific functionality and make doing those tasks much simpler than if we had to entirely write the system.
2. Statements that look like _name_(_value_) are  function calls.  These function calls allow us to access code that has been previously written and performs a certain task.  
3. Statements that look like _name_._name_(_value_) are also function calls, but these functions belong to modules.

<br>Note: If you need to see line numbers, refer to the previous notebook for instructions to enable them.

* Lines 1 and 2 let the program use the `webbrowser` and `json` libraries.  The `webbrowser` library allows the program to open a new browser window with a URL (line 16).  [JSON](https://www.json.org/) has become one of the standards for exchanging data through internet-based APIs. The course will cover JSON in more detail in later notebooks. The `json` library provides code to parse the JSON data format.
* Line 3 allows us to use the `request` module from the `urllib` package.  A package is a just a group of related modules. As you can see from these first three lines, one of Python's advantages is the large number of included standard libraries and available 3<sup>rd</sup> party open-source libraries.
* Line 5 prints a message to the console telling the user what the program will do
* Line 6 allows the user to type in the URL for a particular website.  For this example, I used `http://irishwildcat.com`, an old blog no longer available on the Internet.
* Line 7 gets a date in a particular format, starting with the year, month, and day.  This representation is based upon an international standard - [ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) - for dates and times.
* Line 8 creates a variable called `url` to point to a location on Wayback Machine at https://archive.org/
* Line 9 opens a connection to that url on archive.org and places the result into the variable `response`.  It performs this by calling the `urlopen()` function within the `request` module.
* Line 10 then reads the text output from that connection, setting the output into the variable `contents`
* Line 11 converts the contents of the text variable (which contains a JSON object) to a Python dictionary.  A dictionary stores data in key-value pairs - we discuss this in much greater detail in a later notebook.
* In lines 12-18, we execute a code block in a special region.  If a Python error occurs, the interpreter will detect the error and show the user the message in line 18.
* Line 13 grabs a specific web address (URL) from the results of the Wayback Machine.
* Line 14 prints that URL
* Line 15 tells the user that we will open that URL in a new browser window.
* Line 16 then opens a browser window with that URL. NOTE: this will not work in Google Colaboratory.

Note: if you want to see the values of some of these intermediate values, you can edit the above source code and insert a line such as the following just after that variable is assigned (before line 12):
```
print(contents)
```

## Sample Program: A Google Search
In this section, the following Python statements will perform a Google search, access one of the pages, and print the extracted text from that page.

First, we need to make available the functionality to perform the search:

In [None]:
from googlesearch import search

To see help information about the `search()` method from the `googlesearch` module, you can call a built-in function `help()`:

In [None]:
help(search)

Within a Jupyter notebook, we can also use a `?` after an item to see the help.

In [None]:
search?

In the following cell, the program searches Google for the terms "financial technology". The program then converts that result into a built-in data structure called a list that holds a sequence of "things/data" in a particular order.

As we call `search`, we pass four arguments:
* query terms
* num - number of  search results per page
* stop - last result number to retrieve
* pause - a time in seconds to wait between making requests to Google.


In [None]:
search_results = list(search("financial technology", num=10, stop=30, pause=2))

Note: if you receive the following error message, you need to make Python aware of the root certificates such that the Python interpreter can validate the secure connection to the web server that provides the search results:
```
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)>
```
On MacOS, open a terminal window and execute the following commands (assumes Python 3.10 is installed):
```
cd  /Applications/Python\ 3.10
./Install\ Certificates.command
```

This next cell displays the documentation associated with the type of `search_results`.

In [None]:
help(search_results)

The following code block prints the number of entries in the list `search_result` followed by the entry in the first position.  For largely historical reasons tied to what was a performance optimization, most programming languages start to count things at 0 and then go to length-1 for the last item. [Zero-Based Numbering](https://en.wikipedia.org/wiki/Zero-based_numbering)

Try running the following block with different values for `0`.  What happens if `[0]` is removed?

In [None]:
print(len(search_results))
print(search_results[0])

This next code block takes that particular URL, opens a network connection to that URL, and then reads the results into the variable `mystr`.

Then the code uses the BeautifulSoup library to parse the returned HTML document. The program then extracts all text content, placing the results in `text`.

In [None]:
url = search_results[3]
import urllib.request
from bs4 import BeautifulSoup

fp = urllib.request.urlopen(url)
mybytes = fp.read()

mystr = mybytes.decode("utf-8")   # convert from bytes into a string representation
fp.close()

soup = BeautifulSoup(mystr)
[s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
text = soup.getText(separator="\n")
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text)

Yes, the output is rather messy.  Part of the challenge with scraping websites is to figure out what to keep and how to combine the results effectively. With a text analysis task such as this, researchers look to see how we can effectively extract the relevant text and then find meaning in that text. 

Again, many different things occur in the previous text book.  Do not worry if it all does not make sense.  Part of these two code samples is to show you possible destinations on this coding journey that you have just started.

## Sample Program: Traveling Salesperson and Solvability
Fundamentally, computer science is concerned with asking if a particular problem can be solved and, if it is solvable, how expensive is a specific solution.

One of the most well-studied problems in computer science is the [traveling salesperson problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem).  Suppose a salesperson has to visit $n$  cities and return to that starting point.  What is the shortest route that visits each city exactly once?

The brute-force method would be to try all possible [permutations](https://en.wikipedia.org/wiki/Permutation) of the city routes. With three cities, the following possibilities exist:
- City 1, City 2, City 3
- City 1, City 3, City 2
- City 2, City 1, City 3
- City 2, City 3, City 1
- City 3, City 1, City 2
- City 3, City 2, City 1

How long would it take to search all the possibilities to find the optimal answer?

$n!$ possible permutations exist.

Assuming we can search one billion possibilities in a second, how much time is required to solve the problem for 20 cities? 100 cities?

In [None]:
import math

num_sec_in_day = 24 * 60 * 60
num_tries_per_second = 1_000_000_000
num_cities = 20
num_route_permutations = math.factorial(num_cities)
num_days = num_route_permutations // num_tries_per_second // num_sec_in_day
print(f"{num_days:,}")

Try running the above code block for different values for the number of cities. If you use a small value, the program prints zero as the code uses integral division `//` which discards the remainder (e.g., `5/2` has the result of `2`, not `2.5`).  For small values, take out `// num_sec_in_day` to see the number of seconds or alter the assumption of one billion tries per second. How could you convert the result to the number of years?  Try making these code changes in the above cell and re-running the cell. Part of being a computer scientist is to explore different possibilities.

[You can also step through the code to see what occurs on each program step](https://pythontutor.com/render.html#code=import%20math%0A%0Anum_sec_in_day%20%3D%2024%20*%2060%20*%2060%0Anum_tries_per_second%20%3D%201_000_000_000%0Anum_cities%20%3D%2020%0Anum_route_permutations%20%3D%20math.factorial%28num_cities%29%0Anum_days%20%3D%20num_route_permutations%20//%20num_tries_per_second%20//%20num_sec_in_day%0Aprint%28f%22%7Bnum_days%3A,%7D%22%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false).

The traveling salesperson problem has direct applicability to real-life:
- Creating routes on maps
- Delivery schedules for companies such as Amazon, FedEx, and UPS.

## What is a Program?
As mentioned at the start of this notebook, a program is a series of instructions to carry out a specific task.
As you ran through these different examples, you may have noticed a few commonalities.

1. Each of these programs had some sort of input.  The first example asked for the input from the user. The other two had "hard-coded" inputs through the search terms and the number of cities.
2. Each of the programs had some sort of output (result). 
3. All three had some form of sequence of commands with assignments, mathematical operations, or other statements.
4. Each program used variables to hold information containing the current state of the program.
5. Each program used existing libraries and functions to produce the desired functionality. One aspect of modern programming is not only learning the syntax (the rules) of a programming language, but learning how to find and use existing libraries. 

Additionally, as these notebooks will demonstrate, programs typically contain some form of conditional expression that determines if different blocks of code should execute. Finally, the last fundamental commonality is that programs will regularly have some form of repetition where an action (or sequence of actions) will repeated.

## Abstraction and Encapsulation

One of the keys to successful programming is to apply fundamental programming principles.  In this notebook, we have relied heavily upon two, abstraction and encapsulation, frequently appearing together. Abstraction seeks to distill a concept into its fundamental parts, while encapsulation hides the implementation's necessary and sometimes complex details. With abstraction, we focus on the outside view of an object or function, while encapsulation hides the exact details and steps to perform a specific task. For example, to query the Wayback Machine, many details were encapsulated by the `urllib` module. We did not have to concern ourselves with opening a network connection, following the HTTP protocol, and parsing the results. Instead, the `urllib` module handled those tasks with an abstracted (simplified) view to open a URL and read the response - the fundamental operations.

We take advantage of abstraction, encapsulation, and other programming principles when we use existing classes, modules, and functions. As a result, we can solve real-life problems by focusing on the essential parts without understanding the precise implementation details. Throughout these notebooks, we will apply programming principles to model real-life systems, problems, and tasks.

## Python Documentation
Python offers substantial documentation - both from within the interpreter with the `help()` function and online.

The homepage for the Python documentation is at https://docs.python.org/3/   
Visit that site and see what is available.