# Introduction

Congratulations on making it to this point where you can now view, edit and run [Jupyter Notebooks](https://jupyter.org/).

We'll use Jupyter Notebooks for lectures, code demonstrations, and exercises.  You should run these notebooks yourself. Don't be afraid to make changes, try different code snippets than what's provided, and experiment.

The Jupyter environment follows a very familar [user interface](https://en.wikipedia.org/wiki/WIMP_(computing)) pattern - the main difference is the ability to execute code within the different cells.  As you scroll through this notebook, you'll see that is composed of a particular sequence of cells.  Each of those cells, can either be code or [markdown](https://en.wikipedia.org/wiki/Markdown).  You will need to execute all of the code cells - primarily in order, but you can modify and repeat running cells.  With the markdown cells, we will provide content (e.g., this cell) or provide instructions or exercises to perform. 

Additionally, a particuler cell can be in one of two modes:
1. Command - will have a blue border around the active/current cell.
2. Edit - in this mode, you can change the content of the cells.  A green border is present.

In either of these modes:
 * <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-enter">Enter</kbd></span>  Run the current cell, and select the cell below 
 * <span class="keys"><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-enter">Enter</kbd></span>  Run the current cell, and insert a cell below 
 * <span class="keys"><kbd class="key-shift">Ctrl</kbd><span>+</span><kbd class="key-s">s</kbd></span> Saves the current notebook  (<span class="keys"><kbd class="key-shift">Command(⌘)</kbd><span>+</span><kbd class="key-s">s</kbd></span> on a Mac)

In the command mode:
* To see all shortcuts, type <kbd class="key-h">H</kbd>
* Switch to the edit mode by click the mouse on the cell or typing <kbd class="key-enter">Enter</kbd>
* Use the up and down arrow keys to move among the cells
* Press <kbd class="key-a">a</kbd> to insert a cell above the current one
* Press <kbd class="key-b">b</kbd> to insert a cell below the current one
* To change the cell type to markdown, press <kbd class="key-m">m</kbd>
* To change the cell type to code, press <kbd class="key-y">y</kbd>

In the edit mode:
* Press <kbd class="key-esc">Esc</kbd> to go into the command mode
* Press <kbd class="key-tab">Tab</kbd> for code completion or indent


## Sample Program: The Wayback Machine
This notebook demonstrates a couple of python scripts that can retrieve results from a web service and then show you the results. As this is also one of the first Python programs you might have seen, we'll provide a detail explaination of each step.


In [None]:
import sys
!{sys.executable} -m pip install  google

The previous cell will install a 3rd party API that allows us to perform google search, retrieving the result in a list that can then be processed by a program.  [Package Source on Github](https://github.com/MarioVilas/googlesearch)

The previous cell's depends upon whether or not you already have the "google" package installed.  At the bottom of the cell, you will either see
```
Successfully installed google-3.0.0
```
or 
```
Requirement already satisfied: google in ......./site-packages (3.0.0)
```

The first line let's the script(program) utilize the 'sys' module, which provides access to various parameters and settings used by the python interpreter.  The corresponding Python documentation is available at <https://docs.python.org/3/library/sys.html> - do not worry if the contents of that page don't make much sense - we will explain and usages of APIs as we go through this course.

The second line uses the current Python interpreter for the specific environment in which this Jupyter notebook executes.  With that, the line then uses the "pip" module to install the latest version of the google API.  ("pip" can also be executed as a command-line program.)  Any depencences required by the google API will also be installed - this is why you may see output related to beautifulsoup4 and soupsieve.

If you are already familar with with Python, you may be asking why not just execute `!pip install google`.  The short answer is that if the Python interpreter runs within a different `venv` than what the Jupyter server does, the dependency will be installed into the wrong location.  Jake VanderPlas describes this on [his blog in much greater detail](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/).  Jake VanderPlas has written an excellent book - _Python Data Science Handbook, 2nd Ed_ [O'Reilly](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/) [Amazon](https://www.amazon.com/Python-Data-Science-Handbook-Essential-ebook/dp/B01N2JT3ST) which you may want to examine.

In [None]:
import webbrowser
import json
from urllib.request import urlopen
    
print("Let's find an old website.")
site = input("Type a website URL: ")
era = input("Type a year, month, and day, like 20150613: ")
url = "http://archive.org/wayback/available?url=%s&timestamp=%s" % (site, era)
response = urlopen(url)
contents = response.read()
text = contents.decode("utf-8")
data = json.loads(text)
try:
    old_site = data["archived_snapshots"]["closest"]["url"]
    print("Found this copy: ", old_site)
    print("It should appear in your browser now.")
    webbrowser.open(old_site)
except:
    print("Sorry, no luck finding", site)
    


Source: _Introducing Python: Modern Computing in Simple Packages, 2nd Ed_, Bill Lubanovic [O'Reilly](https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/) [Amazon](https://www.amazon.com/Introducing-Python-Modern-Computing-Packages-dp-1492051365/dp/1492051365)

While you may be new to Programming, hopefully you could see a little bit how the previous code block worked.

* lines 1 and 2 let the program use the webbrowser and json libraries.  The webbrowser library lets us open a new browser window with a given URL (line 17).  [JSON](https://www.json.org/) has become one of the standards for exchanging data through internet-based APIs. We'll cover this in more detail as the course progresses. 
* line 3 allows us to use the the `urlopen()` method from the `urllib.request` module.
* line 5 prints a message to the console telling the user what the programm will do
* line 6 allows the user to type in the URL for a particular website.  For this example, I used `http://irishwildcat.com`, an old blog that's no longer available on the internet.
* line 7 gets a date in a particular format starting with the year, month, and then day.  This representation is based off an international standard - [ISO-8601](https://en.wikipedia.org/wiki/ISO_8601)
* line 8 creates a variable called `url` to point to a location on Wayback Machine at https://archive.org/
* line 9 opens a connection to that url on archive.org and places the result into the variable `response`
* line 10 then reads the text output from that connection, placing the output into the variable `contents`
* line 11 then translates that result into a particular string encoding and assigns the result to the variable `text` 
* line 12 converts the contents of the text variable (which contains a JSON object) to a Python dictionary.  A dictionary stores data in key-value pairs - we'll give this in much greater detail a few notebooks from now.
* In lines 13-19, we execute a block of code in a special region.  If any Python errors occur, they will be caught the user will be shown the message from line 19.
* line 14 grabs a specific URL from the results of the Wayback Machine.
* line 15 prints that URL
* line 16 tells the user that we will be opening that URL in a new browswer window.
* line 17 then opens a browswer window with that URL.

Note: if you want to see the values of some of these intermediate values, you can edit the above source code and insert a line such as the following just after that variables is assigned:
```
print(text)
```

## Sample Program: A Google Search
The following series of Python commands will perform a Google search.

In [None]:
from googlesearch import search

In [None]:
help(search)

The previous cell allows us to use the `search()` method from the `googlesearch` module that we installed earlier in this notebook.  

And, now, in the following cell, the program searches google for the search query "financial technology". The program then converts that result into a built-in data structure called a list that holds a sequences of "things/data" in a particular order.

As we call `search`, we pass 4 arguments:
* query terms
* num - number of  search results per page
* stop - last result to retrieve
* pause - a time in seconds to wait between making requests to Google.


In [None]:
search_results = list(search("financial technology", num=10, stop=30, pause=2))

This next cell displays the documentation that associated with the type of `search_results`.

In [None]:
help(search_results)

The next code blocks prints the number of entries in the list `search_result` followed by the entry in the first position.  For largely historical reasons tied to what was a performance optimization, most programming langauges start to count things at 0 and then go to length-1 for the last item. [Zero Based Numbering](https://en.wikipedia.org/wiki/Zero-based_numbering).  

Try running the following block with different values for `0`.  What happens if `[0]` is removed?

In [None]:
print(len(search_results))
print(search_results[0])

This next code block takes that particular URL, opens a network connection to that URL, and then reads the results into the variable `mystr`.

Then the code uses the BeautifulSoup library to parse the HTML document that's returned and then extract all of the text content, placing the results in `text`.

In [None]:
url = search_results[0]
import urllib.request
from bs4 import BeautifulSoup

fp = urllib.request.urlopen(url)
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

soup = BeautifulSoup(mystr)
[s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
text = soup.getText(separator="\n")
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text)

## Sample Program: Traveling Salesman and Solvability
The lecture slides presented computer science as being concerned with asking if a particular problem can be solved and if it is solvaable, how expensive is it to solve.

One of the most well-studied problems in computer science is the [travelling salesperson problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem).  Suppose a salesperson has to visit $n$  cities and return back to that starting point.  What is the shortest route that visits each city exactly once?

The brute-force method would be to try all possible permutations of the city ordering. Suppose we had 3 cities, the following possibilities exits:
- City 1, City 2, City 3
- City 1, City 3, City 2
- City 2, City 1, City 3
- City 2, City 3, City 1
- City 3, City 1, City 2
- City 3, City 2, City 1

But how long would that take to exhausitively search to find the optimal answer?

$n!$ possible permutations exist.

Assuming we can search 1 billion possibilities in a second, how much time is required to solve the problem for 20 cities?  What about 100?


In [None]:
import math

num_sec_in_day = 24 * 60 * 60
num_tries_per_second = 1_000_000_000
num_cities = 20
num_route_permutations = math.factorial(num_cities)
num_days = num_route_permutations // num_tries_per_second // num_sec_in_day
print(f"{num_days:,}")

Try running the above code block for different values for the number of cities.  As you can see if you use a small value, you'll end up with zero as the code uses integral division `//`.  For small values, take out `// num_sec_in_day` to just see the number of seconds or even reduce the assumption of 1 billion tries per second. How could you convert the result to the number of years?

This problem have direct applicability to real-life:
- Create routes on maps
- Delivery schedules for companies such as Amazon, FedEx, and UPS.

## Python Documentation
Python offers a substantial amount of documentation - both from within the interpreter with the `help()` function as well as online.

The homepage for the Python documentation is at https://docs.python.org/3/   
Visit that site and see what's available.