<a href="https://colab.research.google.com/github/kbreit/mastery19/blob/master/Mastery_Intermediate_Programming_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src="https://www.insight.com/content/dam/insight-web/logos/global-nav.svg" width='400'></center>
<br>
<font color='#544640'>
<center><i>Mastery 2019</i></center>
<center><i>Scottsdale, Arizona</i></center></font>
  
  <br>
<center><i><font color='#544640' size='1'>Authors: <br>
  
Kevin Breit</font></i></center>

<center><i><font color='#B81590' size='1'>kevin.breit@insight.com</font></i></center><br>
    
<center><i><font color='#544640' size='1'>Victor Aranda</font></i></center>

<center><i><font color='#B81590' size='1'>victor.aranda@insight.com</font></i></center>




---





# Objective of Course



* Apply Python concepts to real world application development



---



# Application Level Components



* User interface
* Data storage
* Connectivity
* Main logic
* Testing frameworks

## User Interface



An application doesn't necessarily need a graphical user interface (GUI). It can be run automatically at certain times or under certain conditions. For example this can run at 3 AM.

`cleanup_tmp_files.py --location /tmp`

Alternatively, it can be a command line interface similar to the Python interpreter.

```
todo.py
Enter your task: Clean the dishes
Task created
```

For graphical interfaces, Python can use `Tkniter`, `wxPython`, or many other graphical interface frameworks.

This presentation will be based on the first example, which may be run automatically.

## Data Storage


Data storage can be in a database, flat files, or even an Excel file.


### Excel Files




<font color='#544640'>We'll open an Excel (.xlsx) containing some stock ticker symbols with prices.

We're going to use `pandas`, a very powerful data science and data manipulation library that can handle large amounts of multidimensional "panel data" (hence the name) efficiently. It is generally used for data science and computing applications.

It's also convenient for accessing and handling tabular data. There are *many* libraries that can handle Excel files, `pandas` is only one. `pandas` comes with some caveats (and limitations) that we won't go into here, related to it's original intended use; i.e. it is definitely not just a 'excel reader' library!

Side note: `pandas` objects are called `dataframes`, much like R's native data structure and are very similar.

For our purposes, here is the relevant doc:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html</font>

In [0]:
# environment setup
import os
import pandas as pd

First we will import the data from the spreadsheet into Python and Pandas.

In [0]:
data = pd.read_excel('https://github.com/kbreit/mastery19/raw/master/Ticker%20Symbols.xlsx', sheet_name = None)

<font color='#544640'>That was easy. When we specify sheet_name, we can do so using  `pd.read_excel` returns a `dict` whose key values are the names of the sheets in the `xlsx` document.</font>


In [0]:
list(data.keys())

['Stocks']

Notice the list above shows the worksheet from the spreadsheet. Now we can list the stocks we want to pull the price for.

In [0]:
dataframe = data['Stocks']
dataframe['Ticker']

0    NSIT
1    MSFT
2    CSCO
3    AAPL
Name: Ticker, dtype: object

Very cool. We now have a list of ticker symbols we want to retrieve stock prices for.

## Screen Scraping



In this simple example we're going to do some very basic web scraping.

Please note the random pause/wait time introduced in the loops in this example. It's important not to get yourself into trouble by sending too many requests too frequently to the site you are accessing. Your computer will basically try to (mini-)DoS a target host if you aren't careful.

### Libraries Used:



* `BeautifulSoup4`: https://readthedocs.org/projects/beautiful-soup-4/
* `requests`: http://docs.python-requests.org/en/master/
* `re`: https://docs.python.org/3/howto/regex.html

In [0]:
# set up environment
import bs4 as bs
from bs4 import BeautifulSoup
from bs4.element import Comment
import re
from json import loads
from pprint import pprint as pp


# these two packages do almost the same thing
# used one for one example and one for another
import urllib.request
import requests

### Get All Links on a Page:

You can easily write a custom web-crawler/scraper by traversing links one by one through a domain. Use at your own risk - don't say I didn't warn you. :)

In [0]:
target_page = 'https://finance.yahoo.com/quote/AAPL/profile?p=AAPL'

page_data = requests.get(target_page)

# use this regular expression to strip out HTML tags, if needed
# re.sub('<[^<]+?>', '', page_data.text)

soup = bs.BeautifulSoup(page_data.text, 'html.parser')

links = soup.find_all('a', attrs={'href': re.compile('^http://')})

for link in links[:10]:
    print(link.get('href'))

http://www.apple.com
http://info.yahoo.com/privacy/us/yahoo/
http://info.yahoo.com/relevantads/
http://info.yahoo.com/legal/us/yahoo/utos/utos-173.html
http://twitter.com/YahooFinance
http://facebook.com/yahoofinance
http://yahoofinance.tumblr.com


### Find Specific Text on a Page:

In [0]:
# wonderful example from https://stackoverflow.com/questions/1936466/beautifulsoup-grab-visible-webpage-text

def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True

def get_ceo(ticker):
    soup = BeautifulSoup(requests.get('http://finance.yahoo.com/quote/{0}/profile?p={0}'.format(ticker)).content)
    script = soup.find('script',text=re.compile('root.App.main')).text
    data = loads(re.search('root.App.main\s+=\s+(\{.*\})', script).group(1))
#     pp(data["context"]["dispatcher"]["stores"][u'QuoteSummaryStore']['assetProfile']['companyOfficers'])
    executive_json = data["context"]["dispatcher"]["stores"][u'QuoteSummaryStore']['assetProfile']['companyOfficers']
    for exec in executive_json:
        if 'CEO' in exec['title']:
            return exec['name']

ceo = get_ceo('AAPL')
print(ceo)

Mr. Timothy D. Cook


## Connectivity



Connectivity can mean a lot of things. It can be connecting to a database for your data storage retrieval. Today's example will require us to retrieve the stock prices for the tickers in the Excel spreadsheet.

`requests` is a very common library for HTTP requests in Python. `requests` will communicate with Alpha Vantage's API (https://www.alphavantage.co/) for stock price lookups.

First, we need to import requests.

First, we need to construct our URL based on the API documentation (https://www.alphavantage.co/documentation/).

In [0]:
import requests

API_KEY=""
params = {'function': 'TIME_SERIES_DAILY',
          'symbol': 'MSFT',
          'apikey': API_KEY}
response = requests.get('https://www.alphavantage.co/query', params=params)
response.url

'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=MSFT&apikey=NWL4TGYBVPSU0QHW'

Notice the URL has the keys and values in the dictionary I specified. What did Alpha Advantage return to us? Lets see in JSON format.

In [0]:
response.json()

{'Meta Data': {'1. Information': 'Daily Prices (open, high, low, close) and Volumes',
  '2. Symbol': 'MSFT',
  '3. Last Refreshed': '2019-08-09 09:35:27',
  '4. Output Size': 'Compact',
  '5. Time Zone': 'US/Eastern'},
 'Time Series (Daily)': {'2019-03-20': {'1. open': '117.3900',
   '2. high': '118.7500',
   '3. low': '116.7100',
   '4. close': '117.5200',
   '5. volume': '28113300'},
  '2019-03-21': {'1. open': '117.1400',
   '2. high': '120.8200',
   '3. low': '117.0900',
   '4. close': '120.2200',
   '5. volume': '29854400'},
  '2019-03-22': {'1. open': '119.5000',
   '2. high': '119.5900',
   '3. low': '117.0400',
   '4. close': '117.0500',
   '5. volume': '33624500'},
  '2019-03-25': {'1. open': '116.5600',
   '2. high': '118.0100',
   '3. low': '116.3200',
   '4. close': '117.6600',
   '5. volume': '27067100'},
  '2019-03-26': {'1. open': '118.6200',
   '2. high': '118.7100',
   '3. low': '116.8500',
   '4. close': '117.9100',
   '5. volume': '26097700'},
  '2019-03-27': {'1. op

That's a lot. 100 days of data to be precise. What about today's opening price?

In [0]:
from datetime import datetime

current = datetime.today().strftime('%Y-%m-%d')
prices = response.json()
prices['Time Series (Daily)'][current]['1. open']

'136.6000'

We have the proper components ready to go for this example. Data storage via Excel. Connectivity over a RESTful API using requests. Lets tie it together.

## Main Logic



The main logic of the program is really what controls everything. It's the glue that brings your Lego pieces together. Don't be that guy. Don't use glue on your Lego pieces.

First, we should move the components into functions.

In [0]:
def get_stock_opening_price(symbol):
    params = {'function': 'TIME_SERIES_DAILY',
              'symbol': symbol,
              'apikey': API_KEY}
    response = requests.get('https://www.alphavantage.co/query', params=params)
    response_json = response.json()
    return response_json['Time Series (Daily)'][current]['1. open']

def read_from_excel(filename):
    return pd.read_excel(filename, sheet_name=None)

Next, we will call `read_from_excel()` to get a `list`
 of stock symbols to query.


In [0]:
data = read_from_excel('/Ticker Symbols.xlsx')
dataframe = data['Stocks']
for index, row in dataframe.iterrows():
    dataframe.loc[index, "Price"] = get_stock_opening_price(row['Ticker'])
    dataframe.loc[index, "CEO"] = get_ceo(row['Ticker'])
print(dataframe)


FileNotFoundError: ignored

Finally, we should write the information back to the Excel file.

In [0]:
with pd.ExcelWriter('/Ticker Symbols.xlsx') as writer:
    dataframe.to_excel(writer, sheet_name='Stocks', index=False)

## Testing Frameworks



Any application that is more than just a simple script should have some automatible testing associated to it.

There are three main types of tests - unit, integration, and validation.

> In computer programming, **unit testing** is a software testing method by which individual units of source code, sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures, are tested to determine whether they are fit for use. - https://en.wikipedia.org/wiki/Unit_testing

> **Integration testing** (sometimes called integration and testing, abbreviated I&T) is the phase in software testing in which individual software modules are combined and tested as a group. Integration testing is conducted to evaluate the compliance of a system or component with specified functional requirements. It occurs after unit testing and before validation testing. Integration testing takes as its input modules that have been unit tested, groups them in larger aggregates, applies tests defined in an integration test plan to those aggregates, and delivers as its output the integrated system ready for system testing. - https://en.wikipedia.org/wiki/Integration_testing

> In software project management, software testing, and software engineering, **verification and validation** (V&V) is the process of checking that a software system meets specifications and that it fulfills its intended purpose. It may also be referred to as software quality control. It is normally the responsibility of software testers as part of the software development lifecycle. In simple terms, software verification is: "Assuming we should build X, does our software achieve its goals without any bugs or gaps?" On the other hand, software validation is: "Was X what we should have built? Does X meet the high level requirements?"- https://en.wikipedia.org/wiki/Software_verification_and_validation

Today we will mostly discuss unit testing and touch on integration tests.

### Unit Testing



Unit tests perform a test against a single piece of code. Each test case should be tested independently from other test cases.

In [0]:
def add(a, b):
    return a + b

def subtract(a, b):
    return a - b

def multiply(a, b):
    return a * b

def divide(a, b):
    return a / b

The following class is using the built-in Python `unittest` module to perform unit tests against the functions.

In [0]:
import unittest

class MasteryNotebook(unittest.TestCase):
    def test_add(self):
        self.assertEqual(add(1,2), 3)
        self.assertEqual(add(0,1), 1)
        self.assertEqual(add(-1,-1), -2)
    def test_subtract(self):
        self.assertEqual(subtract(2, 1), 1)
        self.assertEqual(subtract(0, 1), -1)
    def test_multiply(self):
        self.assertEqual(multiply(2, 1), 2)
        self.assertEqual(multiply(-1, 1), -1)        
    def test_divide(self):
        with self.assertRaises(ZeroDivisionError):
            divide(1, 0)

unittest.main(argv=[''], verbosity=1, exit=False)

....
----------------------------------------------------------------------
Ran 4 tests in 0.006s

OK


<unittest.main.TestProgram at 0x7f69daa4b160>

#### Mocks



Remember, a unit test is meant to test an isolated piece of code. What if your unit test requires another source, such as a local database or network connection?

> In object-oriented programming, **mock objects** are simulated objects that mimic the behavior of real objects in controlled ways, most often as part of a software testing initiative. - https://en.wikipedia.org/wiki/Mock_object

Mocks go beyond the content of this course. But they allow you to simulate an external response in a controlled manner. In other words, the mock pretends to be what you want it to be (ex. database call).

*Note:* [Some people](http://arlobelshee.com/tag/no-mocks/) don't like mocks and think it means there is room for improvement with code structure. I'm not opinionated here. Do what accomplishes your task.

### Integration Tests



Integration testing is frequently accomplished using `tox`, which is a Python testing tool. You can also use a full Continuous Integration (CI) system such as TravisCI or Jenkins to run your tests. This is outside the scope of this presentation.

### Coverage



Code coverage means how many lines of code are actually tested. But does that mean that as long as a line of code is tested it is properly tested? No. In the unit test example above, the `test_divide()` test suite only tests for the exception. I'd argue the test suite doesn't fully cover all the cases it should test for. Design your unit tests as well as you can and build-out over time. They won't be perfect day one.





---




# Resources





[wxPython](https://www.wxpython.org/)

[Pandas](https://pandas.pydata.org/)

[Python Context Managers and the "with" Statement](https://realpython.com/courses/python-context-managers-and-with-statement/)

[Requests Library](https://2.python-requests.org/en/master/)

[Getting Started With Testing in Python - Real Python](https://realpython.com/python-testing/)

[Demystifying the Patch Function - Video](https://www.youtube.com/watch?v=ww1UsGZV8fQ)

[Reading and Writing Files in Python](https://realpython.com/read-write-files-python/)

[Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)