# Python Fundamentals

Prepared by:Gregory J. Bott, Ph.D.
(Sources: Dr. Nick Freeman, Python for Everyone, Charles Severance; A Whirlwind Tour of Python, Jake Vanderplas)

## Why should a business student learn Python?

Information is the lifeblood of <del>nearly</del> every organization. The purpose of this notebook is the help business students master the fundamental concepts and skills required to effectively use Python. Python skills are in high demand. One reason for this demand is Python's ability to efficiently acquire, manipulate, analyze and visual data. However, prior to performing data analytic tasks, business students must learn the fundamentals.

## Data Analysis is Part of *Every* Job
It's not just data scientists or data analytics that need analysis skills. Nearly every job intersects with data. It's highly likely that even if your job doesn't have "analyst" or "scientst" in the title, you'll still benefit from understanding how to acquire, handle, manipulate, and report data.

> ### Deloitte: "...skills that were highly appreciated in Deloitte and projects were Java, Python/R..."


## Python skills are in high demand

2018 Developer Survey by StackOverflow

![](\images\2018MostWantedLanguages.jpg)

TODO: Add Github stats showing Python as /#3 overall.

[] task

## Programming Teaches Problem solving

The ability to think critically and solve problems is a general life skill. Proble solving applies to the all facets of life. In this course you'll learn Python syntax and structures, but more importantly you'll learn to abstract a problem and code a solution. 

## Python is the new Excel
Business rightly assume that you have solid Excel skills. However, the new expectation is that you already possess the skills necessary to handle data acquisition, analysis, and visualization. And alothough this can arguably still be done in Excel, Python's tools and libraries are exponentially more efficient. 

Python is the new Excel. (see https://www.fincad.com/blog/python-new-excel)

## About the Python language
(Sources: Wikipedia, Dr. Nickolas K. Freeman)

>Python is an interpreted high-level programming language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, and a syntax that allows programmers to express concepts in fewer lines of code, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.

>Python is a multi-paradigm programming language. Object-oriented programming and structured programming are fully supported, and many of its features support functional programming and aspect-oriented programming (including by metaprogramming and metaobjects (magic methods)). Many other paradigms are supported via extensions, including design by contract and logic programming.

>The language's core philosophy is summarized in the document The Zen of Python (PEP 20), which includes aphorisms such as:

> - Beautiful is better than ugly
> - Explicit is better than implicit
> - Simple is better than complex
> - Complex is better than complicated
> - Readability counts

> Rather than having all of its functionality built into its core, Python was designed to be highly extensible. This compact modularity has made it particularly popular as a means of adding programmable interfaces to existing applications. Van Rossum's vision of a small core language with a large standard library and easily extensible interpreter stemmed from his frustrations with ABC, another programming language that espoused the opposite approach.

> While offering choice in coding methodology, the Python philosophy rejects exuberant syntax (such as that of Perl) in favor of a simpler, less-cluttered grammar. As Alex Martelli put it: "To describe something as 'clever' is not considered a compliment in the Python culture." Python's philosophy rejects the Perl "there is more than one way to do it" approach to language design in favor of "there should be one—and preferably only one—obvious way to do it".

>Python's developers strive to avoid premature optimization, and reject patches to non-critical parts of CPython that would offer marginal increases in speed at the cost of clarity. When speed is important, a Python programmer can move time-critical functions to extension modules written in languages such as C, or use PyPy, a just-in-time compiler. Cython is also available, which translates a Python script into C and makes direct C-level API calls into the Python interpreter.

>An important goal of Python's developers is keeping it fun to use. This is reflected in the language's name—a tribute to the British comedy group Monty Python—and in occasionally playful approaches to tutorials and reference materials, such as examples that refer to spam and eggs (from a famous Monty Python sketch) instead of the standard foo and bar.

>A common neologism in the Python community is *pythonic*, which can have a wide range of meanings related to program style. To say that code is pythonic is to say that it uses Python idioms well, that it is natural or shows fluency in the language, that it conforms with Python's minimalist philosophy and emphasis on readability. In contrast, code that is difficult to understand or reads like a rough transcription from another programming language is called unpythonic.

>Users and admirers of Python, especially those considered knowledgeable or experienced, are often referred to as Pythonists, Pythonistas, and Pythoneers
# What is a Program?

> ## A program is a sequence of instructions that specified how to perform a computation. 

## Building Blocks of Nearly Every Language
* **input** - get data--from user via keyboard, from sensors, from other programs, from databases, etc.
* **output** - display results in the console on the screen, on paper, to another program, a web page, etc.
* **math** - perform mathematical operations (addition, multiplication, etc.)
* **conditional execution** - check for certain values or states and run the appropriate code
* **repetition** - repeatedly perform some action a certain number of times

### Python is interpreted, not compiled.
Programming languages generally fall into one of two categories: Compiled or Interpreted. With a compiled language, code you enter is reduced to a set of machine-specific instructions before being saved as an executable file. With interpreted languages, the code is saved in the same format that you entered. Compiled programs generally run faster than interpreted ones because interpreted programs must be reduced to machine instructions at runtime. However, with an interpreted language you can do things that cannot be done in a compiled language. For example, interpreted programs can modify themselves by adding or changing functions at runtime. It is also usually easier to develop applications in an interpreted environment because you don't have to recompile your application each time you want to test a small section. (source: http://www.vanguardsw.com/dphelp4/dph00296.htm)
<br>

![image](images/python_interpreted.png)


# Setting Up Your Environment and Getting Started
<a id="Setting_up_your_environment"> </a>



## Installing Anaconda
Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing that aims to simplify package management and deployment. Package versions are managed by the package management system conda. (source: Wikipedia) The Spyder IDE is one of packages included in Anaconda.

**Miniconda** provides the Python interpreter and the conda package manager. If you don't want the preinstalled packages that come with Anaconda, you can install Miniconda and install the packages you want yourself.


## Starting Jupyter Lab

If you wish to control the starting folder (home folder) of Jupyter Lab, then following these instructions.

1. Open Anaconda Prompt
2. Navigate to starting folder (e.g., cd G:).
3. Type ```jupyter lab``` and press ENTER
    * The Home Folder is the starting folder of the Anaconda prompt.

## Loading the TOC plugin for Jupyter Lab

### Install dependencies
1. Right-click the Anaconda prompt icon and then click Run as Administrator.
2. In the console window, type the following commands:
  * conda install nodejs
  * pip install npm
  * jupyter labextension install @jupyterlab/toc
1. Then to start Jupyter Lab, type:
  * jupyter lab --watch
  
  
  (Source: https://github.com/jupyterlab/jupyterlab-toc)

## Choosing a Python IDE

### Jupyter Notebook and Jupyter Lab
Jupyer Notebook and Lab are the best IDEs for learning and documenting Python code. They are also a solid method to provide Python code and results to non-Python programmers. We will make extensive use of Jupyter Lab in this course.

### Spyder
Spyder is a simple IDE that is useful for learning Python. It is included with the install of Anaconda and supported in this course. You may find Spyder the best choice for completing assignments. If an automatic grading environment is being used, you may find it faster to write, debug, and test code in Spyder and then paste it into the automatic grading environment.

### VS Code
Visual Studio Code is also available with Anaconda, though it is not installed by default. VS Code is perhaps the best overall choice for an IDE, however, Spyder offers a more straightforward initial user experience for the beginner. 

### Sublime Text
A highly functional and extensible text editor often used by Pythonistas.

### PyCharm
If you would like an IDE with a deep feature set, Pycharm is a good choice. They offer special licensing for academia.

You are welcome to use any of these IDEs for this class. You are also welcome to use a different IDE, but we cannot support all IDEs, so if you choose an IDE besides these two, we'll give you best effort support, but you may be on your own making a different IDE work.

## Why should a business student learn Python?

### Data Analysis is Part of *Every* Job
It's not just data scientists or data analytics that need analysis skills. Nearly every job intersects with data. It's highly likely that even if your job doesn't have "analyst" or "scientst" in the title, you'll still benefit from understanding how to acquire, handle, manipulate, and report data.

> ### Deloitte: "...skills that were highly appreciated in Deloitte and projects were Java, Python/R..."



## Python Coding basics


### Indentation
White space in many languages has little or no meaning. However, using Python, improper indentation will generate a syntax error:

In [None]:
# Error: statements following an if statement must be indented.
if 4 > 1:
    print("4 is greater than 1")

### Comments

Make a habit of clearly commenting your code...even if the purpose of the code seems obvious. Even for code you have written yourself, it is often difficult to remember why you chose to implement something in a specific way. 

Start a comment using the hash (#) symbol 

# Variables, Expressions, and Types

Python is a *dynamically typed* language. A programming language is said to be dynamically typed, or just 'dynamic', when the majority of its type checking is performed at run-time as opposed to at compile-time.

Python is also *strongly typed* as the interpreter keeps track of all variables types. Again, it's also very dynamic as it rarely uses what it knows to limit variable usage. In Python, it's the program's responsibility to use built-in functions like isinstance() and issubclass() to test variable types and correct usage. Python tries to stay out of your way while giving you all you need to implement strong type checking.

The difference between a strongly typed language and a weakly typed language is basically that a strongly typed language checks the type of a variable before performing an operation on it, whereas a weakly typed language does not.
[Source: Wiki.Python.org](https://wiki.python.org/moin/Why%20is%20Python%20a%20dynamic%20language%20and%20also%20a%20strongly%20typed%20language)

In [None]:
#Store a float in the variable a
a = 3.141
print(a, type(a))

#Store a list in a
a = ["apple", 3.141, "banana"]
print(a, type(a))

In [None]:
#Store a string in a
a = "apple"
print(a, type(a))

## Variables in Python
Variables store values. Variable names can be as long as you want, can contain letters and numbers, but must not begin with a number or be a Python keyword (e.g., true, for, from, lambda).

> **Python is case-sensitive.** unit_cost is not the same as Unit_cost. 

By convention variable names are lower case and use the underscore character to separate words. 

earnings_after_tax <br>
default_gateway

In [None]:
# Variables are case-sensitive --> Error
Fruit = "apple"
print(fruit)

In [None]:
# illegal -- must not start with a number
76trombones = 0

In [None]:
although_difficult_to_use_this_is_a_valid_variable = 1

## Using Variables to Store User Input

Use variables to store input from users. To get input from the user Python provides a built-in function **input** that captures input from keyboard as a string.

In [None]:
# Get card total from user and store in CardTotal 
card_total = input("Card total?")
print("CardTotal type is: {}".format(type(card_total)))

In [None]:
list = ["hi",6,7]
print(type(list))

### input() function
Used to get data from the user. Always returns a string, so you must cast if you are working with numbers.

In [None]:
type(card_total)

In [None]:
# Error -- CardTotal is str
if card_total > 21:
    print("Busted")
else:
    print("Hit me")

### Comparison operators

You are probably already familiar with the comparison operators.
* x and y are variable names
```Python
x > y
x >= y
x < y
x <= y
x == y # Use double-equal to test for equality (single equal sign is for assignment), True if x is same as y
x != y # test for inequality, True if x is different than y
```

In [None]:
# Must cast to appropriate value type (int)
if int(card_total) > 21:
    print("Busted")
else:
    print("Hit me")

In [None]:
print("Hello", "how are you?", card_total, sep=" ---") 

## Basic Data Types in Python

### Integers
---
A number with no fractional part. 

![image](\images\int-number-line.svg)

#### Includes: 
* the counting numbers {1, 2, 3, ...}, 
* zero {0}, 
* and the negative of the counting numbers {-1, -2, -3, ...}

We can write them all down like this: {..., -3, -2, -1, 0, 1, 2, 3, ...}

Examples of integers: -16, -3, 0, 1, 198

Integer size is limited only by your machine.

In [None]:
bigInt = 1234568901234568901234568901234568901234564568901234568901234568901234568901234568901234568901234568900

print(bigInt + 1)

print(type(bigInt))

### Float type
* Platform dependent
* Typically equivalent to IEEE754 64-bit C double
* Smallest float is effectively 2.225 x 10^-308

In [None]:
type(1.0)

In [None]:
b = 2

print("b = {} and is type {}".format(b, type(b)))

In [None]:
b = 2 * 1.1

print("b = {} and is type {}".format(b, type(b)))

### Boolean type
---

In [None]:
is_fte = 1

# Boolean values indicate True or False and must be title-case
is_fte == True


# Reminder: do comparisons with double = sign (x == 5)
#    single = is assignment, let x = 5.
is_fte == False


In [None]:
# Error = the boolean value must be capitalized (True, not true or TRUE)
if is_fte == true:
    print("Full-time")

Non-zero values evaluate to ```True``` while 0 (zero) evaluates to ```False```.

In [None]:
hours_worked = 0
#hours_worked = 37
#hours_worked = 'yes'

if hours_worked:
    print("Need to calculate payroll.")

if not hours_worked:
    print("No hours worked or invalid value.")

#### Logic Operators on boolean values
a and b are boolean values

```Python
not a # True if a is False, False if a is True
a and b # Both a and b must be True for statement to be True
a or b # Either a or b must be True for statement to be True
```

### String Type
A string is a sequence of characters. 

Values enclosed within quotation marks (single, double, or triple-double) are strings.

In [None]:
s = 'Monty Python'

In [None]:
# use len() function to get the length of a string
print(len(s))

# Print part of a string, a slice
print(s[0:5])

print(s[6])

In [None]:
# Strings are immutable (Can't change 'Python' to 'Jython')
s[6] = "J"

### None type

The null keyword is available in languages such as C++ and Java. Null means empty. It is not equivalent to a zero-length string nor is it equivalent to zero (0). In Python, the None type  is the keyword equivalent to Null. None (the type) is not equivalent to the string, "None". In my humble opinion, None is more logical than null. None means the object is nothing, non-existent. 

#### Why use None?

When instantiating (creating) an object, you may need to check to see if the instantiation was successful or not. If the creation of the new object failed, the object will return a None type.

---

In [None]:
# None is not the "None" string. It is a class type.
print(None == "None")
print(type(None))

In [None]:
#Source: https://www.pythoncentral.io/python-null-equivalent-none/
database_connection = None
 
# Try to connect (none of the variables for the connect have values...)
try:
    database = MyDatabase(db_host, db_user, db_password, db_database)
    database_connection = database.connect()
except:
    pass
 
if database_connection is None:
    print('The database could not connect')
else:
    print('The database could connect')

### Complex numbers
---
A Complex Number is a combination of a Real Number and an Imaginary Number. [1]

![image](\images\complex-example.svg)

   

In [None]:
print("3i is of type: " + str(type(3j)))
print(7 + 3j)

   
The "unit" imaginary number (like 1 for Real Numbers) is i, which is the square root of −1.   

![image](\images\imaginary-square-root.svg)

> **Except in Python, "j" is used instead of "i".**

In [None]:
1j * 1j == -1

## Converting values between types

Often you may need to convert from values from one type to another. For example, you may need to convert the values received from the input() function from string to an int or a float.

In [None]:
user_number = input("Enter a number and I'll tell you if it is even or odd:\n")
print(type(int(user_number)))


In [None]:
#Source: https://gist.github.com/jfpuget/60e07a82dece69b011bb -- Jean-François Puget¶

import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors
%matplotlib inline 


def mandelbrot_image(xmin,xmax,ymin,ymax,width=12,height=12,maxiter=80,cmap='hot'):
    dpi = 72
    img_width = dpi * width
    img_height = dpi * height
    x,y,z = mandelbrot_set(xmin,xmax,ymin,ymax,img_width,img_height,maxiter)
    
    fig, ax = plt.subplots(figsize=(width, height),dpi=72)
    ticks = np.arange(0,img_width,3*dpi)
    x_ticks = xmin + (xmax-xmin)*ticks/img_width
    plt.xticks(ticks, x_ticks)
    y_ticks = ymin + (ymax-ymin)*ticks/img_width
    plt.yticks(ticks, y_ticks)
    
    norm = colors.PowerNorm(0.3)
    ax.imshow(z.T,cmap=cmap,origin='lower',norm=norm)
    
def mandelbrot(c,maxiter):
    z = c
    for n in range(maxiter):
        if abs(z) > 2:
            return n
        z = z*z + c
    return 0

def mandelbrot_set(xmin,xmax,ymin,ymax,width,height,maxiter):
    r1 = np.linspace(xmin, xmax, width)
    r2 = np.linspace(ymin, ymax, height)
    n3 = np.empty((width,height))
    for i in range(width):
        for j in range(height):
            n3[i,j] = mandelbrot(r1[i] + 1j*r2[j],maxiter)
    return (r1,r2,n3)

mandelbrot_image(-2.0,0.5,-1.25,1.25,maxiter=80,cmap='gnuplot2')

## Order of Operations
<a id="Setting_up_your_environment"> </a>

The order of evaluation of expressions with more than one operator follows *rules of precedence* -- PEMDAS

* **Parentheses**
* **Exponentiation**
* **Multiplication and Division**
* **Addition and Subraction**
* **Left to Right** - operators with the same precedence are evaluated left to right

In [None]:
# Exponentiation, then Multiplication
3*1**3

## Operators

### Modulus operator
<a id="modulus_operator"> </a>

In computing, the modulo operation finds the remainder after division of one number by another (sometimes called modulus). Given two positive numbers, a (the dividend) and n (the divisor), a modulo n (abbreviated as a mod n) is the remainder of the Euclidean division of a by n.


In [None]:
# Divide 7 by 3
7/3

In [None]:
# Return the quotient
7//3

In [None]:
# Return the remainder
7 % 3

### Assignment vs. Comparison
---
### In Python, assignment of a value to a variable is accomplished using a single equal sign. Comparison is performed using a double equal sign.


In [None]:
x = 7
print(x)
x = 1000
print(x)

In [None]:
# Must use a double equal sign to compare values
if x == 7:
    print("Lucky Seven")
else:
    print("You lose!")

In [None]:
# Error when adding string and integer
userinput = "5"

sum = 7 + int(userinput)
print(sum)

In [None]:
sum = 7 + int(userinput)
print("sum = {}".format(sum))

[1]: Source: https://www.mathsisfun.com

## Variable Scope - LEGB
When using variables, you must understand its scope. By scope we are referring the visibility of variables--in which parts of your program can access or change the variable. For example, if a variable is global, every part of your program can access it. However, variables used within a function should generally not be available outside the function.

Python looks for variables in this order:
* Local
* Enclosing
* Global
* Built-in

In [None]:
# Local scope

y = "outside of function"

def my_function():
    y = "inside function scope" # y is local to function--not available after function ends
    print(y)

my_function()

print(y) # Prints enclosed y variable

# Conditional Execution

In Python, use the if statement to perform decision-making by allowing conditional execution of a statement or group of statements based on the value of an expression.

Remember that Python requires indentation to designate the start and end of code blocks. 

<condition> has a value of True or False

```Python
if <condition>:
    <expression>
    <expression>
``` 

In [None]:
CardTotal = 22
if CardTotal > 21:
    print("busted!")

In [None]:
deal_again = input("deal again?")
if deal_again == 'yes':
    print("dealing again...")
    # code here that deals again...

## Conditionals with multiple expressions
The ```if``` statement may be follwed by an ```else``` clause. The ```else``` clause will only be excecuted when the ```if``` statement evaluates to ```False```.

You can use an ```if...else``` statement to execute a set of statements based on whether the value of a variable is even or odd.

![image](\images\if-then-elselogic.jpg)

The basic if statement form:

```Python
if expr:
    ''statement''
else:
    ''statement''
```

In [None]:
x = 9

if x%2 == 0:
    print('x is even')
else:
    print('x is odd')

## Logical Operators

In [None]:
shave = True
haircut = False

if shave and haircut:
    print("You know the secret knock!")
else:
    print("You're not one of us.")

In [None]:
shave = False
haircut = True

if shave or haircut:
    print("You know the secret knock!")
else:
    print("You're not one of us.")

In [None]:
#BOTH statements must be true to satisfy the statement and print True
if 1 < 10 and -2 > -7:
    print(True)
else:
    print(False)

In [None]:
#Only ONE expressopm must be true to satisfy the statement and print True
if 100 < 10 or -2 > -7:
    print(True)
else:
    print(False)

## Chained Conditionals

If more then two possibilities exist, one way to programmatically express this is using elif.

In [None]:
x = 5
y = 5

if x > y:
    print("x is greater than y")
elif y > x:
    print("y is greater than x")
else:
    print("x and y are equal")

## Nested Conditionals

You can also nest one conditional inside another conditional. Consider the previous example:

In [None]:
x = 2
y = 3

if x == y:
    print("x and y are equal")
else:
    if x < y:
        print("x is less than y")
    else:
        print("x is greater than y")

## Grouping comparison operators
Comparison operators can be grouped.

In [None]:
x = 111
if 0 < x < 10:
    print("x is a positive single-digit number")
elif x < 0:
    print("x is a negative number")
elif x >= 10:
    print("x is a positive two-digit number")

In [None]:
x = 1
y = 2
z = 6

# The entire expression must be true to print values
if x < y < z: print(x); print(y); print(z)

## Operators and Operands
<a id="operators_and_operands"> </a>
Operators are special symbols that represent computations like addition and multiplication. The values the operator is applied to are called operands.
The operators +, -, *, /, and ** perform addition, subtraction, multiplication, division, and exponentiation, as in the following examples:


In [None]:
#Addition and subtraction
20+33-10

In [None]:
# Five squared
5**2

In [None]:
# Multiplication
(3+2)*(9+2)

In [None]:
# Division
100/25

# Iteration (looping)
Computers are very good and doing repetitive tasks. You will use iteration for many operations in Python. For example, you may loop through records in a database or examine lines in text file. 

Two methods for iterating are the While statement and the For loop. 

## While statement

Computers are well-suited to repeat operations very quickly. Use the While statement to repeat a series of steps until a condition is met. Use when the number of iterations is not known.

```Python
while <condition>:
    <expression>
    <expression>
    ...
```

In [None]:
n = 0
while n < 7:
    print("day " + str(n))
    n = n + 1

## For Loop
Use the For Loop when you have a specific number of iterations or you are iterating through a collection of items (e.g., lines in a text file).

```Python
for <variable> in range(<value>):
    <expression>
    <expression>
    ...
```

* On first loop, <variable> starts with smallest value
* On each loop, <variable> is incremented (or decremented if negative step)
* At <value> - 1, loop is complete

### Range Function

```Python
range(start, stop, step)
```
Default values: start = 0, step = 1

Range increments until value is stop - 1.

In [None]:
# Range function (see "Functions" section for more information about functions)
# range(start_value, end_value, step_value)

for x in range(11):
    print(x, end=" ")

In [None]:
# To count down, set the step value to negative.
for x in range(10,0,-1):
    print(x, end=" ")

A counter variable can be used in either the For Loop or the While loop, but the counter variable must be initialized prior to its use.

In [None]:
# Initialze a counter variable, z.
z = 0

#Loop ten times (1 through 10)
for y in range(1,11):

    #Increment z by 1 during each loop. 
    z += 1 # This is shorthand for z = z + 1
    
    print(z, end=" ")

## Looping through items

In addition to using the ```range()``` function or the ```while``` statement, you can also loop through a collection of objects such as the files in the working directory. Note that we do not need to obtain the number of items (files, in this case) present in the collection (folder). We simply ask Python to iterate through them.

In [None]:
os.path.expanduser('~\Documents')

In [None]:
# Add the os module to our environment (covered later)
import os

# Get a list of all the files in the current folder and store them in the
#   variable files
files = os.listdir()

for f in files:
    print(f)

In [None]:
# Traverse the directory hierarchy in the user's Documents folder and 
#   Display the full path of each file

# Start in this folder (use ''~\Documents' for user's document directory)
root_folder = os.path.expanduser('c:\\BDF')

# Return the path, folders, and files in the hierarchy
for root, dirs, files in os.walk(root_folder):
    for name in files:
        print(os.path.join(root, name))

## Break statement
Use the ```break``` statement to exit a loop. Break exits the current loop, so if you have a nested loop and wish to exit both, you'll need to provide two break statements, one for each loop.

In [None]:
total_loops = 0
for x in range(1,101):    
    for y in range(1,6):
        total_loops = total_loops + 1
        if y == 5:
            break
print(x,y, "total loops = ", total_loops)

In [None]:
database = MyDatabase(db_host, db_user, db_password, db_database)
database_connection = database.connect()

In [None]:
#Source: https://www.pythoncentral.io/python-null-equivalent-none/
database_connection = None
 
# Try to connect (none of the variables for the connect have values...)
try:
    database = MyDatabase(db_host, db_user, db_password, db_database)
    database_connection = database.connect()
except:
    print("no db connection")
    pass
 
if database_connection is None:
    print('The database could not connect')
else:
    print('The database could connect')

# Functions
A *function* is a discrete set of instructions typically designed to receive one or more values and return a value. A function call receives values called *arguments* or *parameters* and it typically *returns* a value or object.


## Built-in functions
For example, the print() function takes an argument and sends output to the console.

### Print() and Input()
The ```print()``` function displays messages and is sometimes used as simple debugging method. 

The print function underwent major changes from Python 2 to Python 3 and is written differently depending on the version. Python 2 omits the parentheses (e.g., ```print a```) while Python 3 requires them (e.g., ```print(a)``` ).

Calling the print function without an argument results in a blank line.

In [None]:
print()

You will often print string literals such as:

In [None]:
print("Data loading complete.")

Or using a variable:

In [None]:
a = 'apple'
print(a)

In [None]:
print("The print() function did this.")

The ```print()``` function also takes parameters such as the ```sep``` and ```end``` parameters, which overrides the default separator with the specified separator and the end of line character.

In [None]:
print("Item 1","Item 2", sep="|")
print("This is on", end=" ")
print("the same line.")

The type() function takes a value or object and returns its type.

In [None]:
# Print and Type functions 
print(type(3.141))

In [None]:
# Print the highest value using the max() function. This is a function within a function.
print(max(1,10,3,4,5))

In [None]:
# Print the longest word. Note that the len function is applied to each item 
#   in the list and the high number is returned.
words = ['apple', 'court', 'banana','z'] # words is a list. See Data Structures.
print(max(words))
print(max(words, key=len))

In [None]:
#Display the number of characters (including the white space) in "Hello, world!"
len('Hello, world!')

## Type Converstion Functions

Python includes functions to convert values from one data type to another. 

For example, when requesting a number value from a user you may need to convert the resulting string input to an number type such as int.

In [None]:
# Input values are strings. Convert strings to appropriate number type, if necessary.
# Enter decimal value....error.

tirepressure = int(input("Input current tire pressure:"))

if tirepressure < 32:
    print("Add air to tire.")
else:
    print(f"At {tirepressure} psi the tire does not require additional air pressure.")

## Misc Functions
Below are common functions and explanations of how they work and when you might use them.

## Accessing functions in modules

One of the strengths of the Python language is the large number of modules available to it. To add functionality to your program, you make modules available using the import keyword. Below we import the math module and the random module.


In [None]:
# Get colume of a sphere using radius (r)

import math

def get_sphere_volume(r):
    """Returns volume of a sphere given the radius (r)."""
    
    #Use the pi constant from the math module 
    volume = (4/3) * math.pi * r**3
    return volume

#Call the function to find the volume of a sphere with a radius of 2.
get_sphere_volume(2)

### Generating random numbers
To generate random numbers, use the random module. Note that this module is not designed for cryptographic use. 

In [None]:
# NameError if random module is not imported
#import random

# Print 10 numbers between 1 and 100 (inclusive)
for x in range(10):
  print(x,random.randint(1,101))

## Creating your own Functions
Use the def keyword to define custom functions. Empty parentheses following the function name indicate the function takes no arguments.

In [None]:
def print_lyrics():
    """ Prints lumberjack lyrics. How cool! """
    print("I'm a lumberjack and I'm okay.")
    
def repeat_lyrics():
    print_lyrics()
    print_lyrics()
    
repeat_lyrics()

## Docstrings
Docstrings (documentation strings) provide a helpful and convenient method of
displaying documentation with Python modules, functions, classes, and methods. 

An object's docsting is defined by including a string constant as the first
statement in the object's definition and can be viewed by calling help(function).

In [None]:
help(print_lyrics)

## Passing values
Functions defined with arguments accept values. 

In [None]:
def print_stuff(mystuff):
    """ Prints the string passed to it. """
    print(mystuff)
    
newstuff = "really cool stuff"
print_stuff(newstuff)

### Optional arguments
The parameter in the previous function is required. Attempting to run the function without the parameter results in an error.

In addition to required positional arguments, Python allows optional arguments. All required positional arguments must precede optional arguments.

In [1]:
# Optional arguments
def print_stuff(my_stuff="no stuff"):
    print(my_stuff)
    
print_stuff()
print_stuff("goodwill stuff")

no stuff
goodwill stuff


In [None]:
20%10

In [None]:
# Function using keyword and default arguments
def calc_tip(amount, percentage = .15):
    """ Calculate a tip based on an amount. 15% is default. """
    tip = amount * percentage
    return tip

print(calc_tip(10))

print(calc_tip(10, percentage = .25))

### \*args and \*\*kwargs
The \* symbol (by convention \*args) enables a variable number of positional arguments to be passed to a function. 

In [None]:
# Passing a variable number of parameters
def print_grades(*args):
    number_of_grades = len(args)
    sum_of_grades = sum(args)
    avg_grade = sum_of_grades/number_of_grades
    print(args)
    print(f"Average grade = {avg_grade}")
    
print_grades(88,99,56,100,92)
print_grades(88,82,99,97,89,100,84,96,92,92,90)

Using ```*args``` is by convention, not by requirement. You can use whatever name makes the most sense to you.

In [None]:
# Passing a variable number of parameters
def print_grades(*grades):
    number_of_grades = len(grades)
    sum_of_grades = sum(grades)
    avg_grade = sum_of_grades/number_of_grades
    print(grades)
    print(f"Average grade = {avg_grade}")
    
print_grades(88,99,56,100,92)
print_grades(88,82,99,97,89,100,84,96,92,92,90)

In [None]:
# Passing a variable number of parameters and iterating through them
def print_grades(*grades):
    for grade in grades:
        if grade == 100:
            print("Wow, a perfect score!!")
    
print_grades(88,99,56,100,92)
print_grades(88,82,99,97,89,100,84,96,92,92,90)

In [None]:
# Passing a variable number of parameters and iterating through them
def print_grades(*grades):
    for grade in grades:
        print(grade)

# Pass grades via a function    
print_grades(88,99,56,100,92)

# Get input from the user
grades = input("Enter grades separated by a comma:")

# Split the input by comma delimiter (results in a list)
grades = grades.split(',')

# Pass using the unpack operator
print_grades(*grades)

### Using **kwargs
The \*\* symbol (by convention \*\*kwargs) enables a variable number of keyword arguments to be passed to a function.

In [None]:
# Passing a variable number of parameters
def show_grades(**kwargs):
    
    print(f"kwargs = {kwargs}")
    
    print("Student - Grade")
    for key, value in kwargs.items():
        print(f"{key} - {value}")
    print() #blank line

show_grades(Alice=88,Joe=99,Jevontae=56,Subha=100,Kelly=92)
show_grades(Stewart=100,Mark=105,Joe=95,Eric=75)

## What is the Dot Operator?
Just about everything in Python is an object. The dot operator enables you to access attributes (statements) and methods (function) associated with the object. 

Press <TAB> after a dot to see a list of methods and properties associated with the object.

In the following code, a dot operator is used to access the pi property of the math object:
```Python
<a_python_object.do_something()>
- or -
<a_python_object.access_an_attribute>

# Attribute example:
volume = (4/3) * math.pi * r**3

# Method example:
print("Hello".upper())
```

In [None]:
import math

r =5
volume = (4/3) * math.pi * r**3

print(f"volume = {volume}")

In [None]:
# Using the upper() method on a string via the dot operator
print("Hello!".upper())

In [None]:
#What other functions are available in the math module? Use the dir() function to list a directory of math attributes.
dir(math)

## Void and Return Functions
PY4E calls functions that return values "fruitful." Functions that do not return values are void functions.

In [None]:
def addtwo(a,b):
    """ Returns the sum of two numbers."""
    added = a + b
    return added

z = addtwo(1,2) * 10
print("hi")

In [None]:
print(z)

In [None]:
print(addtwo(3,5))

## Currency and date formatting
Because currency and date formats vary by locale, a recommended way of formatting currency and dates is to use the locale module. This module accesses the locale of your current system and applies it to format values.

In [None]:
import locale
import datetime

#Sets locale for all categories to the user's default setting
locale.setlocale(locale.LC_ALL, '')

#To add commas, set grouping = True
print(locale.currency(100000.55977, grouping=True))

today = datetime.date.today()
print(today)

In [None]:
dir(datetime)

In [None]:
#Source: https://www.programiz.com/python-programming/datetime#datetime
now = datetime.datetime.now() # current date and time

year = now.strftime("%Y")
print("year:", year)

month = now.strftime("%m")
print("month:", month)

day = now.strftime("%d")
print("day:", day)

time = now.strftime("%H:%M:%S")
print("time:", time)

date_time = now.strftime("%m/%d/%Y, %H:%M:%S")
print("date and time:",date_time)	

In [None]:
#Source: https://www.programiz.com/python-programming/datetime#datetime

from datetime import datetime, date

t1 = date(year = 2018, month = 7, day = 12)
t2 = date(year = 2017, month = 12, day = 23)
t3 = t1 - t2
print("t3 =", t3)

t4 = datetime(year = 2019, month = 1, day = 12, hour = 7, minute = 9, second = 33)
t5 = datetime(year = 2019, month = 12, day = 25, hour = 5, minute = 55, second = 13)
t6 = t5 - t4
print("t6 =", t6)

print("type of t3 =", type(t3)) 
print("type of t6 =", type(t6))

# File Operations

Use the open() function to read(r), append (a), or write (w) to a file. Opening a file returns a file handle, not the actual data in the file. After opening the file you can read or write to it. When you are finished with the file, ensure it is closed. Failing to close a file may lead to memory issues, inaccessible files, and possibly data loss.

In [None]:
# use the os module to access operating system information such as the current working directory (getcwd())
import os

# Create (or overwrite) a file
# If no path is specified, the file will be created in the current working directory

# If the file exists, opening with the "w" parameter will overwrite a file of the same name
#   if present in the same folder. To append instead of overwriting, use the "a" mode.
f = open("demofile.txt", "w")

f.write("This is the first line of the file.\n")

# Be sure to close your file. Failure to do so will cause problems.
f.close()

# Get the current directory
print(os.getcwd())

## Using the With statement for opening files
One advantage to using the With statement is that files you open using this method are automatically closed.

In [None]:
# Append the file
with open("demofile.txt", "a") as f:
    f.write("This is the second line.\n")

    # No need to explicitly close the file. Close() is automatically called.

## Reading files
There are several ways to read data from a file. Some of the methods to read a file include: reading a specified number of characters, reading line-by-line, or a reading number of lines.

### Reading an entire file

In [None]:
file_path = os.getcwd() + "\\files"
file_abs = file_path + "\\"+"gettysburg.txt"
with open(file_abs,"r") as fh_getty:
    #read() will access the entire file. Not a good option for large files.
    print(fh_getty.read())

In [None]:
with open(file_abs,"r") as fh_getty:
    n = 100
    #read() will access the entire file. Not a good option for large files.
    print(fh_getty.read(n)) # Read the first n characters

In [None]:
with open(file_abs,"r") as fh_getty:    
    #read() will access the entire file. Not a good option for large files.
    print(fh_getty.readline()) # Read a line
    print(fh_getty.readline()) # Read a line
    print(fh_getty.readline()) # Read a line

In [None]:
with open(file_abs,"r") as fh_getty:    
    #read() will access the entire file. Not a good option for large files.
    x = fh_getty.readlines() # Read all lines with new line characters, separated by commas
    print(x[0])

In [None]:
with open(file_abs,"r") as fh_getty:
    line_number = 0
    for x in fh_getty: # "x" will represent a line
        print(str(line_number) + ": " + x)
        line_number += 1

In [None]:
with open(".\\files\\fake_customer_list.txt", "r") as fh_customers:

    for line in fh_customers:
        #print(line)
        customer_list = line.strip().split("|")
        full_name = customer_list[0]
        email = customer_list[-1]
        print(full_name + " -- " + email)        

In [None]:
# List files in the current directory 
import os
print(os.listdir(os.getcwd()))

### Use the CSV module to read a file

In [None]:
import csv
with open('.\\files\\lyricsonly2M.csv', 'r') as f:
    reader = csv.reader(f)
    your_list = list(reader)

In [None]:
print(your_list)

In [None]:
len(your_list)

In [None]:
import pandas as pd
df_lyrics = pd.DataFrame(your_list, columns=your_list[0])

In [None]:
print(df_lyrics.shape)
with pd.option_context('display.max_seq_items', None):
    print(df_lyrics.head(5000))


In [None]:
print(your_list[0])

## Writing to a File

In [None]:
import os
with open('notebook_list.txt','w') as f:
    for root, dirs, files in os.walk("C:\\Users\\gregb\\documents", topdown=False):
        for name in files:
          if name.endswith('.ipynb'):
                print("writing...",os.path.join(root, name))
                f.write(os.path.join(root, name)+"\n")

## Use Pandas to Read a file
Although built-in file operations in Python may be useful for trivial matters, Pandas and Numpy are much more effective for reading, shaping, and analyzing data. Using these libraries is beyond the scope of this course, however, you should be aware of these libraries. See the Pandas notebook for more information.

In [None]:
import pandas as pd

df = pd.read_csv(".\\files\\2017_instacart_products.csv")
df.head()

In [None]:
# Use the describe function to display descriptive statistics for numerical fields (even if that doesn't make sense...)
df.describe()

# String Operations
Text in Python is represented by a string. A string is an immutable array of unicode characters. This means that once defined, they cannot be changed. You can access (but not change) characters one at a time using the bracket [] operator.

## Strings are arrays
Because strings are stored as an array, you may access characters using array notation.

In [8]:
x = 'Guardians of the Galaxy Vol. 1'
print(x[0:6]) # Print the first six characters

Guardi


In [6]:
print(x[-5:]) # Print the last 5 characters

ol. 1


## Strings are immutable
Like tuples strings cannot be changed.

In [None]:
# Error: Strings are immutable
print(x[29])
x[29] = '2' # Change Vol. 1 to Vol. 2

In [None]:
# Instead of attempting to change the array, reassign the variable to the new value
x = "Guardians of the Galaxy Vol. 2"

## Escape characters and raw strings
Prefix a string with ‘r’ or ‘R’ to force Python to treat backslash (\) as a literal character.
E.g., Windows file paths

| Escape Character | Prints as |
|------------------|-----------|
| \\' | Single quote
| \\" | Double quote
| \\t | Tab
| \\n | Newline (line break)
| \\ | Backslash



In [None]:
# Error - unexpected escape characters
file_path = "c:\users\gregb\documents\python_projects"
print(file_path)

In [None]:
# Escaping backslash for file name
file_path = "c:\\users\\gregb\\documents\\python_projects"
print(file_path)

In [None]:
# Using raw string
file_path = r"c:\users\gregb\documents\python_projects"
print(file_path)

## Using single, double, and triple quotes
You may use either single, double, or triple quotes. Use double or triple quotes when a string contains a single apostrophe, double apostrophe or both.

In [None]:
#Double quotes specifies a string.
statement = """I'm a Python programmer."""



#Single quotes also specify a string. Triple quotes, too.
howdy = 'hello, world!'
print(howdy)

#To print quotes, you canuse the escape character (\)
as_good_as_it_gets = 'Sell crazy someplace else. We\'re all stocked up here.'
print(as_good_as_it_gets)

#Tripe quotes are helpful when you want to display single or double quotes within a string without using an escape character.
cannoli = "Clamenza said, \"Leave the gun, take the cannoli.\" It's one of my 'fav' movie quotes." 
print(cannoli)

## String Capitalization

In [None]:
# Capitalize the first word
print(howdy.capitalize())

In [None]:
# Capitalize each word
print(howdy.title())
print("this is title case".title())

In [None]:
# Capitalize each word
print(howdy.upper())
print("this is all capps".upper())

#Use upper() to compare strings ignoring case
myfavfruit = "Kiwi"

if myfavfruit.upper() == "KIWI":
    print("That's my fav!")
else:
    print("Not my fav")

In [None]:
book_title = "THE UNOFFICIAL GUIDE TO ETHICAL HACKING"

# Lowercase
print(book_title.lower())

In [3]:
# Store the string "banana" in the favorite_fruit variable. 
favorite_fruit = "banana"

# A string is essentially an array of characters and you may access them like you would an array.
print(favorite_fruit)
print(favorite_fruit[0])
print(favorite_fruit[-5:0])

banana
b



In [None]:
# Strings are immutable. Unlike a typical array, you may NOT modify the items (characters) in the array.
# Rather than changing the first letter from 'b' to 'B', this code results in an error.
favorite_fruit[0] = 'B'

In [None]:
print(favorite_fruit)

# You can, however, replace the string value (e.g., "banana") with something else (e.g, "apple")
favorite_fruit = "apple"

print(favorite_fruit)

## String Concatenation

Use the '+' operator to join strings.

In [None]:
first_name = "Gregory"
middle_initial = "J"
last_name = "Bott"

full_name = first_name + " " + middle_initial + " " + last_name + ", Ph.D."
print(full_name)

## Removing white space
A common task when working with data is to remove white space (spaces, tabs, newlines) from the beginning and end of a string. to remove white space use the **strip** method.

In [None]:
# value is preceded by three tabs and followed by a line break
data_column = "\t\t\t 133422.88\n"
print(str(data_column) + "[end]")

#tabs and the line break are stripped from the string
print(data_column.strip() + "[end]")

## Format Operator
To substitute values from variables or functions into a string, use the *format operator* %. 

Do not confuse % with modulus operator. In the statement, 4 % 2 = 0 '%' is the modulus operator. 

Instead of using the % operator between integers as in the modulus operator, the *format operator* is used within a string.
<br>%d = signed integer decimal
<br>%s = string
<br>%f = float

For more conversion types go to https://docs.python.org/3/library/stdtypes.html#old-string-formatting

In [None]:
b_of_b_on_wall = 99
beverage = "beer"

for bottle_num in range(100,0,-1):
    print("%d bottles of %s on the wall, %d bottles of %s." % (bottle_num, beverage, bottle_num, beverage))
    print("Take one down and pass it around, %d bottles of %s on the wall." % (int(bottle_num)-1, beverage))
    

## Using str.format()
The format operator is a good option, but when you have multiple placeholders in a string, code becomes less readable. 

One advantage of the str.format() method is that you can use the replacement fields in any order. Simply use their index values.

In [None]:
name = "Greg"
age = "82"

print("Hello, {}. You are {}.".format(name, age))

In [None]:
name = "Greg"
age = "82"

#Reference index to use out of sequence order.
print("Hello, {1}. You are {0}.".format(age, name))

In [None]:
# Use dictionary values
person = {'name': 'Greg', 'age': 82}
print("Hello, {name}. You are {age}.".format(**person))

## Using f-Strings
Beginning with Python 3.6, you can use f-strings ("formatting string literals"). The syntax for f-Strings is similar to str.format() but results in more readable code.

In [None]:
name = "Greg"
age = "82"

print(f"Hello, {name}. You are {age}.")

## Splitting and Joining strings

In [2]:
# Below is a database record exported using the pipe symbol ("|") to seprate fields
exported_record = "Quin J. Alford|Proin Company|Ap #664-5782 Felis St.|Butte|35565|MT|-72.72653, -167.07764|4716 4071 8086 1415|436|eu@pellentesque.net"
print("original data:")
print(exported_record)
print()

#Split the data at each pipe symbol. The result of the split function is a Python list (essentially an array)
exported_record = exported_record.split("|")
print("converted to a list: ")
print(exported_record)
print()

original data:
Quin J. Alford|Proin Company|Ap #664-5782 Felis St.|Butte|35565|MT|-72.72653, -167.07764|4716 4071 8086 1415|436|eu@pellentesque.net

converted to a list: 
['Quin J. Alford', 'Proin Company', 'Ap #664-5782 Felis St.', 'Butte', '35565', 'MT', '-72.72653, -167.07764', '4716 4071 8086 1415', '436', 'eu@pellentesque.net']



In [3]:
#Print the first and last member of the list. Acess the first element (element 0), and the last element (-1).
#The second to last element would be accessed using [-2].
print("Name: " + exported_record[0], "   email: " + exported_record[-1] +"\n")

#Join the list by comma and store in exported_record
exported_record_csv = ",".join(exported_record)
print(exported_record_csv)

Name: Quin J. Alford    email: eu@pellentesque.net

Quin J. Alford,Proin Company,Ap #664-5782 Felis St.,Butte,35565,MT,-72.72653, -167.07764,4716 4071 8086 1415,436,eu@pellentesque.net


## Find() method for strings

In [1]:
gburg_text = "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal."

#Find first instance of a word, find(value, start default = 0, end default = end of string)
find_pos = gburg_text.find("score")
print(find_pos)

5


# Working with Dates

In Python, a date is not a data type. Use the datetime module to work with dates as date objects.

In [None]:
import datetime

datetime_obj = datetime.datetime.now()
print(datetime_obj)

# Error Handling

Robust programs anticipate and gracefully handle unexpected situations and errors. For example, when asking a user to input a number, a robust program gracefully handles unexpected or erroneous input. Another examples include attempting to open a file or connect to a database.

```Python
try:
    pass
except Exception:
    pass
else:
    pass
finally:
    pass
```

Error handling enables the developer to gracefully respond to exceptions in code. Without error handling, users will be confronted with error output they may not understand and that stops execution. 

Instead, use error handling to communicate resolution steps to the user and continue execution or exit gracefully.

In [1]:
# Error: File Does not exist
with open("demo_file.txt", 'r') as f:
    f.read()

FileNotFoundError: [Errno 2] No such file or directory: 'demo_file.txt'

In [3]:
# Wrap error-prone code in try...except blocks
try:
    with open("demo_file.txt", 'r') as f:
        f.read()
except:
    print("File not found")

File not found


In [9]:
# To take specific actions, place the type of error after the except statement.
#   This block will only catch FileNotFound errors
try:
    with open("demofile.txt", 'r') as f:
        f.read()
    #newfile = myfile
except FileNotFoundError as e:
    print(e, "Please input the correct path and file name.")

NameError: name 'myfile' is not defined

When handling multiple exceptions, sort your exception handling with the most specific at the top and the more general towards the bottom. Otherwise, the specific exceptions will never be caught.

In [12]:
# Use as many except statements as needed
try:
    with open("demofile.txt", 'r') as f:
        f.read()
    newfile = myfile
except FileNotFoundError as e:
    print(e, "Please input the correct path and file name.") # Help users understand how to resolve the error
except NameError as e:
    print(e)

name 'myfile' is not defined


### Using Else and Finally
Use the ```else``` clause to run code if NO errors are thrown.
Code in the ```finally``` block runs irrespective of whether an exception was caught.

In [15]:
try:
    with open("demofile.txt", 'r') as f:
        f.read()
    newfile = ""
except FileNotFoundError as e:
    print(e, "Please input the correct path and file name.") # Help users understand how to resolve the error
except NameError as e:
    print(e)
else:
    print("No exceptions thrown.")
finally:
    print("Opening file process completed.")

<class '_io.TextIOWrapper'>
No exceptions thrown.
Opening file process completed.


In [16]:
# You can also raise errors manually

if not type(newfile) is int:
  raise TypeError("Only integers are allowed") 

TypeError: Only integers are allowed

In [17]:
try:
    if not type(newfile) is int:
        raise TypeError("Only integers are allowed") 
except Exception as e:
    print(e)

Only integers are allowed


# Data Structures


## What's a data structure?
As its name implies, a data structure is a containerthat holds data. Just like some post office boxes hold packages and others hold letters, Python's built-in data structures have different purposes and uses. Use data structures to organize and perform operations on data. Python has the following built-in data structures: Lists, Dictionaries, Sets, and Tuples. Each container has different attributes and is used for a different purpose.

sources: (W3Schools, RealPython.com)

<img src="images/post_office_boxes.jpg" align="middle">

## Comparing Built-in Data Structures
Below is a comparsion of four built-in data structures in Python. 

![](images/Structures.jpg)

## Tuples
What is the proper pronunication of "tuple"? Answer: either TEW-pull or Tupple (like the 'u' sound in pup). 

* Ordered sequence of elements
* Tuples are immutable
* Parentheses denote a tuple

In [None]:
# Create an empty tuple

t = ()
t = (4, "hello", True, 3.1)
print(t[1])

In [None]:
# Concatenation with a tuple

print(t)
print("Concatenate '7'")
print((t) + (7,)) # Note the comma--the comma tells Python that this is a tuple and not an int

In [None]:
# Iterating a tuple
for v in t:
    print(v)

In [None]:
# zip function example

s = ' abc'
t = [0, 1, 2] 
zip(s, t)
for pair in zip(s, t):
    print(pair)

In [None]:
def tip_options(amount):
    # Use a tuple to return more than one value
    return(amount, amount*1.10, amount*1.15, amount*1.2)

print(tip_options(30))
print(type(tip_options(30)))

In [None]:
# Use a tuple to swap values
y = 5
x = 10
print('x =', x, 'y =', y)
(x, y) = (y, x)
print('x =', x, 'y =', y)

## Lists

A list is an ordered sequence of *items*. Lists are similar to arrays in other languages. One difference is that Lists can contain different types of data.

### Creating Lists
 
Lists are created using several methods.

In [2]:
#Use square brackets to make list

cities = [] # an empty list

['Dallas', 'Chicago', 'Miami', 'Grand Rapids']  is of type  <class 'list'>
Miami


In [None]:
print(cities)

In [None]:
cities = ["Dallas","Chicago","Miami","Grand Rapids" ]
print(cities," is of type ", type(cities))

In [None]:
# Get the length of your list using the len() function
len(cities)

In [None]:
#Ordered -- accessible via index
print(cities[2]) # Print the third item in a list

In [None]:
print(cities[-1]) # Print the last item in a list

In [1]:
print(cities[1:3]) # print the second and third items. Second value is not inclusive.

States is of  <class 'str'>
['Missouri', ' Alabama', ' Texas', ' Washington', ' Florida'] this is of <class 'list'>
Missouri, Alabama, Texas, Washington, Florida this is of <class 'str'>


In [None]:
print(cities[1:]) # print the second item through the end of the list.

In [None]:
# Lists are not limited to containing only values of a single type
# A list may contain objects such as another list 
my_list = [True, 0, "Greg Bott", 3.14159, ["steak","eggs","donuts"]]
print(my_list)

In [None]:
def my_function(my_parameter):
    pass
my_function(0)

### Lists are Mutable

In [None]:
print(cities)
cities[2] = "San Antonio" # Replace the third entry ('Miami') with 'San Antonio'
print(cities) # mutated list object

### Adding items to a list
```Python
<list>.append(element)
``` 
Use append to add elements to the end of a list. This operation *mutates* the list.

In [None]:
# Use insert() to add an item to specific location in the list. 
print(cities)
cities.insert(1, 'Austin') # Insert Austin in the second position
print(cities)

In [None]:
# Use the append() method to add an item to the list
cities.append("Columbia")
print(cities)

# Using append() to add multiple items results in a list within a list
more_cities = ['St. Louis', 'Tempe', 'Atlanta']

# append() accepts only 1 argument (an interable)
cities.append(('St. Louis', 'Tempe', 'Atlanta')) 
print(cities)

In [None]:
# Reset the list to the original cities
cities = ['Dallas', 'Austin', 'Chicago', 'Miami', 'Grand Rapids', 'Columbia']

In [None]:
more_cities = ['St. Louis', 'Tempe', 'Atlanta']

# Use extend() when you want to add multiple values to a list
cities.extend(more_cities)
print(cities)

### Copying a list
If you use the following expression to create a new list, what you have is two references to a single object, NOT two lists.
```Python
list_a = list_b
```

In [None]:
list_a = [1,2,3,4,5]

list_b = list_a
print("a=",list_a)
print("b=",list_b)

In [None]:
# Change ONLY list_b
list_a[0] = 'Protein bar'
print("a=",list_a)
print("b=",list_b)

In [None]:
list_a = [1,2,3,4,5]

# To make a *copy* of the object instead of referencing the same object, use copy()
list_b = list_a.copy()
print("a=",list_a)
print("b=",list_b)

In [None]:
# Change ONLY list_b
list_a[0] = 'Protein bar'
print("a=",list_a)
print("b=",list_b)

### Converting Lists to Strings and Back
Using the split() function to separate a string using a delimter (e.g., a comma) creates a list object.

In [None]:
states = "Missouri, Alabama, Texas, Washington, Florida"
print("States is of ", type(states))

states = states.split(",")
print(states, "this is of",type(states))

In [None]:
print(states[2])

In [None]:
states = ','.join(states)
print(states, "this is of",type(states))

In [None]:
# Remember that you may split on any character.
#   Here is an example of splitting an email address
#   into the user name (string prior to the '@' symbol)
#   and the domain (the part following the '@' symbol.')

addr = 'monty@python.org'
uname, domain = addr.split('@')
split_email = addr.split('@' )
print(f"user name = {uname}")
print(f"domain name = {domain}")

### Using List Comprehensions

List comprehensions are a compact method to build lists using a single line of code. 

Basic syntax
```python
[ expr for item in iterable ]
```

Instead of:

In [None]:
# Iteration method to load a list
L = []
for n in range(12):
    L.append(n ** 2)
print(L)

In [2]:
# Loading items using a List Comprehension
L = [n ** 2 for n in range(12)]
print(L)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

### Removing Items from a List

In [None]:
# Remove elements by index
t = ['a', 'b', 'c', 'd', 'e']
del(t[1])
print(t)

In [None]:
# Delete the last item on the list. Returns the item deleted. Mutates list.
x = t.pop()

print('Deleted ', x)
print('New list is ', t)

In [None]:
# Remove specific element (e.g., remove 'Chicago'), mutates the list.
print(cities)
cities.remove('Chicago')
print(cities)

In [None]:
# ERROR: If not in list, error.
cities.remove('St. Louis')

Be careful when removing items from a list. If you attempt to remove items while iterating over the same list, items may be skipped. 

In [None]:
my_list = [1,2,3,4,5,7,8,9,10]
print(my_list[6])

In [None]:
# The intent of this code is to remove numbers greater than 5 from my list.

for item in my_list:
    if item > 5:
        my_list.remove(item)

# ERROR: However, 7 and 9 remain because they were skipped as items were removed.
print(my_list)

One solution is to use a list comprehension.

In [None]:
my_list = [1,2,3,4,5,6,7,8,9,10]

# Only keep items in the list that are less than 6
my_list = [item for item in my_list if item < 6]

print(my_list)

Another solution is to reverse the list. That way, if the last item (item 9) is deleted, it doesn't alter the indexes of the rest of the list.

In [None]:
my_list = [1,2,3,4,5,6,7,8,9,10]

# Reverse the list and delete items greater than 5
for item in reversed(my_list):
    if item > 5:
        my_list.remove(item)
        
print(my_list)

### Testing for membership

Use the in keyword to test for list membership.

In [5]:
print("Dallas" in cities)
print("Tuscaloosa" in cities)

True
False


### Iterating a list

Use a for loop to iterate a list.

In [None]:
# Loop through list
for city in cities:
    print(city)

Use the len() function to determine how many items are in the list and use that within a range() function.

In [3]:
for i in range(len(cities)):
    print(cities[i],end=" ")

Dallas Chicago Miami Grand Rapids 

### List Concatenation


In [1]:
# Use the '+' operator to concatenate lists
a = [1,2,3]
b = [4,5,6]
c = a + b # does not mutate 'a' or 'b'

print("c = ", c)

print("a = ", a)
print("b = ", b)

c =  [1, 2, 3, 4, 5, 6]
a =  [1, 2, 3]
b =  [4, 5, 6]


### Extending a list

In [None]:
print("list 'a' = ", a)
print("list 'b' =", b)

a.extend(b) # This combines a and b, mutates a but not b

print("list 'a' =", a)
print("list 'b' =", b)
print(c)

In [None]:
# Use the '*' operator to repeat items
print(a * 3)

### Slicing Lists
You can return parts of a list using slicing operators. Other objects (e.g., strings and tuples) can also be sliced.

In [None]:
# Slicing operations

t = ['a', 'b','c','d','e','f','g']

# return the 2nd and 3rd elements in t
print(t[1:3])

In [None]:
# Omitting the first parameter tells the intepreter to start at the beginning
print(t[:3])

In [None]:
# Omitting the second paramter tells the interpreter to continue to the end
# start with the third element and return all elements to the end of the list
print(t[3:])

In [None]:
# Get the last item
print("Last item in the list = ", t[-1])

In [None]:
# Use Negative slicing to replace the last item in the ist
t[-1] = 'watermelon'
print(t)

### Sorting Lists

In [None]:
# Use sorted() to display a sorted list but not mutate it.
my_letters = ['n','r','y','x','a','w']

print("sorted list = ", sorted(my_letters))
print("my_letters = ", my_letters)

In [None]:
# Use the sort() method to sort the items in a list
my_letters.sort()
print(my_letters)

In [None]:
# Don't do this...sort() returns "None"
my_letters = my_letters.sort() 
print(my_letters)

In [None]:
# Reverse a list
my_letters.sort(reverse=True)
print(my_letters)

### Use a list to return more than one value from a function

In [2]:
# Use a list to return more than one value from a function
def tip_options(amount):
    # Use a list to return three tipping options (10%, 15%, 20%)
    return[amount, amount*1.10, amount*1.15, amount*1.2]

print(tip_options(30))
print(type(tip_options(30)))

[30, 33.0, 34.5, 36.0]
<class 'list'>


### Working With Nested Lists

In [None]:
my_list = [['Ford','Chevrolet','Volkswagen'],
           ['F150','Suburban','Passat'],
           ['Big Bang Theory','Young Sheldon','Mindhunter']]

print(my_list[0][1]) # row zero, item 2 (index 1)
print(my_list[2][2]) # row two, item 3 (index 2)


In [None]:
# Use a list to swap values
y = 5
x = 10

print('x =', x, 'y =', y)

[x, y] = [y, x]

print('x =', x, 'y =', y)

## Sets
* Sets are unordered.
* Set elements are unique. Duplicate elements are not allowed.
* You may add or remove items from the set, but you cannot edit an item in a set.
* Accessing items by index (e.g., myset[1]) is NOT supported.
* Sets are denoted by curly braces.
* Membership tests are more efficient using sets than lists or tuples.

You can define a set using the set() function.b
```python
x = set(<iter>)
```

In [None]:
my_list = ['a','b',1, 'c', 1]
set2 = set(my_list)
print(set2)
print(my_list)

You can also create a set using curly braces {}. However, you cannot create an empty set using a pair of curly braces like you can for a list.

In [None]:
# INCORRECT
my_set = {}  # <-- results in a dictionary, NOT a set
print(type(my_set))

# Instead use the set constructor
my_set = set()
print(type(my_set))

In [None]:
# Use curly braces to create a set
my_set = {1,1, 6,7, 3, 5,5,5,5,5, 'red'}
print(type(my_set))
print(my_set)

### Why do I care about sets?
Sets in Python provide the same benefits as sets in mathematics. Sets contain a well-defined collection of distinct objects called elements. Using the set object enables you to efficiently perform set operations such as union and intersection.

![](images/data_science_diagram.png) <br>
(image source: https://towardsdatascience.com)

### Creating sets
Use curly braces to denote a set or use the set() constructor. If you use set(), you must provide an iterable as the argument.

In [None]:
# Persons with expertise in specific areas
cs_expertise = {"Bill", "Matt", "Alexandra", "Joe", "Dexter"}
stats_expertise = set(["Dexter", "Subha", "Brad", "Bruce"])
business_expertise = {"Kay","Jonathan","Dexter","Suzanne", "Matt"}

You can also use the set() method to create a set. The argument for the set method must be an iterable.

In [None]:
#Use the set() method to create a set, parameter must be <iter> (an iterable --e.g., a list)
my_set2 = set(['foo', 'bar', 3.141, 'bar'])
print(my_set2)

In [None]:
#Error creating tropical_fruits set using set() contructor...why?
tropical_fruits = set("Guava", "Dragon Fruit", "Banana","Banana")
temperate_fruits = {"Apple", "Peach", "Plum"}

all_fruit = tropical_fruits.union(temperate_fruits)
print(all_fruit)

In [None]:
# Who might be suited for Data Science (intersection of three topics)
data_scientists = cs_expertise.intersection(stats_expertise, business_expertise)
print(data_scientists)

In [None]:
#Empty sets are evaluated as False
loch_ness_monsters = set()
print("The set of Loch Ness Monsters is " + str(bool(loch_ness_monsters)))
print()

#You can add, update, and remove items, but you cannot change items in a set
loch_ness_monsters.add("Marvin")
print("Added Marvin to monster set...")
print("The set of Loch Ness Monsters is " + str(bool(loch_ness_monsters)), loch_ness_monsters)
print("The length of the monster set is " + str(len(loch_ness_monsters)))

In [None]:
# Reduce this list of grades to only have unique values
grades = {81,100,81,89,76,94,93,86,75,88,96,76,87,90,81,78,99,83,94,75,83,92,96,81,99,89,99,98,100,95,84,94,97,100,92,97,98,92,95,88,90,98,87,86,95,86,84,91,87,88,83,89,84,98,75,90,100,79,83,94,89,93,84,83,94,84,93,97,75,81,91,84,78,89,96,97,99,90,98,83,93,96,98,91,77,98,97,76,98,75,89,92,81,83,84,82,94,89,77,96,94,100,86,79,87,78,83,86,89,99,77,96,88,91,86,89,99,82,83,92,91,84,83,76,89,90,82,75,84,83,81,96,87,90,82,93,76,86,100,81,88,100,94,84,99,77,91,92,98,88,90,83,88}
print(grades)

In [None]:
c_and_higher = set(range(75,101))

missing_grades = grades.symmetric_difference(c_and_higher)

print("What grades are missing from 75-100?: " + str(missing_grades))

## Dictionaries

Think of dictionaries like a list, but with a flexible index. The List index must be an integer, but the index or keys used to associate values can be any immutable data type.

Dictionaries are **unordered** and use key-value pairs to store and retrieve data. In other languages this structure might be called an *associative array*.


### Creating dictionaries
Use curly braces and a colon to indicate to the interpreter that you are creating a dictionary data structure. A key can be any immutable data type.

Pretty Print is a module that displays dictionaries in a more human-readable format.

In [None]:
import pprint as pp

# The employee ID is associated with the employee name
employees = {"2334":"Greg Bott", "2335":"John Gilbert", "2336":"Bill Hampton","2337":"Joe Odom"}
pp.pprint(employees)

In [None]:
# Create an empty dictionary
person = {}

#Display the type of the 'person' variable
print(type(person))


person[1000] = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy', 'children':['John Davis', 'Piper', 'Will', 'Truett'], 'pets':{'Bama':'dog', 'TJ':'cat'}}
person[1001] = {'first_name':'Joe', 'last_name':'Devlin', 'spouse':'Suzanne', 'children':['CK', 'Alan', 'Devin', 'Tom'], 'pets':{'Orangey':'gold fish', 'Hammer':'turtle'}}

pp.pprint(person)

## Accessing data in a dictionary
Use ```dict[key]``` to return the value from the key-value pair. If the value doesn't exist, an exception is thrown. Use the get() method to retrieve keys and handle missing keys more gracefully.

You can also use ```keys()``` to list the keys in your dictionaries, ```values``` to access the values of the key-value pair or ```items()``` to access both.

In [None]:
# Using the employee ID (key), display the name of the employee (value)
print(employees["2334"])

In [None]:
# Print the first_name attribute of the person 1001 key.
print(person[1001]['first_name'])

### Using the get() method
Use the get() method to access keys. Using get() avoids a KeyError if the desired key does not exist. Instead, Python returns the None value.

In [None]:
print(employees.get("2334"))
print(employees.get("9999"))

In [None]:
# You can also provide a default value if a key does not exist
print(employees.get('ss_number', 'no SS# provided'))

In [None]:
# Use a List as values a dictionary
make_model = {"Ford":["Mustang","Explorer","Focus"],"Volkswagen":["Passat","Jetta","Beetle"]}
print(make_model["Ford"])

In [None]:
print(person[1000]['first_name'])

In [None]:
# Replace values using a key
pp.pprint(person[1001])
person[1001]['pets'] = {'flying squirrel':'Rocky'}
pp.pprint(person[1001])

### Using a loop to examine a dictionary
Although looping through a dictionary fails to take advantage the speed of a dictionary, sometimes you may find it useful. Remember that dictionaries contain a key / value pair and that you must loop through them differently than you would a list.

In [None]:
# Attempting to interate through a dictionary as you would a list 
#   will yield only the keys
for x in employees:
    print(x)

In [None]:
# Instead, use the items() method to return both the key and the value
for x, y in employees.items(): # x = key; y = value
    print(x,y) 

### Using a loop to add values to a dictionary.
So far we have manually added items to a dictionary. Most often you will add items progamatically (e.g., using a loop) rather than manually. Below is part of the code I use find duplicate files using an MD5 hash. A hash is a one-way algorithm applied to an object that results in a fixed-length string that uniquely identifies that object.

We'll use the os module to access the file system and the hashlib to apply the MD5 algorithm to the files and then store them in a dictionary using the MD5 value as the key.

In [None]:
import os
import hashlib
import pprint as pp

# Create a blank dictionary
os_files = {}

# Hash Function
def hashfile(path, blocksize=65536):
    file_to_hash = open(path, 'rb')
    hasher = hashlib.md5()
    buf = file_to_hash.read(blocksize)
    while len(buf) > 0:
        hasher.update(buf)
        buf = file_to_hash.read(blocksize)
    file_to_hash.close()
    return hasher.hexdigest()

for file in os.listdir():
    # Use error handling to avoid file permission issues
    try:
        # Use the hash value returned from the hashfile() function as the key and append the file name to it.
        os_files[hashfile(file)] = file
    except:
        pass
    
# Display the Dictionary
pp.pprint(os_files)

### Check for Values in a Dictionary

To determine if a value is present within a key, us the *in* keyword.

In [None]:
print("Focus" in make_model["Ford"])
print("Explorer II" in make_model["Ford"])

### Access Specific item in key value

Individual values associated with a key may be accessed by an index value.

In [None]:
# Print the third value associated with the Ford key.
print(make_model['Ford'][2])

### Check for Keys in a Dictionary

In [None]:
search_key = "Focus"
if search_key in make_model:
    print(f"'{search_key}' key found in dictionary!")
else:
    print(f"'{search_key}' key NOT found in dictionary.")

### Attempting to Access Keys that Don't exist
If you attempt to access a key that does not exist within the dictionary, Python will raise an exception


In [None]:

person[1000] = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy', 'children':['John Davis', 'Piper', 'Will', 'Truett'], 'pets':{'Bama':'dog', 'TJ':'cat'}}
person[1001] = {'first_name':'Joe', 'last_name':'Devlin', 'spouse':'Suzanne', 'children':['CK', 'Alan', 'Devin', 'Tom'], 'pets':{'Orangey':'gold fish', 'Hammer':'turtle'}}


In [None]:
# ERROR: KeyError (key does not exist in the dictionary)
print(person['fname'])
print(person['ss_number'])

### Updating dictionary values
Assigning a value to an existing key/value pair will replace the value.

You can also use the update method to replace multiple values in a dictionary and add new values.

```update()``` takes a dictionary as its parameter.

In [None]:
person[1000].update({'first_name':'Gregory','ss_number':'123-45-6789','middle':'Hamilton'})
pp.pprint(person)

In [None]:
# Use the append() method of a key to append a value
#   Here we are adding Sally to Joe Devlin's children
person[1001]['children'].append('Sally')
pp.pprint(person[1001])

### Removing an item from the dictionary
You can use del or pop to remove items from a dictionary. Just as in a list, ```pop``` returns the value deleted that you can store in a variable.

In [None]:
del person['pets']
print(person)

In [None]:
print(person)
ss_number = person.pop('ss_number')
print(ss_number)

### Clear a dictionary
Use the clear() method to empty the contents of a dictionary.

In [None]:
print(person)
person.clear()
print(person)

# Regular Expressions
RegEx or regular expressions is a sequence of characters that match other strings or sets of strings, using a specialized syntax pattern. Python has a built-in package called re, which can be used to work with regular expressions. To use the re package, import re.

## Raw Strings

To avoid Python escaping the RegEx patterns, prefix the patter with 'r'.

## Regex Cheat Sheet

source: https://regexone.com

For an excellent interactive tutorial, go to https://regexone.com/lesson/introduction_abcs

To test and learn more about RegEx, https://regexr.com/ is also a helpful site.

abc…	Letters<br>
123…	Digits<br>
\d	Any Digit<br>
\D	Any Non-digit character<br>
.	Any Character<br>
\.	Period<br>
[abc]	Only a, b, or c<br>
[^abc]	Not a, b, nor c<br>
[a-z]	Characters a to z<br>
[0-9]	Numbers 0 to 9<br>
\w	Any Alphanumeric character<br>
\W	Any Non-alphanumeric character<br>
{m}	m Repetitions<br>
{m,n}	m to n Repetitions<br>
\*	Zero or more repetitions<br>
\+	One or more repetitions<br>
?	Optional character<br>
\s	Any Whitespace<br>
\S	Any Non-whitespace character<br>
^…$	Starts and ends<br>
(…)	Capture Group<br>
(a(bc))	Capture Sub-group<br>
(.*)	Capture all<br>
(abc|def)	Matches abc or def<br>

In [None]:
# Import the built-in Regular Expressions package
import re

email_header = "From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008 Return-Path: <postmaster@collab.sakaiproject.org> for <source@collab.sakaiproject.org>;Received: (from apache@localhost) Author:  stephen.marquard@uct.ac.za"

found_text = re.findall('\d\d:\d\d:\d\d', email_header)
print(found_text)
print("found text is of type",type(found_text))

author = re.findall('Author:\s+\S+', email_header)
print(author)

In [None]:
mboxfile = open("mbox.txt", "r")

for line in mboxfile:
    line = line.rstrip()
    
    # Search for lines that start with 'F', followed by 2 characters, followed by 'm:'
    if re.search('F..m:', line):        
        print(line)
mboxfile.close()

In [None]:
# Store all email addresses into a list
mboxfile = open("mbox.txt", "r")
all_emails_list = []
for line in mboxfile:
    line = line.rstrip()
    x = re.findall('\S+@\S+\.\D\D\D', line)
    if len(x) > 0:
        all_emails_list.extend(x)
print(all_emails_list)

In [None]:
mboxfile = open("mbox.txt", "r")
all_emails_list = []
for line in mboxfile:
    line = line.rstrip()
    x = re.findall('rev=.....', line)
    if len(x) > 0:
        all_emails_list.extend(x)

print(all_emails_list)
all_revs_set = set(all_emails_list)
print(len(all_revs_set))