### Python Programming for Data Science

Quick "hands on" introduction to Python programming elements for data science.

Jay Urbain, PhD

6/26/2018

#### What is Python?  

Python is a popular programming language. It was created in 1991 by Guido van Rossum.

It is used for:  

- data science and espcically machine learning.
- web development (server-side),
- software development,
- mathematics.
- system scripting.  

#### What can Python do?  
- Data science applications.  
- Machine learning models.  
- Artificial intelligence applications. 
- Used on a server to create web applications.    
- Used alongside software to create workflows.  
- Connect to database systems. It can also read and modify files.  
- Used to handle big data and perform complex mathematics.  
- Used for rapid prototyping, or for production-ready software development.  

#### Why Python?  
- Works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).  
- Simple syntax similar to the English language.  
- Syntax that allows developers to write programs with fewer lines than some other - programming languages.  
- Runs on an *interpreter system*, meaning that code can be executed as soon as it is written. This means that prototyping can be very quick.  
- Python can be treated in a procedural way, an object-orientated way or a functional way.  

#### Good to know  
- The most recent major version of Python is Python 3.
- Python was designed to for readability, and has some similarities to the English language with influence from mathematics.  
- Python uses new lines to complete a command, as opposed to other programming languages which often use semicolons or parentheses.  
- Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions and classes. Other programming languages often use curly-brackets for this purpose.  

#### Getting Help

object? - help system  
help(object) - help system (inline)  
object?? - source code 
%quickref - Quick reference to magic commands 
 

? - intro and overview of Pythons features

In [5]:
len?

In [6]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [7]:
import math
help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.6/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)
        
        Return the arc cosine (measured in radians) of x.
    
    acosh(...)
        acosh(x)
        
        Return the inverse hyperbolic cosine of x.
    
    asin(...)
        asin(x)
        
        Return the arc sine (measured in radians) of x.
    
    asinh(...)
        asinh(x)
        
        Return the inverse hyperbolic sine of x.
    
    atan(...)
        atan(x)
        
 

In [9]:
L = [1,2,3]

In [10]:
L

[1, 2, 3]

In [11]:
L.append?

In [12]:
def square(x):
    """
    most awesome square function
    x - takes number
    returns square of x
    """
    return x*x

In [13]:
square?

Accessing Source Code with ??

In [14]:
square??

In [16]:
# source code not provided if function is not written in python
len??

Exploring modules with TAB completion


In [20]:
L.count??

In [21]:
L.count?

In [22]:
L.count(3)

1

1Tab completion when importing

In [24]:
from itertools import combinations

wildcare matching

In [28]:
*Warning?

In [27]:
str.*find*?

#### Somewhat Useful IPython/Jupyter Shell Shortcuts

**Keyboard Shortcuts** 

| Keystroke                         | Action                                     |
|-----------------------------------|--------------------------------------------|
| ``Ctrl-a``                        | Move cursor to the beginning of the line   |
| ``Ctrl-e``                        | Move cursor to the end of the line         |
| ``Ctrl-b`` or the left arrow key  | Move cursor back one character             |
| ``Ctrl-f`` or the right arrow key | Move cursor forward one character          |

#### Text Entry Shortcuts

| Keystroke                     | Action                                           |
|-------------------------------|--------------------------------------------------|
| Backspace key                 | Delete previous character in line                |
| ``Ctrl-d``                    | Delete next character in line                    |
| ``Ctrl-k``                    | Cut text from cursor to end of line              |
| ``Ctrl-u``                    | Cut text from beginning of line to cursor        |
| ``Ctrl-y``                    | Yank (i.e. paste) text that was previously cut   |
| ``Ctrl-t``                    | Transpose (i.e., switch) previous two characters |

#### Command History Shortcuts

| Keystroke                           | Action                                     |
|-------------------------------------|--------------------------------------------|
| ``Ctrl-p`` (or the up arrow key)    | Access previous command in history         |
| ``Ctrl-n`` (or the down arrow key)  | Access next command in history             |
| ``Ctrl-r``                          | Reverse-search through command history     |

#### Data Types And Operators

Python uses dynamic typing.

Data Types:   
- Integers    
- Floats    
- Booleans    
- Strings    
- Lists    
- Tuples    
- Sets    
- Dictionaries   

Operators: 
- Arithmetic  
- Assignment  
- Comparison  
- Logical  
- Membership  
- Identity   

Built-In Functions  
Compound Data Structures  
Type Conversion  


#### Arithmetic Operators


\+ Addition   
\- Subtraction  
\* Multiplication  
/ Division  
% Mod (the remainder after dividing)  
** Exponentiation (note that ^ does not do this operation, as you might have seen in other languages)  
// Divides and rounds down to the nearest integer  

Bitwise operators:  
http://mathforum.org/dr.math/faq/faq.order.operations.html



#### Dynamic typing

In [29]:
x = 4
print(type(x), x)
x = 4.0
print(type(x), x)
x = '4'
print(type(x), x)
x = int(x)
print(type(x), x)
x = str(x)
print(type(x), x)

<class 'int'> 4
<class 'float'> 4.0
<class 'str'> 4
<class 'int'> 4
<class 'str'> 4


#### Python implementation

Python is written in C. Each Python object including simple data types like integer, float, string, and boolean are a C structure which contains several values.

In C, an integer is a label for a position in memory whose bytes encode an integer value.

In Python, an integer is a pointer to a position in memory containing all the Python object information.

#### Python Assignment

In [30]:
x = 1
y = 2
x + y

3

In [31]:
x

1

>**Question**: 
My electricity bills for the last three months have been \$23, \$32 and \$64. What is the average monthly electricity bill over the three month period? Write an expression to calculate the mean, and use print() to view the result.


In [38]:
def mean(inputs):
    total = 0
    for x in inputs:
        total += x
    return x / len(inputs)

In [39]:
avg = mean([23,32,64]) # Your work here. Print the results
print(avg)


21.333333333333332


In [40]:
format?

In [41]:
help('FORMATTING')

Format String Syntax
********************

The "str.format()" method and the "Formatter" class share the same
syntax for format strings (although in the case of "Formatter",
subclasses can define their own format string syntax).  The syntax is
related to that of formatted string literals, but there are
differences.

Format strings contain “replacement fields” surrounded by curly braces
"{}". Anything that is not contained in braces is considered literal
text, which is copied unchanged to the output.  If you need to include
a brace character in the literal text, it can be escaped by doubling:
"{{" and "}}".

The grammar for a replacement field is as follows:

      replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}"
      field_name        ::= arg_name ("." attribute_name | "[" element_index "]")*
      arg_name          ::= [identifier | digit+]
      attribute_name    ::= identifier
      element_index     ::= digit+ | index_string
      index_string      ::=

In [49]:
str.format("{}", avg)

'21.333333333333332'

In [53]:
str.format("{!a}", avg)

'21.333333333333332'

In [77]:
def testFormats(test_string, is_int = False):
    examples = ["!s: {}", "!a: {!a}", "!r: {!r}"]
    
    for x in examples:
        print(str.format(x, test_string))
        
    examples_int = ["{}", "{:.1}", "{:.3}", "{:.1f}", "{:.3f}"]
    if is_int:
        for x in examples_int:
            print(str.format(x, test_string))

In [78]:
testFormats("alsdjfklasjdf")

!s: alsdjfklasjdf
!a: 'alsdjfklasjdf'
!r: 'alsdjfklasjdf'


In [79]:
testFormats(avg, True)

!s: 21.333333333333332
!a: 21.333333333333332
!r: 21.333333333333332
21.333333333333332
2e+01
21.3
21.3
21.333


In [82]:
print("The average electricity bill for the last three months was ${:.2f}".format(avg))

The average electricity bill for the last three months was $21.33


In [86]:
print("The average electricity bill for the last three months was $%.2f" % avg)

The average electricity bill for the last three months was $21.33


>**Question**: 
Complete the following.

Note that this code uses scientific notation to define large numbers. 4.445e8 is equal to 4.445 10 * 8 which is equal to 444500000.0.

In [95]:
# The current volume of a water reservoir (in cubic metres)
reservoir_volume = 4.445e8
# The amount of rainfall from a storm (in cubic metres)
rainfall = 5e6

# decrease the rainfall variable by 10% to account for runoff
rainfall *= .9

# add the rainfall variable to the reservoir_volume variable
reservoir_volume += rainfall

# increase reservoir_volume by 5% to account for stormwater that flows
# into the reservoir in the days following the storm
reservoir_volume *= 1.05

# decrease reservoir_volume by 5% to account for evaporation
reservoir_volume *= .95

# subtract 2.5e5 cubic metres from reservoir_volume to account for water
# that's piped to arid regions.
reservoir_volume -= 2.5e5

# print the new value of the reservoir_volume variable
print("reservoir_volume: {:.0f}".format(reservoir_volume))

reservoir_volume: 447627500


#### Functions

<br>
`add_numbers` is a function that takes two numbers and adds them together.

In [96]:
def add_numbers(x, y):
    return x + y

add_numbers(1, 2)

3

<br>
`add_numbers` updated to take an optional 3rd parameter. Using `print` allows printing of multiple expressions within a single cell.

In [97]:
def add_numbers(x,y,z=None):
    if (z==None):
        return x+y
    else:
        return x+y+z

print(add_numbers(1, 2))
print(add_numbers(1, 2, 3))

3
6


<br>
`add_numbers` updated to take an optional flag parameter.

In [98]:
def add_numbers(x, y, z=None, flag=False):
    if (flag):
        print('Flag is true!')
    if (z==None):
        return x + y
    else:
        return x + y + z
    
print(add_numbers(1, 2, flag=True))

Flag is true!
3


<br>
Assign function `add_numbers` to variable `a`.

In [99]:
def add_numbers(x,y):
    return x+y

a = add_numbers
a(1,2)

3

>**Question**: 
Complete the following.

Complete the steps below.

In [103]:
# Write calc_mean(x,y,z) that returns the mean of 3 numbers.
def calc_mean(x,y,z):
    return (x + y + z) / 3
# Assign calc_mean to a variable 'avg'
avg = calc_mean
# Execute avg with 1, 3, and 5; display the results
avg(*[1,3,5])

3.0

#### Types


Use `type` to return the object's type.

In [104]:
type('This is a string')

str

In [105]:
type(None)

NoneType

In [106]:
type(1)

int

In [107]:
type(1.0)

float

In [108]:
type(add_numbers)

function

>**Question**: 

In the fishy situation below, some of the quantities are of type int and some are of type float. 

Identify the ones that should be of type float.

- How many people came on your fishing trip?  **Int**
- Length of a fish caught, in meters. **float**
- Number of fish caught on a fishing trip. **int** or float if you fish with my friends
- Length of time it took to catch the first fish, in hours. **int**

#### Answer here:
- How many people came on your fishing trip? 

        int
- Length of a fish caught, in meters.  

        float
- Number of fish caught on a fishing trip.  

        int
- Length of time it took to catch the first fish, in hours. 

        int


#### Sequences

`Tuples` are an immutable (cannot be altered) data structure.

In [109]:
x = (1, 'a', 2, 'b')
type(x)

tuple

In [112]:
tuple??

In [113]:
help(tuple)

Help on class tuple in module builtins:

class tuple(object)
 |  tuple() -> empty tuple
 |  tuple(iterable) -> tuple initialized from iterable's items
 |  
 |  If the argument is a tuple, the return value is the same object.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __getnewargs__(...)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self

`Lists` are a mutable data structure. 

Types can be mixed within a list.

In [110]:
x = [1, 'a', 2, 'b']
type(x)

list

<br>
Use `append` to append an object to a list.

In [111]:
x.append(3.3)
print(x)

[1, 'a', 2, 'b', 3.3]


<br>
This is an example of how to loop through each item `in` the list.

In [None]:
for item in x:
    print(item)

<br>
Or using the indexing `[]` operator:

In [None]:
i=0
while( i != len(x) ):
    print(x[i])
    i = i + 1

<br>
Use `+` to concatenate lists.

In [None]:
[1,2] + [3,4]

<br>
Use `*` to repeat lists.

In [None]:
[1]*3

<br>
Use the `in` operator to check if something is inside a list.

In [None]:
1 in [1, 2, 3]

>**Question**: 
Print out the following list in alphabetic order and reverse alphabetic order by the entire name:


In [None]:
x = ['Larry Page', 'Bill Gates', 'Mark Zuckerberg', 'Jeff Bezos']


`Strings.` Use bracket notation to `slice` a string.

In [None]:
x = 'This is a string'
print(x[0]) #first character
print(x[0:1]) #first character, but we have explicitly set the end character
print(x[0:2]) #first two characters


<br>
This will return the last element of the string.

In [None]:
x[-1]

<br>
This will return the slice starting from the 4th element from the end and stopping before the 2nd element from the end.

In [None]:
x[-4:-2]

<br>
This is a slice from the beginning of the string and stopping before the 3rd element.

In [None]:
x[:3]

<br>
And this is a slice starting from the 3rd element of the string and going all the way to the end.

In [None]:
x[3:]

Different ways of printing a string.

In [None]:
firstname = 'Sergey'
lastname = 'Brin'

print(firstname + ' ' + lastname)
print(firstname, lastname)
print(lastname, firstname, sep=',')
print(firstname*5)
print('Larry' in firstname)
print('Sergey' in firstname)

<br>
`split` returns a list of all the words in a string, or a list split on a specific character.

In [None]:
firstname = 'William Henry Gates III'.split(' ')[0] # [0] selects the first element of the list
lastname = 'William Henry Gates III'.split(' ')[-2] # [-1] selects the last element of the list
print(firstname)
print(lastname)

<br>
Make sure you convert objects to strings before concatenating. Note: TypeError expected.

In [None]:
'Bill' + 2

In [None]:
'Bill' + str(2)

<br>
Dictionaries associate keys with values.

In [None]:
x = {'Larry Page': 'larry@google.com', 'Bill Gates': 'gates@microsoft.com' , 'Mark Zuckerberg': 'zuck@fb.com'}
x['Larry Page'] # Retrieve a value by using the indexing operator


In [None]:
x['Jeff Bezos'] = None
x['Jeff Bezos']

In [None]:
x['Jeff Bezos'] = 'drevil@amazon.com'
x['Jeff Bezos']

Iterate over all of the keys:

In [None]:
for name in x:
    print(x[name])


Iterate over all of the values:

In [None]:
for email in x.values():
    print(email)

Iterate over all of the (key, value) items:

In [None]:
for name, email in x.items():
    print(name)
    print(email)

You can unpack a sequence into different variables:

In [None]:
x = ('Bill', 'Gates', 'gates@gmail.com')
fname, lname, email = x

In [None]:
fname

In [None]:
lname

Make sure the number of values you are unpacking matches the number of variables being assigned.

In [None]:
x = ('Bill', 'Gates', 'gates@gmail.com', 'billy@msoe.edu')
fname, lname, email = x

#### String format

Python has a built in method for convenient string formatting.

In [None]:
sales_record = {
'price': 3.24,
'num_items': 4,
'person': 'Jay'}

print(sales_record['num_items'])

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'

print(sales_statement.format(sales_record['person'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items']*sales_record['price']))


>**Question**: 

Uses the sales_record dictionary above, print out the average price per item cost for user Jay as shown in the output of the solution cell below:

Reference:  
https://docs.python.org/3/library/string.html#format-specification-mini-language

In [None]:
# Your answer here



#### Partial list of debugging commands

There are many more available commands for interactive debugging than we've listed here; the following table contains a description of some of the more common and useful ones:

| Command         |  Description                                                |
|-----------------|-------------------------------------------------------------|
| ``list``        | Show the current location in the file                       |
| ``h(elp)``      | Show a list of commands, or find help on a specific command |
| ``q(uit)``      | Quit the debugger and the program                           |
| ``c(ontinue)``  | Quit the debugger, continue in the program                  |
| ``n(ext)``      | Go to the next step of the program                          |
| ``<enter>``     | Repeat the previous command                                 |
| ``p(rint)``     | Print variables   

In [None]:
def funky(a, b):
    return a / b

funky(5,0)

In [None]:
%xmode plain

In [None]:
funky(5,0)

In [None]:
%xmode verbose

In [None]:
funky(5,0)

In [None]:
%debug