# Data Science with Python 

## Python Review





# Before we begin, be sure you have Python 3 installed!

1. Download from Anaconda
 * go to https://www.anaconda.com/products/individual at the bottom of the page
 
You can use Jupyter Notebooks or your IDE of choice.  These days I like VS Code, but there are many good choices.

# Python Fundamentals



## *Dynamic* typing, no declarations

In [None]:
x = 3.9
print(x)
x = "Python"
print(x)

## ...but strongly typed

In [None]:
x = 'hello'
y = x + 1
y

## if/elif/else

In [None]:
x = 17

if x == 1: # no parens needed around expression
    print('hey, x is 1')
    print('foo')
elif x < 10:
    print('x is less than 10 and not 1')
else:
    print('x >= 10')
        

## Functions

In [None]:
def myfunc(x):
    print('do something', x)
    if x == 1:
        return True
    else:
        return 'abc'
    
print(myfunc(4.8))

## ...functions return __`None`__ if return not invoked

In [None]:
def myfunc(x):
    print('do something', x)

print(myfunc(35))

## What is __`None`__?
* it acts like __`False`__, but it's a different object
* Rookie mistake: If you get an error complaining about none, did to take the return value of function when you meant to use another value?

In [None]:
if None is False:
    print('no! None is not False')
id(None), id(False)

## Iterating with __`for`__ loops
* iterating through a container
If you come from a language with numeric iterators, this is weird.

In [None]:
# This looks normal, right?
for num in range(25):
    print(num, end=' ')

In [None]:
# This is Pythonic
mylist = 'small medium large'.split()
for size in mylist:
    print(size)

In [None]:
# This is NOT Pythonic
mylist = 'small medium large'.split()
for i in range(len(mylist)):
    print(mylist[i])

## Modules
* just files of Python code
* export variables, functions, and/or classes

In [None]:
# this code lives in mymodule.py
def dummy():
    return 45
   
public_data = "public stuff!"
_private_data = "private stuff!"
print('__name__ =', __name__)

# If this code is being *run*, then __name__ will be '__main__'
if __name__ == '__main__':
    # test dummy
    if dummy() == 45:
        print('success')

In [None]:
!python3 mymodule.py
# The above runs a command in the shell, outside of the notebook

In [None]:
import mymodule
mymodule.dummy() # must preface identifiers with module name

In [None]:
mymodule._private_data, mymodule.public_data

In [None]:
from mymodule import public_data as thismodule_data

# Python Datatype Overview

## Ints, Floats and Booleans
They are quite standard

## Strings
* can use single or double quotes
* triple quotes (single or double) allow multi-line strings

In [None]:
s = "The embedded apostrophe isn't a problem!"
print(s)
s = 'Nor are embedded "quotes"'
print(s)
s = "This string is \"more difficult\" to read"
print(s)

In [None]:
s = '''A man,
a plan, 
a canal: Panama'''
print(s)

In [None]:
# What are all the __methods__?
dir(str)

## Lists

A real workhorse of python.

* ordered
* mutable
* comma-separated values in []
* types can be mixed
* append(), extend(), pop(), remove()
* clear(), copy(), sort(), reverse()
* count(), index()

In [None]:
years = [1215, 1620, 1812, 1941]
weird_list = [1, 'two', (3, 4), False]
[years[1], weird_list[2]]

### List comprehensions:

When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:

In [None]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares)

[0, 1, 4, 9, 16]


You can make this code simpler using a list comprehension:

In [None]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares)

[0, 1, 4, 9, 16]


List comprehensions can also contain conditions:

In [None]:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares)

[0, 4, 16]


## Tuples
* immutable
* generally imply some structure
* parens not required when declaring
* used in the very pythonic unpacking

In [None]:
# empty tuple
t = ()
print(t)
# singleton tuple
t = 1,
print(t)

def unpack():
    return "Spam", 42, len

a, b, c = unpack()
print(a,b,c)

## Sets
* unordered
* no duplicates
* implement set logic: Union, Intersection, etc.

In [None]:
t = set()
type(t)

In [None]:
even = { 2, 4, 6 }
print(even)
even.add(8)
even.add(2)
print(even)

In [None]:
prime = set([int(x) for x in '2357'])
print(prime)
print('all numbers =', prime | even)
print('even primes =', prime & even)


In [None]:
7 in prime

## Dictionaries
* unordered list of key/value pairs
* associative array, hash, etc.
* very fast at retrieving items from big data sets that fit in memory

In [None]:
d = { 'red': 0, 'blue': 1, 'green': 2 }
d['blue'] = 9
d['yellow'] = -1
print(d)

In [None]:
d = {}
d['tall'] = 12
d['grande'] = 16
d['venti'] = 20
print(d)

In [None]:
# keys() function is a view, which is dynamic
#keys = d.keys()
#Python2_keys = list(d.keys())
# a snapshot of the keys() gives us a static list
print('keys are', d.keys())
print('values are', d.values())
print('items are', d.items())

In [None]:
print(d.keys())

In [None]:
# now add to the dict...
d['trenta'] = 31
d.keys()

In [None]:
aa = d.keys()
print(aa)
d['lots'] = 64
print(aa)

In [None]:
# not all elements need to have the same type
d['happy'] = ['pooh', 'bear']

In [None]:
d

In [None]:
# You cannot use mutable types or other unhashables
# This is a gotcha.  Lists cannot be a key.  
d[['pooh', 'bear']] = 'tigger'

In [None]:
# If you already have frozendict then this is not necessary
!conda install -y frozendict

# In nested json data you frequently parsing nested dicts. 
# Frozendict can help.  This is common in dealing with big datasets.

In [None]:
from frozendict import frozendict

fd = frozendict({ 'pooh': 'bear' })

print(fd)
d[fd] = 'tigger'

d[fd]

### Classes

The syntax for defining classes in Python is straightforward:

In [None]:
class Greeter:

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
          print('HELLO, {}'.format(self.name.upper()))
        else:
          print('Hello, {}!'.format(self.name))

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

Hello, Fred!
HELLO, FRED


### Readability 

Python code is often said to be almost like pseudocode, since it allows you to express powerful ideas in very few lines of code, while being *readable*. As an example, here is an implementation of the classic quicksort algorithm in Python:

In [44]:
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)

print(quicksort([3,6,8,10,1,2,1]))

[1, 1, 2, 3, 6, 8, 10]
