# Data Science with Python 

## Python Review





# Before we begin, be sure you have Python 3 installed!

1. Download from Anaconda
 * go to https://www.anaconda.com/products/individual at the bottom of the page
 
You can use Jupyter Notebooks or your IDE of choice.  These days I like VS Code, but there are many good choices.

# Python Fundamentals



## *Dynamic* typing, no declarations

In [1]:
x = 3.9
print(x)
x = "Python"
print(x)

3.9
Python


## ...but strongly typed

In [2]:
x = 'hello'
y = x + 1
y

TypeError: can only concatenate str (not "int") to str

## if/elif/else

In [4]:
x = 1

if x == 1: # no parens needed around expression
    print('hey, x is 1')
    print('foo')
elif x < 10:
    print('x is less than 10 and not 1')
else:
    print('x >= 10')
        

hey, x is 1
foo


## Functions

In [None]:
def myfunc(x):
    print('do something', x)
    if x == 1:
        return True
    else:
        return 'abc'
    
print(myfunc(4.8))

## ...functions return __`None`__ if return not invoked

In [5]:
def myfunc(x):
    print('do something', x)

print(myfunc(35))

do something 35
None


## What is __`None`__?
* it acts like __`False`__, but it's a different object
* Rookie mistake: If you get an error complaining about none, did to take the return value of function when you meant to use another value?

In [6]:
if None is False:
    print('no! None is not False')
id(None), id(False)

(4478040168, 4477940112)

## Iterating with __`for`__ loops
* iterating through a container
If you come from a language with numeric iterators, this is weird.

In [7]:
# This looks normal, right?
for num in range(25):
    print(num, end=' ')

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 

In [8]:
# This is Pythonic
mylist = 'small medium large'.split()
for size in mylist:
    print(size)

small
medium
large


In [9]:
# This is NOT Pythonic
mylist = 'small medium large'.split()
for i in range(len(mylist)):
    print(mylist[i])

small
medium
large


## Modules
* just files of Python code
* export variables, functions, and/or classes

In [None]:
# this code lives in mymodule.py
def dummy():
    return 45
   
public_data = "public stuff!"
_private_data = "private stuff!"
print('__name__ =', __name__)

# If this code is being *run*, then __name__ will be '__main__'
if __name__ == '__main__':
    # test dummy
    if dummy() == 45:
        print('success')

In [10]:
!python3 mymodule.py
# The above runs a command in the shell, outside of the notebook

__name__ = __main__
success


In [11]:
import mymodule
mymodule.dummy() # must preface identifiers with module name

__name__ = mymodule


45

In [12]:
mymodule._private_data, mymodule.public_data

('private stuff!', 'public stuff!')

In [13]:
from mymodule import public_data as thismodule_data

# Python Datatype Overview

## Ints, Floats and Booleans
They are quite standard

## Strings
* can use single or double quotes
* triple quotes (single or double) allow multi-line strings

In [14]:
s = "The embedded apostrophe isn't a problem!"
print(s)
s = 'Nor are embedded "quotes"'
print(s)
s = "This string is \"more difficult\" to read"
print(s)

The embedded apostrophe isn't a problem!
Nor are embedded "quotes"
This string is "more difficult" to read


In [15]:
s = '''A man,
a plan, 
a canal: Panama'''
print(s)

A man,
a plan, 
a canal: Panama


In [16]:
# What are all the __methods__?
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [19]:
x = int('0001', base=10)
x

1

## Lists

A real workhorse of python.

* ordered
* mutable
* comma-separated values in []
* types can be mixed
* append(), extend(), pop(), remove()
* clear(), copy(), sort(), reverse()
* count(), index()

In [21]:
years = [1215, 1620, 1812, 1941]
weird_list = [1, 'two', (3, 4), False]
[years[1], weird_list[2:4]]

[1620, [(3, 4), False]]

### List comprehensions:

When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:

In [22]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares)

[0, 1, 4, 9, 16]


You can make this code simpler using a list comprehension:

In [23]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares)

[0, 1, 4, 9, 16]


List comprehensions can also contain conditions:

In [25]:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if (x % 2 == 0 and x > 0)]
print(even_squares)

[4, 16]


## Tuples
* immutable
* generally imply some structure
* parens not required when declaring
* used in the very pythonic unpacking

In [43]:
# empty tuple
t = ()
print(t)
# singleton tuple
t = 1,
print(t)

def unpack():
    return "Spam", 42, len

a, b, c = unpack()
print(a,b,c)

()
(1,)
Spam 42 <built-in function len>


In [45]:
aaa = len
print(len)

<built-in function len>


In [46]:
str(len)

'<built-in function len>'

In [50]:
str(len)

'0'

## Sets
* unordered
* no duplicates
* implement set logic: Union, Intersection, etc.

In [28]:
t = set()
type(t)

set

In [29]:
even = { 2, 4, 6 }
print(even)
even.add(8)
even.add(2)
print(even)

{2, 4, 6}
{8, 2, 4, 6}


In [30]:
prime = set([int(x) for x in '2357'])
print(prime)
print('all numbers =', prime | even)
print('even primes =', prime & even)


{2, 3, 5, 7}
all numbers = {2, 3, 4, 5, 6, 7, 8}
even primes = {2}


In [31]:
7 in prime

True

## Dictionaries
* unordered list of key/value pairs
* associative array, hash, etc.
* very fast at retrieving items from big data sets that fit in memory

In [32]:
d = { 'red': 0, 'blue': 1, 'green': 2 }
d['blue'] = 9
d['yellow'] = -1
print(d)

{'red': 0, 'blue': 9, 'green': 2, 'yellow': -1}


In [33]:
d = {}
d['tall'] = 12
d['grande'] = 16
d['venti'] = 20
print(d)

{'tall': 12, 'grande': 16, 'venti': 20}


In [34]:
# keys() function is a view, which is dynamic
#keys = d.keys()
#Python2_keys = list(d.keys())
# a snapshot of the keys() gives us a static list
print('keys are', d.keys())
print('values are', d.values())
print('items are', d.items())

keys are dict_keys(['tall', 'grande', 'venti'])
values are dict_values([12, 16, 20])
items are dict_items([('tall', 12), ('grande', 16), ('venti', 20)])


In [35]:
print(d.keys())

dict_keys(['tall', 'grande', 'venti'])


In [36]:
# now add to the dict...
d['trenta'] = 31
d.keys()

dict_keys(['tall', 'grande', 'venti', 'trenta'])

In [37]:
aa = d.keys()
print(aa)
d['lots'] = 64
print(aa)

dict_keys(['tall', 'grande', 'venti', 'trenta'])
dict_keys(['tall', 'grande', 'venti', 'trenta', 'lots'])


In [38]:
# not all elements need to have the same type
d['happy'] = ['pooh', 'bear']

In [39]:
d

{'tall': 12,
 'grande': 16,
 'venti': 20,
 'trenta': 31,
 'lots': 64,
 'happy': ['pooh', 'bear']}

In [40]:
# You cannot use mutable types or other unhashables
# This is a gotcha.  Lists cannot be a key.  
d[['pooh', 'bear']] = 'tigger'

TypeError: unhashable type: 'list'

In [41]:
# If you already have frozendict then this is not necessary
!conda install -y frozendict

# In nested json data you frequently parsing nested dicts. 
# Frozendict can help.  This is common in dealing with big datasets.

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.10.1
  latest version: 4.10.3

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [42]:
from frozendict import frozendict

fd = frozendict({ 'pooh': 'bear' })

print(fd)
d[fd] = 'tigger'

d[fd]

frozendict({'pooh': 'bear'})


'tigger'

### Classes

The syntax for defining classes in Python is straightforward:

In [51]:
class Greeter:

    class_var = 5
    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
          print('HELLO, {}'.format(self.name.upper()))
        else:
          print('Hello, {}!'.format(self.name))

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

Hello, Fred!
HELLO, FRED


In [53]:
g2 = Greeter('Praveen')
Greeter.greet(g2)

Hello, Praveen!


### Readability 

Python code is often said to be almost like pseudocode, since it allows you to express powerful ideas in very few lines of code, while being *readable*. As an example, here is an implementation of the classic quicksort algorithm in Python:

In [44]:
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)

print(quicksort([3,6,8,10,1,2,1]))

[1, 1, 2, 3, 6, 8, 10]
