# 1 Introducing Python with Jupyter

There are many ways to install Python. For Data Science, the easiest way to install both Python and the required packages is through the Anacanda distribution.

Anaconda comes with several core elements:

* A most recent version of Python 3 or Python 2 interpreter
* `conda`, the package and environment manager for Python
* An extensive collection of Python Data Science modules, including numpy, scipy and sklearn.


The idea behind notebooks is that DOCUMENTATION comes first:
* most of the area is dedicated to notes, and graphs, plots, etc.
* Relatively *few* lines of code, overall

Where in a std. software program, ~10k+ is common, while for data science, ~50 to 100 lines is adequate.

If you need lots of lines of code, these are often put in an external file (module/library), with the notebook the output.

__IMPORTANT THINGS TO NOTE__:
* run order matters
* must re-run earlier cells if later ones depend on them
* jupyter lists run order next to cell 

## 1.1 Jupyter Commands

To navigate effectively around your Jupyter notebook, there are a number of useful keyboard shortcuts:

When in a Code cell:
* To run the code in the cell - CTRL ENTER
* To enter command mode('exit' the cell) - ESC
* To get code suggestions/auto-complete statements - TAB

When in a Markdown Cell:
* To create heading fony sizes - \# (the more the smaller the heading)
* To make a list - \*
* To run/compile the markdown - CTRL ENTER

When in Command mode:
* To enter edit mode - ENTER
* To navigate between cells - Up/Down arrow keys
* To create a cell above - a
* to create a cell below - b
* To delete a cell - dd
* To cut a cell - x
* To make a cell markdown - m
* To make cell code - y

## Python Overview
* data 
* variables
* operations
* simple functions
* objects
* defining functions
* data structure operations
* libraries
* conditions
* loops

* later on: numpy, pandas, matplotlib, sklearn 

# 2 Data In Python

## 2.1 Values and Objects - OOP

All computer programs manipulate data, and thus every programming language provides a means for the programmer to represent data, as well as to reference data for later use.

A __value__ in a program is a syntax term which appears on the right-hand-side (RHS) of an assigment operation.

An __object__, in programming sense, is a type of in-memory data structure which contains not only the state or value of the entity the structure tries to represent (often referred to as "fields"), but also a set of actions the structure can perform (often referred to as "methods").

Some of these methods could change the state of the fields within the object, while some could output information about this object; some can do both.

Objects in programs are supposed to mimick real-life objects, in the sense that real-life objects have both states and things they can do. 

For example, a real-life pen can have a length, and a colour, these are its state. Two pens can have different length or colour—thus different states. A pen can also write things on paper. And the write functionality also depends on the colour of the pen—a red pen would draw a red line for example.

In a program, an in-memory object representing a pen would have the same states (or fields/attributes), length and colour, and it will also have a method write(), whose behaviour is dependent on colour.

Objects always have a type. There is a distinction between a particular red pen (an "instance"), and the type of pen. We can have multiple objects of the same type: we can have 2 red pens, 3 blue pens and so on, but a pen would always have a length, a colour, and the functionality of being able to write. On the other hand, a bicyle will have entirely different set of fields and methods.

The object data structure allows a programer to write programs by treating it as a machine with many components, with each component being an object. And just like in real-word, different "machines" can share same components. Writing programs therefore is about definging the different object types, and creating instances of those types.

In Python, all values are objects — in other words, all values in python are stored in memory with both a set of states and a set of methods.

## 2.2 Helpful Python built-in functions

* `print()` converts any Python object to a string and print to standard output. For most built-in types, it will print out the expected human-readable values. For example, text objects (strings) will have their text printed out, and numbers will have their numerical values printed out.

In [1]:
print("Hello World")

* `id()` returns the unique memory location where the object is stored

In [2]:
obj = 'Bob'
print(obj)
print(id(obj))

* `type()` returns a reference to the type (referred to as a "class") of the object

In [3]:
print(type(obj))

* dir() returns the set of fields and methods associated with the object

In [4]:
print(dir(obj))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


Object fields and methods can be referenced using dot .. For example, text in Python are represented as string (str) objects, and every str object has a method called upper(), which outputs the upper-case version of text:

In [5]:
new = obj.upper()
print(new)

BOB


## Exercise:

* #### assign your name to the variable called "name"
* #### print out the id of the string object you have created
* #### assign your occupation to the variable called "job"
* #### Are the two objects the same? Check id of job
* #### print the type of name
* #### use dir() to list the attributes available to the object associated to name
* ####  can we find the String method lower? Use help in REPL to look at what does lower() do
* #### apply the lower() method to name and job, and print the output

## Solution

In [6]:
# TODO 1: assign your name to the variable called "name"
name = 'Thomas Holmes'

# TODO 2: print out the id of the string object you have created
print(id(name))

# TODO 3: assign your occupation to the variable called "job"
job = 'Trainer'

# TODO 4: Are the two objects the same? Check id of job
print(id(job))

# TODO 5: print the type of name
print(type(name))

# TODO 6: use dir() to list the attributes available to the object
# associated to name
print(dir(name))

# TODO 7: can we find the method lower? Use help in REPL to look at
# what does lower() do
help(name.lower)

# TODO 8: apply the lower() method to name and job, and print the
# output
print(name.lower())
print(job.lower())

4555279664
4555279728
<class 'str'>
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
Help on built-in function lower

# Types

In [7]:
print(5)
print(5.0)
print("5")
print(True)
print(["Lovers", "Haters", "Needers"])

5
5.0
5
True
['Lovers', 'Haters', 'Needers']


In [8]:
print(type(5))
print(type(5.0))
print(type("5"))
print(type(True))
print(type(["Lovers", "Haters", "Needers"]))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
<class 'list'>


# Operations

###### Highlight the Python code and Click Play to execute the code snippet

In [9]:
print( "Thomas " + "Holmes")
print( 2 * 3 )
print( 2 ** 3 )    # 2 * 2 * 2 = 2^3
print( "-" * 3 )
print( "@" in "kofi.glover@qa.com")
print( True and False )
print( ["Eggs", "Butter"] + ["Flour", "Sugar"] )
print( 1.14 <=  2.13 )

Thomas Holmes
6
8
---
True
False
['Eggs', 'Butter', 'Flour', 'Sugar']
True


## Exercise:
* Take the number 1
* Add True to it
* Compute the 4th power of the result
* Convert the result to a string
* 'Add' the string 3 to this
* Convert the result to a float
* Find the remainder when you divide this number by 3
#### What is the final number?

# Variables

###### Highlight the Python code and Click Play to execute the code snippet

In [10]:
name = "Thomas"
age = 24
height = 1.77
hobbies = ["Data Science", "Reading", "Cycling", "Eating"]
is_alive = True

print(name, age, height)  # print with multiple args = spaced output
print(hobbies)
print(name + " is alive: " + str(is_alive))

mean_yearly_growth = 1.77/24 * 100
print("Mean centimeters grown per year = " + str(mean_yearly_growth))

fmesg = f"{name} is {age} and {height * 100} cm" # formatted string

print(fmesg)

Thomas 24 1.77
['Data Science', 'Reading', 'Cycling', 'Eating']
Thomas is alive: True
Mean centimeters grown per year = 7.375
Thomas is 24 and 177.0 cm


In [11]:
age += 1

age   # the last line of a jupyter cell is always printed

25

###### Highlight the Python code and Click Play to execute the code snippet

### Simple Functions

Functions are blocks of code which perform some action (procedure) that we can call upon. They may be built-in or user defined. They may return an output, or they may not.

output = fn(input)

returnValue = procedure(requirements)

makes a new value

these are algorithms, not "relationships between mathematical variables"

In [12]:
name = "Thomas"

print(name)          # output the value of name
print( id(name) )    # output the memory location of the value of name
print( type(name) )  # output the type of the value of name
print( len(name) )   # output the length of (the value of) name 
print(print(name))

dir(__builtin__)[80:]

Thomas
4520294448
<class 'str'>
6
Thomas
None


['abs',
 'all',
 'any',
 'ascii',
 'bin',
 'bool',
 'breakpoint',
 'bytearray',
 'bytes',
 'callable',
 'chr',
 'classmethod',
 'compile',
 'complex',
 'copyright',
 'credits',
 'delattr',
 'dict',
 'dir',
 'display',
 'divmod',
 'enumerate',
 'eval',
 'exec',
 'filter',
 'float',
 'format',
 'frozenset',
 'get_ipython',
 'getattr',
 'globals',
 'hasattr',
 'hash',
 'help',
 'hex',
 'id',
 'input',
 'int',
 'isinstance',
 'issubclass',
 'iter',
 'len',
 'license',
 'list',
 'locals',
 'map',
 'max',
 'memoryview',
 'min',
 'next',
 'object',
 'oct',
 'open',
 'ord',
 'pow',
 'print',
 'property',
 'range',
 'repr',
 'reversed',
 'round',
 'set',
 'setattr',
 'slice',
 'sorted',
 'staticmethod',
 'str',
 'sum',
 'super',
 'tuple',
 'type',
 'vars',
 'zip']

## Exercise:
* Pick one of the listed functions
* Call the help function on it
* Try to use it (If it looks complicated then pick another one!)

## Solution:

In [13]:
help(pow)

Help on built-in function pow in module builtins:

pow(x, y, z=None, /)
    Equivalent to x**y (with two arguments) or x**y % z (with three arguments)
    
    Some types, such as ints, are able to use a more efficient algorithm when
    invoked using the three argument form.



# Objects

###### Highlight the Python code and Click Play to execute the code snippet

In [14]:
# data.operation(requirements)
# obj.method(parameters)
# ask name to upper() itself 
# ask name if it startswith(M)

print( name.upper()  )
print( name.lower()   )
print( name.startswith("T") )
print( name.endswith("T") )

# all data in python is an object
# objects are data structures:  values (properties), types (class), id, methods

THOMAS
thomas
True
False


In [15]:
dir(name)[-5:] # lists the last 5 attributes & methods of the object 

['swapcase', 'title', 'translate', 'upper', 'zfill']

###### Highlight the Python code and Click Play to execute the code snippet

## Exercise 1:
* define variables of each type mentioned
* they should describe you (name, age, location, etc.)

* print these out
* print all strings in upper case
* print whether your age is over 18
* print 10 dashes

# Functions

###### Highlight the Python code and Click Play to execute the code snippet

### Defining Functions
* algorithm can be used to calculate the value of a mathematical function...

* error(pred, obv) = (pred - obv)^2  (known as the MSE, or, Mean Square Error)

* def error(pred, obv)                <- LHS of math
* return (pred - obv)^2               <- RHS of math  (return aprox., = )

* return actually means store calcuated value in memory 

In [16]:
def error(pred, obv):
    return (pred - obv) ** 2

error(3, 3.3)

0.0899999999999999

* indendation groups operations together
* def defines a function
* parameters are listed after the function name
* one new line after the definintion ends the def. 
* notice colon before indentation

###### Highlight the Python code and Click Play to execute the code snippet

In [17]:
# functions = procedures
# can also not return anything

def show_results(results):
    print("-" * 10)
    print(results)
    print("-" * 10)
    
show_results([12, 12, 15])   # writes to screen, but has no return value

def distance(x1, x2):
    return (x2 - x1) ** 2   # euclidean distance, aka. L2 norm
    

dist = distance(10, 12)
print(dist * 1.1)  # calculated value can be stored in variable


rtn = show_results([10, 12])

print(type(rtn))
print(rtn) # nothing is stored here, no return value

----------
[12, 12, 15]
----------
4.4
----------
[10, 12]
----------
<class 'NoneType'>
None


###### Highlight the Python code and Click Play to execute the code snippet

### Exercise 2:
define a function called :

* mean which takes three parameters and returns their mean
* cube which cubes its first argument
* is_adult which says whetehr its first argument is more than 18 



* define three variables:   
* mean_ages  which is mean of 18,18,20 
* two_later which is 2 cubed
* teen_is_adult which is whether an age of 15 is adult


* define function show() 
* which prints the three variables above 

###### Highlight the Python code and Click Play to execute the code snippet

In [18]:
def mean(x, y, z):
    return (x + y + z)/3

def cube(x):
    return x ** 3

def is_adult(age):
    return age >= 18


def show(m, c, a):
    print("mean:", m)
    print("cube:", c)
    print("age:", a)
    
mean_ages = mean(18,18,20)
two_late = cube(2)
teen = is_adult(15)

show(mean_ages, two_late, teen)

mean: 18.666666666666668
cube: 8
age: False


###### Highlight the Python code and Click Play to execute the code snippet

### Data Structures
* strings - groups of characters
* lists - ordered groups of data where each element is indexed by an int
* sets - unordered groups of data where there is no indexing
* tuples - uneditable (immutable) groups of data where elements are int-indexed
* dictionaries - groups of data where indexes are chosen by you

###### Highlight the Python code and Click Play to execute the code snippet

In [19]:
# strings

quote = "Be the change you wish to see in the world!"

print( quote[0] )   # first
print( quote[1] )   # second
print( quote[-2] )  # second from last
print( quote[-1] )  # last

B
e
d
!


In [20]:
print( quote[0:2] ) 

Be


In [21]:
print( quote[0:-6] ) # 

Be the change you wish to see in the 


In [22]:
print( quote[0:-6]  + "bedroom" )

Be the change you wish to see in the bedroom


In [23]:
print(quote[:2])   
print(quote[-6:] )  # 

Be
world!


In [24]:
quote[::2]

'B h hneyuws osei h ol!'

###### Highlight the Python code and Click Play to execute the code snippet

In [25]:
# tuple

point = (10, 20, 30)

print( point[0] )
print( point[1] )
print( point[-1] )

print( point[0:2] ) #slice, as with strings

# point[0] = 15 # error: not allowed to overwrite

# technically, () not required...

address = "OldSt", "London"

print(address)

10
20
30
(10, 20)
('OldSt', 'London')


###### Highlight the Python code and Click Play to execute the code snippet

In [26]:
# lists
# y target customer satisfaction
# x customer features 
# (days-since-first-purchase, total-spent, nearest-store, addresss)
#  

x = [300, 1000, "London", ("Old Street", "London")]

print(x)
print(len(x))

print(x[-1])
print(len(x[-1]))



[300, 1000, 'London', ('Old Street', 'London')]
4
('Old Street', 'London')
2


In [27]:
x.append(1)
x

[300, 1000, 'London', ('Old Street', 'London'), 1]

In [28]:
x.pop()

1

In [29]:
print(x)
x.insert(0, 1) # insert at postn 0, the element 1

print(x)

[300, 1000, 'London', ('Old Street', 'London')]
[1, 300, 1000, 'London', ('Old Street', 'London')]


### Sets

###### Highlight the Python code and Click Play to execute the code snippet

### Using Lists in Functions

In [30]:
def error(y_pred, y, i):
    return (y_pred - y[i]) ** 2

In [31]:
y = [2, 3, 5, 8]
guess = 2.2

error(guess, y, 1)   #   (2.2 - 3) ** 2

0.6399999999999997

###### Highlight the Python code and Click Play to execute the code snippet

## Exericse 3:  Collections
* define a tuple "weekday" which contains the first letter of each weekday(in order)
* try to add a new day to the week (expect an error)
* print out the first, last and middle two days


* define a list "cart" which is a shopping cart
* add several items to it
* print out the first, last and middle two items


* Define a set containing 3 unique numbers
* check the set contains one of the numbers using conatins/in
* try to insert a new number
* Create a set containing 5 numbers, 3 of which are the same
* print the above


* Try to add a new item at the start of eac list

###### Highlight the Python code and Click Play to execute the code snippet

### Dictionaries
* key-value data structures
* where the keys are defined by you (generally strings)

In [32]:
user = {
    "name": "Thomas",
    "age": 24,
    "location": "uk"
}

print(user["name"])     # use string keys to look up value rather than int index
print(user["age"])
print(user["location"])

Thomas
24
uk


In [33]:
# data science example: labelling for Fraud|NotFraud
# dict keys can be lots of diff. thigns, not just strings...
# but must be unique!


# key = (age, days-since-purchase-of-insurance)  
user = {
    (18, 13) : "Fraud",
    (60, 300) : "NotFraud"
}

user[(18, 13)]

'Fraud'

In [34]:
# dictionaries more commonly are more like matrices...

users = {
    "age-at-purchase": [18, 60],
    "days-from-purcahse": [13, 300]
}

ages = users['age-at-purchase']

sum(ages)/len(ages)

39.0

## Exercise:
Create a dictionary containing a list of foods you like, and a list of foods you do not like (3 will do)

### Control Flow

###### Highlight the Python code and Click Play to execute the code snippet

In [35]:
user_age = 18

if user_age > 65:                      # colons
    print("See Retirement Plans")      # indentation
elif user_age > 21:                    # keyword, elif
    print("See Vocation Plans")
elif user_age > 13:
    print("See Education Plans")
else:
    print("See your mother!")
    

See Education Plans


###### Highlight the Python code and Click Play to execute the code snippet

In [36]:
# while loops are rare, usually bad -- repeating

ratings = [5,5,6,7,8,1]

while len(ratings) > 0:
    print(ratings.pop())    # remove last one
    
    

1
8
7
6
5
5


In [37]:
ratings

[]

###### Highlight the Python code and Click Play to execute the code snippet

In [38]:
# for loop -- data processing loop

ratings = [5,5,6,7,8,1]

for element in ratings:      # for name-of-each-element  in source-data-input
    print(element)           # algorithm for processing each-element
    
ratings

5
5
6
7
8
1


[5, 5, 6, 7, 8, 1]

## Comprehensions!

## Exercise: FizzBuzz Test
* Write a program that prints numbers from 1 to 100. 
* For multiples of three print “Fizz” instead of the number
* For the multiples of five print “Buzz”.
* For numbers which are multiples of both three and five print “FizzBuzz”."

__Scoring__: (200 - number of characters in code)/100

* If finished: output all elements to a list and count the number you have of each element.
* If doubly finished, try to do the same for the numbers between -50 and 50

# Classes

Classes are a central component of OOP. They allow you to define your own objects, complete with their own methods and attributes. These can be useful when you wish to create a structure carrying out a number of related functions.

Below, we define a simple class which containing a print method.

In [39]:
class Simple:
    def __init__(self):
        self.value = 3
    def print_method(self):
        print(self.value)

#Instantiate Class
simple_class = Simple()

# Observe type
print(type(simple_class))

# Print attribute/value
print(simple_class.value)

# use method
simple_class.print_method()

# Change attribute
simple_class.value = 17

# print again
simple_class.print_method()

# Add new attribute
simple_class.new_var = 'this is new'

print(simple_class.new_var)

# Familiar friends
dir(Simple)

<class '__main__.Simple'>
3
3
17
this is new


['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'print_method']

A word of warning when it comes to defining variables within a class.

In [40]:
class Dog:

    tricks = []             # mistaken use of a class variable

    def __init__(self, name):
        self.name = name

    def add_trick(self, trick):
        self.tricks.append(trick)

d = Dog('Fido')
e = Dog('Buddy')
d.add_trick('roll over')
e.add_trick('play dead')
d.tricks                # unexpectedly shared by all dogs (you can teach every dog every trick!!!! No matter how old)

['roll over', 'play dead']

We can build on the above, defining variables which we wish to be defined on instantiation

In [41]:
class LessSimple:
    
    def __init__(self, essential_arg):
        self.essential_arg = essential_arg
        self.inessential_arg = 'We just have this arg by default'
        self.value = 3
        
    def print_method(self):
        print('essential_arg\t:', self.essential_arg,
              '\ninessential_arg\t:', self.inessential_arg,
              '\nvalue\t:', self.value)
        
    def mult_by_3(self, number):
        return number * 3
        
ls = LessSimple("We have to define our essential arg")
ls.print_method()
print(ls.value)
print(ls.mult_by_3(4))

essential_arg	: We have to define our essential arg 
inessential_arg	: We just have this arg by default 
value	: 3
3
12


## Exercise:
* Define a class which describes a pen, with a single attribute/value and single a method which adds a hyphen to a list

## Libraries
* import to include a library
* == python file which defines some functions (etc.)


## Importing Your Own Modules

In [42]:
import mymodule as m

ic = m.ImportedClass()

ic.give_proof()

m.imported_method()

print(type(ic))

ModuleNotFoundError: No module named 'mymodule'

## Importing Other Modules

In [None]:
import os
os.listdir('.')

In [None]:
import sys
sys.platform

In [None]:
import re
quote = "Be the change you wish to see in the etc."
print(quote)
re.findall(r"\w+", quote)  # r will escape all backslashes, so they arent interpred as, eg., new lines

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt  

# std. aliases which make using libs easier

fast_array = np.array([1, 2, 3])
table = pd.DataFrame(
    {
        "ages": [10, 18, 30],
        "weights": [50, 70, 80]
    }
)

In [None]:
fast_array

In [None]:
table

In [None]:
plt.scatter(table["ages"], table["weights"])

### Dropping Prefixes

In [None]:
from os import listdir
from numpy.random import random as rn  # mathematicians like short names, makes math clearer

listdir('.')  # this is os.listdir

In [None]:
rn() # numpy.random.random