# 1 Introducing Python with Jupyter

There are many ways to install Python. For Data Science, the easiest way to install both Python and the required packages is through the Anaconda distribution.

Anaconda comes with several core elements:

* A most recent version of the Python 3 interpreter
* `conda`, the package and environment manager for Python
* An extensive collection of Python Data Science modules, including numpy, scipy and sklearn.


The idea behind notebooks is that DOCUMENTATION comes first:
* most of the area is dedicated to notes, and graphs, plots, etc.
* Relatively *few* lines of code, overall

Where in a std. software program, ~10k+ is common, while for data science, ~50 to 100 lines is adequate.

If you need lots of lines of code, these are often put in an external file (module/library), with the notebook the output.

__IMPORTANT THINGS TO NOTE__:
* run order matters
* must re-run earlier cells if later ones depend on them
* jupyter lists run order next to cell 

## 1.1 Jupyter Commands

To navigate effectively around your Jupyter notebook, there are a number of useful keyboard shortcuts:

When in a Code cell:
* To run the code in the cell - CTRL ENTER
* To enter command mode('exit' the cell) - ESC
* To get code suggestions/auto-complete statements - TAB

When in a Markdown Cell:
* To create heading fony sizes - \# (the more the smaller the heading)
* To make a list - \*
* To run/compile the markdown - CTRL ENTER

When in Command mode:
* To enter edit mode - ENTER
* To navigate between cells - Up/Down arrow keys
* To create a cell above - a
* to create a cell below - b
* To delete a cell - dd
* To cut a cell - x
* To make a cell markdown - m
* To make cell code - y

## Python Overview
* data 
* variables
* operations
* simple functions
* objects
* defining functions
* data structure operations
* libraries
* conditions
* loops

* later on: numpy, pandas, matplotlib, sklearn 

# 2 Data In Python

## 2.1 Values and Objects - OOP

All computer programs manipulate data, and thus every programming language provides a means for the programmer to represent data, as well as to reference data for later use.

A __value__ in a program is a syntax term which appears on the right-hand-side (RHS) of an assigment operation.

An __object__, in programming sense, is a type of in-memory data structure which contains not only the state or value of the entity the structure tries to represent (often referred to as "fields"), but also a set of actions the structure can perform (often referred to as "methods").

Some of these methods could change the state of the fields within the object, while some could output information about this object; some can do both.

Objects in programs are supposed to mimick real-life objects, in the sense that real-life objects have both states and things they can do. 

For example, a real-life pen can have a length, and a colour, these are its state. Two pens can have different length or colour—thus different states. A pen can also write things on paper. And the write functionality also depends on the colour of the pen—a red pen would draw a red line for example.

In a program, an in-memory object representing a pen would have the same states (or fields/attributes), length and colour, and it will also have a method write(), whose behaviour is dependent on colour.

Objects always have a type. There is a distinction between a particular red pen (an "instance"), and the type of pen. We can have multiple objects of the same type: we can have 2 red pens, 3 blue pens and so on, but a pen would always have a length, a colour, and the functionality of being able to write. On the other hand, a bicyle will have entirely different set of fields and methods.

The object data structure allows a programer to write programs by treating it as a machine with many components, with each component being an object. And just like in real-word, different "machines" can share same components. Writing programs therefore is about definging the different object types, and creating instances of those types.

In Python, all values are objects — in other words, all values in python are stored in memory with both a set of states and a set of methods.

## 2.2 Helpful Python built-in functions

* `print()` converts any Python object to a string and print to standard output. For most built-in types, it will print out the expected human-readable values. For example, text objects (strings) will have their text printed out, and numbers will have their numerical values printed out.

In [None]:
# Example: Printing a string
print("Hello World")

* `id()` returns the unique memory location where the object is stored

In [None]:
# Example: Object Identity
obj = 'Bob'
print(obj)
print(id(obj))

* `type()` returns a reference to the type (referred to as a "class") of the object

In [None]:
# Example: Object Type
print(type(obj))

* dir() returns the set of fields and methods associated with the object

In [None]:
# Example: Object Directory
print(dir(obj))

Object fields and methods can be referenced using dot .. For example, text in Python are represented as string (str) objects, and every str object has a method called upper(), which outputs the upper-case version of text:

In [None]:
# Example: Using Object Methods
new = obj.upper()
print(new)

## Exercise: Identities
1. Assign your favorite city to a variable called `city`.
2. Print the `id` of that object.
3. Print the `type` of that object.
4. Use the `.upper()` method to print the city in all capitals.

In [None]:
# YOUR TURN: Write your code below

# TODO: Create the variable 'city'

# TODO: Print the id

# TODO: Print the type

# TODO: Print it in uppercase


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

city = "London"
print(id(city))
print(type(city))
print(city.upper())

# Types

In [None]:
# Example: Standard Types
print(5)
print(5.0)
print("5")
print(True)
print(["Lovers", "Haters", "Needers"])

In [None]:
# Example: Checking Types
print(type(5))
print(type(5.0))
print(type("5"))
print(type(True))
print(type(["Lovers", "Haters", "Needers"]))

# Operations

In [None]:
# Example: Operators
print("Thomas " + "Holmes")
print(2 * 3)
print(2 ** 3)     # 2 * 2 * 2 = 2^3
print("-" * 3)
print("@" in "kofi.glover@qa.com")
print(True and False)
print(["Eggs", "Butter"] + ["Flour", "Sugar"])
print(1.14 <= 2.13)

## Exercise: Math
1. Create a variable `radius` with value 5.
2. Calculate the area of a circle ($3.14 * radius^2$).
3. Print the result as a string combined with text.

In [None]:
# YOUR TURN: Write your code below

# TODO: Set radius

# TODO: Calculate area (hint: use ** for power)

# TODO: Print result


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

radius = 5
area = 3.14 * (radius ** 2)
print("The area is " + str(area))

# Variables

In [None]:
# Example: Variable Assignment
name = "Thomas"
age = 24
height = 1.77
hobbies = ["Data Science", "Reading", "Cycling", "Eating"]
is_alive = True

print(name, age, height)  # print with multiple args = spaced output
print(hobbies)
print(name + " is alive: " + str(is_alive))

mean_yearly_growth = 1.77 / 24 * 100
print("Mean centimeters grown per year = " + str(mean_yearly_growth))

fmesg = f"{name} is {age} and {height * 100} cm"  # formatted string

print(fmesg)

In [None]:
# Example: Incrementing
age += 1

print(age)    # the last line of a jupyter cell is always printed

## Exercise 1: Variables
* Define a variable `item` (e.g., "Apple").
* Define a variable `price` (e.g., 0.50).
* Define a variable `count` (e.g., 4).
* Calculate the `total`.
* Print a sentence using an **f-string**: "I bought 4 Apple(s) for a total of 2.0".

In [None]:
# YOUR TURN: Write your code below

# TODO: Define variables

# TODO: Calculate total

# TODO: Print f-string


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

item = "Apple"
price = 0.50
count = 4
total = price * count

print(f"I bought {count} {item}(s) for a total of {total}")

### Simple Functions

Functions are blocks of code which perform some action (procedure) that we can call upon. They may be built-in or user defined. They may return an output, or they may not.

output = fn(input)

returnValue = procedure(requirements)

makes a new value

these are algorithms, not "relationships between mathematical variables"

In [None]:
# Example: Built-in Functions
import builtins

name = "Thomas"

print(name)           # output the value of name
print(id(name))       # output the memory location of the value of name
print(type(name))     # output the type of the value of name
print(len(name))      # output the length of (the value of) name
print(print(name))

dir(builtins)[80:]

## Exercise: Built-ins
* Use the `help()` function to find out what the `pow()` function does.

In [None]:
# YOUR TURN: Write your code below


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===
help(pow)

# Objects

In [None]:
# Example: Object Methods
# data.operation(requirements)
# obj.method(parameters)

print(name.upper())
print(name.lower())
print(name.startswith("T"))
print(name.endswith("T"))

# all data in python is an object
# objects are data structures:  values (properties), types (class), id, methods

In [None]:
# Example: Attributes
dir(name)[-5:]  # lists the last 5 attributes & methods of the object

# Functions

### Defining Functions
* algorithm can be used to calculate the value of a mathematical function...

* error(pred, obv) = (pred - obv)^2  (known as the MSE, or, Mean Square Error)

* def error(pred, obv)                <- LHS of math
* return (pred - obv)^2               <- RHS of math  (return aprox., = )

* return actually means store calcuated value in memory 

In [None]:
# Example: Error Function
def error(pred, obv):
    return (pred - obv) ** 2

error(3, 3.3)

* indendation groups operations together
* def defines a function
* parameters are listed after the function name
* one new line after the definintion ends the def. 
* notice colon before indentation

In [None]:
# Example: Void Functions vs Return Functions

def show_results(results):
    print("-" * 10)
    print(results)
    print("-" * 10)

show_results([12, 12, 15])    # writes to screen, but has no return value


def distance(x1, x2):
    return (x2 - x1) ** 2     # euclidean distance, aka. L2 norm


dist = distance(10, 12)
print(dist * 1.1)  # calculated value can be stored in variable


rtn = show_results([10, 12])

print(type(rtn))
print(rtn)  # nothing is stored here, no return value

### Exercise 2: Functions
Define two new functions:

1. `convert_to_seconds(minutes)`: takes an integer `minutes` and returns the number of seconds.
2. `is_passing(score)`: takes a score and returns `True` if score is > 50.

Test them both by printing the results.

In [None]:
# YOUR TURN: Write your code below

# TODO: Define convert_to_seconds

# TODO: Define is_passing

# TODO: Test them


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

def convert_to_seconds(minutes):
    return minutes * 60

def is_passing(score):
    return score > 50

print(convert_to_seconds(5))
print(is_passing(45))

### Data Structures
* strings - groups of characters
* lists - ordered groups of data where each element is indexed by an int
* sets - unordered groups of data where there is no indexing
* tuples - uneditable (immutable) groups of data where elements are int-indexed
* dictionaries - groups of data where indexes are chosen by you

In [None]:
# Example: String Slicing
quote = "Be the change you wish to see in the world!"

print(quote[0])    # first
print(quote[1])    # second
print(quote[-2])   # second from last
print(quote[-1])   # last

In [None]:
# Example: Slicing Range
print(quote[0:2])

In [None]:
# Example: Slicing with Negative Index
print(quote[0:-6])

In [None]:
# Example: Concatenation
print(quote[0:-6] + "bedroom")

In [None]:
# Example: Start and End
print(quote[:2])
print(quote[-6:])

In [None]:
# Example: Steps
quote[::2]

In [None]:
# Example: Tuples

point = (10, 20, 30)

print(point[0])
print(point[1])
print(point[-1])

print(point[0:2])   # slice, as with strings

# point[0] = 15 # error: not allowed to overwrite

# technically, () not required...

address = "OldSt", "London"

print(address)

In [None]:
# Example: Lists
# y target customer satisfaction
# x customer features 
# (days-since-first-purchase, total-spent, nearest-store, addresss)
#

x = [300, 1000, "London", ("Old Street", "London")]

print(x)
print(len(x))

print(x[-1])
print(len(x[-1]))

In [None]:
# Example: List Append
x.append(1)
x

In [None]:
# Example: List Pop
x.pop()

In [None]:
# Example: List Insert
print(x)
x.insert(0, 1)  # insert at postn 0, the element 1

print(x)

### Sets

### Using Lists in Functions

In [None]:
# Example: Functions with Lists
def error(y_pred, y, i):
    return (y_pred - y[i]) ** 2

In [None]:
# Example: Calling Function with List
y = [2, 3, 5, 8]
guess = 2.2

error(guess, y, 1)    # (2.2 - 3) ** 2

## Exericse 3: Collections
1. Define a list `sports` containing "Football", "Cricket", "Tennis".
2. Add "Rugby" to the end of the list.
3. Insert "Swimming" at the beginning (index 0).
4. Print the final list.

In [None]:
# YOUR TURN: Write your code below

# TODO: Create list

# TODO: Append 'Rugby'

# TODO: Insert 'Swimming'

# TODO: Print list


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

sports = ["Football", "Cricket", "Tennis"]
sports.append("Rugby")
sports.insert(0, "Swimming")
print(sports)

### Dictionaries
* key-value data structures
* where the keys are defined by you (generally strings)

In [None]:
# Example: Dictionary
user = {
    "name": "Thomas",
    "age": 24,
    "location": "uk"
}

print(user["name"])     # use string keys to look up value rather than int index
print(user["age"])
print(user["location"])

In [None]:
# Example: Tuple Keys
# data science example: labelling for Fraud|NotFraud
# dict keys can be lots of diff. thigns, not just strings...
# but must be unique!


# key = (age, days-since-purchase-of-insurance) 
user = {
    (18, 13): "Fraud",
    (60, 300): "NotFraud"
}

user[(18, 13)]

In [None]:
# Example: Dictionary Analysis
# dictionaries more commonly are more like matrices...

users = {
    "age-at-purchase": [18, 60],
    "days-from-purcahse": [13, 300]
}

ages = users['age-at-purchase']

sum(ages) / len(ages)

## Exercise: Dictionaries
Create a dictionary called `car` with these keys:
* `brand`: "Tesla"
* `model`: "Model 3"
* `year`: 2023

Then print the `model` of the car.

In [None]:
# YOUR TURN: Write your code below

# TODO: Create dictionary

# TODO: Print the model


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

car = {
    "brand": "Tesla",
    "model": "Model 3",
    "year": 2023
}

print(car["model"])

### Control Flow

In [None]:
# Example: If/Else
user_age = 18

if user_age > 65:                       # colons
    print("See Retirement Plans")       # indentation
elif user_age > 21:                     # keyword, elif
    print("See Vocation Plans")
elif user_age > 13:
    print("See Education Plans")
else:
    print("See your mother!")

In [None]:
# Example: While Loop
# while loops are rare, usually bad -- repeating

ratings = [5, 5, 6, 7, 8, 1]

while len(ratings) > 0:
    print(ratings.pop())    # remove last one

In [None]:
# Example: Inspect List
ratings

In [None]:
# Example: For Loop
# for loop -- data processing loop

ratings = [5, 5, 6, 7, 8, 1]

for element in ratings:      # for name-of-each-element  in source-data-input
    print(element)           # algorithm for processing each-element

ratings

## Exercise: Loops
1. Create a list of colors: "Red", "Green", "Blue".
2. Use a `for` loop to iterate through the list and print each color.

In [None]:
# YOUR TURN: Write your code below


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

colors = ["Red", "Green", "Blue"]
for color in colors:
    print(color)

## Comprehensions!

## Exercise: FizzBuzz Test
* Write a program that prints numbers from 1 to 100. 
* For multiples of three print “Fizz” instead of the number
* For the multiples of five print “Buzz”.
* For numbers which are multiples of both three and five print “FizzBuzz”."

__Scoring__: (200 - number of characters in code)/100

* If finished: output all elements to a list and count the number you have of each element.
* If doubly finished, try to do the same for the numbers between -50 and 50

In [None]:
# YOUR TURN: Write your code below


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)

# Classes

Classes are a central component of OOP. They allow you to define your own objects, complete with their own methods and attributes. These can be useful when you wish to create a structure carrying out a number of related functions.

Below, we define a simple class which containing a print method.

In [None]:
# Example: Simple Class
class Simple:
    def __init__(self):
        self.value = 3

    def print_method(self):
        print(self.value)

# Instantiate Class
simple_class = Simple()

# Observe type
print(type(simple_class))

# Print attribute/value
print(simple_class.value)

# use method
simple_class.print_method()

# Change attribute
simple_class.value = 17

# print again
simple_class.print_method()

# Add new attribute
simple_class.new_var = 'this is new'

print(simple_class.new_var)

# Familiar friends
dir(Simple)

A word of warning when it comes to defining variables within a class.

In [None]:
# Example: Class Variables Warning
class Dog:

    tricks = []             # mistaken use of a class variable

    def __init__(self, name):
        self.name = name

    def add_trick(self, trick):
        self.tricks.append(trick)


d = Dog('Fido')
e = Dog('Buddy')
d.add_trick('roll over')
e.add_trick('play dead')
d.tricks                # unexpectedly shared by all dogs (you can teach every dog every trick!!!! No matter how old)

We can build on the above, defining variables which we wish to be defined on instantiation

In [None]:
# Example: Correct Class Structure
class LessSimple:

    def __init__(self, essential_arg):
        self.essential_arg = essential_arg
        self.inessential_arg = 'We just have this arg by default'
        self.value = 3

    def print_method(self):
        print('essential_arg\t:', self.essential_arg,
              '\ninessential_arg\t:', self.inessential_arg,
              '\nvalue\t:', self.value)

    def mult_by_3(self, number):
        return number * 3


ls = LessSimple("We have to define our essential arg")
ls.print_method()
print(ls.value)
print(ls.mult_by_3(4))

## Exercise: Classes
* Define a class `Pen`.
* In the `__init__` method, accept a `color` and store it as an attribute.
* Add a method `write(word)` that prints: "Writing [word] in [color]".

In [None]:
# YOUR TURN: Write your code below

# TODO: Define class


### Solution

In [None]:
# === SOLUTION (Click arrow to expand) ===

class Pen:
    def __init__(self, color):
        self.color = color

    def write(self, word):
        print(f"Writing {word} in {self.color}")

p = Pen("Blue")
p.write("Hello")

## Libraries
* import to include a library
* == python file which defines some functions (etc.)


## Importing Your Own Modules

In Python, you can import code from other files. These files are called modules.

First, let's create a simple module file named `mymodule.py` in our current directory using the `%%writefile` command. Then, we will try to import it.

In [None]:
%%writefile mymodule.py

# This cell creates the file 'mymodule.py' in your local directory
class ImportedClass:
    def give_proof(self):
        print("I am imported!")

def imported_method():
    print("Method called!")

In [None]:
# Example: Importing Local Modules
import mymodule as m

ic = m.ImportedClass()

ic.give_proof()

m.imported_method()

print(type(ic))

## Importing Other Modules

In [None]:
# Example: Import OS
import os
os.listdir('.')

In [None]:
# Example: Import Sys
import sys
sys.platform

In [None]:
# Example: Regex Import
import re
quote = "Be the change you wish to see in the etc."
print(quote)
re.findall(r"\w+", quote)  # r will escape all backslashes, so they arent interpred as, eg., new lines

In [None]:
# Example: Importing Data Science Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# std. aliases which make using libs easier

fast_array = np.array([1, 2, 3])
table = pd.DataFrame(
    {
        "ages": [10, 18, 30],
        "weights": [50, 70, 80]
    }
)

In [None]:
# Example: Numpy Array
fast_array

In [None]:
# Example: Pandas DataFrame
table

In [None]:
# Example: Matplotlib Plot
plt.scatter(table["ages"], table["weights"])

### Dropping Prefixes

In [None]:
# Example: Import From
from os import listdir
from numpy.random import random as rn  # mathematicians like short names, makes math clearer

listdir('.')  # this is os.listdir

In [None]:
# Example: Random
rn() # numpy.random.random