# Objectives

Introduce and/or remind everyone about:
- General coding practices
- Python fundamentals: variables, types, arithmetic, control flow

Note: The breakout session notebooks and structure of this lecture are based on David Beck's material for his Software Engineering for Data Science course at UW ([source](https://github.com/UWDIRECT/UWDIRECT.github.io/tree/master/Wi23_content/SEDS)).

# Basics: Code, Variables, and Comments

## Programming in Python
Python is a programming language that is increasingly becoming the industry standard for data science applications. Some characteristics of Python:
- It is a high-level language, meaning that you don't need to know much about how a computer works under the hood in order to write Python code.
- It is an interpreted language, so unlike `C++` or `Rust` you do not need to compile your code before running it.
- It is dynamically-typed, meaning that the type errors are not checked before runtime, and can thus cause runtime failures.

### Python Setup
If you do not have Python installed locally on your machine, you will need to do that. I would recommend following [these instructions](https://cloud.google.com/python/docs/setup#macos) up to "Installing an Editor". 

Once you have Python installed, I would recommend creating a *virtual environment* that you use exclusively for this class. The instructions linked above include a section on using `venv` to create a virtual environment, so please follow those. 

### Jupyter Notebooks
In this class, we will mostly write Python code in Jupyter Notebooks, which these notes are written in. Jupyter is nice for combining text (it supports both markdown and $\LaTeX$) with code. Both text and code go in *cells*. This paragraph is in a Markdown text cell, but the code below is in a code cell, which gets executed using whatever Python interpreter (or kernel) your computer is running. You can execute a code cell by highlighting it and pressing `Shift-Return`

In [18]:
print("This is a code cell that demonstrates the print function")

This is a code cell that demonstrates the print function


## Variables
In a programming language, *variables* are the names we give to data objects so that they can be easily used (and re-used) in operations. There are a number of best practices to consider when choosing variable names in Python:
- variable names should be `lower_case` by default
- variable names should be intuitive and clear to readers of the code. Favor longer, clear names over abbreviated, but confusing names.
- variables can contain the underscore `_`character to separate words. This is sometimes called "snake-casing"
- variable names cannot use any of the resereved **keywords**, pre-defined variables in the Python language which you can find a list of [here](https://docs.python.org/3.8/reference/lexical_analysis.html#keywords).
- variable names can contain numbers, e.g., `a1`, but they cannot start with a number. So something like `1a` would not be allowed.

## Comments
Comments are text annotations that we add to code so that it is clear to a reader what the code is doing. Comments can be denoted with either a `#` at the beginning of a line for a short comment, or in a block surrounded by triple quotes ` """ like this """ ` for longer comments. Your Python interpreter does not run comments as code; they are purely for you and whoever else is using the code. Nevertheless, you should think of incorrect or misleading comments as *bugs* because they can cause users to misuse or misinterpret a piece of code. Some general best practices for commenting:
- Write comments, but not too many
- Don't comment obvious things, e.g., `a = 5  # defining variable a`
- Do comment potentially confusing things, though it's better to not write confusing code in the first place if you can avoid it.
- Write your comments with proper grammar and formatting. 

# Variable assignment and Data Types
We assign values to variables using the `=` operator. Let's see some examples below for different data types

## Integers and Floating Point Numbers
Integers, or whole numbers, are an important data type in Python -- some operations will only work with integers, or will work differently with integers compared to floating point numbers, or "floats," which are numbers that contain a decimal point. Much of the time in Python, they can be used interchangably when you are doing math, but it's good to keep track of your variable types regardless

In [19]:
# Numerical types, and printing using f-strings
a = 2
b = 5
c = a + b  # Addition
d = a - b  # Subtraction
e = a * b  # Multiplication
f = b / a  # Division

print(f"a = {a}, b = {b}, a + b = c = {c}")
print(f"d = {d}")
print(f"e = {e}")
print(f"f = {f}")

a = 2, b = 5, a + b = c = 7
d = -3
e = 10
f = 2.5


In [20]:
print(f"a is type {type(a)}")  # Initialized as an int by default, unless you add a decimal (e.g., a = 2.0)
print(f"e is type {type(e)}")  # Type int because it is the product of two ints
print(f"but f is type {type(f)}")  # Ends up being a float because it is created via float division /

a is type <class 'int'>
e is type <class 'int'>
but f is type <class 'float'>


## Breakout: Simple Math In Python

## String types
Python has different flavors of strings, which are used to hold characters

In [21]:
standard_string = "this is a standard string"
f_string = f"this is an f-string, which can print variables in curly braces like a = {a}"

print(standard_string)
print(f_string)

this is a standard string
this is an f-string, which can print variables in curly braces like a = 2


In [22]:
a = "two"
b = "five"
c = a + b
print(f"a = {a}, b = {b}, a + b = c = {c}")

a = two, b = five, a + b = c = twofive


Clearly, addition works differently with strings than it does with numbers

Strings have some very handy methods associated with them. Think of a method as a function that can be applied to a certain object. 

In [23]:
file_path = "/some/dir/on/your/machine.csv"
new_path = file_path.replace("your", "my")
print(new_path)

/some/dir/on/my/machine.csv


In [24]:
upper_path = file_path.upper()
dir_index = file_path.index("dir")
print(upper_path)  # Convert all to upper case
print(dir_index)  # Find index (starting at zero) where a certain substring occurs

/SOME/DIR/ON/YOUR/MACHINE.CSV
6


Here we create a new data type called a List in the `split_path` variable, then we grab the final element of the list. More on lists below. 

In [25]:
split_path = file_path.split("/")
filename = file_path.split("/")[-1]
print(split_path)  
print(filename)  

['', 'some', 'dir', 'on', 'your', 'machine.csv']
machine.csv


## Lists
Lists are mutable, ordered, container objects. This means they meaning they can store other objects within them, the objects remain in a particular order, and the elements of the list can be changed (that's the mutable part). The objects in a list do not necessarily need to be of the same type. 

In [26]:
# Initialize a list with square brackets, and separate elements with commas
my_list = ["what", "a", "great", "object", "the", "list", "is", 987, [1, 2, 3]]
my_list



['what', 'a', 'great', 'object', 'the', 'list', 'is', 987, [1, 2, 3]]

Lists have a length which you can access with the `len` function:

In [27]:
print(len(my_list))

9


We can access individual elements of a list using square brackets. Just remember that Python indices start at `0`. This is similar to most sensible programming languages, though certain computing software packages masquerading as programming languages (MATLAB, R) index starting at `1`. Fortran does too, but Fortran gets a pass because it's so good at other stuff.

In [28]:
print(my_list[0])

what


In [29]:
print(my_list[1])

a


In [30]:
print(my_list[-1]) # We can also count starting at the end. This is the way to get the final element of a list

[1, 2, 3]


In [31]:
print(my_list[-2]) 

987


We can access multiple elements at a time by *slicing*. This usually involves the `:` operator. 

In [32]:
one_and_two = my_list[:2]  # == [my_list[0], my_list[1]] Slicing from the beginning up to and including index 1
one_and_two

['what', 'a']

In [33]:
start_at_two = my_list[2:]  # Slice from index 2 to the end
start_at_two

['great', 'object', 'the', 'list', 'is', 987, [1, 2, 3]]

We can extract elements N at a time too:

In [34]:
every_other = my_list[::2] # Start at the beginning, go to the end, grab every second element
every_other

['what', 'great', 'the', 'is', [1, 2, 3]]

Recall that lists are mutable, meaning we can change what's in them:

In [35]:
my_list.append(3)  # Add something to the end
my_list

['what', 'a', 'great', 'object', 'the', 'list', 'is', 987, [1, 2, 3], 3]

In [36]:
my_list[1] = 4  # Modify the element at index 1
my_list

['what', 4, 'great', 'object', 'the', 'list', 'is', 987, [1, 2, 3], 3]

## Tuples
A tuple is an immutable list: it is an ordered collection, the elements of which cannot be changed once it is initialized. You can initialize one with parentheses `()`, with items separated by a comma. 

In [37]:
my_tuple = (1, 2, "a", "b")
my_tuple

(1, 2, 'a', 'b')

Slicing works the same:

In [38]:
my_tuple[:2]

(1, 2)

But don't try to change anything in a tuple!

In [39]:
# my_tuple[0] = 4  # Throws a TypeError

**Discussion: At first blush, immutability seems kind of annoying. Can you think of a situation where it's useful?**

## Dictionaries
Dictionaries are one of the most useful Python data types. If you know any Java, they are basically an implementation of the Java `HashMap` or a JSON blob. If you don't know any Java, don't worry -- I don't really either. 

You can think of dictionaries as a list where each element is not just a single value, but a key-value pair. Instead of integer indices (like the List type), you access values by passing in the corresponding key. Dictionaries are initialized with curly braces `{}`:

In [40]:
sample_dict = {"key_1": 42, "key_2": 57}
value_of_key_1 = sample_dict["key_1"]
print(f"The value of key_1 is {value_of_key_1}")


The value of key_1 is 42


Keys don't have to be strings, they could be integers too. It's best to keep them the same type (though this isn't enforced) 

In [41]:
a = {"one": 1, 0: 0}
print(a)
print(a["one"])
print(a[0])

{'one': 1, 0: 0}
1
0


Dictionaries are particularly useful for storing numerical values along with more easily-remembered descriptions for those numerical values. For example, we could have a dictionary that stores the number of credits associated with different classes in the DATA track:

In [42]:
credit_count = {"DATA5100": 3, "DATA3310": 5, "DATA5111": 3}
print(credit_count["DATA5100"])

3


The most common example is probably the Contacts app on your phone: it maps people's names (key) to their phone number (value). But you can imagine extending this to things like prices for different items in your store, the errors from an ML model training parameter sweep, purchase history associated with different customers in your database, etc.

It's often helpful to *nest* dictionaries, meaning to have sub-dictionaries as the values of higher-level keys:

In [43]:
nested_dict = {
    "level_1_key": {
        "level_2_key": {
            "level_3_key": "finally, a value that isn't another dictionary"
            }
        }
    }
value = nested_dict["level_1_key"]["level_2_key"]["level_3_key"]  # Can extract with multiple keys at once
print(nested_dict)
print(value)

{'level_1_key': {'level_2_key': {'level_3_key': "finally, a value that isn't another dictionary"}}}
finally, a value that isn't another dictionary


## Breakout: Collections

# Flow Control
Flow control refers to the logic that dictates what code gets run, and when. In other words, the logical flow of all the code in a script or a notebook cell. We can control that logical flow using a combination of Logical Operators and Looping Operators. 

## Logical Operators
Logical operators implement Boolean logic, i.e., handling when things are either True or False 

Python has the variables `True` and `False` as reserved keywords. You can use them as-is, or assign them to other variables:

In [44]:
a = True
b = False
c = False
print(a)
print(True)
print(c)

True
True
False


We can string different Boolean variables together using other logical operators: `and`, `or`, `not` 

In [45]:
# or gives True if one or the other is True
print(a or b)

True


In [46]:
# and only gives True if both are True
print(a and b)

False


In [47]:
# not will negate:
print(not a)

False


In [48]:
# can get complicated!
print((not b and a) or (b and c))

True


These logical statements are usually used into control flow within the context of `if/elif/else` blocks. These blocks can be initialized with the `if` keyword, followed by a logical statement, followed by a `:`. The code on the next line, which must be indented, will only get run if the preceding statement evaluated to `True`

In [49]:
# If/elif/else
if a:
  print("Must have been true")

if b:
  print("This will never get printed")
elif c:
  print("Nor will this")
else:
  print("There we go")

Must have been true
There we go


We can do logical operations with numbers too. Here we compare magnitudes with `>`, `<`, `==`, and `<=`

In [50]:
# Logical operations on numbers
print(f"5 > 4 is {5 > 4}")  # Greater than
print(f"3 < 4 is {3 < 4}")  # Less than
print(f"1 == 2 is {1 == 2}")  # Equal to
print(f"2 <= 2 is {2 <= 2}")  # Less than or equal to

5 > 4 is True
3 < 4 is True
1 == 2 is False
2 <= 2 is True


Python considers any nonzero number as `True` by default:

In [51]:
if 0:
    print("Zero is True?")

if 467:
    print("No, everything else is True")

No, everything else is True


We can do logical operations on collections too, often using the built-in functions `all` and `any`

In [52]:
all_true_list = [1, 2, 3, 4]
none_true_list = [0, 0, False]

print(f"all_true_list is all {all(all_true_list)}")
print(f"and it also satisfies the any condition: {any(all_true_list)}")
print(f"but none_true_list returns {any(none_true_list)} because all elements are False")

all_true_list is all True
and it also satisfies the any condition: True
but none_true_list returns False because all elements are False


Similar to the `0` rule, empty collections are defined as `False` by default:

In [53]:
if not []:
    print("Empty lists are False")

if not ():
    print("And empty tuples also")

if not {}:
    print("So are empty dicts")


Empty lists are False
And empty tuples also
So are empty dicts


## For Loops
A for-loop is a way to run the same portion of code some predetermined number of times. Their structure is shown below, where we use the `range` function to generate an "iterable" of integers between 0 - 9 (starts at 0, ends at the argument non-inclusively).

In [54]:
for i in range(10):
  print(i)

0
1
2
3
4
5
6
7
8
9


We can also loop through collections like lists, tuples, or dicts. 

In [55]:
a = [10, 9, 8, 7]
for index, elem in enumerate(a): # Wrapping the list in enumerate() makes it return both the element and the corresponding index
  print(index, elem)

0 10
1 9
2 8
3 7


Python also has a nice shortcut for-loop construct called a list comprehension

In [56]:
b = [i * 2 for i in a]
print(b)

[20, 18, 16, 14]


## While Loops
Sometimes you don't know how many times you want to repeat a section of code, but you do know that you want it to keep running until some condition is met. This is common in numerical analysis when you want to iteratively solve a set of equations within some error tolerance. 

In [57]:
# Keep looping until a condition is True
a = 100
while a > 50:
  print(a)
  a -= 10 # == a = a - 10

100
90
80
70
60


# Functions and Classes 

## Functions
Throughout this notebook we have relied on various built-in functions, e.g., `print`, `any`, `all`, `enumerate`. But as with any good programming language, Python allows you to create your own functions. In general, if you are going to running a section of code more than once, it is good to define a function so that the logic can be reused, especially if it's a long or complicated operation you are trying to do.

For now, we'll just do an easy example by writing a function that squares every element of an input. You wouldn't need to write such a simple function yourself, but this will demonstrate good coding practice by including error handling and documentation. 

In [64]:
# Use the def keyword to initialize a function block
def square_the_input(x):
    """
    Inputs
    x: integer, float, or list to be squared

    Output
    y: square of the input
    """
    if isinstance(x, int) or isinstance(x, float):
        y = x ** 2
    elif isinstance(x, list):
        y = [elem ** 2 for elem in x]
    else:
        raise TypeError(f"Input x must be one of: int, float, or list, not {type(x)}")

    return y
    
some_list = [1, 2, 3, 4]
some_list_squared  = square_the_input(some_list)

print(some_list)
print(some_list_squared)

[1, 2, 3, 4]
[1, 4, 9, 16]


In [65]:
# Trying with a dictionary
some_dict = {0: "a", 1: "b"}
square_the_input(some_dict)

TypeError: Input x must be one of: int, float, or list, not <class 'dict'>

## Classes
Python Classes are an example of object-oriented programming: a paradigm where related bits of code and functionality are grouped together on objects, which can be initialized and used more intuitively than a bunch of disparate functions scattered throughout your code base. You won't have to define many of your own classes in this course, but many packages we rely on (`scikit-learn` and `pandas` in particular) rely heavily on their own custom Class objects, so it is worth understanding the building blocks. Let's define a simple custom class to demonstrate their various attributes. 

In [88]:
class Dog: # by convention, classes are named with Capital letters

    """
    This __init__ function defines how you initialize an instance of the Dog class. By convention, the first argument is always
    a sorta dummy variable called "self", which refers to the instance of the class you are initializing. The other arguments
    are things that you can optionally specify when you initialize an instance of the class in order to define its particular
    characteristics.
    """
    def __init__(self, name="Lassie", breed="Rough Collie", owner="Miller Family", friendly=True):
        # These characteristics attached to the class are called "attributes"
        self.name = name
        self.breed = breed
        self.friendly = friendly

    # Functions associated with a class are called "methods", and they can be accessed from any instance of the class
    # We need to pass in self as an argument, which again refers to an instance of the class
    def is_friendly(self):
        return self.friendly

    def name_and_breed(self):
        print(f"{self.name}, {self.breed}")

Let's initialize an instance of the default dog class, and see how to work with it:

In [81]:
default_dog = Dog()

We can access the attributes by doing `class_instance.attribute`

In [82]:
print(default_dog.name)
print(default_dog.breed)
print(default_dog.friendly)

Lassie
Rough Collie
True


And we can call any of the methods associated with the class:

In [83]:
default_dog.name_and_breed()

Lassie, Rough Collie


The real power of Classes, though, is that you can use them as a framework to initialize your own object with its own properties, and call all the same methods on that object. Here is an example of how I would initialize the `Dog` class for one of my dogs:

In [89]:
my_dog = Dog(name="Mandy", breed="Great Pyrenees", friendly=False, owner="Galen Egan")

In [94]:
print(my_dog.friendly)

False


In [92]:
my_dog.name_and_breed()

Mandy, Great Pyrenees
