# Introduction to Python and Jupyter Notebooks

# A bit about Python

1) Python is an interpreted language. This means it does not require a compiler to run (though some do exist). Some of the consequences of this are:
> - Python runs slower than compiled languages like C++ and Java.
> - Type checking is done at run time. This means that errors with using the wrong type will not occur until the code is run. These type errors can be found in testing, but if the code is rarely run (like for rare events), these errors can remain silent for long periods.

2) Python, like most languages, starts indexing at 0. Excel and MatLab start at 1. This is important for indexing the correct value and off-by-one errors.

3) Python is pass by reference. This means that passing data to a new variable, the reference to the data is passed rather than a copy of the data. Additional reading: https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/

In [16]:
from copy import copy, deepcopy

x = [1, 2, 3, 4]
print("x is currently: ", x)
y = x
print("y is currently: ", y)
z = copy(x)
print("z is currently: ", z)


x is currently:  [1, 2, 3, 4]
y is currently:  [1, 2, 3, 4]
z is currently:  [1, 2, 3, 4]


We will now append the number 1 to y and change x[0] to 7

In [17]:
y.append(1)
x[0] = 7
print("x is currently: ", x)
print("y is currently: ", y)
print("z is currently: ", z)


x is currently:  [7, 2, 3, 4, 1]
y is currently:  [7, 2, 3, 4, 1]
z is currently:  [1, 2, 3, 4]


We will now append the number 2 to z and change z[3] to 42

In [18]:
z.append(2)
z[3] = 42
print("x is currently: ", x)
print("y is currently: ", y)
print("z is currently: ", z)


x is currently:  [7, 2, 3, 4, 1]
y is currently:  [7, 2, 3, 4, 1]
z is currently:  [1, 2, 3, 42, 2]


4) Python is object orientated. Nearly everything is considered an object in python, so everything will have class structure, attributes, and methods.
> - An example can be seen in the .append(1) in one of the previous cells. This is a method call. So even one of the basic data types is an object.

5) Python uses whitespace (tabs) to define control blocks. The body of for/while loops, if statements, functions, classes will all be indented relative to their call.

6) Python is capitalization sensitive. x is not the same as X

# Importing libraries
There are a few options to import libraries.

1) Import the entire library as is

2) Import a portion of the library

3) Use either above method with a pseudonym

In [19]:
import numpy  # this imports the entire library
from math import (
    sin,
)  # this imports only sin, the rest of the library remains unimported
from math import (
    cos as Adj_over_Hyp,
)  # this imports only cos and renames its reference to Adj_over_Hyp


In [20]:
angle = numpy.array([1, 2, 3, 4])
# note that we must use the library name followed by . then the part of the library we would like to use
print(angle, "\n")

sin_of_angle = [sin(i) for i in angle]  # note that we use
print(sin_of_angle, "\n")

cos_of_angle = [Adj_over_Hyp(i) for i in angle]
print(cos_of_angle)


[1 2 3 4] 

[0.8414709848078965, 0.9092974268256817, 0.1411200080598672, -0.7568024953079283] 

[0.5403023058681398, -0.4161468365471424, -0.9899924966004454, -0.6536436208636119]


# Downloading Packages
The benefit of Anaconda is it already has most of the packages you'll ever use. For the times when you need a new package, I suggest downloading it using anaconda prompt rather than through Jupyter Notebook. Here is a resource for how to download a package via anaconda prompt: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#:~:text=Use%20the%20terminal%20or%20an%20Anaconda%20Prompt%20for,myenv%2C%20the%20package%20installs%20into%20the%20current%20environment%3A

# Data Types
There are 8 major data types that you will usually use:

1) **Boolean (bool):** True/False, 1/0, Any number other than 0/False

2) **integer (int):** mathematical integer (ex. 1, 2, 3, 4, ...)

3) **float (float):** integer and non-integer real numbers (ex. 1., 1.0, 3.1415, etc.)

4) **string (str):** a string of letters, numbers, special characters, etc. Strings are surrounded by apostropies (') or quotation marks (").

**A note about ordered/unordered:** for a structure that is ordered, you can expect a consistent ordering of the variables within. This is not true for an unordered structure.

5) **set {set}:** an _unordered_ series of items. Each item must be unique and are immutable after they are added to the set. The set itself is mutable (can be added to/ removed from) and useful for some operations. https://www.programiz.com/python-programming/set

6) **list [list]:** an _ordered_ series of items. The items need not be unique and are mutable.

7) **dictionary {dict}:** an _unordered_ series of key: value pairs. Keys must be strings/numbers, values can be (nearly) anything. You can get a value from a dictionary by asking for its key (but not vice versa).

8) **None/NaN:** Placeholder values that have no meaning and cannot be used for processing other than for testing for their existance (is this value a None/NaN?). None is general while NaN (or more specifically, float('nan')) is used for float values. Integer values have no NaN type. There are other specialized None types (ex. NaT - not a time).

In [21]:
# Boolean
a = True
b = False

print(bool(1))

print(bool(-3))

print(bool(0))

print(bool([]))

print(bool([0]))


True
True
False
False
True


In [22]:
# Boolean
a = [0, 0.0, 1, 2, 3.0, False, True]

b = [True if i == True else False for i in a]
print(b, "\n")

c = [True if i else False for i in a]
print(c)


[False, False, True, False, False, False, True] 

[False, False, True, True, True, False, True]


In [23]:
import math

# integers
a = 4
b = 7

# regular division
print(b / a)

# integer division
print(b // a)
# doesn't work with floats
print(3.7 // 1.3)

# modulus (gives the remainder of division as an int)
print(b % a)

print(round(b / a))
print(math.ceil(b / a))
print(math.floor(b / a))


1.75
1
2.0
3
2
2
1


In [24]:
# Set
a = {1, 2, 3, 4}
b = set([5, 6, 7, 8])

# two options to concatenate these sets:
print("Union: ", a | b)
# or
a.update(b)  # note, this changes a
print(a, "\n")

# can check if an element is in a set using in or if statement:
print(0 in a)
if 1 in a:
    print(True)


Union:  {1, 2, 3, 4, 5, 6, 7, 8}
{1, 2, 3, 4, 5, 6, 7, 8} 

False
True


In [25]:
# List
a = [1, 2, 3, 4]
b = list({1, 2, 3, 4})
print(b)
a += b

print(a)

c = [[1, 2, 3], {4, 5, 6}, "this is a string"]
print(c)
print(c[0][0])


[1, 2, 3, 4]
[1, 2, 3, 4, 1, 2, 3, 4]
[[1, 2, 3], {4, 5, 6}, 'this is a string']
1


In [26]:
print(type(c))
print(type(c[1]))


<class 'list'>
<class 'set'>


In [27]:
# A note on the differences between list() and []
x = "abc def"

print(list(x))
print([x])


['a', 'b', 'c', ' ', 'd', 'e', 'f']
['abc def']


In [28]:
y = [1, 2, 3, 4]

print(list(y))
print([y])


[1, 2, 3, 4]
[[1, 2, 3, 4]]


In [29]:
# Dictionary
a = {1: "This", "2": "is", "three": ["the", "number"], 4.67: 4}
print(a, "\n")
print(a[1])
print(a[4.67], "\n")

b = a.keys()
print(b)
print(a.values())
print(a.items())


{1: 'This', '2': 'is', 'three': ['the', 'number'], 4.67: 4} 

This
4 

dict_keys([1, '2', 'three', 4.67])
dict_values(['This', 'is', ['the', 'number'], 4])
dict_items([(1, 'This'), ('2', 'is'), ('three', ['the', 'number']), (4.67, 4)])


In [30]:
import numpy as np

# None/NaN
a = None
b = float("nan")
c = np.nan
print(bool(a))
print(list(a))

a += 1


False


TypeError: 'NoneType' object is not iterable

# Indexing
## Sets, lists, numpy arrays, etc.
We can index into an element of a list by:

x[index]

If we want the values for a range we use a ':'

x[:index] -> this will give us the first n elements of a list, up to, but not including, the indexth item

x[index:] -> this will give the index element and all elements after.

x[index_1:index_2] -> this will give us the elements starting with the first index up to, but not including, the second index

### What if we want to start from the end of the list?
Python has decided to use negative values to indicate starting from the end of the list.

x[-1] -> This is the last element of the list

x[:-2] -> This will give us the first n elements of a list, up to, but not including the second to last item

x[-2:] -> This will give us the second to last element and all elements after.

## Dictionaries
We index a dictionary by its keys:

x[key]

### How would we index an item of a list that is within a list? How about an item in a list within a dictonary?

In [32]:
import numpy as np

x = [[1, 2, 3], [4, 5, 6]]
y = {"i": [1, 2, 3, 4], "j": [5, 6, 7, 8], "k": {"a": 9, "b": 10, "c": 11}}
z = np.array(x)

a = z[0][1]
print(a)


2


# Date, Time, Datetime, and Timedelta
Derived data types. As the names suggest, date deals only with dates, time only with times, and datetime with date and time together. Timedelta is a difference in time.

These can have different formats depending on the package used, and these different formats cannot always be used together.

In [33]:
from datetime import date, time, datetime, timedelta


In [34]:
DATE = date(2000, 1, 1)
print(DATE)

TIME = time(18, 55, 31)
print(TIME)

DATETIME = datetime(2000, 1, 1, 18, 55, 31)
print(DATETIME)

TIMEDELTA = timedelta(days=1, minutes=59)
print(TIMEDELTA)


2000-01-01
18:55:31
2000-01-01 18:55:31
1 day, 0:59:00


In [35]:
# Timedelta can be used to add or subtract from datetimes

new_DATETIME = DATETIME - TIMEDELTA
print(new_DATETIME)


1999-12-31 17:56:31


In [36]:
# different formats cannot always be added or subtracted
DATE + DATETIME

DATE + TIME


TypeError: unsupported operand type(s) for +: 'datetime.date' and 'datetime.datetime'

In [37]:
numpy_dt = np.datetime64("2005-02-25T03:30")
print(numpy_dt)


2005-02-25T03:30


In [38]:
# numpy datetimes and python datetimes do not synergize
numpy_dt + DATETIME


UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[m]') and dtype('O')

In [39]:
import pandas as pd


In [40]:
pd_dt = pd.Series(numpy_dt)
pd_dt = pd_dt[0].to_pydatetime()
print(pd_dt)

difference = pd_dt - DATETIME
print(difference)
print(type(difference))


2005-02-25 03:30:00
1881 days, 8:34:29
<class 'datetime.timedelta'>


# Comparisons
## If statements
Uses the format:

if condition 1:
> body

elif condition 2: (not required)
> body

else: (not required)
> body

## Comparitors and intersect/union
>- **==**: equal to. Do not use with float values.
>- **!=**: not equal to.
>- **>, >=, <, <=**: greater than, greater or equal, less than, less or equal
>- **in**: True if value in a array-like object, otherwise false
>- **not in**: Negation of in
>- **and**: intersect
>- **&**: also intersect, but not as applicable as and
>- **or**: union
>- **|**: also union, but not as applicable as or

>- **any()**: if any of the items in an array-like structure are true
>- **all()**: if all of the items in an array-like structure are true

>- **not**: negation
>- **~**: negation, useful for numpy arrays

### How do we check if there are no true values in a list?

In [41]:
x = [True, True, True, True]
y = [False, False, False, False]
z = [True, False, False, True]

a = [0, 1, 2, 3, 4]
print(all(a))
# print(5 in a)

# if not any(y):
#     print(True)
# else:
#     print(False)


False


### How do we check if an element exists?

In [42]:
x = True
y = 0
z = None

if x:
    print(True)
else:
    False


True


# for/while loops
## for loops
Iterate a set number of times. This can be either a set number (ex. 4) or for each item in an array-like object. (ex. [42, 17, 19, 27, ...])

Format:

x = [42, 17, 19, 27, ...]

for i in range(10):
> body

> use x[i] to index x

or

for i in x:
> body

> i is now the element of x

### range(...)
range(...) is how you'll loop a set number of times.

## while loops
Continue to loop while a condition is true. Note that because the number of loops is not set in the while section, this is where you can most often run into infinite loops!!!

Format:

i = 0

while i < 7:
> body

> i += 1 # This is important

or

while TRUE:
> body

> if *some_condition*:

>        break

### Given a list, how do we square each element?

In [43]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for i in range(len(x)):
    if i % 2 == 0:
        x[i] = x[i] ** 2

print(x)


[1, 2, 9, 4, 25, 6, 49, 8, 81, 10]


### Given two lists, how do we multiply each element of the first list by the corresponding element of the second and put that into a third list?

In [44]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

# initialize a new list
z = []

# go through each element of both lists and multiply the values
for i in range(len(x)):
    # put the new value into the new list
    z.append(x[i] * y[i])

print(z)


[9, 16, 21, 24, 25, 24, 21, 16, 9, 0]


### Given the function:
$x_{i+1} = x_{i} - 0.25 * x_{i} $

How to write equations in markdowns: https://medium.com/analytics-vidhya/writing-math-equations-in-jupyter-notebook-a-naive-introduction-a5ce87b9a214

### How many iterations do we need for x to be less than 1?

In [45]:
x = 25.0

# both the live code and commented code work

# i = x
# count = 0
# while i > 1:
#     count += 1
#     i = i - 0.25 * i
# print(count)

i = x
count = 0
a = True
while a:
    i -= 0.25 * i
    count += 1
    if i < 1:
        a = False

print(count)


12


# Functions
Structure

def function_name(variables):
>- body
>- return variable(s)_to_return

## Scope
A function should only know what it is given. **NEVER** reference a variable that exists outside a function unless it was given as an input!!!!

Variables that are defined within the function only exist in the function.

## Variables
The order is:
1) variables without default: ex. x, y, sst

2) variables with defaults: ex. x=1, y=[2, 3, 4], sst={}

### Example: multiply a sum of two numbers by a value. Assume one value will have an expected normal value.

In [46]:
def sum_two_numbers(number_1, number_2=2):
    x = number_1 + number_2
    return x


a = 1
b = 2
c = 3
print(sum_two_numbers(a, b))
print("\n")
print(sum_two_numbers(1, 7))
print(c)


3


8
3


## \*args, \*\*kwargs
\*args is used when a variable (unknown) number of non-named items will be supplied to a function.

\*\*kwards is used when a variable (unknown) number of named items will be supplied to a function.

https://www.geeksforgeeks.org/args-kwargs-python/

In [47]:
def simple_args_example(*args):
    sum_of_numbers = 0
    for i in args:
        sum_of_numbers += i
    return sum_of_numbers


print(simple_args_example(1))
print("\n")
print(simple_args_example(1, 2, 3, 4, 5, 8, 28))
args = [1, 2, 3, 4, 5, 8, 28]


1


51


In [48]:
def simple_kwargs_example(**kwargs):
    print(kwargs)
    for key, value in kwargs.items():
        print(f"The key is {key} and the value is {value}")


simple_kwargs_example(Adam="A", Amelia="C")
print("\n")
simple_kwargs_example(Adam="A", Amy="A", Amelia="C", Andrew="D")


{'Adam': 'A', 'Amelia': 'C'}
The key is Adam and the value is A
The key is Amelia and the value is C


{'Adam': 'A', 'Amy': 'A', 'Amelia': 'C', 'Andrew': 'D'}
The key is Adam and the value is A
The key is Amy and the value is A
The key is Amelia and the value is C
The key is Andrew and the value is D


### Let's say we have a list of raw chlorophyll voltages which need to be converted. Write a function to do so.
The conversion equation is:

$chlorophyll = scale factor (voltage - dark count)$

We also have a global correction factor which halves the chlorophyll value from the equation above. We want our function's default mode to make this correction, but we should be able to exclude the correction if necessary.

In [49]:
x = [0.1, 0.2, 0.3, 0.4]
scale_factor = 9
dark_count = 0.08


def chlorophyll_volts_to_mgL(
    voltages, scale_factor, dark_count, global_correction=True
):
    """
    converts chlorophyll volts to ug/l

    parameters:
    voltages: (iterable) a list of voltages
    scale_factor: (float)

    output:
    converted_chl: (list)
    """
    converted_chl = []

    for volt in voltages:
        if global_correction is True:
            chl = round(scale_factor * (volt - dark_count) / 2.0, 3)
            converted_chl.append(chl)
        else:
            chl = round(scale_factor * (volt - dark_count), 3)
            converted_chl.append(chl)

    return converted_chl


print(chlorophyll_volts_to_mgL(x, scale_factor, dark_count))
print(chlorophyll_volts_to_mgL(x, scale_factor, dark_count, False))


[0.09, 0.54, 0.99, 1.44]
[0.18, 1.08, 1.98, 2.88]


# Lambda Functions, List Comprehension, Dictionary Comprehension
These are specialty structures that allow a combination of for loop(s) and if statements that can occur in one line.

## Lambda
https://realpython.com/python-lambda/#:~:text=%20In%20particular%2C%20a%20lambda%20function%20has%20the,It%20can%20be%20immediately%20invoked%20%28IIFE%29.%20More%20

Lambda format:

lambda variable(s): function
>- ex. lambda x: x + 2

How to call:

In [50]:
a = lambda x: x + 2

x = a(2)
print(x)

# note this doesn't work
# y = a(1, 2, 3, 4)
# print(y)

# we would need a more complicated lambda function
# (I don't suggest using this, a function or list comprehension would be preferable)
b = lambda x: list(map(lambda n: n + 2, x))
y = b([1, 2, 3, 4])
print(y)


4
[3, 4, 5, 6]


## List comprehension
Within a set of brackets, you can use the **for** keyword to iterate over a set of values. You can simultaneously perform mathematical manipulations and/or select data using **if** statements.

https://www.programiz.com/python-programming/list-comprehension

In [51]:
a = [1, 2, 3, 4, 2]

# Same thing as the lambda function above, but applied to a list
y = [i + 2 for i in a]

# y = []
# for i in a:
#     if i < 3:
#         x = i
#     y.append(x)
print(y)

# Adjust our addition depending on the value of the item.
# Note the format: the if/else (both required) statement comes before the for statement
y = [i + 2 if i < 3 else i + 1 for i in a]
print(y)

# Only select items which meet a criteria. Note that the if statement comes after the for statement and has no else.
# Also note here that the list size may not be the same as the original list.
z = [i for i in a if i < 3]
print(z)


[3, 4, 5, 6, 4]
[3, 4, 4, 5, 4]
[1, 2, 2]


## Dictionary comprehension
Similar style to list comprehension, but requires both key and value to be defined.

https://www.datacamp.com/community/tutorials/python-dictionary-comprehension

In [52]:
student_names = ["Adam", "Amy", "Amelia", "Andrew"]
grades = ["A", "A", "C", "D"]
is_class_complete = [True, True, False, True]

print(list(zip(student_names, grades)), "\n")

name_and_grade = {key: value for key, value in zip(student_names, grades)}
print(name_and_grade)

name_and_complete_grade = {
    k: v if complete == True else "incomplete"
    for k, v, complete in zip(student_names, grades, is_class_complete)
}
print(name_and_complete_grade)

y = {}
for student, grade, iscomplete in zip(student_names, grades, is_class_complete):
    assert len(student_names) == len(grades) == len(is_class_complete)
    if iscomplete == True:
        y[student] = grade
    else:
        y[student] = "incomplete"

print(y)


[('Adam', 'A'), ('Amy', 'A'), ('Amelia', 'C'), ('Andrew', 'D')] 

{'Adam': 'A', 'Amy': 'A', 'Amelia': 'C', 'Andrew': 'D'}
{'Adam': 'A', 'Amy': 'A', 'Amelia': 'incomplete', 'Andrew': 'D'}
{'Adam': 'A', 'Amy': 'A', 'Amelia': 'incomplete', 'Andrew': 'D'}


The following suggestions will make your life easier in the long run.

# Naming Conventions
1) **Be descriptive and explicit.** There is a big difference between *x* and *xco2_sw*. You are not likely to be the only one who reads your code. Even if you are, what happens when you come back after a few months? Having good names for your variables, functions, classes will make it easier to understand what is going on in your and other's code. Also, most IDEs have autofill, so long names do not take long to write.

2) **Be consistent with your naming conventions.**

3) **Replace spaces with underscores.** Variable and function names cannot handle spaces, '-', so underscores are the typical way to go.

4) **Capitalization.** Generally, only class names get to be capitalized. Everything else is lower case. (This is not required, but some IDEs will tell you when you're breaking convention)

# Function suggestions
If you find yourself writing the same piece of code multiple times, it's time to put it into a function. The reasoning here is simple. Imagine you've written a piece of code that occurs 6 times. If you want to change anything about that piece of code, you have to change it 6 times. Not only is this time consuming, but the likeliness that you will miss one is high.
1) **Limit the scope of the function.** A function should only do one thing! If you find your function is doing too much, break it up into smaller function and precompute some variables or have internal function calls.

2) **Limit the length of functions.** A function should never be more than a screen length long. I try to not go over 40 lines of actual code (excluding comments and whitespace). I would prefer even shorter.

3) **Limit the number of input variables.** If you find yourself using more than about 4 varaibles, then your function is likely doing more than one thing. This is a little different for classes, which have internal methods.

4) **Do not allow side effects!** Side effects are when you permanently change an external variable when that is not the stated purpose of the function.

# Other suggestions
1) **Use spaces liberally.** For example, it is easier to read *x + y = 1* than *x+y=1*, especially when formulas get long.

2) **Keep lines of code relatively short.** A line should never go off the page...

3) **Try to prevent nesting.** Rewrite/break up code into functions if it starts to look like this:
>        for ...
>            if ...
>                if ...
>                    for ...

4) **Expect to rewrite code.** Even top programmers expect to toss their first rendition of code. I normally will write out the process to get the code working, then rewrite it into functional/class format.