## Introduction

A lil tutorial/overview/review of some basic computer science techniques and topics to help you with making and organizing scripts!

Topics in order include (to be updated as the notebook evolves):

* Data Structures (tuples vs. lists vs. dictionaries vs. NumPy arrays)
* Loops
	* With Arrays/Lists
	* Guidelines for Good Loops (goal -> code, how & when nested)
* Code Abstractions (Functions and Classes)
	* Ways to Structure Code
	* Documenting Code
* Variable Scope

* Git and Github
* Reading Documentation and Learning Libraries
* Python Environment Setup
* Parallelism Basics


## Data Structures

Different ways to store data or group related data together.

### Tuples

In [None]:
some_book = ("The Watcher", 582, 4.5)

# Defined using ()
# Elements/items can have different types
# Items can be duplicates of each other
# Elements can be accessed by index because their order matters
print(some_book[0])
print(some_book[1])
print(some_book[2])

In [None]:
# Tuples themselves CANNOT be altered in any way

# These should cause errors (Python will always stop at the first)
some_book[0] = "Watchmen"   # Cannot reassign items to tuples
some_book[3] = "new item"   # Cannot add new items to or delete items from tuples

Ideal for:
* Unchanging data &mdash; tuples themselves can't ever change
* Small groups of related data that's *only* used in nearby code
* Packaging 2 values together in places where only 1 is typically expected **(more on this later)**
  * i.e. returning multiple values at once from functions
  * i.e. dictionary keys based on multiple related values at once

Drawbacks:
* Accessing items *only* by index easily gets confusing
  * Especially if there are lots of items
  * Especially if the tuple was defined far from where it's used
  * Not very self-documenting, the coder must remember what each item means by themselves

### Dictionaries

Strap in because these things are powerful but the needed baseline knowledge is a doozy.

In [None]:
some_book = {
  "title": "The Watcher",
  "pages": 582,
  "rating": 4.5,
  42: 42,
  69.0: 69,
  (): "my key is a tuple"
}

# Defined using {}
# Contains "key-value pairs"
# Can be defined entirely on 1 line like the tuple above, but spreading across multiple lines helps with readability

# Keys are how you access values INSTEAD of indexes
# Keys MUST be unique
# Keys can be strings, numbers, or tuples /containing/ strings and numbers

# Values are,,, the values/items
# Values can be literally anything you want, and can be duplicates of existing values
# Even other entire tuples, lists, dictionaries, files, class objects, etc

In [None]:
# Access items with [] like usual, but use the desired key instead of an index

print(some_book['title'])
print(some_book[42.0])  # Type conversions are automatically done as necessary, typical Python
print(some_book[()])

In [None]:
# New items can be added just by using a new key
print('before', some_book, '\n')

some_book['new_key'] = 'I didnt exist before'

print('after', some_book)

In [None]:
# Existing items can be changed
# The expression for accessing a value can also be used like a variable name to change the same value
print('before:', some_book[42])

some_book[42] = 420

print('after:', some_book[42])

Dictionaries have lots of methods (associated functions) that perform more useful and complex operations. A list of them can be found [here](https://www.w3schools.com/python/python_dictionaries_methods.asp).

Weirdly enough, I couldn't find a clean list like this in [Python's official documentation](https://docs.python.org)

Benefits:
* Easily represent things whose list of properties will grow and shrink a lot
  * Easy to add and remove key-value pairs
  * This is probably how the classes we've made so far should be done. Much easier than constantly creating new ```this.X``` values inside class functions.
* Can still loop through all the key-value pairs like tuple/list elements **(more on this later)**

Drawbacks:
* Basically none in most cases. There are jokes about how, if you don't know the answer to a technical interview question, just throw a dictionary/hash map/map at it and you'll get a good enough answer lmao

### Lists

In [None]:
titles = ["The Lightning Thief", "The Sea of Monsters", "The Titan's Curse"]
random_list = ["The Watcher", 420, ()]

# Defined using []
# Elements/items can have different types
# Items can be duplicates of each other
# Basically a tuple whose elements you can change

In [None]:
# Access items using an index
print(titles[1])

# Change items in a similar way to dictionaries
titles[1] = "The Son of Neptune"
print(titles[1])

You may have noticed that I left something pretty important out of the dictionary and list sections so far. What if you need to delete elements from dictionaries or lists?

I don't think removing things from lists/dictionaries is something you will be doing often, at least not until you find yourself *desperately* needing to optimize your scripts.

You can always clone a data structure and edit that new copy. It will require more memory, but this way you'll always have access to old versions of that structure. It can be helpful if you need to retrace your steps while writing or if you just want to see data both before and after some change.

Changing data structures, especially in the middle of loops that use/reference them, can also easily cause bugs. Algorithms/loops you create may not work as expected because the data you're working on changes in the middle of execution.

We've run into this before on your projects, but here's another (contrived) example for reference:

In [None]:
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

# You might initially expect this loop to print the entire alphabet, but...
for letter in alphabet:
  if letter == 'k':
    alphabet.remove(letter) # changing the list as we go through it...

  print(letter) # may give unexpected results.

#### Slicing

You can easily isolate subsections of strings or lists using **slice notation**. This involves multiple numbers/expressions between the `[]` separated by colons.

In [None]:
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

# [ start : end : step ]
# "end" is EXCLUSIVE
print(alphabet[0:3:1])
print(alphabet[0:len(alphabet):2])
print(alphabet[-3:len(alphabet):1])
print(alphabet[11:16:1])

Slice notation has lots of nice shortcuts and defaults that can make it really easy to use.

In [None]:
# These all print the same value
# 0 is the default "start"
# 1 is the default "step"
print(alphabet[0:3:1])
print(alphabet[:3:1])
print(alphabet[:3:])
print(alphabet[:3])

In [None]:
# These all also print the same value
# len(str) or len(list) is the default "end"
print(alphabet[11:len(alphabet):1])
print(alphabet[11:len(alphabet):])
print(alphabet[11::])  # leaving the "step" at the default

# These all also also print the same value
print(alphabet[0:len(alphabet):2])
print(alphabet[0::2])
print(alphabet[::2])

In [32]:
# Negative "start" and "stop" numbers indicate counting from the end

print(alphabet[-3:len(alphabet):1]) # sublist consisting of elements spanning:
print(alphabet[-3:])                # "3rd from the end" to "the end of the list"

print(alphabet[-3:-1])  # "3rd from the end" up to "final element" (exclusive)
print(alphabet[-4:-2])  # "4th from the end" to "2nd from the end"

print(alphabet[:-21])  # everything BUT the last 21 items

['x', 'y', 'z']
['x', 'y', 'z']
['x', 'y']
['w', 'x']
['a', 'b', 'c', 'd', 'e']
['z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm', 'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']
['k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']


In [39]:
# Negative "step" numbers indicate reversing the order
# If you use these, "start" and "stop" should also be in the opposite order
print(alphabet[::-1])
print(alphabet[len(alphabet):0:-1]) # There's no way to use negative steps to
                                    # reverse an ENTIRE list, only part of it
print(alphabet[3:0:-1])

['z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm', 'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']
['z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm', 'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b']
['d', 'c', 'b']


### NumPy Arrays

In [None]:
import numpy as np
print(np.array([1,2,3]))
print(np.array([
  [1,2],
  [3,4]
]))

# Basically Python lists but with some extra restrictions that allow them to be MUCH more efficient
# All items MUST be the same type (NumPy will automatically convert as needed)

Benefits:
* MUCH faster and more space-efficient inherently
  * This may not be noticeable until you get to massive million-cell arrays or larger, but the bonuses are always there
* Lots of free NumPy functions for faster calculations
* Accurate to what arrays are like in basically all other languages

Drawbacks:
* Less flexible
  * ALL items must be the same type
  * Fixed size defined at creation, cannot add/remove items
    * But you can create large, empty arrays and loop to populate them
  * For a list of lists, all sublists MUST be the same size

## Loops

ALL loops have 4 parts to outline or consider:
1. the setup
  * creating empty lists, initializing counter variables, etc
2. the condition
  * what must be TRUE to keep the loop going (i.e. this condition will be false when the loop finishes)
3. the body
  * the code that will be executed inside the loop
  * this can include more nested loops
4. the (state) update
  * how the code moves on to the next step/item/iteration/etc



In [44]:
# Some examples using Python's 2 types of loops: "while" and "for"
mini_alphabet = ['a', 'b', 'c', 'd', 'e']

i = 0
while i < len(mini_alphabet):
  print(mini_alphabet[i])
  i += 1
print('-----')  # separator for clearer output

for i in range(len(mini_alphabet)):
  print(mini_alphabet[i])
print('-----')  # separator for clearer output

for letter in mini_alphabet:
  print(letter)
print('-----')  # separator for nicer output

for index, letter in enumerate(mini_alphabet):
  print(f"{index}: {letter}")

a
b
c
d
e
-----
a
b
c
d
e
-----
a
b
c
d
e
-----
0: a
1: b
2: c
3: d
4: e


- range() details
- enumerate
- zip

- if loop is too large (can't fit on 1 screen), good sign to simplify
  - break body into functions (more on this later)
  - create multiple smaller loops
- again, if looping through list/dictionary, avoid changing the data structure in the middle of the loop (adding/removing/changing items)
  - can cause unexpected behaviors
  - safer & more consistent to create & edit copy

- imagine self as computer going through steps
  - start at element 1, what do you need to do to/with it?
  - next element, what to do here?
  - next element, any patterns? patterns go inside the loop

## Git and Github

# explain diagrams (UML?) and walk through example on call

* break down steps even further
  * get to usual stopping point
  * for each step/bullet, ask yourself "what data are needed to do this? what operations/changes are being done on that data?"
    * note those as sub-bullets or connected bubbles or something
    * for the data, note its structure (in other words, what does it consist of?)
      * for example, an object in some simulation might consist of variables for mass, momentum, velocity, acceleration, etc
  * do you notice any repetition or patterns?
    * patterns in data can be abstracted out into a class interface or some dictionary/list data structure
    * patterns in operations can be put inside functions
    * if you do notice patterns, write down what the generalized form of the data and functions will be (data properties, function signatures, class interfaces)
  * for each of those sub-bullets/bubbles/etc, consider how to represent them or how to do them and what libraries might be involved
  * execute according to plan
    * BE FLEXIBLE
      * you'll think of better ways to do things or encounter problems with your original plan as you go
      * let them happen because nobody, not even senior engineers can catch all the issues at the beginning (but with experience you can catch most)
      * also this is research, like you said, you don't fully know what you're gonna do or need to do, change is arguably more inevitable than in normal software
      * don't worry too much about cleanliness, code beauty, ease of use, etc until the code becomes annoying/slow/a hindrance to work with
        * nobody else is gonna see this unless you're making a library or publishing code on github for a new process or something

* if the steps above look like too much (understandable), remember that we're being intentionally excessive in breaking things down
  * ensures that we have as few questions as possible before starting, makes room for all the questions and debugging that might pop up
* bugs often come from faulty algos or wrong assumptions
  * thus debugging is, at a high level, mostly figuring out what special case we left out or what misunderstanding we have of some section of code
  * at a lower level, this can include going through code step by step (manually or with a debugging tool, more on this below) and printing values throughout the code
* don't worry too much about memorizing syntax
  * that's the stuff that's easy to look up in documentation, geeksforgeeks, tutorials, etc
* ideally, if you have time, study the solution you found after copy-pasting their code
  * see what they did differently and why it works when yours didn't

https://medium.com/codex/how-to-debug-jupyter-notebooks-in-visual-studio-code-3d36039c6f86

https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_debug-a-jupyter-notebook