<a href="https://colab.research.google.com/github/mckelviesmith/CupPong/blob/main/4100PythonDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basics:  types, syntax, essential data structures, loops

This is a quick tour of Python and the Jupyter notebook environment.  It's meant to help students who aren't comfortable in Python yet, but who want a gentler introduction than HW1, which will also cover the A* algorithm.

To run a piece of code in this environment, you can click the arrow to the left of the code snippet.  Here are some expressions in Python to evaluate.

In [None]:
2+2

In [None]:
True and False

In [None]:
True or False

In [None]:
not True

In [None]:
5 / 3

In [None]:
5 // 3

As you can see, you don't need to include semicolons at the end of expressions in Python, boolean operators are spelled out, and floating point arithmetic is assumed unless you specify otherwise.

It's often useful to have a concept of infinity and negative infinity, values that will always "win" or always "lose" in a comparison.  Here's how to get that.



In [None]:
real_big = float("inf")
print(real_big > 10000000000000000)
real_small = float("-inf")
print(real_small < -100000000000000)

String concatenation works like it does in Java, with +. String equality works the way you'd want it to intuitively: it checks the characters. In fact, assignment copies the string, so this doesn't have the problems of passing by reference that you might encounter in other languages (or in Python for lists and other data structures).

In [None]:
a = "foo"
b = a # copies
a += "bar" # doesn't modify b
print(a)
print(b)
c = "foobar"
print(c == a) # == really does compare characters
print("a is {}, b is {}, c is {}".format(a,b,c)) # format method can be smoother than concatenation


Defining functions should be familiar to you.  There's no need to specify type, and indentation determines where the function begins and ends; there are a lot fewer curly braces than in other languages.

In [None]:
def addThreeThings(a, b, c):
  return a + b + c

addThreeThings(2,4,5)

Sometimes lack of strong typing leaves your function vulnerable to unintended behavior or runtime errors.  These may not cause everything to halt unless there's a mismatch in type.

In [None]:
addThreeThings("foo", "bar", "baz")

In [None]:
addThreeThings("foo","bar",2)

Notice how the error message in the second case shows the exact line of the function that was problematic, as well as the exact line where that function was called, and so on.  Sometimes these errors dig deep into functions you didn't write, so look in the stack of calls for the code you did write.  You probably passed something bad to those other functions.

A common issue is wanting to just automatically concatenate a number with a string, which you can do in Java seamlessly.  Here, it needs a cast to string.

In [None]:
addThreeThings("foo","bar",str(2))

Two important data structures are used over and over in Python:  the list, and the dictionary.  These have their own shorthand syntax, unlike Java's ArrayList and HashMap, the two closest equivalents.

In [None]:
myList = [1,2,3,4]
myList.append(5)
print(myList)

In [None]:
myDict = {
    "foo" : 2,
    "bar" : 5
}
print(myDict["bar"])

Both data structures allow iteration with a foreach loop.

In [None]:
for i in myList:
  print(i*10)

In [None]:
for key in myDict:
  print("The value of {} is {}".format(key, myDict[key]))

If you just want to do something a particular number of times, instead of iterating over a data structure, you can do the following:

In [None]:
for i in range(4):  # 4 is the number of times we will execute
  print("Iteration number {}".format(i))

# Exercise 1

Try writing a Python function that takes a list of numbers and prints all possible products of two numbers within the list.  For example, [1, 2, 3]
would print

```
1
2
3
2
4
6
3
6
9
```



You can assume the input is good (list, at least one element, all ints, etc)

In [None]:
# TODO

# Tuples and multiple return values

A data structure that may seem similar to a list is a tuple, which uses parens instead of square brackets.  While a list is conceptually combining similar things and can be lengthened, a tuple is a lightweight way to combine multiple pieces of data, maybe of different types, and is more like a "struct" in C or a method-less object in Java.  A tuple with just two elements is a "pair."

In [None]:
letter_counts = [("hi", 2), ("how", 3), ("are", 3), ("you", 3)]

You can assign the different parts of a tuple to different comma-separated variables.

In [None]:
word, count = letter_counts[0]
print(word)
print(count)

This can be useful to get a function to effectively return multiple values.  In fact, comma-separated values in the return statement will be treated as building a tuple.

In [None]:
def glom_together(letter_count_list):
  final_string = ""
  final_count = 0
  for pair in letter_count_list:
    final_string += pair[0]
    final_count += pair[1]
  return final_string, final_count

allwords, count = glom_together(letter_counts)
print(allwords)
print(count)

# Numpy and other modules

Possibly the biggest strength of Python is its deep, wide selection of modules (libraries).  There are a few that are really handy for AI/ML:  numpy for numerical computing, scipy for scientific functions, matplotlib for plotting in 2D and 3D, scikit-learn for assorted out-of-the-box machine learning algorithms, and some very good libraries for neural networks.

Of these, the main one you should be aware of right away is numpy, which has a variety of functions for creating and manipulating vectors and matrices.  Modules often have conventional shorthand names to use throughout the program, but you still need to say "import [module] as [shorthand]" to use the shorthand name.  It's the difference between needing to say numpy.zeros() and np.zeros().

In [None]:
np.zeros([3,4]) # don't expect to recognize this without import

In [None]:
import numpy as np

np.zeros([3,4])

In [None]:
my_lol = [[1, 2, 3, 4], [5,6,7,8]]
my_array = np.array(my_lol)
print(my_array)


Though they look like list-of-lists, numpy arrays are preferred to simple lists or list-of-lists for efficiency when doing numerical things.  Their shape field, a pair, can be useful for finding the dimensions.

In [None]:
a = np.zeros([3,4])
print(a)
height, width = a.shape
print(height)
print(width)

Matrices can come up a fair amount in machine learning.  One shortcut is that matrix multiplication has a shorthand in numpy: @.  In numpy, vectors are just matrices that have one row.

In [None]:
A = np.array([[2, 2], [2, 2]])
B = np.array([[2, 0], [0, 2]])
print(A@B)
v = np.array([5, 6])
print(B@v) # auto-flip to get a legal mult

Last, notice that numpy array indices can be comma-separated when accessing elements, instead of using multiple square brackets.  Both get used, but only the commas work with some more advanced features like slicing (see the end of this worksheet).

In [None]:
A = np.array([[1,2],[3,4]])
A[1,1]

In [None]:
A[1][1]

# More assorted control flow tips

*While* loops run until their condition is false.  This is good for lots of search, where we don't know when we'll be done.  It's also good for lots of machine learning, where we want to run until our error is sufficiently low.

If you want a quick instruction for jumping out of where you are in a loop, a *continue* jumps to the next iteration, and *break* quits the loop and moves on.  (If you're using these, ask yourself whether your function is too big and needs refactoring.)

The syntax for if-else is a bit unusual in Python, thanks to "elif," its shorthand for "else if."

There are no "switch" statements in Python, but you can use dictionaries to a similar effect.

In [None]:
a = 2
while (a < 100):
  if a == 50:
    print("Fifty!")
    break
  elif a % 2 == 0:
    print("{}: even".format(a))
    a = a + 7
  else:
    print("{}: odd".format(a))
    a = a * 2
print(a)
print("Done!")

# Exercise 2

Try writing code that squares the matrix [[1, 2],[3,4]] until the upper-left [0,0] element is greater than 200.  (Recall that A @ A will work to square it, as long as A is a numpy array, np.array(your_list_of_lists)).  Hint:  The value in that corner that triggers the end will be 165751.

In [None]:
import numpy as np

def first_big_matrix(A):
  while A[0,0] <= 200:
    A = A @ A
    print(A)
  return A

myA = np.array([[1,2],[3,4]])
print(first_big_matrix(myA))

# Advanced Interlude:  But why Python?

When Lisp, the language Racket was based on, was the primary language for AI, its basic syntax was a very good fit for what AI was doing:  processing massive lists of states to check for A* searches.  (Its compilers are still very efficient for this; I think some of the Google Flights codebase is still in Lisp.)  Python's relationship to AI/ML is based much more on the availability of nice libraries, and the ability to build more nice libraries on top of those existing ones.  Here's an example of two, scikit-learn (sklearn) for ML and matplotlib for plotting.  You can see that you can do a lot with just a little code with the right libraries.

You aren't expected to fully understand what's happening in the following code, but rather see how we can do a lot with few lines of code.

In [None]:
# Digit recognition, based on example at
# https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html

from sklearn import datasets
import matplotlib.pyplot as plt

digits = datasets.load_digits()  # The MNIST handwritten digit database
plt.imshow(digits.images[0], cmap=plt.cm.gray_r) # The first digit in the data, which just happens to be 0, in grayscale

In [None]:
# Neural networks are complex, but we can just ask sklearn for one.
# Note that more complex networks need more complex libraries (e.g., Tensorflow)
# with which to specify more fine-grained details about the architecture.
from sklearn.neural_network import MLPClassifier  # MLP = "multilayer perceptron", so, nontrivial neural network

my_nn = MLPClassifier(hidden_layer_sizes=(30,20,)) # Lots of optional arguments that we're skipping
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1)) # Flatten the 3D images x width x height to the 2D images x pixels
my_nn.fit(data, digits.target)  # Do the learning with features (images) "data" and labels (digit names) "digits.target"
my_nn.predict([data[0,:]]) # The first row is the image above, in vector form.  Did the network learn it?

In [None]:
# What about a new digit image?  "Drawing" a 7
my_one = np.array([[1,1,1,1,1,1,1,0],
                   [1,1,1,1,1,1,0,0],
                   [0,0,0,1,1,0,0,0],
                   [0,0,1,1,0,0,0,0],
                   [0,1,1,0,0,0,0,0],
                   [1,1,0,0,0,0,0,0],
                   [1,0,0,0,0,0,0,0],
                   [1,0,0,0,0,0,0,0]
              ])
flattened = np.transpose(my_one.reshape((64,-1)))
my_nn.predict(flattened)

If that all seems like a lot to learn, don't worry - even professional Python programmers spend a lot of time looking at documentation.  They definitely don't memorize every function signature, though they might begin to remember the ones they use a lot.

# Object-oriented Python

AI and ML often deal with state information that can contain multiple variables, like where the robot is in Vacuum World or the strengths of connections between neurons in a neural network.  So it makes sense that we'll often encounter objects - variables that hang together and describe the same entity, combined with functions designed to work with them called methods.

Unlike Java, you don't really need to declare field names ahead of time.  The object is called "self," and referring to a field like "self.position" will cause it to exist.

The constructor, which creates a new object, is always called __init__, and the function that gets called when printing the object is called __str__.  The first argument to every method is "self."


In [None]:
import numpy as np

class VacuumWorld(object):
    """ Represent a grid containing dirt or not in each square, plus a single robot.

    Attributes:
        tile_dirt (numpy array): 2D array of ints representing world. 1 for dirt, 0 for no dirt.
        robot_col, robot_row (int): describe robot location
        width, height (int):  map size, for convenience
    """

    def __init__(self, width, height, dirt_coords):
        """ Arguments:
          width (int): width of vacuum world
          height (int): height of vacuum world
          dirt_coords (list of int pairs):  dirt locations
          (assume robot always starts in upper-left corner)
        """
        self.width = width
        self.height = height
        self.tiles = np.zeros((height, width))
        self.robot_col = 0
        self.robot_row = 0
        for dirtpair in dirt_coords:
          self.tiles[dirtpair[0],dirtpair[1]] = 1

    def __str__(self):
        """String is grid with * for dirt, _ for no dirt, R for robot"""
        out = ""
        for i in range(self.height):
            for j in range(self.width):
              if self.robot_row == i and self.robot_col == j:
                out += "R"
              elif (self.tiles[i][j]):
                out += "*"
              else:
                out += "_"
            out += "\n"
        return out
    
    # Done if all dirt vacuumed.  "sum" is the kind of handy function that
    # you don't remember the name and syntax of, but you're pretty sure exists
    def done(self):
      return np.sum(self.tiles) == 0

my_world = VacuumWorld(3, 4, [(1,1), (2,2)])
print(my_world)
print(my_world.done())

# Exercise 3

Try defining your own object with a single piece of state that is set by the constructor and printed by __str__.  If you're looking for inspiration, consider modeling a pet, friend, or nemesis with their mood.  If there's time, define one additional method.  Check that your definition works.

In [None]:
# TODO

# Gotchas:  Indentation, Notebook evaluation, Object copying

One fairly common source of bugs in Python is issues with indentation.  The most horrible version of this is that a tab isn't considered the same amount of whitespace as the number of spaces necessary to cover it.  So "    " (4 spaces) is not treated the same as "    " (tab that happens to be 4 spaces long).  Oof.  Luckily, colab keeps you from shooting yourself in the foot by automatically converting your spaces to tabs, as any reasonable Python editor should.

But the problem remains that as your code gets big, it's harder to eyeball indentation levels to check whether they're the same.  Many a student has come to me with a "return" inside a loop instead of outside it, leading to exactly one iteration and then quitting.  The main solution is to be very conscious of this as you code, and don't assume the editor is autoindenting to the right place.  Better function breakdown often helps - if you have to scroll up to see your last deindent, maybe you could write a function for this stuff inside the loop and make the code more readable, too.

A common source of bugs within Jupyter Notebook is to evaluate the cells in an unexpected order.  It doesn't run everything in order up to your current point unless you tell it to, and that's a good idea if things have gotten strange:  Runtime->Run all up top.

We often need to copy objects in the early part of this course, in order to do search (and also in MCTS if you go that way for a project).  Simple assignment won't cut it; that just copies the reference.  This is true for lists and dictionaries, too.

In [None]:
a = VacuumWorld(3,4,[])
b = a
a.tiles[0,1] = 1  # will change both a and b
print(a)
print(b)

In [None]:
a = [1, 2]
b = a
a.append(3)
print(a)
print(b)

There are a variety of methods floating around for actually copying things successfully; consult the documentation webpages or Google "python copy [data structure]".  For a class, we may want to explicitly create a copy() method that calls the constructor with the right arguments, and modifies the result if necessary.  Of course, we could also just call the constructor ourselves.

In [None]:
a = VacuumWorld(3,4,[])
b = VacuumWorld(a.height,a.width,[])  # This will clearly work!
a.tiles[0,1] = 1
print(a)
print(b)

In [None]:
a = [1, 2]
b = a.copy()
a.append(3)
print(a)
print(b)

# Some random Python slickness

There are some idioms in Python that you probably won't feel comfortable writing yourself if you're just coming from Java, but Python lovers like these kinds of shortcuts.  None of these is strictly necessary if you want to just write in "translated Java" but it's good to recognize what it's doing when you see it.

## List comprehensions

The classic Python slickness, this lets you define a list easily as a function of another list.  Avoids the awkwardness of iterating and appending.

In [None]:
squares = [a ** 2 for a in range(10)] # ** is exponentiation
print(squares)

## Slicing

The colon operator can be used to grab substrings, sublists, and subarrays from these data structures.  Index at [FirstElementIn:FirstElementNotIn].  Leaving one of these off goes all the way in that direction; putting just the colon down means "everything."  -1 goes not quite to the end, but one element back.

In [None]:
mynums = [1,2,3,4]
print(mynums[1:3])

In [None]:
import numpy as np

myarray = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(myarray[:, 1:]) # "all rows, columns from 1"

In [None]:
s = "foobar"
print(s[0:3])
print(s[0:-1])

## Numpy operations affecting the whole vector/matrix

When there's a size mismatch, numpy will often try to generously interpret the operation as applying to each item in the other matrix or vector, a behavior called "broadcasting."  This can be very confusing if you don't know it's possible.

In [None]:
v = np.array([1, 2, 3, 4])
print(v+1)

In [None]:
v < 5

This doesn't work with basic lists.

In [None]:
L = [1,2,3]
L+3

## Ternary operator

For quick if-elses determining what value is returned, there's a shorthand that works all on one line.  Note that normally, there would be colons in there.

In [None]:
b = 2
a = 1 if b == 2 else 3
print(a)

## Iterating over tuples

In [None]:
for word, count in letter_counts:
  print("{} is {} letters long".format(word, count))

# Final remarks

As usual, a lot of learning a new language is practice and seeing what experts do. When you get stuck with a "how do I..." question, consult the more extensive tutorial at:

https://docs.python.org/3/tutorial/

Modules like numpy, matplotlib, and scikit-learn have their own documentation that you can find by Googling.

Python is sometimes quirky, but has a lot of neat shortcuts, and more importantly, has some of the best libraries for AI and ML at present.  There's nothing about it that means it will be the best language for AI/ML for all time; but it's the clear choice in the present.  Happy coding!