<h1>Hack the Law: Coding Basics for Law Students</h1>
<i>Paul Gowder, 9-5-15</i>

First, we've gotta give a nod to tradition.  Everyone who learns to program starts off by telling the computer to say "Hello World!"  It's kind of like reading <i>Pierson v. Post</i> or <i>Hawkings v. McGee</i> in 1L year: you just haven't had the experience unless you've done it.  
<p>
SO!  There's a text entry field just below.  That's called a "cell" in the interface we're using.  (This thing, the ipython notebook, is not normally how people write code.  It's a special interface to an actual code interpreter running under the hood.  We're using it because it makes it easy to combine text, code, and the output of code, and then share the results.)  
<p>
Type the following line of code into it: <p>
    print 'Hello World!'
<p>
Then hit the shift key and the enter key at the same time to execute the cell.



In [1]:
print 'Hello World!'

Hello World!


<b>Congratulations!  You've written your first program.</b>  (Everyone always says that.)  Now to do serious stuff.

There are two basic ideas that underly all programming.  
<ol>
<li>Repetition</li><p>
Computer code is built up out of simple tasks repeated over and over.  At the basic level, all computer programming is just switching electrical circuits between two states, on and off, but you can represent really complex stuff by doing that many times.
</p><br>
<li>Abstraction</li><p>
</p>Programming is all about representing what you want to do in abstract form, so that you can apply it many times over, to different kinds of input.
</ol>

<h2>Repetition:</h2>
This is an idea that should be familiar to you from math.  Multiplication is just exponentiation over and over; exponentiation is just multiplication over and over.  The advantage of a computer over a human is that it can carry out these tasks over and over very accurately and at very high speeds. <p>Execute the following cell by putting the cursor in it then hitting shift+enter.

In [None]:
print 2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2+2
print 2*2*2*2*2*2
print 2**6

<h2>Abstraction</h2>
Abstraction is the powerful idea that when we represent something in computer code, we've captured that behavior.  If the thing we represent is a way to interact with data, then we can take that code and apply it to all kinds of different data, and still get predictable results.
<p>
Another example from math is the easiest way to think of this.  The plus symbol, <b>+</b>, is nothing more than an abstraction for an operation, namely adding.  It takes two pieces of data (numbers), and spits out a third piece of data, and it works for <i>any</i> two numbers.  Plus works the same whether you feed it 2 and 2 or 85817509860916283505986 and 34839863929323618004747466.  
<p>
Every computer program is an abstraction in that sense.  Your word processing program, for example, is nothing more than a complex bucket of operations that can take a bunch of text, plus some commands that you give it through the user interface ("make this bold" etc.), and then output a completed document.  It can take all kinds of text, and a wide variety of commands; think how useless it would be if you could only give it a few sentences!
<p>
We can understand <b>legal rules as abstractions</b> as well.  A "contract" specifies certain kinds of permissible imput (promises, given in an offer-assent form, backed up with consideration), and then applies an operation to them, and spits out legal obligations.  Within the bounds of the legal rules that specify the input, contract law ideally has a predictable output.  (Of course, we know it doesn't work that way in reality, because judges aren't computers... yet.)  A contract doesn't care if you give it "A's promise to turn over a horse" and "B's promise to pay money" or "A's promise to show up at the party and sing jazz standards" and "B's promise to teach A how to program."  It just applies the same operations to both pairs of promises.  
<p>
Actually, I think this is why you see lots of people going back and forth between law and coding: the same kinds of mental skills of representing abstractions are useful in both domains.  (The same goes for philosophers, who have been known to move in both directions.)

Execute the following cell.  Don't worry anything except the last four lines right now, but observe that we've defined an abstract operation for the computer to do ("bluebookCite"), and that, because we have that operation, we can apply it to two different chunks of data and get predictable output.

In [2]:
def bluebookCite(citeDict):
    theString = citeDict["p1"] + ' v. ' + citeDict["p2"] + ', ' + citeDict["repVol"] +  ' ' + \
    citeDict['reporter'] + ' ' + citeDict["repPage"] + ' (' + citeDict["year"] + ').'
    return theString

case1 = {"p1": "Gowder", "p2": "Iowa Weather", "repVol": '101', "repPage": "999", "reporter": 'U.S.', "year": "2015"}
case2 = {"p1": "Policy Lab Students", "p2": "Gowder", "repVol": '24', "repPage": "3", "reporter": 'U.S.', "year": "2016"}
print bluebookCite(case1)
print bluebookCite(case2)

Gowder v. Iowa Weather, 101 U.S. 999 (2015).
Policy Lab Students v. Gowder, 24 U.S. 3 (2016).


Here's what that cell did: 
<ol>
<li>Created a special kind of an abstraction called a <i>function</i>, and named it bluebookCite.  I'll show you how to do that in detail later, but for now know that <b>a function is a reusable operation, and you can make your own</b>.  We can imagine that the plus sign is a function (and in some programming languages it explicitly is one), as are the formalities of making a contract.</li>
<li>Defined two cases using a special <i>data structure</i> called a "dictionary" (more on this later too). </li>
<li>Fed the two cases, "case1" and "case2," into the function bluebookCite.</li>
<li><i>Print</i>ed the output to the screen.  (Yes, for coders, "print" often means to the screen, not necessarily to paper.)</li>
</ol>

<h2>Algorithms</h2>
An "algorithm" is just a set of instructions.  We can express a complex task in terms of more basic elements.  For example, suppose I want to make coffee.  (I *always* want to make coffee.)  We might express the algorithm as follows: 
<ol>
<li>Get beans</li>
<li>Put beans in grinder</li>
<li>Activate grinder</li>
<li>Get French press</li>
<li>Put grounds in french press</li>
<li>Heat water, not all the way to boiling</li>
<li>Pour water into french press</li>
<li>Let the coffee sit for a few minutes</li>
<li>Press the plunger</li>
</ol>
<p>
As you can see, an algorithm is a lot like a recipe, or a set of instructions for assembling furniture.  Of course, those instructions wouldn't do us a lot of good for a coffee-making robot, because they're not written in computer language.  (An algorithm written in human language is called "pseudocode."  When you write code, it often helps to write it out as pseudocode first, so you can get straight in your brain what you want it to do, and then translate it to computer language.)

<h2>Control Flow</h2>
Computer programs wouldn't be very useful if all they did was execute a fixed set of instructions, once, and then stop.  The real interesting ideas come in when we ask them to change their behavior and do things more than once.  That's all called "control flow."  Two big ideas in control flow are the "conditional" and the "loop."

<h2>Conditionals</h2>
The idea of a conditional is straightforward.  Sometimes you want a program to do X only if condition A holds, otherwise you want it to do Y (or nothing at all).  
<p>
This is a familiar idea from the law.  We can express the warrant requirement as a conditional algorithm, for example.  In pseudocode: 
<ol>
<li>**If** you have a warrant, **then** you may search.  **Otherwise**:</li>
<li>**If** you meet a named exception to the warrant requirement (search incident to arrest, plain view, consent, etc.), **then** you may search.  **Otherwise**:</li>
<li>You may not search.</li>
</ol>
<p>
If our warrant algorithm were in computational form (Robocop!), here's what would happen.
<ul>
<li>First, the computer would execute line 1.  If Robocop has a warrant, he would search, and wouldn't execute anything else in that chunk of code.</li>  <li>If he doesn't have a warrant, he'll execute line 2.  He'll review his list of warrant exceptions and see if any of them apply.  If they do, then he'll search, and won't execute anything else.</li>  <li>If he doesn't have a warrant, *and* he doesn't meet any of the exceptions, he'll execute line 3, and on line 3 he concludes that he can't search, so he doesn't.</li>
<p>
Another way to think of it is that the last line in the conditional is a default rule, sort of like when someone dies without a will, the intestacy rules give the default behavior of the legal system. 

<h2>Loops</h2>
A loop is just a way of telling the computer to do something multiple times.  You can specify a fixed number of times, or you can combine loops and conditionals to tell the computer to do something a number of times that depends on some other piece of information.  

<p>
Here's another lawyer example.  Suppose you're doing document review to respond to a discovery request.  You've got the following conditional algorithm, expressed as a function (abstraction, remember?).<p>

**Doc Review Function**
<ol>
<li>Read document.</li>
<li>**If** document is privileged, **then** do not produce it **and** add it to the privilege log.  **Otherwise**</li>
<li>**If** document is responsive, produce it.  **Otherwise**</li>
<li>Do not produce the document.</li>
</ol>

But you have more than one document.  So you can express your whole task in a loop: <p>
**For** each document, execute Doc Review Function


<h2>Variables, Expressions and Operators</h2>
One last big set of ideas before we get to writing actual code.  A piece of code that produces a value is called an *expression.*  This includes numbers, pieces of text (known as "strings" in programmer-ese), the special *boolean* values True and False, and other such things.  You can combine expressions with *operators* to produce new expressions.  
<p>
For example, we can combine the expression 2, the operator +, and the expression 2 to get the new expression 4.  
<p>
But we don't have to calculate the result of expressions ("evaluate" them, in programmer-ese) before using them in bigger expressions. The following two statements are equivalent: 
<p>
(2+2) * 2
<p>
4 * 2
<p>
Of course, you know they're mathematically equivalent, because you successfully graduated middle school.  But they're computationally equivalent too.  **Computer programming is basically like algebra** in this way.  (And like algebra, there's an order of operations, which you can override with parentheses.)  

<p>
Also like algebra, we have variables.  We can *assign* the value 2+2 to, say, x, as follows:
<p>
x = 2+2
<p>
then when we can have a third way of expressing the same thing: 
<p>
x * 2

<hr>
<p>
Ok, time to write some code!  Python's the easiest useful programming language, because it strips away a lot of the obnoxious *syntax* (rules for forming expressions) that other languages have.  A few things really matter in Python, though.  
<p>
First are lines.  Python executes your code line by line, unless something in control flow tells it to skip one.  Normally (with some exceptions) a single command occupies one line.
<p>
Second is indentation.  Most languages mark out "code blocks" (I'll explain them in a minute) with brackets or something similar.  For example, here's an ugly chunk of javascript (javascript is, in general, a really ugly language and it makes me sad): 

    function showHide(shID) {
       if (document.getElementById(shID)) {
          if (document.getElementById(shID+'-show').style.display != 'none') {
             document.getElementById(shID+'-show').style.display = 'none';
             document.getElementById(shID).style.display = 'block';
          }
          else {
             document.getElementById(shID+'-show').style.display = 'inline';
             document.getElementById(shID).style.display = 'none';
          }
       }

You don't need the little curly-braces in Python.  Instead, you'd just indent the lines and it's all good.  

<h2>Now to work!</h2>
Let's start with basic functions.  To assign a variable to an expression, you put the variable name, then an equal sign, then the expression.  Case matters (*always*).  For example, to assign the variable x to the value 2+2, you'd type:
<p>
x = 2 + 2
<p>
(2+2 and 2 + 2 are equivalent)
<p>
As you can see, to describe numbers, like 2, you just type them in (in numeric form, not words).  Then you can do math and stuff with them.  You can also assign variables to pieces of text, called *strings*.  You denote something as a string by putting it in quotation marks (single or double, it's fine either way, but you can't mix and match in the same string).  So to assign the name catNoise to the string MEOW, you'd type: 
<p>
catNoise = 'MEOW'
<p>
or
<p>
catNoise = "MEOW"
<p>
Things like numbers and strings are called "types," as in "types of data."  There is also a *boolean* type, which takes the special values True and False (again, case matters), so we can, e.g., assign: 
<p>
priorCriminalRecord = True
<p>
We can reassign a variable (and lose its previous value) just by entering a new assignment statement

<p>
Now you try.  Assign the variable *verdict* to the string *guilty*, and then assign the variable *fine* to the number 500.

In [4]:
x = 1000
X = 50
print x
print X

Xx = 'THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!'
for x in range(10):
    print Xx

1000
50
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!
THIS IS THE BEST CLASS EVER!  YAY GOWDER!!!


Important note: in Python, as in many programming languages, numbers default to "integer"--that is, rounded to whole numbers, no fractions or decimals.  This can be bad news when you do math, because you can accidentally lose information and get hillariously wrong answers.  There are reasons for this, though, related to the problem with representing decimal parts in binary computers.  Suffice it to say that you'll often want to use "floats," and when it comes up we'll talk about it.  (This behavior also changes when you get to Python version 3, but that's a whole 'nother conversation.)
<p>
Just to see the problem, though, divide 5 by 2 in the cell below (hint: the operator for division is a forward slash).

In [6]:
print 5/2
print 5.0/2
print float(5)/2

2
2.5
2.5


You wanna see something even worse?  Now using "floats" add 1.1 to 2.2. (Incidentally, if you want to tell Python to use a float, i.e., keep the decimals, adding something after a decimal, even if it's .0, will do it.)

In [7]:
1.1 + 2.2


3.3000000000000003

Yes, that's right.  Python sometimes gets the wrong answer with decimal (floating point) math.  So does every other programming language.  It's a limitation of computers, though for most day-to-day applications it doesn't matter, because the errors can be pushed to a sufficiently small point that nothing important breaks.

<hr><p>There are also more complex kinds of "data structures."  For present purposes, the only two you really need to know about are the *list* and the *dictionary*.  <p>

A list is what it sounds like: an ordered list of elements.  They're denoted by brackets and commas.  You can put any of the basic types (strings, integers, floats, booleans) in a list.  For example, we can assign: 
<p>
myList = [1, 5, 'law', True, 'school', 2.0]
<p>
lists can also contain more complex data types, and (as always) can refer to elements by variable names as well as their values.  
<p>
myList2 = [1, 5, 'law', True, 'school', 2.0, x]
<p>
works because we assigned x a value earlier on (remember?)---if not, that line of code would give you an error.  
<p>
myList3 = [myList, myList2] 
<p>
also works.  
<p>
Sometimes lists can produce weird, screwed-up behavior, because they have a property called *mutability*.  That's too advanced for our class today, but if you get into more serious code, you should know that there's a good alternative, called a *tuple*, that lacks that property.  

The second complex data structure we'll learn today is called the "dictionary."  It's a mapping of elements to names, and it doesn't come in any particular order.  Remember the case names we created way up above?  We used a dictionary for that, as follows:
<p>
case1 = {"p1": "Gowder", "p2": "Iowa Weather", "repVol": '101', "repPage": "999", "reporter": 'U.S.', "year": "2015"}
<p>
That code creates a dictionary named case1, then within it, names a bunch of different strings.  So the dictionary entry with the name "p1" maps to the string "Gowder," for example.  The colons and commas should be obvious, ditto the role of curly brackets.

We refer to elements of lists by their numerical position.  For some weird reason, **almost all programing languages start counting at zero**.  (This will cause lots of errors in your code; it happens to everyone.)  So if we have a list: 
<p>
myList = ["The", "Brown", "Cow"]
<p>
and we wanted to get "The" out of it, then we'd say: 
myList[0]
<p>
Let's actually do that.  Just execute the cell below.

In [8]:
myList = ["The", "Brown", "Cow"]
print myList
print myList[0]
print myList[1]
print myList[2]
# note that you can't get myList[3] because it doesn't exist!  counting from zero, remember?
# also, this pound sign/hash tag denotes a comment, a line you can have in a program 
# that doesn't get executed.  it's useful for documenting your code.
#
# also, it should be pretty clear what putting print at the start of a line does by now :-)

['The', 'Brown', 'Cow']
The
Brown
Cow


In [9]:
print myList[3]

IndexError: list index out of range

We refer to dictionaries by the names of their elements (the name is called the *key*, and the stuff attached to the name is called the *value*.  Keys can be numbers or strings, and a few other things too, values can be pretty much any data type, including more complex data structures like lists and other dictionaries).  Look at, then execute, the following cell.  

In [10]:
case1 = {"p1": "Gowder", "p2": "Iowa Weather", "repVol": '101', "repPage": "999", "reporter": 'U.S.', "year": "2015"}
print case1
print case1['p1']
print case1['year']

{'p2': 'Iowa Weather', 'p1': 'Gowder', 'reporter': 'U.S.', 'repPage': '999', 'repVol': '101', 'year': '2015'}
Gowder
2015


Now let's learn some control flow.  The next cell will pick a random number between 1 and 10, and then if it's even, it's going to say it's even; if it's odd, it'll say it's odd.  I'll explain every line of the next cell after we execute it.  Note that you can go back in an ipython notebook and execute a cell over and over (and it overwrites any previous variable assignments, etc.)

In [24]:
import random
dieroll = random.randint(1, 10)
print 'Randomly selected number is: ' + str(dieroll)
if dieroll % 2 == 0:
    print dieroll % 2 == 0
    print 'That\'s an even number!'
    print "kitty!!!"
    print "policy lab is the best!  Gowder is my hero in all things.  Yay gowder!"
else:
    print dieroll % 2 == 0
    print 'That\'s an odd number!'

Randomly selected number is: 7
False
That's an odd number!


Ok, let's talk about that code.  <p>

The first line is an *import statement*, followed by a *library*.  Basically, what that does is gives you access to a bunch of code written by other people.  It's sort of like what happens when a state just enacts a model/uniform code, like the ABA Model Rules, the Model Penal Code, or the Uniform Commercial Code---it doesn't have to write its own law on sales, it just imports UCC article 2.  Here, we've imported a bunch of pre-written functions allowing us to generate (kinda) random numbers.
<p>
The next line  assigns the name"dieroll" to the output of one of the functions we've borrowed.  The dot notation random.randint means "go look in the random library we just imported, and find something in it called randint.  <p>
Then it calls the function.  This bears more explanation.  The basic form of calling a funcion is to type: <p>
functionName(parameters)
<p>
where, in our example, random.randint is the function name, and the parameters are 1 and 10.  (Wou always list a function's parameters in parentheses, adjoined to the function name, and separated by commas.)  As I've said before, a function is an abstraction for a chunk of code, and so what I actually did was look up the randint function, and found that it takes two parameters, both of which are integers, and then gives you a random number between them (inclusive).
<p>
The next line is a print statement.  It tells us to print the string 'Randomly selected number is: ' and then there's some weird stuff.  The weird stuff is a string concatenation.  Basically, whenever you put a plus sign between two strings, it runs them together.  However, since the variable dieroll points to an integer, not a string, this wouldn't normally work.  Wrapping it in the built-in (str) function turns it from an integer to a string, so it can be concatenated.
<p>
The next few lines are the fun bits.  The line starting with an if statement is a conditional.  The python syntax for an if statement is: 
<p>
<pre>if [EXPRESSION THAT EVALUATES TO A BOOLEAN]:
    INDENTED BLOCK OF CODE</pre>
<p>
Let's take that in reverse order.  The indented block of code is all the stuff that gets executed if the expression in the if statement is True.  You can put as many lines in there as you want, so long as they're all indented to the same level.  
<p>
An "expression that evaluates to a boolean" means any expression that evaluates to True or False.  For example, the expression: 
<p>
1 + 1 == 2
<p>
evaluates to True, because 1 + 1  does equal 2.  
<p>
1+1 == 11 
<p> would evaluate to False, although, for amusement's sake go run the next cell.

In [25]:
print '1'+'1' == '11'

True


Python hasn't gone mad.  When you put it in quotes, you're not working with integers anymore, you're working with strings, and so the plus sign gives you string concatenation, not addition.  
<p>
ANYWAY!  As you've probably guessed, the double equals == means "equals" (the single equal sign is reserved for variable assignment).  Likewise, != means "not equals," > means "greater than, >= means "greater than or equals, and you get the picture.  
<p>
Onward.  So the only other thing that is new is the percentage sign.  That's a special math function called "modulo," and what it does is it just gives you the remainder just like in elementary school division.  Since the remainder of any even number divided by 2 is zero, that picks out the even numbers.   
<p> Now the print statement on the next line you should get.  The only new thing is the backslash.  When you put a backslash in a string, it's an *escape character* that means "the next character has a special meaning, but Python, ignore that special meaning and just treat it like a character.  Since the apostrophe (a.k.a. single quote) would normally end the string, but here we just want it to signify a contraction *inside* the string, it must be escaped out. 
<p>  Next is an else statement.  Like an if statement, it is followed by a colon and an indented block, but this time, the indented block is the code that is to be executed *only if the expression in the if statement above evaluates to False*.  
<p>
Else statements are actually optional.  The code above would work perfectly fine without it, it would just pass over odd numbers without remark.  If you want, go edit the random number code cell above to remove the last two lines, run it a few times, and see what happens.
<p>
You can also have multiple conditions using the elif statement, which basically says "ok, if the previous condition isn't True, then evaluate this condition and see if IT is True, and if so, execute the following code block, otherwise go on."  elif stands for "else if".  For example: 

In [34]:
dieroll2 = random.randint(1, 10)
print 'Randomly selected number is: ' + str(dieroll2)
if dieroll2 % 2 == 0:
    print 'That\'s an even number!'
elif dieroll2 < 4:
    print 'That\'s a small odd number!'
elif dieroll2 > 6:
    print 'That\'s a large odd number!'

Randomly selected number is: 5


<hr><p>Now it's your turn to try.  Based on the code above, I want you to write me some code that gets a random number between 1 and 100, and then, if the number is above 50, says "BIG!" and otherwise says "small."  Put it in the next cell.
<p>
**NOTE:** as you may have noticed from the cell above, you don't need to import random again.  An import statement works for everything in the code that follows.  

In [48]:
dieroll2 = random.randint(1, 100)
if dieroll2 > 50:
    print 'BIG!'
else:
    print "small."

small.


<hr><p>
But maybe you're feeling lazy.  You really don't want to go back, click on the previous cell, and hit shift+enter again to run it over and over to see what happens.  Wouldn't it be nice if you could just run it a bunch of times all in one go?  Well, that's what loops are for!  Before we get to an example, I want you to run a bit of preparatory code and see what comes out.  Go run the next cell.

In [49]:
print range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


What you can see is that range is a built-in function that produces (from here on, we will say *returns* which is a technical term for what a function outputs, and hence what it evaluates to in an expression) a list.  When you pass it a single number, it returns a list starting at 0 (because of course, computer programmers, jeez) and ending just one shy of the number in question.  (You can pass it multiple numbers too, but that's not important now.)  So range(10) produces a list of ten digits, but because coders are stupid, it goes from 0-9 rather than from 1-10.  
<p> Incidentally, if you want a proper 1-10 list, the easiest way to do it is the following code, which is advanced and I won't explain it: 

    [x + 1 for x in range(10)]


In [50]:
for count in range(10):
    dieroll3 = random.randint(1, 10)
    print 'Randomly selected number %s is: ' % str(count + 1) + str(dieroll3) 
    if dieroll3 % 2 == 0:
        print 'That\'s an even number!'
    elif dieroll3 < 4:
        print 'That\'s a small odd number!'
    elif dieroll3 > 6:
        print 'That\'s a large odd number!'
    else:
        print 'That\'s five!!'
    print

Randomly selected number 1 is: 4
That's an even number!

Randomly selected number 2 is: 10
That's an even number!

Randomly selected number 3 is: 5
That's five!!

Randomly selected number 4 is: 8
That's an even number!

Randomly selected number 5 is: 1
That's a small odd number!

Randomly selected number 6 is: 2
That's an even number!

Randomly selected number 7 is: 1
That's a small odd number!

Randomly selected number 8 is: 9
That's a large odd number!

Randomly selected number 9 is: 2
That's an even number!

Randomly selected number 10 is: 2
That's an even number!



In [53]:
xxx = "Public Policy FTW, also Gowder is Awesome."
for xxy in range(5):
    print xxx
    


Public Policy FTW, also Gowder is Awesome.
Public Policy FTW, also Gowder is Awesome.
Public Policy FTW, also Gowder is Awesome.
Public Policy FTW, also Gowder is Awesome.
Public Policy FTW, also Gowder is Awesome.


In [54]:
GradesList = ["A+", "A-", "F-"]
StudentList = ["Very Smart", "Kinda Smart", "Came to class stoned"]
for index in range(len(StudentList)):
    print "Student %s got grade %s" % (StudentList[index], GradesList[index])

Student Very Smart got grade A+
Student Kinda Smart got grade A-
Student Came to class stoned got grade F-


In [56]:
for grade in GradesList: 
    print grade + 'fish'

A+fish
A-fish
F-fish


So here's an explanation of the new stuff.<p>
The first line initiates a for loop.  The basic syntax of a for loop (there are other kinds, such as a while loop, but we don't need to worry about them now) is as follows:<p>
<pre>
for VARIABLE in LIST: 
    INDENTED BLOCK TO EXECUTE
</pre>
and what that means is "take the first item in LIST, temporarily assign that item to the name VARIABLE, then execute the indented block.  The take the second item in the list, temporarily assign it to variable, then execute the indented block again.  and so on, until you run out.
<p>
since here, the list is range(10), and we've seen what that returns already, what the for statement does is tells the Python interpreter to take the number 0, assign it the name count, run the code, then loop back around, take the number 1, reassign the name count to it, and so forth. 
<p>
As you've seen, we can nest indented blocks, so inside the block for the for loop we have indented blocks for our conditionals as before.
<p>
the only other new line is:
<p>
print 'Randomly selected number %s is: ' % str(count + 1) + str(dieroll3) 
<p>
The %s in there is a *format character*.  What that means is "go look at the end of the string for another percentage sign, then swap the result of the expression in there at this spot.  It allows us to stick the values assigned to variables in the middle of strings.  Here, I'm saying "take out count variable (which, you remember, is just the integer in the list we produced with the range function), add 1 to it, turn it to a string, then stick it in the middle of that other string.  Then we do another string concatenation.  
<p>
Actually, that could have been done a lot more concisely, but this way is clearer for learning.  Anyway, don't worry about format characters, you don't really need to know them right now.  Just wanted to explain what I was doing.  
<p>The print statement at the bottom adds a blank line.  Because it's in the top-level indented block, it adds a blank line in every iteration of the for loop.

Your turn!  In the next cell, first create a list containing the strings 'cat', 'dog', and 'zebra', then write a for loop which takes every item in the list, concatenates it with the string 'fish', and prints it.  So you should end up with:
<p>
<pre>
catfish
dogfish
zebrafish
</pre>

In [57]:
zoo = ['cat', 'dog', 'zebra']
for animal in zoo:
    print animal + 'fish'

catfish
dogfish
zebrafish


Ok, now we're ready for the last piece of our basic concepts in python!  Suppose we wanted to abstract out this repeat-random-number-getting-and-describing behavior, and apply it to different inputs?  Sometimes we want to get ten numbers, each between 1 and 10, but sometimes we want to get 20 numbers, each between 1 and 50!  Do we have to write the same code over and over again like lemmings?  No we do not.  Instead, we put it in a function, just like random.randint, and then we can call it outselves whenever we want.  Watch: 

In [59]:
def describeNums(iterations, toprange):
    half = toprange / 2
    for count in range(iterations):
        dieroll = random.randint(1, toprange)
        print 'Randomly selected number %s of %s is: %s.' % (count + 1, iterations, dieroll)
        if dieroll % 2 == 0:
            print 'That\'s an even number!'
        elif dieroll < half:
            print 'That\'s a small odd number!'
        elif dieroll > half:
            print 'That\'s a large odd number!'
        else:
            print 'That\'s right in the middle!'
        print

Now we've got a function, which I'll explain in the moment.  First, demonstration.  Run the next few cells.

In [60]:
describeNums(1, 20)

Randomly selected number 1 of 1 is: 3.
That's a small odd number!



In [61]:
describeNums(5, 100)

Randomly selected number 1 of 5 is: 70.
That's an even number!

Randomly selected number 2 of 5 is: 8.
That's an even number!

Randomly selected number 3 of 5 is: 69.
That's a large odd number!

Randomly selected number 4 of 5 is: 91.
That's a large odd number!

Randomly selected number 5 of 5 is: 20.
That's an even number!



In [64]:
describeNums(10000, 10000000)

Randomly selected number 1 of 10000 is: 8679352.
That's an even number!

Randomly selected number 2 of 10000 is: 9711865.
That's a large odd number!

Randomly selected number 3 of 10000 is: 8381858.
That's an even number!

Randomly selected number 4 of 10000 is: 7035684.
That's an even number!

Randomly selected number 5 of 10000 is: 8160472.
That's an even number!

Randomly selected number 6 of 10000 is: 4037247.
That's a small odd number!

Randomly selected number 7 of 10000 is: 6560278.
That's an even number!

Randomly selected number 8 of 10000 is: 8606171.
That's a large odd number!

Randomly selected number 9 of 10000 is: 8533290.
That's an even number!

Randomly selected number 10 of 10000 is: 2601913.
That's a small odd number!

Randomly selected number 11 of 10000 is: 5977953.
That's a large odd number!

Randomly selected number 12 of 10000 is: 3190048.
That's an even number!

Randomly selected number 13 of 10000 is: 9826838.
That's an even number!

Randomly selected number 14

By now, most of this should be super-clear.  (Again, ignore the formatting operations on the strings.  It's unimportant.)  The new and important stuff is all in the first line.  
<p>
We start a function definition with the def statement, end the first line with a colon, and use an indented block to demarcate the stuff that counts as part of the function, just like we did with the loop and the conditional.  
<p>
In between, we give the function a name and its *formal parameters*.  Formal parameters are the stuff in the parentheses, and what they do is they serve as placeholders for the values you pass in when you call it.  So in our first example, when we called describeNums(1, 20), the call passed 1 to the function, and assigned it the temporary name iterations (from the first formal parameter; it passed 20 to the function and assigned it the temporary name toprange.  Then it executed the stuff in the function definition (the top-level indented block) using those values for the variables in it.  
<p>
<hr>
<p>
There's one very important thing about functions.  You can only put them in other expressions and evaluate them if they include a return statement at the bottom.  Otherwise, they don't produce anything.  The function we created just a moment ago is actually pretty useless, all it does is print stuff to the screen (printing is called a "side effect," don't worry too much about the term).  But it doesn't, e.g., produce anything we could do math with.  By contrast, the following function is much more useful: 

In [None]:
import math
def crazyMath(aNumber):
    add1 = aNumber + 1
    subtract1 = aNumber - 1
    square = aNumber **2
    root = math.sqrt(aNumber)
    pluspi = aNumber + 3.1415
    pluse = aNumber + math.e
    naturallog = math.log(aNumber)
    stupidity = math.sqrt(((aNumber + 1)**2)-1)
    return stupidity

    

When we execute it, we get the stuff produced by the return statement, but *all we get out of it* is the stuff produced by the return statement.  All that other stuff, add1, subtract1, it gets created within the body of the function, and then forgotten when the function ends (this is a feature called "scoping" which we don't need to stress about right now).  Observe: 

In [None]:
print 5 + crazyMath(20)

if you actually carried out the math attached to the stupidity variable, you'd see that's the right number, and all the rest of that stuff just got fogotten.  Expanded:

In [None]:
a = 20
b = a + 1
c = b**2
d = c - 1
e = math.sqrt(d)
print 5 + e

<h2>Time for fun!</h2> 
This is really a solid basic grounding in coding in Python you have now!  We'll learn, in the next class, how to use some of these tools for data analysis, now that a block of python code won't be too intimidating.  If you want to learn how to put this stuff together in actual programs, here are some more resources: 
<ul>
<li><a href="http://learnpythonthehardway.org/book/">Learn Python the Hard Way</a> (free tutorial)</li>
<li><a href="https://automatetheboringstuff.com/">Automate the Boring Stuff with Python</a> (free version of a book)</a>
<li><a href="https://mitpress.mit.edu/books/introduction-computation-and-programming-using-python">Introduction to Computation and Programming Using Python</a> a great book </a></li>
</ul>

<b>Now let's mess around with some caselaw.</b>  Let's use our new coding skills to learn some stuff about the current Supreme Court!

I've created a big text file in a special format called JSON.  (You don't need to know how that works right now, just know it's *awesome*, and probably the only good thing javascript ever spawned.)  You can download it <a href="https://github.com/paultopia/code-data-lawstudents/blob/master/supcourt.json?raw=true">HERE</a>.  It contains ROUGHLY every Supreme Court decision since Jan 2011 (a rough guess as to when Kagan would have potentially started writing opinions), through as close as possible to today, it's the closest thing we have to a full dataset of the current court.  (n.b. *this has errors in it*: the source I downloaded it from seems to have made mistakes in the metadata used to get this, so there are old cases and non-supreme-court cases).  Download the file now.  <p>Then let's find out what directory you're working in, via the next cell.  For now, don't worry about the specifics of the code, I'll explain the important stuff.

In [65]:
from os import getcwd as myDir
print myDir()

/Users/pauliglot/github/code-data-lawstudents


Go ahead and save the file you just downloaded into the directory that the last cell produced.  Call it *supcourt.json*.  Then run the next cell, which takes that file and reads it into memory.

In [66]:
import json
with open("supcourt.json") as supcourt:
    cases = json.load(supcourt)

What we have now is a complex list, where each element in the list is a case, and each case is a dictionary with stuff like year, opinions, etc. in it.  First, let's extract the data we want.  What we're really interested in is the text of the cases.  Unfortunately, it's not clean---the format of opinions in this file is basically equivalent to a scan with html code added, and all the opinions are run together.  But we can still learn interesting things from that ugly chunk of text.
<p>
I'd like you to write a loop that extracts all the text from each case and *appends* it to a new list.  Here are some tips that will help: 

* the text from each case can be collected using the dictionary key string "html_with_citations"

* in order to make a list and append to it, first create an empty list by the syntax:
<p>
LISTNAME = []
<p>
then, when it comes time to append an item to it, you'll want to use the following syntax:
<p>
LISTNAME.append(ITEMNAME)
<p>

give your list the name *caselist*

Ok, if all went well, you should have a list where each element is a string with the text of a case in it.  I'm going to go do the same thing in the next cell, but in a slightly more concise way. 

In [67]:
caselist = [item["html_with_citations"] for item in cases]

Let's look at the first item to make sure we have what we expect. 

In [68]:
print caselist[0]

<pre class="inline">                 Cite as: 570 U. S. ____ (2013)            1

                     SCALIA, J., dissenting

SUPREME COURT OF THE UNITED STATES
                         _________________

                          No. 13A57
                         _________________


 EDMUND G. BROWN, GOVERNOR OF CALIFORNIA,

     ET AL. v. MARCIANO PLATA AND RALPH 

                 COLEMAN, ET AL. 

                 ON APPLICATION FOR STAY
                       [August 2, 2013]

  The application for stay presented to JUSTICE KENNEDY
and by him referred to the Court is denied. JUSTICE ALITO
would grant the application for stay.
  JUSTICE SCALIA, with whom JUSTICE THOMAS joins,
dissenting.
  When this case was here two Terms ago, I dissented
from the Court’s affirmance of the injunction, because the
District Court’s order that California release 46,000 pris-
oners violated the clear limitations of the Prison Litigation
Reform Act, 18 U. S. C. §3626(a)(1)(A)—“besides defying
all sou

Looks like a case to me!  And as you can see, there's a lot of stuff mixed in there, text, HTML code, page headers, it's all kind of dirty.  With more time, we could clean that stuff up, but data cleaning is a slow and sometimes tedious task (it's how the data science types *really* earn their money), especially when working with text, and I don't know about you, but I have better things to do with my time.  So we'll work with what we got.
<p>
Here's something interesting we could maybe learn: which justices are the most influential?  Let's say that influentialness is an idea that captures how often they write, how long their opinions are, and how often they get cited.  
<p>
I'm being really lazy to define influentialness that way, because it's easy to get with the data we have.  Since the justices' names appear once per page of their opinions, they will show up more, the more they write and the longer they write; they'll also show up more the more they are cited by name (usually when their dissents or concurrences are cited).  So we can get this with a simple search.
<p>
First, we'll make a list of all the justice's names.  It looks like they tend to show up in all caps, and also that they tend to be followed by ", J.,"---this is very useful, for it will help us filter out, e.g., parties that share a name with the justices (especially Thomas and Roberts).  As the chief, Roberts tends to get a "C. J."
<p>
We'll also create a dictionary mapping each Justice's name to a count, and start it at 0.  That will give us an easy way to keep track of how many times a justice shows up.

In [69]:
justices = ['ROBERTS, C. J.,','SCALIA, J.,','KENNEDY, J.,','THOMAS, J.,','GINSBURG, J.,','BREYER, J.,','ALITO, J.,','SOTOMAYOR, J.,','KAGAN, J.,',]
justicecount = {justice: 0 for justice in justices}


Now we can loop over all the opinions and count the number of times each name shows up. <p> Pleasantly, python is kind enough to give us a function to count the number of occurences of a given substring within a larget string.
<p>
Note the nested loops: we loop over our list of justices once per case.  Also note that the += syntax just adds a number to a preexisting number.  
<p>
Also note another use of the dot notation: we're using it this time to not to say "call a function from this library," but to say "call a function that applies to this particular type of data.  Don't worry about that, it's an artifact of the fact that Python is (alas) an "object-oriented" language.  

In [70]:
for case in caselist:
    for justice in justices: 
        justicecount[justice] += case.count(justice)

That didn't do anything!  Why not?  Well, we didn't tell it to display any text.  Let's see what we got here: 

In [71]:
print justicecount

{'SOTOMAYOR, J.,': 1003, 'SCALIA, J.,': 1289, 'ALITO, J.,': 927, 'ROBERTS, C. J.,': 594, 'KAGAN, J.,': 487, 'GINSBURG, J.,': 641, 'KENNEDY, J.,': 400, 'THOMAS, J.,': 1292, 'BREYER, J.,': 962}


Ok, we have data!  But it looks like we have stupid data.  What's going on here?  Kennedy is less influential?  *Kennedy's the swing vote!  How can that be right?*
<p>
Well, there's an easy answer.  If you look at how opinions are written, the majority opinion isn't identified by the justice's name.  So when Kennedy writes the actual opinion of the court, it isn't counted.  And when one of Kennedy's opinions is *cited* as the opinion of the court, that isn't counted either.  So this is really more like an anti-influence metric: it captures Scalia writing rants in dissents in which he cites himself from prior dissents over and over, and doesn't capture Kennedy actually making the law.  
<p>
Guess our influence metric was pretty stupid, huh?  

Still, it's a pretty interesting figure.  What else can we learn?  Well, if we look at a few cases, we can get a sense of the format for what it looks like when there are concurrences and dissents.  Let's inspect a couple.  (Do that on your own.)

Based on looking through a few, it looks like we can get concurrences and dissents out of the syllabus, and for concurrences we would want at least "filed a concurring opinion," and "filed an opinion concurring in the judgment," while for dissents we'd want "filed a dissenting opinion."  So let's go search through and see if we can't count the number of cases that have concurrences and dissents.  (The percentages will be a little distorted, because our pool doesn't just have full-fledged cases in it, it has things like stay denials, etc.)

In [72]:
cddict = {'concurrence': 0, 'dissent': 0}
for case in caselist:
    if ('filed a concurring opinion' in case) or ('filed an opinion concurring in the judgment' in case):
        cddict['concurrence'] += 1
    if 'filed a dissenting opinion' in case:
        cddict['dissent'] += 1
print "RESULTS OF CONCURRENCE/DISSENT INVESTIGATION"
print "Number of cases: " + str(len(caselist))
print "Number of cases with concurrence: " + str(cddict['concurrence'])
print "Number of cases with dissent: " + str(cddict['dissent'])
percon = int(round((float(cddict['concurrence']) / len(caselist)) * 100))
perdis = int(round((float(cddict['dissent']) / len(caselist)) * 100))
print
print "There are concurrences in about %s%% percent of cases and dissents in about %s%% of cases." % (percon, perdis)

RESULTS OF CONCURRENCE/DISSENT INVESTIGATION
Number of cases: 522
Number of cases with concurrence: 98
Number of cases with dissent: 140

There are concurrences in about 19% percent of cases and dissents in about 27% of cases.


<h2>Congratulations!</h2>  You've learned a bit of coding, and seen the cool stuff it can enable you to do in the law.  Now let's conclude with an: 
<p>
<hr>
<p>
<h1>Optional No-Credit Homework Assignment</h1>
<p>
We still would like to have some sense of how influential each justice is.  Our first influence metric was useless, but with the tools you have thus far, you ought to be able to build a better one.  Bragging rights to anyone who can come back next class and tell me: <p>
<b>In how many cases did each justice write the majority opinion?</b>