# Querying Textbook Data

### Objectives

- To use lists, loops, and conditionals to answer basic questions about a dataset
- To design data structures with specific purposes (use cases) in mind

Instructors should review the material in the `Preliminaries` section with the whole group, before giving teams their assignments.

### Preliminaries

Before we return to our textbook dataset, we will introduce one new Python **control structure** and review a common **programming pattern**. 

#### Conditionals & Boolean values

In a very abstract sense, a computer is a machine for implementing binary logic. In binary logic, the only values allowed are `1`'s and `0`'s, which represent `True` and `False`, respectively. 

Thus, at some level, _everything_ we do in computation can be reduced to `True` or `False` (from the computer's perspective). But from a programmer's perspective, this usually only matters in situations where we want the program to do different things based on certain conditions that may or may not obtain. This cases are called _conditionals_. 

For instance, we can tell Python to compare two numbers, using the standard operators you might have learned in math classes: greater than, less than, equal to, etc.

Note that in Python, we represent equality by the **double equals sign** (`==`). A single equals sign is reserved for variable assignment.

Running the code below should return `True`, which is one of two special Python values called **Boolean values** (so named because the binary logic that computers implement was invented by the mathematician George Boole).

In [1]:
book_price = 55.99 # Assignment: single equals sign
book_price < 60

True

##### Try it out!

Write an expression with the `book_price` variable that returns `False`.

(for instructors)

Solicit examples from the group. Make sure to demonstrate the case of equality, emphasizing the double equals sign, e.g.,

`book_price == 55` 

It may help also to point out that just as we saw that it's not possible to add a `str` value and an `int` (or `float`), we also can't compare string and numeric types:

```
'$55.99' > 60
```
The code above will produce a `TypeError`.


#### `if` statements

Usually, we want to do more than evaluate whether an expression (like `book_price < 60`) is `True` or `False`. Usually, we want the program to _take some action_ based on the outcome of that evaluation.

For this, we use the `if` statement. 

To print a message if the value of `book_price` is above a certain threshhold, we can write the following:

In [3]:
if book_price > 100:
    print("That's an expensive textbook!")

Running the code above produces no output, at least not if `book_price` is assigned as above (to the float value `55.99`). 

If we change the value of `book_price` so that the condition is `True`, we should see our message:

In [4]:
book_price = 101.99
if book_price > 100:
    print("That's an expensive textbook!")

That's an expensive textbook!


What if we want to check for books within a certain range of prices? 

We can use the **Boolean operator** `and` to do this. The `and` operator links two conditions: if both sub-conditions are `True`, then the whole condition (with the `and`) is also `True`. Otherwise, it is `False`. 

The other common Boolean operators are `or` and `not`. `or` is `True` if _at least one_ sub-condition is `True`. `not` flips (inverts) a condition: so `not True` is `False`.

In [6]:
book_price = 55.99
if book_price >= 20 and book_price <= 100:
    print("Not too expensive")

Not too expensive


In the examples above, our code either performed an action or not, depending on a single condition. We can also specify multiple conditions, only one of which can be true. For example, if `book_price` is between `20` and `100`, the following code will print `Not too expensive`), but it will print other messages if `book_price` is less than `20` or greater than `100`.

In [7]:
if book_price >= 20 and book_price <= 100:
    print("Not too expensive")
elif book_price < 20:
    print("That's a relatively cheap textbook.")
else:
    print("That's an expensive textbook!")

Not too expensive


Here are some rules of thumb for writing `if` statements in Python:
- You can have as many `elif` statements as you want, provided they follow an `if` statement.
- The `else` statement is a catch-all: it will be executed if none of the preceding `if` or `elif` statements evaluates to `True`.
- Otherwise, the _first_ `if`/`elif` statement that is `True` will be executed, and _all the rest_ will be ignored. In other words, if the conditions you're testing for are not mutually exclusive, you should write the most specific test first. 
- Each `if`/`elif`/`else` statement ends with a colon and is followed by an **indented block** of code. This is the same pattern we saw with `for` loops. 

(for instructors)

Make sure to take questions at this point. In particular, the implications of the third rule above may not be immediately apparent to learners. An example may help:

```
pub_price = 100.99
on_sale = True
if pub_price > 100 and on_sale:
    retail_price = pub_price * 1.25
elif on_sale:
    retail_price = pub_price * 1.35
else:
    retail_price = pub_price * 1.40
    
```
In this example, we calculate retail price from a publisher's price, using a markup percentage. The percentage to be applied depends on whether or not the book is on sale, with an extra discount for books over $100 that are on sale.

If we didn't put the compound condition in the first position (`if pub_price > 100 and on_sale`), then sale books over $100 would not get the appropriate discount. 
    

#### A `for` loop pattern

A very common use of `for` loops in Python is to **accumulate** (or aggregate, or otherwise keep track of) certain values when processing a list. 

We've seen a version of this pattern before: in the homework, you use a loop to convert some prices from strings to floats and then to adjust the price with sales tax. You collected the adjusted prices in a new list.

In the example below, we'll use this pattern to keep track of a single value. The value we're tracking will simply be the number of items in the list.

We'll use our bookstore dataset, so the first step is to read the file into Python from disk.

In [8]:
import json
with open('../../data/bookstore-data-simplified.json') as f:
    bkst_data = json.load(f)

To use this pattern, you usually have at least _three_ variables to deal with:
1. The variable that holds your list (`bkst_data`)
2. A loop variable (`course` in the code below)
3. A variable defined _before_ the loop that will be used to track or accumulate values. Since this loop is just counting items, we'll call this variable `counter`. We set `counter` initially to `0`, since before we run the loop, we haven't counted any items.

Note also that `counter += 1` is shorthand for `counter = counter + 1`. Either way of writing that expression is fine.

In [10]:
counter = 0
print("Counter before loop", counter)
for course in bkst_data:
    counter += 1
print("Counter after loop", counter)

Counter before loop 0
Counter after loop 1250


(for instructors)

Point out that the `print()` statements emphasize the fact that the `counter` variable is incremented by the loop, so that it goes from 0 to 1250 (the number of items in the list).

Point out that this code essentially does the same thing as `len(bkst_data)`. 

### Team Activity



In [12]:
len([b for b in bkst_data if len(b['texts']) > 0])

55