# Querying Textbook Data

## Objectives

- To use lists, loops, and conditionals to answer basic questions about a dataset
- To design data structures with specific purposes (use cases) in mind

## Preliminaries

Before we return to our textbook dataset, we will introduce one new piece of Python syntax and review a common programming pattern. 

### Conditionals & Boolean values

In a very abstract sense, a computer is a machine for implementing binary logic. In binary logic, the only values allowed are `1`'s and `0`'s, which represent `True` and `False`, respectively. 

Thus, at some level, _everything_ we do in computation can be reduced to `True` or `False` (from the computer's perspective). But from a programmer's perspective, this usually only matters in situations where we want the program to do different things based on certain conditions that may or may not obtain. These cases are called **conditional expressions**. 

For instance, we can tell Python to compare two numbers, using the standard operators you might remember from your math courses: greater than, less than, equal to, etc.

Note that in Python, we represent equality by the **double equals sign** (`==`). A single equals sign is reserved for [variable](https://gwu-libraries.github.io/python-camp/glossary.html#term-variable) assignment.

Running the code below should return `True`, which is one of two special Python values called **Boolean values** (so named because the binary logic that computers implement was invented by the mathematician George Boole).

In [1]:
book_price = 55.99 # Assignment: single equals sign
book_price < 60

```{admonition} Try it out!
:class: try-it-out

Write an expression with the `book_price` variable that returns `False`.

````

In [1]:
# Your code here

### `if` statements

Usually, we want to do more than evaluate whether an expression (like `book_price < 60`) is `True` or `False`. Usually, we want the program to _take some action_ based on the outcome of that evaluation.

For this, we use an **if statement**. 

To print a message if the value of the `book_price` variable is above a certain threshhold, we can write the following:

In [3]:
if book_price > 100:
    print("That's an expensive textbook!")

Running the code above produces no output, at least not if `book_price` is assigned as above (to the [float](https://gwu-libraries.github.io/python-camp/glossary.html#term-float) value `55.99`). 

If we change the value of `book_price` so that the condition is `True`, we should see the intended message:

In [4]:
book_price = 101.99
if book_price > 100:
    print("That's an expensive textbook!")

What if we want to check for books within a certain range of prices? 

We can use the **Boolean operator** `and` to do this. The `and` operator links two conditions: if both sub-conditions are `True`, then the whole condition (with the `and`) is also `True`. Otherwise, it is `False`. 

The other common Boolean operators are `or` and `not`. `or` is `True` if _at least one_ sub-condition is `True`. `not` flips (inverts) a condition: so `not True` is `False`.

In [6]:
book_price = 55.99
if book_price >= 20 and book_price <= 100:
    print("Not too expensive")

In the examples above, our code either performed an action or not, depending on a single condition. We can also specify multiple conditions, only one of which can be true. For example, if `book_price` is between `20` and `100`, the following code will print `Not too expensive`), but it will print other messages if `book_price` is less than `20` or greater than `100`.

In [7]:
if book_price >= 20 and book_price <= 100:
    print("Not too expensive")
elif book_price < 20:
    print("That's a relatively cheap textbook.")
else:
    print("That's an expensive textbook!")

#### Notes

Here are some rules of thumb for writing `if` statements in Python:
- You can have as many `elif` statements as you want, provided they follow an `if` statement.
- The `else` statement is a catch-all: it will be executed if none of the preceding `if` or `elif` statements evaluates to `True`.
- Otherwise, the _first_ `if`/`elif` statement that is `True` will be executed, and _all the rest_ will be ignored. In other words, if the conditions you're testing for are not mutually exclusive, you should write the most specific test first. 
- Each `if`/`elif`/`else` statement ends with a colon and is followed by an indented [block](https://gwu-libraries.github.io/python-camp/glossary.html#term-block) of code. This is the same pattern we saw with `for` loops. 



### A `for` loop pattern

A very common use of a [for loop](https://gwu-libraries.github.io/python-camp/glossary.html#term-for-loop) in Python is to count, aggregate, or otherwise keep track of certain values when processing a list. 

We've seen a version of this pattern before: in the homework, you use a loop to convert some prices from strings to floats and then to adjust the price with sales tax. You collected the adjusted prices in a new list.

In the example below, we'll use this pattern to keep track of a single value. The value we're tracking will simply be the number of items in the list.

We'll use our bookstore dataset, so the first step is to read the file into Python from disk.

In [8]:
import json
with open('../../../data/bookstore-data-summer-2023.json') as f:
    bkst_data = json.load(f)

To use this pattern, you usually have at least _three_ variables to deal with:
1. The variable that holds your list (`bkst_data`)
2. A loop variable (`course` in the code below)
3. A variable defined _before_ the loop that will be used to track or accumulate values. Since this loop is just counting items, we'll call this variable `counter`. We set `counter` initially to `0`, since before we run the loop, we haven't counted any items.

Note also that `counter += 1` is shorthand for `counter = counter + 1`. Either way of writing that expression is fine.

In [10]:
counter = 0
print("Counter before loop", counter)
for course in bkst_data:
    counter += 1
print("Counter after loop", counter)

## Team Activities

### 1. Counting over data

As we say on Day 1, not all courses report textbooks to the bookstore.

**Question**: _What percentage of courses have reported textbooks?_ 

#### Try it out!

Working with your team, write some code that will answer that question, using the`bkst_data` dataset. Below you can expand the hidden cell for some hints. But before you do that, make sure your team discusses the _logic_ you might use to approach this problem. 

Here are some questions that can help:

- A percentage expresses a relationship between two quantities. What are the two quantites involved in calculating this percentage?
- How can you adapt the previous "counter" pattern to this task?
- You want to count those items in the list that meet a certain condition. What is that condition?

And if you get stuck, one of the Python Camp facilitators will be happy to help!



In [None]:
# Your code here

#### Hint
- The textbooks associated with each course comprise a list.
- That list is associated with the `texts` key in the dictionary that represents each course in `bkst_data`.
- If there are not textbooks for a given course, the `texts` list will be empty.
- We can find the length of a list using the `len` function.
- An empty list has a length of 0.
- This exercise requires you to use an `if` statement inside a `for` loop. 



### 2. Querying nested data

This next activity is a little more challenging. 

**Question**: _What is the cost of the most expensive textbook?_

#### Try it out!

Working with your team, write some code that will answer this question, using the`bkst_data` dataset. 



In [17]:
# Your code here

#### Hint
Here are some hints upfront. As before, make sure you discuss the logical approach with your team before writing any code.

- In a variation on the "counter" pattern we've used above, instead of incrementing the counter variable, we simply replace it whenever a certain condition is met. In other words, we can use a counter variable to keep track of the most expensive textbook we've seen so far, as we loop through the list.


- Since the textbooks for each course are in a list nested within the course dictionary, we'll need to use **nested** `for` loops. Below is an example -- merely to illustrate the concept -- that multiples each of the first three positive numbers by each of the second three (and prints the products):
```
for i in [1, 2, 3]:
    for j in [4, 5, 6]:
        print(i*j)        # Output: 4, 5, 6, 8, 10, 12, 12, 15, 18
```


- In the homework, you wrote some code to convert a book price from its string representation to a `float`. You'll need to do the same thing here in order to compare prices. 

  For example, run this expression in a code cell below: `'$100.99' > '$11.99'`. The output should be `False`. Can you guess why?
  


### 3. More queries

**Questions**

1. Can you and your team adapt the preceding solution so as to identify _the course_ with the most expensive textbook? 



2. What about the _title_ and _author_ of the most expensive textbook?

#### Try it out!

Working with your team, write some code that will answer these questions. If you want some hints, please just ask one of the facilitators!



In [22]:
# Your code here

In [23]:
# Your code here

### Wrap up

Today you did the following:

- Used Python dictionaries to store and update data about your team.
- Used conditionals, `for` loops, and the "counter" pattern to answers basic questions about the textbook dataset. 

In the homework tonight, you'll build on this knowledge to practice more ways of interacting with the textbook data. You'll also some learn some new tools to help you take your 

Tonight's homework has two parts:

1. A section with self-guided exercises for you to do on your own. In your teams tomorrow, you'll build on the skills introduced in these exercises. 
2. A shorter section of autograded exercises which you will submit to our GitHub Classroom site to record your score. You can resubmit the exercises as many times as you like. Those who successfully complete the autograded homework exercises will receive a certificate of completion at the end of Python Camp. 