<a href="https://colab.research.google.com/github/organisciak/Scripting-Course/blob/master/labs/02-database-intropython-lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab Part 1: SQLite

*Reminder - save your work. Go to* File > Save a Copy in Drive *to ensure that you have your work saved.*

For this portion of the lab, we won't use Python. Instead, we'll work directly with SQLite. Luckily, there's a way that we could still use *Colab*.

1. To tell a *Colab* notebook that you'd like to talk to SQLite without using Python, run the following code:

In [None]:
%load_ext sql
%sql sqlite://

That activates a special command to connect to SQLite from *Colab*, loads it, and connects to a temporary database.

Now, if you have `%%sql` at the start of a code cell, it will run SQL in your connected database.

*Tip*: If you want your DB to be save in a file, you can give it a name when you connect, like this: `%sql sqlite:///name.db`.

## Create a Table, Insert Data, and Select in SQLite

Here is a small dataset of British cotton workers in 1886, and their average wage:

worker | num_workers |	wage
-------|---- --------|--------
Big piecer |	5902 |	233.59
Engineman|	909|	388.47
Foreman	|2883|	466.54
Grinders|	1983|	399.9
Labourer|	208|	269.73
Mechanic|	669|	440.82
Others	|2966|	311.64
Sizer	|597|	469.62
Spinner	|6951|	408.97
Twister|	865|	357.2

Let's consider how to put this into a database. 

First, you need to CREATE TABLE, then INSERT records. To see the records, use SELECT.

The following has three commands.

In [None]:
%%sql
CREATE TABLE worker_wages (role, num_workers, wage);

 * sqlite://
Done.


[]

In [None]:
%%sql
INSERT INTO worker_wages VALUES
    ('Big piecer', 5902, 233.59),
    ('Engineman', 909, 388.47),
    ('Foreman', 2883, 466.54),
    ('Grinders', 1983, 399.9),
    ('Labourer', 208, 269.73),
    ('Mechanic', 669, 440.82),
    ('Others', 2966, 311.64),
    ('Sizer', 597, 469.62),
    ('Spinner', 6951, 408.97),
    ('Twister', 865, 357.2);

 * sqlite://
10 rows affected.


[]

*Consider the commands above. What happens if you try to run the cell again? What if you change `CREATE TABLE` TO `CREATE TABLE IF NOT EXISTS`? What do you think `LIMIT 2` did in the last command? (we haven't learned it yet!)*

*Tip: Have you inserted your data more than once? You can always delete the table and start from the beginning - `DROP TABLE worker_wages;`*

*Tip 2: If you get an 'This result object does not return rows' error - don't worry: your query still ran.*

---

Let's see **all** of our results:

In [None]:
%%sql
SELECT * FROM worker_wages;

 * sqlite://
Done.


role,num_workers,wage
Big piecer,5902,233.59
Engineman,909,388.47
Foreman,2883,466.54
Grinders,1983,399.9
Labourer,208,269.73
Mechanic,669,440.82
Others,2966,311.64
Sizer,597,469.62
Spinner,6951,408.97
Twister,865,357.2


It's in! Let's look for wages under 350:

In [None]:
%%sql
SELECT * FROM worker_wages WHERE wage < 350;

 * sqlite://
Done.


role,num_workers,wage
Big piecer,5902,233.59
Labourer,208,269.73
Others,2966,311.64


You can use `AND` to combine `WHERE` clauses. E.g. Workers that number over 1000, which make over 300£. 

In [None]:
%%sql
SELECT * FROM worker_wages WHERE wage > 300 AND num_workers > 1000;

 * sqlite://
Done.


role,num_workers,wage
Foreman,2883,466.54
Grinders,1983,399.9
Others,2966,311.64
Spinner,6951,408.97


One final clause: `DROP TABLE`. If you mess something up, you can always drop the table and start again. Be careful doing this in a real, important database, there is no _UNDO_!

Here's how you would run `DROP TABLE`:

```sql
%%sql
DROP TABLE worker_wages;
```

No need to drop it now, since the next exercise question uses the `worker_wages` question.

## Exercises

**Q1**: *(5pts)* Write the SQL to add these two rows to the `worker_wages` table:
        
| worker | num_workers |	wage |
|--------|-------------|-------|
|Drawer |	375|	328.98|
|Warehouseman|	1586|	308.73|

In [None]:
# Scratchpad / Workspace

In [None]:
# Answer-Q1 (Write your answer here, but keep this line at the top)

In [None]:
# Grade check
#@markdown *Check your answer*: Run this cell to check if you did Q1 properly
sqlresults = %sql SELECT * FROM worker_wages
df = sqlresults.DataFrame();
for role, num_workers, wage in [('Drawer',  375, 328.98), ('Warehouseman', 1586, 308.73)]:
  subset = df[df.role == role]
  assert subset.shape[0] > 0, "I don't see '{}' in the data.".format(role)
  assert subset.shape[0] == 1, ('''I see '{}' more than once. Did you add the
  data more than once? (This doesn't affect this question - you probably did
   it right, but ran it twice, which is something to look out for. Try
   dropping the table and creating it anew).'''.format(role))
  assert subset.iloc[-1].num_workers == num_workers, "Num workers doesn't seem right for {}".format(role)
  assert subset.iloc[-1].wage == wage, "Num workers doesn't seem right for {}".format(role)
print("It works. Good work!".upper())


**Q2** *(5pts)*: What's wrong with this SQL?
```sql
    INSERT INTO worker_wages VALUES (Weaver, 8577, 273.97);
```

In [None]:
q2_answer = "" #@param {type:"string"}

Consider the following dataset, of people's heights and weights, as well as their reported heights and weights:

![](https://github.com/organisciak/Scripting-Course/blob/master/images/week2-heights.png?raw=1)

- __Q3__ *(5pts)*: What's the SQL to create the table for this dataset? Include appropriate data types for the columns and call it 'heights'.

In [None]:
# Answer-Q3 (Write your answer here)


- **Q4**: To answer each of the following questions, fill in the blanks:

```sql
    SELECT * FROM heights
    WHERE {{ blank }}
```

How would you select:

In [None]:
#@markdown - **Q4a**: *(5pts)* The men that say they're taller than 100cm?
q4a_answer = "" #@param {type:'string'}
#@markdown - **Q4b**: *(5pts)* The people that say they're taller than they are?
q4b_answer = "" #@param {type:'string'}
#@markdown - **Q4c**: *(10pts)* The women that overestimate their weight and underestimate their height?
q4c_answer = "" #@param {type:'string'}

In [None]:
#@markdown Tips:
#@markdown - Don't know how to combine conditions in a `WHERE` clause? It was introduced before the exercises, go back to check!
#@markdown - Think about what is being compared in the where clause. How many comparisons are needed, connected by an 'AND'? What operators are needed to make those comparisons? What value or variable goes on each side of the operator.
#@markdown - If you want to test it, you can add a few dummy rows, or run this 
#@markdown cell, which will download the table and insert it into a `heights` table.
from sqlalchemy import create_engine
import pandas as pd
import os
if not os.path.exists('week1.db'):
    data = pd.read_csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/carData/Davis.csv')
    engine = create_engine('sqlite:///week1.db')
    data.to_sql('heights', engine, index=False)
    %sql sqlite:///week1.db

'Connected: @week1.db'

**Q5** *(5pts)*: Describe in regular English what the following query does:

```sql
SELECT author,book FROM books
    WHERE release_year > 2000
        AND sales > 100000
        AND author != 'J.K. Rowling'
```

In [None]:
q5_answer = "" #@param {type:"string"}

# Part 2: More Python

Today, we're working on more Python basics, as we build toward more our SciPy stack of data science tools.

## Loops

Recall the `list` type, created like this:

In [None]:
fruits = ['apple', 'banana', 'strawberry', 'mango']

If we wanted to run through each value of the list, we can use a `for` loop.

```python
for value in list:
    do_something
```

For example:

In [None]:
for fruit in fruits:
    print(fruit)

apple
banana
strawberry
mango


A number of things happened here. Note:

- The `for` loop runs four times: once for each value of the `fruits` list.
- At the start of each time the loop is run, a new value of `fruits` is assigned to the temporary variable `fruit`. Essentially, the code above is running the following commands:

```python
fruit = 'apple'
print(fruit)
fruit = 'banana'
print(fruit)
fruit = 'strawberry'
print(fruit)
fruit = 'mango'
print(fruit)
```

- `print(...)` will print any variable to the screen.
- We use indentation to show that `print(fruit)` is part of the `for` loop. Python will run through all the indented code before moving on. See:

In [None]:
for fruit in fruits:
    print(fruit)

print('this prints after the loop')

apple
banana
strawberry
mango
this prints after the loop


- There can be multiple lines inside the loop:

In [None]:
for word in ['hello', 'world']:
    x = word.capitalize()
    print(x)

Hello
World


## New Data Type: boolean

The boolean type is simply a `True` or `False` value. Like integers and floating point numbers, you don't need to put quotation marks around `True` or `False`.

In [None]:
newtype = True
newtype

True

In [None]:
newtype = False
newtype

False

## Comparisons

There are a set of symbols that are used to compare two values. Usually, these return a boolean value: if the comparison statement is `True` or `False`. These are _logical operators_.

The most basic operator is for equality, the `==` sign. For example:

In [None]:
1 == 1

True

In [None]:
'hello' == 'hello'

True

In [None]:
'hello' == 'world'

False

In [None]:
[1, 2, 3] == [1, 2, 3]

True

Other logical operators include:

- `!=` - Not equal
- `<` - Less than
- `>` - Greater than
- `<=` - Less than or equal to
- `>=` - Greater than or equal to

Each of these is used the same as the equality operator. 

__Q6__: *(5pts)* Write a loop that prints the numbers from 1 to 5.

In [None]:
# Answer-Q6 (write your answer here)


**Q7**: *(5pts)* Set a variable `x` to 0, then write a loop to add the following numbers to it: 1,1,2,3,5,8,13,21. This one can be tricky - I've started it for you.

In [None]:
# Answer-Q7 (write your answer here)
x = 0
numbers_to_add = [1,1,2,3,5,8,13,21]

In [None]:
#@markdown *Run this to check your **Q7** work*
assert (x != 0), "The final value of x is 0. You didn't add anything to it!"
assert (x != 21), "The final value of x is 21 (the last number) - that means you're not adding to x each loop."
assert type(x) is int, "x should be a number, but your response is type: {}".format(type(x))
assert (x == 54), "x should be 54, you have {}".format(x)
print("It works, Good work!".upper())

In [None]:
#@markdown __Q8__: True or False
#@markdown - __a__: *(4pts)* "hello"+"world" == "hello world"
#@markdown - __b__: *(4pts)* 2 != 3
#@markdown - __c__: *(4pts)* 'a' < 'b'

q8a_answer = "" #@param ["", "True", "False"]
q8b_answer = "" #@param ["", "True", "False"]
q8c_answer = "" #@param ["", "True", "False"]
for answer in [q8a_answer, q8b_answer, q8c_answer]:
  assert (answer != ""), "You have blank answers"

In [None]:
#@markdown __Q9a__: *(4pts)* In Python, are `1` (an integer) and `'1'` (a string) equivalent?
#@markdown  - __b__: *(4pts)* What about `1` and `1.0`?
q9a_answer = "" #@param ["", "Yes", "No"]
q9b_answer = "" #@param ["", "Yes", "No"]
for answer in [q9a_answer, q9b_answer]:
  assert (answer != ""), "You have blank answers"


 
__Q10__: *(8pts)* Functionally, what is the difference between the following two code blocks?

_Code A_
```
s = 'hello world'
s == 'hello moon'
```

_Code B_
```
s = 'hello world'
s = 'hello moon'
```

In [None]:
q10_answer = "" #@param {type:"string"}
assert q10_answer != "", "Your answer is blank"

In [None]:
#@markdown __Q11__: *(7pts)* True or False: A list cannot have another list inside it.
q11_answer = "" #@param ["", "True", "False"]
assert q11_answer != "", "Your answer is blank"

## Colab

Hopefully by this week, you are growing more comfortable with starting Notebooks and adding/editing cells. Remember that the keyboard shortcuts are invaluable: running a cell with `Ctrl+Enter`, or adding a new cell below with `Ctrl+M B` (in command mode).

Two tricks to try this week: autocompletion and retrieving documentation.

**Autocomplete**

If you start typing a known object or function into Colab, you can pause for a moment and it will show all the options - then you can press `TAB` or `ENTER` to finish it. This is especially useful for seeing what functions are available.

In [None]:
test = "this is a string"

Above, I've set a string to `test`. If I type `te` on a new line, pause very briefly, then press tab, it will complete the word. This is especially useful for long variable names that you don't want to keep typing. If there are multiple candidates for auto-completion, it will show a scrollable list of options.

The `test` variable is a string. To see what options there are for acting upon a string, try typing `test.` (with the period) then pausing for autocomplete. Magic!

![Auto-fill](https://github.com/organisciak/Scripting-Course/blob/master/images/autofill.png?raw=1)

**Documentation reference**

If you want to look up information about a function, you can precede the code running that function with a `?`. For example, if I want to learn how I would use `split()` on `test`, I can type:

In [15]:
?test.split

This will open a panel that looks like this in Colab:

![Info](https://github.com/organisciak/Scripting-Course/blob/master/images/info.png?raw=1)

The documentation is only as good as what the library is documented, so some libraries might be more or less detailed in this feature.

In [None]:
#@markdown - **Q12**: *(5pts)* What does `test.isalpha()` do? Fill in the blank: "Return True if {{   }}, False otherwise."

q12_answer = "" #@param {type: 'string'}

- **Q13**: *(10pts)* Strings have access to a function (whose name starts with a `ce`) that will let you change "HEADING" to "====HEADING====" (that is, padding with `=` to make the string 15 characters wide). What's the code to do that? (tip: this is an auto-fill question!)

In [None]:
# Answer-Q13


## Summary

- SQLite
    - Connecting to a simple database, via notebook (without Python) or command line
- SQL
    - `CREATE TABLE`
    - `DROP TABLE`
    - `SELECT`
    - `INSERT`
    - `WHERE` clause
- Python
    - Logical Operators
        - `==`, `!=`, `<`, `<=`, `>`, `>=`
    - `for` loops on arrays
    - `print()`
    - Tab indentation
    - boolean datatype: `True`, `False`
- Jupyter
    - Auto-complete
    - Documentation lookup

# Submission Instructions

In [None]:
#@markdown ### First, Enter your name for grading
my_name = "" #@param { type:'string' }

#@markdown _Have you saved your work for yourself? Don't forget to Save a Copy in Drive so that you have your progress._

In [None]:
#@markdown ### Second, check your work:

#@markdown - have you answered all the questions?
#@markdown - Does this notebook run from top to bottom?
#@markdown     - Go to "Runtime > Restart and run all..." to check. Do all the cells run, to the very bottom, or is there a cell in the middle with an error?
#@markdown - Have you completed all the answers where you entered code, keeping the `# Answer-Qx` line at the start of those cells?

#@markdown *A lab that the professor has to fix manually will lose 10pts - run the checks!*