# Selecting with SQL 

### Introduction

Now that we have seen how to create and populate some tables in our database, it's time to learn how to query our data.  As data scientists, querying data is the most useful skill we'll have.  Let's get to it.

### Setting up our Data

In [1]:
import pandas as pd

In [2]:
import sqlite3

In [3]:
conn = sqlite3.connect('mean_green.db')

Now that we've created our table, let's insert some data.

In [4]:
url = "https://raw.githubusercontent.com/eng-6-22/mod-1-sql-curriculum/master/1-sql-fundamentals/7-sql-select/employees.csv"

In [5]:
df = pd.read_csv(url)

In [6]:
df.to_sql('employees', conn, index = False)

Let's get started by setting up our data.

Above we called the `execute` method to insert the data.  But, because inserting data changes our database, we need to tell sqlite3 that we are certain we want to do this, by then `committing` our command.

### Selecting All Data

With tables created, and data inserted, it's now time to select our data.  We retreive data from our database with a SQL select statement.  Now SELECT statements can become increasingly more complex, but let's start with our simplest select statement.

```sql
SELECT * FROM employees;
```

To execute this SQL statement with Python we do the following: 

In [7]:
cursor = conn.cursor()

In [8]:
cursor.execute('SELECT * FROM employees;')
cursor.fetchall()
# [(None, 'fred', '555-333-4444', '10001', '8/17/1990'),
#  (None, 'bob', '555-331-4444', '10002', '3/22/1992'),
#  (None, 'sally', '555-332-4444', '10001', '6/22/1991')]

[(1, 'fred', '555-333-4444', 10001, '8/17/1990'),
 (2, 'bob', '555-331-4444', 10002, '3/22/1992'),
 (3, 'sally', '555-332-4444', 10001, '6/22/1991')]

So we can see what this does.  The statement selects all of the rows, and all of the columns from our employees table.  And we did this with the line: 

```sql
SELECT * FROM employees;
```

So now let's break this statement down.  We begin each query with the SQL keyword `SELECT`.  Then if we would like to return all of the columns in each row, we follow SELECT with a `*`.  Finally, we indicate which table we should select these rows from with the `FROM table_name`.  Here our table name is employees.  So to select all of the columns and all of the rows from a table the pattern is:

```sql
SELECT * FROM table_name;
```

### Selecting specific columns

Now that we've seen how to select *all* of the data from a table, let's see how to limit the data that we select.  Let's say that we would like to retrieve just the name from a customer.

In [9]:
sql_select_name = 'SELECT name FROM employees;'
cursor.execute(sql_select_name)
cursor.fetchall()
# [('fred',), ('bob',), ('sally',)]

[('fred',), ('bob',), ('sally',)]

> So above, we changed `SELECT *`, which selects all of the columns, to `SELECT name` to only retrieve the name from each row.

Now let's select just the phone number and the zip code.

In [10]:
sql_select_name = 'SELECT phone_number, zipcode FROM employees;'
cursor.execute(sql_select_name)
cursor.fetchall()
# [('555-333-4444', '10001'),
#  ('555-331-4444', '10002'),
#  ('555-332-4444', '10001')]

[('555-333-4444', 10001), ('555-331-4444', 10002), ('555-332-4444', 10001)]

> So above we can see that if we would like to select more than one column from each row, we simply separate each column name by a comma.

And once again if we would like to select all of the columns, we replace the column names with `*`.

In [11]:
sql_select_name = 'SELECT * FROM employees;'
cursor.execute(sql_select_name)
cursor.fetchall()

# [(None, 'fred', '555-333-4444', '10001', '8/17/1990'),
#  (None, 'bob', '555-331-4444', '10002', '3/22/1992'),
#  (None, 'sally', '555-332-4444', '10001', '6/22/1991')]

[(1, 'fred', '555-333-4444', 10001, '8/17/1990'),
 (2, 'bob', '555-331-4444', 10002, '3/22/1992'),
 (3, 'sally', '555-332-4444', 10001, '6/22/1991')]

## Selecting specific rows

So above we saw how to specify which columns we would like to retrieve.  What if we would like to specify the rows.  Well, to do this we tell SQL to only select rows that match a specific criteria.  For example, below we tell SQL to only return the rows that have a name of `fred`.

In [12]:
sql_select_name = "SELECT * FROM employees WHERE name = 'fred';"
cursor.execute(sql_select_name)
cursor.fetchall()

# [(None, 'fred', '555-333-4444', '10001', '8/17/1990')]

[(1, 'fred', '555-333-4444', 10001, '8/17/1990')]

So we asked SQL to select all of the columns, and all of the rows where the name column equals "fred".  Above we said that we want to match the entire string, 'fred'.  Let's do one more where we select all that match a specified zip code.

In [13]:
sql_select_name = "SELECT * FROM employees WHERE zipcode = '10001';"
cursor.execute(sql_select_name)
cursor.fetchall()

# [(None, 'fred', '555-333-4444', '10001', '8/17/1990'),
#  (None, 'sally', '555-332-4444', '10001', '6/22/1991')]

[(1, 'fred', '555-333-4444', 10001, '8/17/1990'),
 (3, 'sally', '555-332-4444', 10001, '6/22/1991')]

So here we can see that only the two rows that have a zip code of `10001` are a match.  Now if we only want to return the names of employees in the zip code of `10001` we do the following:

In [14]:
sql_select_name = "SELECT name FROM employees WHERE zipcode = '10001';"
cursor.execute(sql_select_name)
cursor.fetchall()

# [('fred',), ('sally',)]

[('fred',), ('sally',)]

So here let's think about how this SQL statement worked.  SQL went to the `employees` table, and found each of the rows with a matching zipcode column of `10001`, and then from each of those matching rows, returned the column of `name`. 

### Complex Where Statements

Finally, let's tack onto our WHERE statements by adding some logic to them with AND and OR statements.  These work as we would expect.

With an `AND` statement, SQL will return the rows that match **all** of the specified criteria.

In [15]:
sql_select_name = "SELECT * FROM employees WHERE zipcode = '10001' AND name = 'fred';"
cursor.execute(sql_select_name)
cursor.fetchall()

[(1, 'fred', '555-333-4444', 10001, '8/17/1990')]

In [16]:
sql_select_name = "SELECT * FROM employees WHERE zipcode = '10002' AND name = 'fred';"
cursor.execute(sql_select_name)
cursor.fetchall()

[]

And with an or statement, SQL returns the rows that match **any** of the specified criteria. 

In [None]:
sql_select_name = "SELECT * FROM employees WHERE zipcode = '10002' OR name = 'fred';"
cursor.execute(sql_select_name)
cursor.fetchall()

# [(None, 'fred', '555-333-4444', '10001', '8/17/1990'),
#  (None, 'bob', '555-331-4444', '10002', '3/22/1992')]

[(None, 'fred', '555-333-4444', '10001', '8/17/1990'),
 (None, 'bob', '555-331-4444', '10002', '3/22/1992')]

So above, we return fred's row as the name `fred` is matched, and we return bob's row, as bob is in zipcode `10002`.

### Summary

In this lesson we saw how to query a table with SELECT statements.  We saw that a select statement follows the pattern of: 

```sql
SELECT columns FROM table_name WHERE column_name = criteria
```

**Selecting Columns**

If we would like to select all columns from matching rows, we use: 

`SELECT * FROM table_name`.  
And, if we would like to select multiple columns from matching rows, we use a comma separated list like:

`SELECT column_name, column_name from table_name`

**Selecting Rows**

We select specific rows with a WHERE statement where we specify what columns match specific criteria.  For example: 

`SELECT * FROM table_name WHERE column_name = criteria`

Finally, we can select multiple criteria with an `AND` or `OR` clause, like the following:  

`SELECT columns FROM table_name WHERE column_name = criteria OR column_name = criteria`