# OOP and SQL


This morning we will be using Object Oriented Programming to interface with the Chinook SQL Database.  It will be a two birds one stone situation. Bird one: Understanding of OOP; Bird two: SQL Practice.

Check out the entity relationship diagram below of the Chinook Database. The Chinook Database is a sample database created with data from the apple store [info](https://docs.yugabyte.com/latest/sample-data/chinook/) 
![Chinook Schema](images/schema.png)

We will use the sqlite3 package, which will connect to the database file stored in the data folder.

In [1]:
# SQL Connection and Querying
import sqlite3
# Data manipulation
import pandas as pd
# os is used to create paths to files
import os
# For testing code
from test_scripts.test_class import Test
test = Test()

We want to build a ```Chinook``` class that will allow us to easily access information in our database without having to write sql queries every time. We can do this with *attributes* and *methods*.

In the cell below describe the difference between an attribute and a method.

-------Use this markdown cell to describe the difference between attributes and methods---------

Our class should have an attribute called ```tables``` that returns a list of tables within the database.

<u><b>Let's review the code for collecting this information.</b></u>

To collect the table names from a sqlite database, we can do the following:

### 1) Open up a connection to our database

In [2]:
path = os.path.join('data', 'chinook.db')
conn = sqlite3.connect(path)

### 2) Create a cursor for our database
>Note: A cursor does not need to be created when using ```pd.read_sql```

>But depending on the use case for your code, pandas is not always the best choice!

In [3]:
cursor = conn.cursor()

### 3) Execute a sql query
sqlite_master is a table in all sqlite databases which describes the database schema.

In [4]:
# This query should job your memory about how SQL syntax
cursor.execute('''SELECT name FROM sqlite_master
                                        WHERE
                                        type = 'table'
                                        AND
                                        name NOT LIKE 'sqlite_%';''').fetchall()

[('albums',),
 ('artists',),
 ('customers',),
 ('employees',),
 ('genres',),
 ('invoices',),
 ('invoice_items',),
 ('media_types',),
 ('playlists',),
 ('playlist_track',),
 ('tracks',)]

As you can see this returns a list of tuples. 

<u>For convenience, we will use list comprehension to change this to a basic list.</u>

In [5]:
# NOT LIKE 'sqlite_%' ignores sqlite_sequence and sqlite_stat1 tables
tables = cursor.execute('''SELECT name FROM sqlite_master
                                        WHERE
                                        type = 'table'
                                         AND
                                        name NOT LIKE 'sqlite_%'
                                       ;''').fetchall()

tables = [table[0] for table in tables]
tables

['albums',
 'artists',
 'customers',
 'employees',
 'genres',
 'invoices',
 'invoice_items',
 'media_types',
 'playlists',
 'playlist_track',
 'tracks']

**Much better**

# Task 1

In the cell below, let's create a class called ```Chinook```.

The class should have an ```__init__()``` method.

>Hint: *methods* are just functions inside classes with ```self``` as the first argument of the function.

>**Example:** 

```class NameOfClass():
    def name_of_method(self, other_arguments_if_needed):
        code here
```
        

The ```__init__()``` method should have two arguments:
1. ```self```
2. ```database_path```

Within the ```__init__()``` method:
1. A connection should be opened up to the database using the ```database_path``` variable and saved as a attribute.
2. A cursor attribute should be created.
3. A tables attribute should be created. 

The code to create the  ```tables``` attribute will be almost identical to the code up above. 

The main difference is that the final tables variable should look like this: ```self.tables```.

##  Take 5 minutes with a partner.

In [6]:
# Your code here
class Chinook():
    def __init__():
        pass

In [7]:
#__SOLUTION__
class Chinook():
    def __init__(self, database_path):
        self.conn = sqlite3.connect(database_path)
        self.cursor = self.conn.cursor()

        tables = self.cursor.execute('''SELECT name FROM sqlite_master
                                        WHERE
                                        type = 'table'
                                        AND
                                        name NOT LIKE 'sqlite_%';''').fetchall()
        self.tables = [x[0] for x in tables]


**Let's test your class!**

In [8]:
path = os.path.join('data', 'chinook.db')
data = Chinook(path)
test.run_test(data.tables, 'tables')

✅ **Hey, you did it.  Good job.**

# Task 2

**Let's add a *method* to our class called ```search_employees```.**

This method should use ```pd.read_sql``` to return a dataframe with a single row for the employee you search for.

<u>```search_employees``` should receive three arguments.</u>
1. ```self```
2. The firstname of an employee.
3. The lastname of an employee.

If the employee is not found, the method should return the string ```'Employee was not found.'``` '

**Hint**: Use f-strings in combination with a sql statement.  Interpolate the arguments given to the method into f-string via the curly braces { }. 

## Take 5 minutes with your partner

In [9]:
# Your code here

In [10]:
#__SOLUTION__
class Chinook():
    def __init__(self, database_path):
        self.conn = sqlite3.connect(database_path)
        self.cursor = self.conn.cursor()

        tables = self.cursor.execute('''SELECT name FROM sqlite_master
                                        WHERE
                                        type = 'table'
                                        AND
                                        name NOT LIKE 'sqlite_%';''')
        self.tables = [x[0] for x in tables]
    
    # Solution
    def search_employee(self, firstname, lastname):
        
        result = pd.read_sql(f'''SELECT * FROM employees
                    WHERE FirstName = '{firstname}'
                    AND LastName = "{lastname}"
                    ''', con = self.conn
                    )

        if len(result) == 0:
            return "Employee was not found."
        else:
            return result 

**Let's test your code on an existing employee!**

In [11]:
data = Chinook(path)
test.run_test(data.search_employee('Jane', 'Peacock'), 'employee1')

✅ **Hey, you did it.  Good job.**

**Now let's test on a nonexistant employee!**

In [12]:
test.run_test(data.search_employee("Joe", "Shmo"), 'employee2')

✅ **Hey, you did it.  Good job.**

# Task 3: Query Method



Add a method called `query` that takes `self` and `query_string` as arguments.   The method should then use pd.read_sql and feed it arguments of 1) query_string 2) the connection to the database defined in the __init__ (think self). The method should return a dataframe.

## Take 5 minutes with your partner

In [13]:
# your code here


In [14]:
#__SOLUTION__
class Chinook():
    def __init__(self, database_path):
        self.conn = sqlite3.connect(database_path)
        self.cursor = self.conn.cursor()

        tables = self.cursor.execute('''SELECT name FROM sqlite_master
                                        WHERE
                                        type = 'table'
                                        AND
                                        name NOT LIKE 'sqlite_%';''')
        self.tables = [x[0] for x in tables]
    ##### Solution to task 3
    def query(self, query_string):
        
        return pd.read_sql(query_string, self.conn)
    
# --> Your code here <-------------
    def search_employee(self, firstname, lastname):
        
        result = pd.read_sql(f'''SELECT * FROM employees
                    WHERE FirstName = '{firstname}'
                    AND LastName = "{lastname}"
                    ''', con = self.conn
                    )

        if len(result) == 0:
            return "Employee was not found."
        else:
            return result 


# Task 4: Use the query method to find unique genres in the dataframe

Write a query that selects all unique genres from the genre table ordered alphabetically. Because the `query` method uses pd.read_sql, the result should be a dataframe

SQL Hint: DISTINCT

# Take 5 minutes with your partner

In [15]:
genre_query = '''Query here'''

In [16]:
#__SOLUTION__

genre_query = '''SELECT DISTINCT(name) as genre_name
                FROM genres
                ORDER BY genre_name ASC'''


The next cell instantiate an instance of the class, and tests out the query using the `query` method.


In [17]:
chinook = Chinook(path)

genre_result = chinook.query(genre_query)

In [18]:
assert genre_result.shape[0] == 25

# Task 5: Genres attribute

Within the __init__ method, create a `genres` attribute which is a list of all unique genres in the database. Use the same query as above, but **don't** use the query method.  Execute a sql statement in the init function using self.cursor.execute. Then, use a list comprehension to create an attribute that is a **list** of all unique genres.

## Take 5 minutes with your partner

In [19]:
# your code here

In [20]:
#__SOLUTION__
class Chinook():
    def __init__(self, database_path):
        self.conn = sqlite3.connect(database_path)
        self.cursor = self.conn.cursor()

        tables = self.cursor.execute('''SELECT name FROM sqlite_master
                                        WHERE
                                        type = 'table'
                                        AND
                                        name NOT LIKE 'sqlite_%';''')
        self.tables = [x[0] for x in tables]
        
        # Solution for task 5
        genres = self.cursor.execute('''SELECT DISTINCT(name) as genre_name
                                        FROM genres
                                        ORDER BY genre_name ASC''').fetchall()
        
        self.genres = [genre[0] for genre in genres]
     
    def query(self, query_string):
        
        return pd.read_sql(query_string, self.conn)
    
    def search_employee(self, firstname, lastname):
        
        result = pd.read_sql(f'''SELECT * FROM employees
                    WHERE FirstName = '{firstname}'
                    AND LastName = "{lastname}"
                    ''', con = self.conn
                    )

        if len(result) == 0:
            return "Employee was not found."
        else:
            return result 


In [21]:
chinook = Chinook(path)
assert type(chinook.genres==list)
assert len(chinook.genres) == 25
print("nice job")

nice job
