<a href="https://colab.research.google.com/github/mcgmed/SQL/blob/main/SQLite_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import sqlite3
import pandas as pd
con = sqlite3.connect('/content/sample_data/chinook.db')
cur = con.cursor()

In [None]:
res = cur.execute('SELECT name FROM sqlite_master') # to see all the table names in the dataset.
res.fetchall()

[('albums',),
 ('sqlite_sequence',),
 ('artists',),
 ('customers',),
 ('employees',),
 ('genres',),
 ('invoices',),
 ('invoice_items',),
 ('media_types',),
 ('playlists',),
 ('playlist_track',),
 ('sqlite_autoindex_playlist_track_1',),
 ('tracks',),
 ('IFK_AlbumArtistId',),
 ('IFK_CustomerSupportRepId',),
 ('IFK_EmployeeReportsTo',),
 ('IFK_InvoiceCustomerId',),
 ('IFK_InvoiceLineInvoiceId',),
 ('IFK_InvoiceLineTrackId',),
 ('IFK_PlaylistTrackTrackId',),
 ('IFK_TrackAlbumId',),
 ('IFK_TrackGenreId',),
 ('IFK_TrackMediaTypeId',),
 ('sqlite_stat1',)]

In [None]:
res = cur.execute('PRAGMA table_info(albums)') # to see all the column names and specifications in the table.
res.fetchall()

[(0, 'AlbumId', 'INTEGER', 1, None, 1),
 (1, 'Title', 'NVARCHAR(160)', 1, None, 0),
 (2, 'ArtistId', 'INTEGER', 1, None, 0)]

In [None]:
res = cur.execute('PRAGMA table_info(artists)') # to see all the column names and specifications in the table.
res.fetchall()

[(0, 'ArtistId', 'INTEGER', 1, None, 1),
 (1, 'Name', 'NVARCHAR(120)', 0, None, 0)]

## EXCEPT

SQLite EXCEPT operator compares the result sets of two queries and returns distinct rows from the left query that are not output by the right query.

In [None]:
query = """SELECT ArtistId
           FROM artists
           EXCEPT
           SELECT ArtistId
           FROM albums"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId
0,25
1,26
2,28
3,29
4,30
...,...
66,192
67,193
68,194
69,195


## INTERSECT

SQLite INTERSECT operator compares the result sets of two queries and returns distinct rows that are output by both queries.

In [None]:
query = """SELECT CustomerId
           FROM customers
           INTERSECT
           SELECT CustomerId
           FROM invoices
           ORDER BY CustomerId"""
data = pd.read_sql(query, con)
data

Unnamed: 0,CustomerId
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,10


## SUBQUERY

A subquery is a SELECT statement nested in another statement. You must use a pair of parentheses to enclose a subquery. Note that you can nest a subquery inside another subquery with a certain depth. The following statement returns all the tracks in the album with the title  Let There Be Rock

In [None]:
query = """SELECT trackid, name, albumid
           FROM tracks
           WHERE albumid = (SELECT albumid FROM albums WHERE title = 'Let There Be Rock')"""
data = pd.read_sql(query, con)
data

Unnamed: 0,TrackId,Name,AlbumId
0,15,Go Down,4
1,16,Dog Eat Dog,4
2,17,Let There Be Rock,4
3,18,Bad Boy Boogie,4
4,19,Problem Child,4
5,20,Overdose,4
6,21,Hell Ain't A Bad Place To Be,4
7,22,Whole Lotta Rosie,4


The following query returns the customers whose sales representatives are in Canada.

In [None]:
query = """SELECT customerid, firstname, lastname
           FROM customers
           WHERE supportrepid IN (SELECT employeeid FROM employees WHERE country = 'Canada')"""
data = pd.read_sql(query, con)
data

Unnamed: 0,CustomerId,FirstName,LastName
0,1,Luís,Gonçalves
1,2,Leonie,Köhler
2,3,François,Tremblay
3,4,Bjørn,Hansen
4,5,František,Wichterlová
5,6,Helena,Holý
6,7,Astrid,Gruber
7,8,Daan,Peeters
8,9,Kara,Nielsen
9,10,Eduardo,Martins


In [None]:
query = """SELECT	AVG(album.size)
           FROM	(SELECT SUM(bytes) AS size 
                 FROM	tracks 
                 GROUP BY	albumid) AS album"""
data = pd.read_sql(query, con)
data

Unnamed: 0,AVG(album.size)
0,338288900.0


The following query uses a correlated subquery to return the albums whose size is less than 10MB.

In [None]:
query = """SELECT albumid, title
           FROM albums
           WHERE 10000000 > (SELECT sum(bytes) FROM tracks WHERE tracks.AlbumId = albums.AlbumId)
           ORDER BY title"""
data = pd.read_sql(query, con)
data

Unnamed: 0,AlbumId,Title
0,296,"A Copland Celebration, Vol. I"
1,285,A Soprano Inspired
2,307,"Adams, John: The Chairman Dances"
3,272,Adorate Deum: Gregorian Chant from the Proper ...
4,273,Allegri: Miserere
...,...,...
77,252,Un-Led-Ed
78,275,Vivaldi: The Four Seasons
79,287,Wagner: Favourite Overtures
80,334,Weill: The Seven Deadly Sins


The following query uses a correlated subquery in the SELECT clause to return the number of tracks in an album.

In [None]:
query = """SELECT albumid, title, (SELECT count(trackid) FROM tracks WHERE tracks.AlbumId = albums.AlbumId) AS tracks_count
           FROM albums
           ORDER BY tracks_count DESC"""
data = pd.read_sql(query, con)
data

Unnamed: 0,AlbumId,Title,tracks_count
0,141,Greatest Hits,57
1,23,Minha Historia,34
2,73,Unplugged,30
3,229,"Lost, Season 3",26
4,230,"Lost, Season 1",25
...,...,...,...
342,343,Respighi:Pines of Rome,1
343,344,Schubert: The Late String Quartets & String Qu...,1
344,345,Monteverdi: L'Orfeo,1
345,346,Mozart: Chamber Music,1


## CASE

The SQLite CASE expression evaluates a list of conditions and returns an expression based on the result of the evaluation.

The CASE expression is similar to the IF-THEN-ELSE statement in other programming languages.

You can use the CASE expression in any clause or statement that accepts a valid expression. For example, you can use the CASE expression in clauses such as WHERE, ORDER BY, HAVING, SELECT and statements such as SELECT, UPDATE, and DELETE.

Suppose, you have to make a report of the customer groups with the logic that if a customer locates in the USA, this customer belongs to the domestic group, otherwise the customer belongs to the foreign group.

To make this report, you use the simple CASE expression in the SELECT statement as follows:

In [None]:
query = """SELECT customerid, firstname, lastname, CASE country 
                                                     WHEN 'USA'
                                                       THEN 'Domestic'
                                                     ELSE 'Foreign'
                                                   END CustomerGroup
           FROM customers
           ORDER BY LastName, FirstName"""
data = pd.read_sql(query, con)
data

Unnamed: 0,CustomerId,FirstName,LastName,CustomerGroup
0,12,Roberto,Almeida,Foreign
1,28,Julia,Barnett,Domestic
2,39,Camille,Bernard,Foreign
3,18,Michelle,Brooks,Domestic
4,29,Robert,Brown,Foreign
5,21,Kathy,Chase,Domestic
6,26,Richard,Cunningham,Domestic
7,41,Marc,Dubois,Foreign
8,34,João,Fernandes,Foreign
9,30,Edward,Francis,Foreign


Suppose you want to classify the tracks based on its length such as less a minute, the track is short; between 1 and 5 minutes, the track is medium; greater than 5 minutes, the track is long.

To achieve this, you use the searched CASE expression as follows:

In [None]:
query = """SELECT	trackid, name, CASE
                                   WHEN milliseconds < 60000
                                     THEN	'short'
                                   WHEN milliseconds > 60000 AND milliseconds < 300000
                                     THEN 'medium'
                                   ELSE 'long'
                                 END category
           FROM	tracks"""
data = pd.read_sql(query, con)
data

Unnamed: 0,TrackId,Name,category
0,1,For Those About To Rock (We Salute You),long
1,2,Balls to the Wall,long
2,3,Fast As a Shark,medium
3,4,Restless and Wild,medium
4,5,Princess of the Dawn,long
...,...,...,...
3498,3499,Pini Di Roma (Pinien Von Rom) \ I Pini Della V...,medium
3499,3500,"String Quartet No. 12 in C Minor, D. 703 ""Quar...",medium
3500,3501,"L'orfeo, Act 3, Sinfonia (Orchestra)",medium
3501,3502,"Quintet for Horn, Violin, 2 Violas, and Cello ...",medium


## INSERT

First, specify the name of the table to which you want to insert data after the INSERT INTO keywords.
Second, add a comma-separated list of columns after the table name. The column list is optional. However, it is a good practice to include the column list after the table name.
Third, add a comma-separated list of values after the VALUES keyword. If you omit the column list, you have to specify values for all columns in the value list. The number of values in the value list must be the same as the number of columns in the column list.

In [None]:
query = """SELECT	* FROM	artists"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,1,AC/DC
1,2,Accept
2,3,Aerosmith
3,4,Alanis Morissette
4,5,Alice In Chains
...,...,...
270,271,"Mela Tenenbaum, Pro Musica Prague & Richard Kapp"
271,272,Emerson String Quartet
272,273,"C. Monteverdi, Nigel Rogers - Chiaroscuro; Lon..."
273,274,Nash Ensemble


In [None]:
cur.execute("INSERT INTO artists (name) VALUES ('Bud Powell')")
con.commit()

Because the ArtistId column is an auto-increment column, you can ignore it in the statement. SQLite automatically geneate a sequential integer number to insert into the ArtistId column.

In [None]:
query = """SELECT	* FROM	artists ORDER BY ArtistId DESC LIMIT 3"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,276,Bud Powell
1,275,Philip Glass Ensemble
2,274,Nash Ensemble


To insert multiple rows into a table:

In [None]:
cur.execute("""INSERT INTO artists (name)
               VALUES	('Buddy Rich'),	('Candido'),	('Charlie Byrd')""")
con.commit()

In [None]:
query = """SELECT	* FROM	artists ORDER BY ArtistId DESC LIMIT 6"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,279,Charlie Byrd
1,278,Candido
2,277,Buddy Rich
3,276,Bud Powell
4,275,Philip Glass Ensemble
5,274,Nash Ensemble


When you create a new table using the CREATE TABLE statement, you can specify default values for columns, or a NULL if a default value is not specified.

The third form of the INSERT statement is INSERT DEFAULT VALUES, which inserts a new row into a table using the default values specified in the column definition or NULL if the default value is not available and the column does not have a NOT NULL constraint.

In [None]:
cur.execute("""INSERT INTO artists DEFAULT VALUES""")
con.commit()

In [None]:
query = """SELECT	* FROM	artists ORDER BY ArtistId DESC LIMIT 6"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,280,
1,279,Charlie Byrd
2,278,Candido
3,277,Buddy Rich
4,276,Bud Powell
5,275,Philip Glass Ensemble


The default value of the ArtistId column is the next sequential integer. However, the name column does not have any default value, therefore, the INSERT DEFAULT VALUES statement inserts NULL  into it.

Suppose you want to backup the artists table, you can follow these steps:

First, create a new table named artists_backup as follows:

In [None]:
cur.execute("""CREATE TABLE artists_backup(ArtistId INTEGER PRIMARY KEY AUTOINCREMENT,
                                           Name NVARCHAR)""")
cur.execute("""INSERT INTO artists_backup
               SELECT ArtistId, Name
               FROM artists;""")
con.commit()

In [None]:
query = """SELECT	* FROM	artists_backup"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,1,AC/DC
1,2,Accept
2,3,Aerosmith
3,4,Alanis Morissette
4,5,Alice In Chains
...,...,...
275,276,Bud Powell
276,277,Buddy Rich
277,278,Candido
278,279,Charlie Byrd


## UPDATE

In [None]:
query = """SELECT	employeeid,	firstname, lastname, title,	email FROM employees"""
data = pd.read_sql(query, con)
data

Unnamed: 0,EmployeeId,FirstName,LastName,Title,Email
0,1,Andrew,Adams,General Manager,andrew@chinookcorp.com
1,2,Nancy,Edwards,Sales Manager,nancy@chinookcorp.com
2,3,Jane,Peacock,Sales Support Agent,jane@chinookcorp.com
3,4,Margaret,Park,Sales Support Agent,margaret@chinookcorp.com
4,5,Steve,Johnson,Sales Support Agent,steve@chinookcorp.com
5,6,Michael,Mitchell,IT Manager,michael@chinookcorp.com
6,7,Robert,King,IT Staff,robert@chinookcorp.com
7,8,Laura,Callahan,IT Staff,laura@chinookcorp.com


In [None]:
cur.execute("""UPDATE employees
               SET lastname = 'Smith'
               WHERE employeeid = 3""")
con.commit()

In [None]:
query = """SELECT	employeeid,	firstname, lastname, title,	email
           FROM employees
           WHERE employeeid = 3"""
data = pd.read_sql(query, con)
data

Unnamed: 0,EmployeeId,FirstName,LastName,Title,Email
0,3,Jane,Smith,Sales Support Agent,jane@chinookcorp.com


Update multiple columns example:

In [None]:
cur.execute("""UPDATE employees
               SET city = 'Toronto', state = 'ON', postalcode = 'M5P 2N7'
               WHERE employeeid = 4""")
con.commit()

In [None]:
query = """SELECT	employeeid,	firstname, lastname, state,	city,	PostalCode
           FROM employees
           WHERE employeeid = 4"""
data = pd.read_sql(query, con)
data

Unnamed: 0,EmployeeId,FirstName,LastName,State,City,PostalCode
0,4,Margaret,Park,ON,Toronto,M5P 2N7


In [None]:
query = """SELECT	employeeid,	firstname, lastname, email
           FROM employees"""
data = pd.read_sql(query, con)
data

Unnamed: 0,EmployeeId,FirstName,LastName,Email
0,1,Andrew,Adams,andrew@chinookcorp.com
1,2,Nancy,Edwards,nancy@chinookcorp.com
2,3,Jane,Smith,jane@chinookcorp.com
3,4,Margaret,Park,margaret@chinookcorp.com
4,5,Steve,Johnson,steve@chinookcorp.com
5,6,Michael,Mitchell,michael@chinookcorp.com
6,7,Robert,King,robert@chinookcorp.com
7,8,Laura,Callahan,laura@chinookcorp.com


To update one row in the employees table, you use LIMIT 1 clause. To make sure that you update the first row of employees sorted by the first name, you add the ORDER BY firstname clause.

The LOWER() function converts the email to lower case.

In [None]:
cur.execute("""UPDATE employees
               SET email = LOWER(firstname || "." || lastname || "@chinookcorp.com")
               ORDER BY	firstname
               LIMIT 1""")
con.commit()

In [None]:
query = """SELECT	employeeid,	firstname, lastname, email
           FROM employees"""
data = pd.read_sql(query, con)
data

Unnamed: 0,EmployeeId,FirstName,LastName,Email
0,1,Andrew,Adams,andrew.adams@chinookcorp.com
1,2,Nancy,Edwards,nancy@chinookcorp.com
2,3,Jane,Smith,jane@chinookcorp.com
3,4,Margaret,Park,margaret@chinookcorp.com
4,5,Steve,Johnson,steve@chinookcorp.com
5,6,Michael,Mitchell,michael@chinookcorp.com
6,7,Robert,King,robert@chinookcorp.com
7,8,Laura,Callahan,laura@chinookcorp.com


## DELETE

In [None]:
query = """SELECT	artistid,	name
           FROM	artists_backup"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,1,AC/DC
1,2,Accept
2,3,Aerosmith
3,4,Alanis Morissette
4,5,Alice In Chains
...,...,...
275,276,Bud Powell
276,277,Buddy Rich
277,278,Candido
278,279,Charlie Byrd


In [None]:
cur.execute("""DELETE FROM artists_backup
               WHERE artistid = 1""")
con.commit()

In [None]:
query = """SELECT	artistid,	name
           FROM	artists_backup"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,2,Accept
1,3,Aerosmith
2,4,Alanis Morissette
3,5,Alice In Chains
4,6,Antônio Carlos Jobim
...,...,...
274,276,Bud Powell
275,277,Buddy Rich
276,278,Candido
277,279,Charlie Byrd


In [None]:
cur.execute("""DELETE FROM artists_backup
               WHERE name LIKE '%Santana%'
               """)
con.commit()

In [None]:
query = """SELECT	artistid,	name
           FROM	artists_backup"""
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name
0,2,Accept
1,3,Aerosmith
2,4,Alanis Morissette
3,5,Alice In Chains
4,6,Antônio Carlos Jobim
...,...,...
265,276,Bud Powell
266,277,Buddy Rich
267,278,Candido
268,279,Charlie Byrd


To remove all rows in the artists_backup table, you just need to omit the WHERE clause as the following statement:

In [None]:
cur.execute("DELETE FROM artists_backup")
con.commit()

In [None]:
query = "SELECT	* FROM artists_backup"
data = pd.read_sql(query, con)
data

Unnamed: 0,ArtistId,Name


## REPLACE

The idea of the REPLACE statement is that when a UNIQUE or PRIMARY KEY constraint violation occurs, it does the following:

First, delete the existing row that causes a constraint violation.
Second, insert a new row.
In the second step, if any constraint violation e.g., NOT NULL constraint occurs, the REPLACE statement will abort the action and roll back the transaction.

In [None]:
cur.execute("""CREATE TABLE IF NOT EXISTS positions (id INTEGER PRIMARY KEY,
                                                     title TEXT NOT NULL,
                                                     min_salary NUMERIC)
            """)
cur.execute("""INSERT INTO positions (title, min_salary)
               VALUES ('DBA', 120000), ('Developer', 100000), ('Architect', 150000)
            """)
con.commit()

In [None]:
query = "SELECT	* FROM positions"
data = pd.read_sql(query, con)
data

Unnamed: 0,id,title,min_salary
0,1,DBA,120000
1,2,Developer,100000
2,3,Architect,150000


The following statement creates a unique index on the title column of the positions table to ensure that it doesn’t have any duplicate position title:

In [None]:
cur.execute("""CREATE UNIQUE INDEX idx_positions_title
               ON positions (title)
            """)
con.commit()

In [None]:
query = "SELECT	* FROM positions"
data = pd.read_sql(query, con)
data

Unnamed: 0,id,title,min_salary
0,1,DBA,120000
1,2,Developer,100000
2,3,Architect,150000


Suppose, you want to add a position into the positions table if it does not exist, in case the position exists, update the current one.

The following REPLACE statement inserts a new row into the positions table because the position title Full Stack Developer is not in the positions table.

In [None]:
cur.execute("""REPLACE INTO positions (title, min_salary)
               VALUES('Full Stack Developer', 140000)
            """)
con.commit()

In [None]:
query = "SELECT	* FROM positions"
data = pd.read_sql(query, con)
data

Unnamed: 0,id,title,min_salary
0,1,DBA,120000
1,2,Developer,100000
2,3,Architect,150000
3,4,Full Stack Developer,140000


In [None]:
cur.execute("""REPLACE INTO positions (title, min_salary)
               VALUES('DBA', 170000)
            """)
con.commit()

In [None]:
query = "SELECT	* FROM positions"
data = pd.read_sql(query, con)
data

Unnamed: 0,id,title,min_salary
0,2,Developer,100000
1,3,Architect,150000
2,4,Full Stack Developer,140000
3,5,DBA,170000


First, SQLite checked the UNIQUE constraint. Second, because this statement violated the UNIQUE constraint by trying to add the DBA title that already exists, SQLite deleted the existing row. Third, SQLite inserted a new row with the data provided by the REPLACE statement.

Notice that the REPLACE statement means INSERT or REPLACE, not INSERT or UPDATE.

See the following statement.

In [None]:
cur.execute("""REPLACE INTO positions (id, min_salary)
               VALUES(2, 110000)
            """)
con.commit()

IntegrityError: ignored

What the statement tried to do is to update the min_salary for the position with id 2, which is the developer.

First, the position with id 2 already exists, the REPLACE statement removes it. Then, SQLite tried to insert a new row with two columns: ( id, min_salary). However, it violates the NOT NULL constraint of the title column. Therefore, SQLite rolls back the transaction.

If the title column does not have the NOT NULL constraint, the REPLACE statement will insert a new row whose the title column is NULL.