# Improving Performance of Slow Queries in MySQL

## Objectives
After completing this lab, you will be able to:

* Use the **EXPLAIN** statement to check the performance of your query
* Add indexes to improve the performance of your query
* Apply other best practices such as using the **UNION ALL** clause to improve query performance


https://www.coursera.org/learn/relational-database-administration/ungradedLti/VByCa/hands-on-lab-improving-performance-of-slow-queries-in-mysql

In [1]:
!curl -O https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0231EN-SkillsNetwork/labs/MySQL/Lab%20-%20Improving%20Performance%20of%20Slow%20Queries%20in%20MySQL/images/employees-schema.png

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 64839  100 64839    0     0  64121      0  0:00:01  0:00:01 --:--:-- 64452


## Database Used in this Lab
The Employees database used in this lab comes from the following source: https://dev.mysql.com/doc/employee/en/ under the CC BY-SA 3.0 License.

The following entity relationship diagram (ERD) shows the schema of the Employees database:

![](employees-schema.png)

## Exercise 1: Load the Database

In [2]:
#!curl -O https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0231EN-SkillsNetwork/datasets/employeesdb.zip

In [3]:
#!unzip employeesdb.zip

In [4]:
#!rm employeesdb.zip

In [5]:
#mysql --host=127.0.0.1 --port=3306 --user=root --password -t < employees.sql 

In [11]:
path = "/usr/local/mysql-8.0.31-macos12-arm64/bin/"

In [12]:
import os
from dotenv import load_dotenv

# Cargar variables de entorno desde el archivo .env
load_dotenv()

# Obtener la contraseña de la variable de entorno
password = os.getenv("DB_PASSWORD")

In [13]:
!{path}mysql --host=127.0.0.1 --port=3306 --user=root --password={password} --execute="SHOW DATABASES" 2>/dev/null;

+--------------------+
| Database           |
+--------------------+
| employees          |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| world              |
+--------------------+


In [14]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [15]:
# Crear la URL de conexión
%sql mysql+pymysql://root:{password}@localhost:3306/employees

In [16]:
%sql show tables;

 * mysql+pymysql://root:***@localhost:3306/employees
8 rows affected.


Tables_in_employees
current_dept_emp
departments
dept_emp
dept_emp_latest_date
dept_manager
employees
salaries
titles


## Exercise 2: Check Your Query's Performance with EXPLAIN
The **EXPLAIN** statement, which provides information about how MySQL executes your statement, will offer you insight about the number of rows your query is planning on looking through. This statement can be helpful when your query is running slow. For example, is it running slow because it’s scanning the entire table each time?

Let’s start with selecting all the data from the employees table

In [17]:
%sql select * from employees limit 10;

 * mysql+pymysql://root:***@localhost:3306/employees
10 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
10001,1953-09-02,Georgi,Facello,M,1986-06-26
10002,1964-06-02,Bezalel,Simmel,F,1985-11-21
10003,1959-12-03,Parto,Bamford,M,1986-08-28
10004,1954-05-01,Chirstian,Koblick,M,1986-12-01
10005,1955-01-21,Kyoichi,Maliniak,M,1989-09-12
10006,1953-04-20,Anneke,Preusig,F,1989-06-02
10007,1957-05-23,Tzvetan,Zielinski,F,1989-02-10
10008,1958-02-19,Saniya,Kalloufi,M,1994-09-15
10009,1952-04-19,Sumant,Peac,F,1985-02-18
10010,1963-06-01,Duangkaew,Piveteau,F,1989-08-24


In [18]:
#Use mysql CLI
#SELECT * FROM employees;

In [26]:
!curl -O https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0231EN-SkillsNetwork/labs/MySQL/Lab%20-%20Improving%20Performance%20of%20Slow%20Queries%20in%20MySQL/images/b-select_all_output.png

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  162k  100  162k    0     0   116k      0  0:00:01  0:00:01 --:--:--  117k98k


![](b-select_all_output.png)

We can use EXPLAIN to see how many rows were scanned

In [27]:
#EXPLAIN SELECT * FROM employees;

![](b-explain.png)

In [28]:
%sql Explain select * from employees;

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,employees,,ALL,,,,,298980,100.0,


Notice how EXPLAIN shows that it is examining 298,980 rows, almost the entire table! With a larger table, this could result in the query running slowly.

## Exercise 3: Add an Index to Your Table

In [29]:
%sql SHOW INDEX FROM employees;

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,Cardinality,Sub_part,Packed,Null,Index_type,Comment,Index_comment,Visible,Expression
employees,0,PRIMARY,1,emp_no,A,298980,,,,BTREE,,,YES,


Remember that indexes for primary keys are created automatically, as we can see above. An index has already been created for the primary key, emp_no. If we think about this, this makes sense because each employee number is unique to the employee, with no NULL values.



Now, let’s say we wanted to see all the information about employees who were hired on or after January 1, 2000. We can do that with the query:

In [30]:
%sql SELECT * FROM employees WHERE hire_date >= '2000-01-01';

 * mysql+pymysql://root:***@localhost:3306/employees
13 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
47291,1960-09-09,Ulf,Flexer,M,2000-01-12
60134,1964-04-21,Seshu,Rathonyi,F,2000-01-02
72329,1953-02-09,Randi,Luit,F,2000-01-02
108201,1955-04-14,Mariangiola,Boreale,M,2000-01-01
205048,1960-09-12,Ennio,Alblas,F,2000-01-06
222965,1959-08-07,Volkmar,Perko,F,2000-01-13
226633,1958-06-10,Xuejun,Benzmuller,F,2000-01-04
227544,1954-11-17,Shahab,Demeyer,M,2000-01-08
422990,1953-04-09,Jaana,Verspoor,F,2000-01-11
424445,1953-04-27,Jeong,Boreale,M,2000-01-03


As we can see, the 13 rows returned took about 0.17 seconds to execute. That may not seem like a long time with this table, but keep in mind that with larger tables, this time can vary greatly.

With the EXPLAIN statement, we can check how many rows this query is scanning:

In [31]:
%sql EXPLAIN SELECT * FROM employees WHERE hire_date >= '2000-01-01';

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,employees,,ALL,,,,,298980,33.33,Using where


This query results in a scan of 299,423 rows, which is nearly the entire table!

By adding an index to the hire_date column, we’ll be able to reduce the query’s need to search through every entry of the table, instead only searching through what it needs.

In [32]:
%sql CREATE INDEX hire_date_index ON employees(hire_date);

 * mysql+pymysql://root:***@localhost:3306/employees
0 rows affected.


[]

The CREATE INDEX command creates an index called hire_date_index on the table employees on column hire_date.

To check your index, you can use the SHOW INDEX command:



In [33]:
%sql SHOW INDEX FROM employees;

 * mysql+pymysql://root:***@localhost:3306/employees
2 rows affected.


Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,Cardinality,Sub_part,Packed,Null,Index_type,Comment,Index_comment,Visible,Expression
employees,0,PRIMARY,1,emp_no,A,298980,,,,BTREE,,,YES,
employees,1,hire_date_index,1,hire_date,A,5647,,,,BTREE,,,YES,


Now you can see that we have both the emp_no index and hire_date index.

Once more, let’s select all the employees who were hired on or after January 1, 2000.

In [34]:
%sql SELECT * FROM employees WHERE hire_date >= '2000-01-01';

 * mysql+pymysql://root:***@localhost:3306/employees
13 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
108201,1955-04-14,Mariangiola,Boreale,M,2000-01-01
60134,1964-04-21,Seshu,Rathonyi,F,2000-01-02
72329,1953-02-09,Randi,Luit,F,2000-01-02
424445,1953-04-27,Jeong,Boreale,M,2000-01-03
226633,1958-06-10,Xuejun,Benzmuller,F,2000-01-04
205048,1960-09-12,Ennio,Alblas,F,2000-01-06
227544,1954-11-17,Shahab,Demeyer,M,2000-01-08
422990,1953-04-09,Jaana,Verspoor,F,2000-01-11
47291,1960-09-09,Ulf,Flexer,M,2000-01-12
222965,1959-08-07,Volkmar,Perko,F,2000-01-13


The difference is quite evident! Rather than taking about 0.17 seconds to execute the query, it takes 0.00 seconds—almost no time at all.

We can use the EXPLAIN statement to see how many rows were scanned:

In [35]:
%sql EXPLAIN SELECT * FROM employees WHERE hire_date >= '2000-01-01';

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,employees,,range,hire_date_index,hire_date_index,3,,13,100.0,Using index condition


Under rows, we can see that only the necessary 13 columns were scanned, leading to the improved performance.

Under Extra, you can also see that it has been explicitly stated that the index was used, that index being hire_date_index based on the possible_keys column.

Now, if you want to remove the index, enter the following into the Terminal:

In [36]:
%sql DROP INDEX hire_date_index ON employees;

 * mysql+pymysql://root:***@localhost:3306/employees
0 rows affected.


[]

This will remove the hire_date_index on the employees table. You can check with the SHOW INDEX command to confirm:



In [37]:
%sql SHOW INDEX FROM employees;

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,Cardinality,Sub_part,Packed,Null,Index_type,Comment,Index_comment,Visible,Expression
employees,0,PRIMARY,1,emp_no,A,298980,,,,BTREE,,,YES,


## Exercise 4: Use an UNION ALL Clause

Sometimes, you might want to run a query using the OR operator with LIKE statements. In this case, using a UNION ALL clause can improve the speed of your query, particularly if the columns on both sides of the OR operator are indexed.

In [39]:
#CLI
#SELECT * FROM employees WHERE first_name LIKE 'C%' OR last_name LIKE 'C%';


28970 rows in set (0,13 sec)

This query searches for first names or last names that start with “C”. It returned 28,970 rows, taking about 0.20 seconds.

In [40]:
%sql EXPLAIN SELECT * FROM employees WHERE first_name LIKE 'C%' OR last_name LIKE 'C%';

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,employees,,ALL,,,,,298980,20.99,Using where


Once more, we can see that almost all the rows are being scanned, so let’s add indexes to both the first_name and last_name columns.

Try adding an index to both the first_name and last_name columns.

In [42]:
%%sql
CREATE INDEX first_name_index ON employees(first_name);
CREATE INDEX last_name_index ON employees(last_name);

 * mysql+pymysql://root:***@localhost:3306/employees
0 rows affected.
0 rows affected.


[]

You can also check to see if your indexes have been added with the SHOW INDEX command:

In [43]:
%sql SHOW INDEX FROM employees;

 * mysql+pymysql://root:***@localhost:3306/employees
3 rows affected.


Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,Cardinality,Sub_part,Packed,Null,Index_type,Comment,Index_comment,Visible,Expression
employees,0,PRIMARY,1,emp_no,A,298980,,,,BTREE,,,YES,
employees,1,first_name_index,1,first_name,A,1266,,,,BTREE,,,YES,
employees,1,last_name_index,1,last_name,A,1684,,,,BTREE,,,YES,


Great! With your indexes now in place, we can re-run the query:



In [45]:
#SELECT * FROM employees WHERE first_name LIKE 'C%' OR last_name LIKE 'C%';

28970 rows in set (0,13 sec)


Let’s also see how many rows are being scanned:



In [46]:
%sql EXPLAIN SELECT * FROM employees WHERE first_name LIKE 'C%' OR last_name LIKE 'C%';

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,employees,,ALL,"first_name_index,last_name_index",,,,298980,20.99,Using where


With indexes, the query still scans all the rows.



Let’s use the UNION ALL clause to improve the performance of this query.

We can do this with the following:

In [None]:
#SELECT * FROM employees WHERE first_name LIKE 'C%' UNION ALL SELECT * FROM employees WHERE last_name LIKE 'C%';

running faster than when we used the OR operator.

Using the EXPLAIN statement, we can see why that might be:

In [49]:
%sql EXPLAIN SELECT * FROM employees WHERE first_name LIKE 'C%' UNION ALL SELECT * FROM employees WHERE last_name LIKE 'C%';

 * mysql+pymysql://root:***@localhost:3306/employees
2 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,PRIMARY,employees,,range,first_name_index,first_name_index,58,,20622,100.0,Using index condition
2,UNION,employees,,range,last_name_index,last_name_index,66,,34168,100.0,Using index condition


As the EXPLAIN statement reveals, there were two SELECT operations performed, with the total number of rows scanned sitting at 54,790. This is less than the original query that scanned the entire table and, as a result, the query performs faster.

Please note, if you choose to perform a leading wildcard search with an index, the entire table will still be scanned. You can see this yourself with the following query:

In [50]:
#SELECT * FROM employees WHERE first_name LIKE '%C';

With this query, we want to find all the employees whose first names end with “C”.

When checking with the EXPLAIN and SHOW INDEX statements, we can see that although we have an index on first_name, the index is not used and results in a search of the entire table.

Under the EXPLAIN statement’s possible_keys column, we can see that this index has not been used as the entry is NULL.

In [51]:
%sql EXPLAIN SELECT * FROM employees WHERE first_name LIKE '%C';

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,employees,,ALL,,,,,298980,11.11,Using where


In [52]:
%sql SHOW INDEX from employees;

 * mysql+pymysql://root:***@localhost:3306/employees
3 rows affected.


Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,Cardinality,Sub_part,Packed,Null,Index_type,Comment,Index_comment,Visible,Expression
employees,0,PRIMARY,1,emp_no,A,298980,,,,BTREE,,,YES,
employees,1,first_name_index,1,first_name,A,1266,,,,BTREE,,,YES,
employees,1,last_name_index,1,last_name,A,1684,,,,BTREE,,,YES,


On the other hand, indexes do work with trailing wildcards, as seen with the following query that finds all employees whose first names begin with “C”:

In [53]:
#SELECT * FROM employees WHERE first_name LIKE 'C%';

In [54]:
%sql EXPLAIN SELECT * FROM employees WHERE first_name LIKE 'C%';

 * mysql+pymysql://root:***@localhost:3306/employees
1 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,employees,,range,first_name_index,first_name_index,58,,20622,100.0,Using index condition


Under the EXPLAIN statement’s possible_keys and Extra columns, we can see that the first_name_index is used. With only 20,622 rows scanned, the query performs better.

## Exercise 5: Be SELECTive

In general, it’s best practice to only select the columns that you need. For example, if you wanted to see the names and hire dates of the various employees, you could show that with the following query:



In [56]:
#SELECT * FROM employees;

Notice how the query loads 300,024 rows in about 0.26 seconds. With the EXPLAIN statement, we can see that the entire table is being scanned, which makes sense because we are looking at all the entries.

If we, however, only wanted to see the names and hire dates, then we should select those columns:

In [55]:
#SELECT first_name, last_name, hire_date FROM employees;

As you can see, this query was executed a little faster despite scanning the entire table as well.
