Skip to content

isabella232/data-engineering-exercise

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Data Engineering Exercise

Instructions

  1. Download exercise.sql and exercise.tar.gz files
  2. Complete exercise
  3. Email optimized exercise.sql file and question answers to data-engineering-exercise@evolytics.com

Purpose

Here at Evolytics, we often receive requests to help optimize SQL. While the original SQL generates the proper output, the performance requires additional resources to execute, and the readability makes updating and maintaining the code difficult to understand.

This data engineering exercise focuses on the ability to identify sub-optimal and poorly written SQL, and then refactor to improve performance and readability. The exercise is entirely query based, so we are not looking for solutions that incorporate stored procedures, user defined functions, or external programming.

Any query based solutions meeting the following objectives will be considered in scope for this exercise.

Objective

Refactor the SQL with the following criteria in mind

  1. Optimized query plan
  2. Enhanced code readability
  3. Easier code maintenance

Final Output

The final output of this exercise will be an updated SQL file, which returns the exact same dataset as produced by the original SQL.

Environment

This exercise was written and tested using MySQL on Mac with default install locations. All necessary code and instructions to reproduce environment are provided below. You are welcome to follow as is, or use any other database platform that you are comfortable with. Keep in mind that if choosing a different platform, it may be necessary to modify the environment instructions as well as the original SQL, to be in line with platform specific keywords and syntax.

All steps executed on macOS Mojave

  1. Download and install MySQL. We tested using version 8.0.11

  2. Create or add entries to /etc/my.cnf so load data infile command will read from directory

     sudo nano /etc/my.cnf
    
     [mysqld]
     secure_file_priv=/add_your_path_here
    
  3. Create or add entries to /etc/my.cnf so MySQL is in the PATH variable

     nano ~/.bash_profile
    
     export PATH="/usr/local/mysql/bin:${PATH}"
    
  4. Restart MySQL from system preferences so changes take affect

  5. Execute the following lines of code to create and populate database. Modify the infile directory based on the secure_file_priv set in /etc/my.cnf file.

     create database evolytics;
    
     create table evolytics.exercise (visitor_id bigint, visit_num int, visit_start_timestamp datetime, hit_timestamp datetime, transaction_type varchar(6), transaction_action varchar(21));
    
     load data infile '/add_your_path_here/exercise.tsv' into table evolytics.exercise fields terminated by '\t' ignore 1 rows;
    
  6. Execute original SQL to generate dataset. Begin refactoring exercise.

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published