# **Lead Data Scientist Interview**

Welcome to the Utility Warehouse Lead Data Scientist interview! 

This interview is not about scrutinising your results and checking you have the correct Python syntax, but is more about your thought process; how you approach and solve problems you may be given as a data scientist.

You're free to Google any questions you may have, or please feel free to ask us for help - and don't worry, we won't penalise your for this! If you'd be more comfortable turning off you camera while you work, feel free to do so.

The interview will cover 4 core sections of a data scientists work:

1. SQL
2. Algorithms
3. Python
4. Data Science Theory

Good luck!

As stated earlier *we are more interested in how you think and approach a problem, rather than how correct your Python code is* - so feel free to Google!

## **SQL**

The initial portion of this test concerns SQL.

In [None]:
# CREATING THE TABLE
import sqlite3

conn = sqlite3.connect('test.db')
print("Opened database successfully");

conn.execute('''
DROP TABLE IF EXISTS team_data;''')


conn.execute('''
CREATE TABLE IF NOT EXISTS team_data(team text, 
                      country text, 
                      season integer, 
                      total_goals integer);''')

conn.execute('''
CREATE TABLE IF NOT EXISTS trophy_data(team text, 
                      country text, 
                      season integer, 
                      trophy_type text);''')

conn.commit()

print("Table created successfully");

# conn.close()

In [None]:
# INSERTING VALUES

conn.execute("INSERT INTO team_data VALUES('Real Madrid', 'Spain', 2019, 63);")
conn.execute("INSERT INTO team_data VALUES('Barcelona', 'Spain', 2019, 47);")
conn.execute("INSERT INTO team_data VALUES('Arsenal', 'UK', 2019, 52);")
conn.execute("INSERT INTO team_data VALUES('Real Madrid', 'Spain', 2018, 49);")
conn.execute("INSERT INTO team_data VALUES('Barcelona', 'Spain', 2018, 45);")
conn.execute("INSERT INTO team_data VALUES('Arsenal', 'UK', 2018, 50 );")
conn.execute("INSERT INTO team_data VALUES('Real Madrid', 'Spain', 2022, 65);")
conn.execute("INSERT INTO team_data VALUES('Bayern', 'Germany', 2020, 55);")
conn.execute("INSERT INTO team_data VALUES('Bayern', 'Germany', 2021, 70 );")
conn.execute("INSERT INTO team_data VALUES('Bayern', 'Germany', 2022, 55);")

conn.execute("INSERT INTO trophy_data VALUES('Bayern', 'Germany', 2022, 'Bundesliga');")
conn.execute("INSERT INTO trophy_data VALUES('Bayern', 'Germany', 2021, 'Bundesliga');")
conn.execute("INSERT INTO trophy_data VALUES('Bayern', 'Germany', 2018, 'Champions League');")
conn.execute("INSERT INTO trophy_data VALUES('Real Madrid', 'Spain', 2022, 'Champions League');")
conn.execute("INSERT INTO trophy_data VALUES('Real Madrid', 'Spain', 2022, 'LaLiga');")
conn.execute("INSERT INTO trophy_data VALUES('Real Madrid', 'Spain', 2020, 'Bundesliga');")
conn.execute("INSERT INTO trophy_data VALUES('Barcelona', 'Spain', 2021, 'LaLiga');")
conn.execute("INSERT INTO trophy_data VALUES('Barcelona', 'Spain', 2019, 'Champions League');")
conn.execute("INSERT INTO trophy_data VALUES('Barcelona', 'Spain', 2017, 'LaLiga');")

conn.commit()

In [None]:
cursor = conn.execute("SELECT * FROM team_data;")

for row in cursor:
  print(row)

In [None]:
cursor = conn.execute("SELECT * FROM trophy_data;")

for row in cursor:
  print(row)

### **Question 1** 
Find the average number of goals scored by a team

**Output Columns**

    Team , avg_goals

In [None]:
sql = """
;"""

cursor = conn.execute(sql)

for row in cursor:
  print(row)

### **Question 2**
List the team with average goals higher than 55

**Output Columns**

    team_name , avg_goals

In [None]:
sql = """
;"""

cursor = conn.execute(sql)

for row in cursor:
  print(row)

In [None]:
sql = """
;"""

cursor = conn.execute(sql)

for row in cursor:
  print(row)

### **Question 3**
Create a table that contains the Champions League winner, and the previous years Champions League winner.

**Output Columns**

    year, champions_league_winner, previous_year_champions_league_winner

In [None]:
sql = """
;"""

cursor = conn.execute(sql)

for row in cursor:
  print(row)

### **Question 4**
Find the team(s) with average goals higher than 50 and have won the **Champions League**.

**Output Columns**

    team_name , avg_goals, trophy_type

In [None]:
sql = """
;"""

cursor = conn.execute(sql)

for row in cursor:
  print(row)

## **Algorithms**

This section will test your knowledge of machine learning algorithms and implementation.

### **Question 1**:

The algorithm we'll be discussing today is the k-means algorithm, can you describe what this algorithm does to data, and how it can be used?

### **Question 2**:

Using pseudo code, describe how you would create a k-means algorithm from scratch.

## **Question 3**

What are some of the drawbacks of this algorithm, and how would you go about improving it?

## **Python**

This section will assess your knowledge of Python.

### **Intro**

We have attached a csv to the session for the notebook, containing the average UK temperature over the last 100 years.

### **Question 1**

Using pandas, load the *average_temperates* file from the *files* section. The layout of this csv is as follows: 

    year [int], average_uk_temperature [float], average_fr_temperature [float]



### **Question 2**

If you were tasked with modelling the yearly temperature of the UK using the dataset above, how would you approach it?

### **Question 3**

Inspect the dataset and describe the data, you may use any method you like!

### **Question 4**

Count the number of years where the average temperature is below 8 degrees for the UK and FR.

    uk_under_8 [int]
    fr_under_8 [int]

## **Data Science Theory**

Congratulations! You've made it to the final section of the test. This section of the test is more of an informal chat, so feel free to take a breather.

### **Question 1**

Can you give an example of regression problem vs a classification problem? 

### **Question 2**

Imagine we have built a machine learning model that takes numerous customer attributes and predicts whether or not they'd be a good customer for us, and we use this to decide whether or not we take in that new customer.  

a) A key stakeholder wants to know how the model works, and what drives its predictions. How would you explain the your model to a stakeholder?

b) We've been using your model for a few months now, and rejected some customers using it. One of these customers contacts us and asks us to explain why they were rejected. How would you go about explaining this individual result?