Skip to content

Commit

Permalink
Candidate interview project for a backend data engineer
Browse files Browse the repository at this point in the history
  • Loading branch information
hilliao committed Feb 8, 2024
1 parent e779ebd commit 88ecaa7
Show file tree
Hide file tree
Showing 6 changed files with 354 additions and 0 deletions.
22 changes: 22 additions & 0 deletions googlecloud/apis/university-courses/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM python:3.9-slim-buster

# Set working directory
WORKDIR /app

# Copy requirements.txt and install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy your Python application files
COPY . /app

# Set environment variable
ENV DB_PASSWORD 'pass at run time'
ENV DB_HOST "pass at deployment time"

# Expose port 5002
EXPOSE 5002

# Run the Flask app
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0", "--port=5002"]

102 changes: 102 additions & 0 deletions googlecloud/apis/university-courses/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Sample microservices
Data Engineering – Interview Project

Summary: Design and develop an API for University Courses

Create a docker container for the API and an associated docker compose file for deploying all necessary components, allowing it to run on Ubuntu. Please utilize the programming language / database of your choice for this project.
Ensure the API is RESTful and the API includes data validation. Please add additional comments for functionality that you think should be added if time allowed.


Key components:

As a user of the API, I should be able to:

* Create a student, including first and last name
* Create courses, including the course name, course code, and description
* Show all students and which courses the students have taken
* Show all students and which courses the students have not taken

Final Output should include:
* Source Code uploaded to Codility.
* Docker File
* Docker Compose File
* Ability to execute on Ubuntu
* Readme file with instructions

NOTE: Do not use Object-Relational Mapping (ORM)

"Pre-requirement - You need to install docker in your local environment to test your solution"


# Configure the database
The database is PostgreSQL 15 running as a Google Cloud SQL instance. The database is `courses`. There are 2 tables:
students, courses

```sql
-- Database: courses

-- DROP DATABASE IF EXISTS courses;

CREATE DATABASE courses
WITH
OWNER = cloudsqlsuperuser
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF8'
LC_CTYPE = 'en_US.UTF8'
LOCALE_PROVIDER = 'libc'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
IS_TEMPLATE = False;

-- Table: public.courses

-- DROP TABLE IF EXISTS public.courses;

CREATE TABLE IF NOT EXISTS public.courses
(
id integer NOT NULL DEFAULT nextval('courses_id_seq'::regclass),
name character varying(255) COLLATE pg_catalog."default" NOT NULL,
description character varying(10240) COLLATE pg_catalog."default",
CONSTRAINT courses_pkey PRIMARY KEY (id)
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.courses
OWNER to postgres;

-- Table: public.students

-- DROP TABLE IF EXISTS public.students;

CREATE TABLE IF NOT EXISTS public.students
(
id integer NOT NULL DEFAULT nextval('students_id_seq'::regclass),
first_name character varying(1024) COLLATE pg_catalog."default",
last_name character varying(1024) COLLATE pg_catalog."default",
courses_taken character varying(10240) COLLATE pg_catalog."default",
CONSTRAINT students_pkey PRIMARY KEY (id)
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.students
OWNER to postgres;
```


# Deployment
## Command to build the docker image:

`$ docker build -t dematic-interview:prod /path/to/folder`

The image is tagged as `dematic-interview:prod`. Consider tagging it and push to Google cloud artifact registry.

## Command to run the container at port 5002 on localhost
The database password and IP need to be passed at runtime. The command below uses `XXXYYYZZZ` and `34.72.218.000`
as the placeholders:
`docker run --name dematic --env DB_PASSWORD=XXXYYYZZZ --env DB_HOST=34.72.218.000 -p 5002:5002 dematic-interview:prod`
Further deployment to Google Cloud run or GKE Autopilot in production is recommended.

# Unit testing
All methods have example outputs for developing unit tests. Review each method in app.py for details.
228 changes: 228 additions & 0 deletions googlecloud/apis/university-courses/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
import os

import psycopg2
from flask import Flask, jsonify, request
from psycopg2.extras import RealDictCursor

app = Flask(__name__)

DB_PARAM_HOST = os.environ["DB_HOST"]
DB_PARAM_PORT = "5432"
DB_PARAM_PASS = os.environ["DB_PASSWORD"]
DB_PARAM_USER = "postgres"
DB_PARAM_DATABASE = "courses"


@app.route('/add_student', methods=['POST'])
def add_student():
"""
Create a student, including first and last name
Example curl:
curl -X POST -H "Content-Type: application/json" -d '{"first_name": "test", "last_name":"test"}' http://localhost:5002/add_student
[
{
"id": 5
}
]
"""
req = request.json
first_name = req['first_name']
last_name = req['last_name']

# validate student name and check SQL injection
if not all(char.isalpha() for char in first_name):
return "First Name not valid", 400
if not all(char.isalpha() for char in last_name):
return "Last Name not valid", 400

try:
with psycopg2.connect(database=DB_PARAM_DATABASE,
user=DB_PARAM_USER, password=DB_PARAM_PASS,
host=DB_PARAM_HOST, port=DB_PARAM_PORT) as conn:
with conn.cursor(cursor_factory=RealDictCursor) as cur:
insert_query = """INSERT INTO public.students(first_name, last_name) VALUES(%s,%s) RETURNING id"""
cur.execute(insert_query, (first_name, last_name))
rows = cur.fetchall()

except (Exception, psycopg2.DatabaseError) as error:
print(error)
return rows


@app.route('/add_course', methods=['POST'])
def add_course():
"""
Create courses, including the course name, course code, and description
Example curl:
curl -X POST -H "Content-Type: application/json" -d '{"name": "test", "description":"test", "code": "56"}' http://localhost:5002/add_course
[
{
"id": 56
}
]
(university-courses) hil@us-threadripper-24cpu-32g:~/git/enterprise-solutions/googlecloud/apis/university-courses$ curl -X POST -H "Content-Type: application/json" -d '{"name": "test", "description":"tes t", "code": "56"}' http://localhost:5002/add_course
Course description not valid(university-courses)
"""
req = request.json
name = req['name']
id = req['code']
description = req['description']

# validate student name and check SQL injection
if not all(char.isalpha() for char in name):
return "Course Name not valid", 400
if not all(char.isalpha() for char in description):
return "Course description not valid", 400
course_id = int(id)

try:
with psycopg2.connect(database=DB_PARAM_DATABASE,
user=DB_PARAM_USER, password=DB_PARAM_PASS,
host=DB_PARAM_HOST, port=DB_PARAM_PORT) as conn:
with conn.cursor(cursor_factory=RealDictCursor) as cur:
insert_query = """INSERT INTO public.courses(id, name,description) VALUES(%s,%s,%s) RETURNING id"""
cur.execute(insert_query, (course_id, name, description))
rows = cur.fetchall()

except (Exception, psycopg2.DatabaseError) as error:
print(error)
return rows


@app.route('/courses-taken')
def taken():
"""
Show all students and which courses the students have taken
Example output:
[
{
"courses_taken": null,
"first_name": "Joe",
"id": 1,
"last_name": "Smoth"
},
{
"courses_taken": null,
"first_name": "Jane",
"id": 2,
"last_name": "Doe"
},
{
"courses_taken": "456",
"first_name": "David",
"id": 3,
"last_name": "Gardner"
},
{
"courses_taken": "123,120",
"first_name": "Amy",
"id": 4,
"last_name": "Lyall"
}
]
"""
conn = psycopg2.connect(database=DB_PARAM_DATABASE,
user=DB_PARAM_USER, password=DB_PARAM_PASS,
host=DB_PARAM_HOST, port=DB_PARAM_PORT)
cur = conn.cursor(cursor_factory=RealDictCursor)
query = """
SELECT * FROM PUBLIC.STUDENTS
"""
cur.execute(query)
results = cur.fetchall()
# close the cursor and connection
cur.close()
conn.close()

return jsonify(results)


@app.route('/courses-not-taken')
def not_taken():
"""
Show all students and which courses the students have not taken
example output:
[
{
"courses_not_taken": [
123,
456,
450,
120
],
"courses_taken": null,
"first_name": "Joe",
"id": 1,
"last_name": "Smoth"
},
{
"courses_not_taken": [
123,
456,
450,
120
],
"courses_taken": null,
"first_name": "Jane",
"id": 2,
"last_name": "Doe"
},
{
"courses_not_taken": [
123,
450,
120
],
"courses_taken": "456",
"first_name": "David",
"id": 3,
"last_name": "Gardner"
},
{
"courses_not_taken": [
456,
450
],
"courses_taken": "123,120",
"first_name": "Amy",
"id": 4,
"last_name": "Lyall"
}
]
"""
conn = psycopg2.connect(database=DB_PARAM_DATABASE,
user=DB_PARAM_USER, password=DB_PARAM_PASS,
host=DB_PARAM_HOST, port=DB_PARAM_PORT)
cur = conn.cursor(cursor_factory=RealDictCursor)
query = """
SELECT * FROM PUBLIC.STUDENTS
"""
cur.execute(query)
students_results = cur.fetchall()

cur.execute("SELECT * FROM PUBLIC.COURSES")
courses_results = cur.fetchall()

# close the cursor and connection
cur.close()
conn.close()

students_results = jsonify(students_results)
courses_results = jsonify(courses_results)
students_results_dict = students_results.json

all_courses = [course['id'] for course in courses_results.json]
for student in students_results_dict:
courses_taken = courses_taken = student['courses_taken']
if courses_taken:
courses_taken = courses_taken.split(',')
courses_not_taken = [c for c in all_courses if str(c) not in courses_taken]
student['courses_not_taken'] = courses_not_taken
else:
student['courses_not_taken'] = all_courses

return students_results_dict


if __name__ == '__main__':
app.run(debug=True, port=5002)
2 changes: 2 additions & 0 deletions googlecloud/apis/university-courses/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
flask
psycopg2-binary
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 88ecaa7

Please sign in to comment.