Skip to content

magrathj/Data-Science-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Challenge

This challenge looks at using docker, postgres and python to analyse interesting trends in transcational banking data.

Preparation

We will be utilizing a PostgreSQL 11 server for storing and retrieving data and Docker for hosting it. To do this, first install Docker: https://docs.docker.com/.

Once installed, run the following commands in your terminal:

  1. To run the server: docker run -d --name ht_pg_server -v ht_dbdata:/var/lib/postgresql/data -p 54320:5432 postgres:11
  2. Check the logs to see if it is running: docker logs -f ht_pg_server
  3. Create the database: docker exec -it ht_pg_server psql -U postgres -c "create database ht_db"
  4. Modify paths in the config.py script and run it to load the csv data into the DB. Then populate the database: python etl.py

Interacting with the DB

In the first part of the home task you will be required to utilize the DB to query the data. If you are unfamiliar with PostgreSQL, we recommend installing DBeaver (Mac, Windows, Linux) or Postico.

For Postico visit: https://eggerapps.at/postico/. Once installed, open Postico and:

  1. click 'New Favorite'
  2. enter the following:
  • Nickname: ht_db
  • Host: localhost
  • Port: 54320
  • User: postgres
  • Database: ht_db
  1. click 'Done' and 'Connect' and you should be able to see the data you just loaded to the database.

Data

  1. devices.csv
    • a table of associated devices
    • brand: string corresponding to the phone brand
    • user_id: string uniquely identifying the user
  2. users.csv
    • a table of user data
    • user_id: string uniquely identifying the user
    • birth_year: integer corresponding to the user’s birth year
    • country: two letter string corresponding to the user’s country of residence
    • city: two string corresponding to the user’s city of residence
    • created_date: datetime corresponding to the user’s created date
    • user_settings_crypto_unlocked: integer indicating if the user has unlocked the crypto currencies in the app
    • plan: string indicating on which plan the user is on
    • attributes_notifications_marketing_push: float indicating if the user has accepted to receive marketing push notifications
    • attributes_notifications_marketing_email: float indicating if the user has accepted to receive marketing email notifications
    • num_contacts: integer corresponding to the number of contacts the user has
    • num_referrals: integer corresponding to the number of users referred by the selected user
    • num_successful_referrals: integer corresponding to the number of users successfully referred by the selected user (successfully means users who have actually installed the app and are able to use the product)
  3. notifications.csv
    • reason: string indicating the purpose of the notification
    • channel: string indicating how the user has been notified
    • status: string indicating the status of the notification
    • user_id: string uniquely identifying the user
    • created_date: datetime indicating when the notification has been sent
  4. transactions.csv
    • transaction_id: string uniquely identifying the transaction
    • transactions_type: string indicating the type of the transaction
    • transactions_currency: string indicating the currency of the transaction
    • amount_usd: float corresponding to the transaction amount in USD
    • transactions_state: string indicating the state of a transaction COMPLETED - the transaction was completed and the user's balance was changed DECLINED/FAILED - the transaction was declined for some reason, usually pertains to insufficient balance REVERTED - the associated transaction was completed first but was then rolled back later in time potentially due to customer reaching out
    • ea_cardholderpresence: string indicating if the card holder was present when the transaction happened
    • ea_merchant_mcc: float corresponding to the Merchant Category Code (MCC)
    • ea_merchant_city: string corresponding to the merchant’s city
    • ea_merchant_country: string corresponding to the merchant’s country
    • direction: string indicating the direction of the transaction
    • user_id: string uniquely identifying the user
    • created_date: datetime corresponding to the transaction’s created date

Hints & Tips

  1. Engineering practice is a strong component of ML Engineer role. Anything you include in the code/ sub-directory should be nearly production grade.
  2. Presentation, creativity and the ability to dive deep into the problem space is a strong component of the Data Scientist role. Your presentation should be one you could comfortably present to a team of technical and non-technical employees.
  3. Communicating your thought process is also important, we care a lot about the auditability & reproducibility of research/analysis for both Scientists and Engineers.
  4. Data visualisation and storytelling are important (don’t hesitate to use external packages)
  5. Backup assumptions with data
  6. All analysis should be accompanied with explanations (models, coeffiecients, tests and values)
  7. Keep your code clean and comment it

About

This challenge looks at using linux, docker, postgres and python to analyse interesting trends in transcational banking data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages