Skip to content

pdcal/dataengineerapi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Airangel Logo

Data Engineer Technical Test

Airangel Data Engineering Technical Problem

The winemag-data-130k-v2-formatted.json file contains a list of review for wine from various users. We would like you to demonstrate scripting, database and API knowledge to provide some insight into the wine reviews.

Insight Problem

Schema Creation

Construct a database schema and user called 'vino' with password 'vino'

Table Population

Create two tables. One called 'reviews' which matches structure of the JSON records and another 'userinfo' that contains the following fields for a given Twitter user:

  • id - autogenerated primary key
  • name - name of the user
  • description - a description of the user
  • profile_image_url - Profile image URL
  • followers_count - Count of their followers

Write a script that reads and parses the json file then inserts the data into the database.

Twitter User Population

Write a script that queries the database table and list all users with a Twitter handle, fetches the data required for the 'userinfo' table from the twitter API (https://developer.twitter.com/en), and inserts that data into the database.

Unique Reviewers Query

Write a script that counts the number of unique reviewers in the reviews table.

Multi-Reviewers Query

Write a script that ouputs users with five or more reviews.

Twitter Followers/Reviews

Write a script that looks at the Twitter users and calculates a score for followers_count * number of reviews for that user.

Submission

The output you should include in the final submission should be:

  • Table Population Script for managing the database (create, drop, list structure)
  • Twitter User Population script for querying the database, the API and inserting data into the userinfo table
  • Output file for the unique reviewers
  • Output file for the twitter followers/reviews
  • Output file for the users with 5 reviews or more
  • A dumped MySQL database structure
  • Suitable tests for the scripts

We suggest that you either upload your response to github (making sure not to include any API keys in publicly available code) or zip everything and share it via dropbox.

About

Data engineer test for Airangel

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published