Follow these commands for setting up on your local environment (assuming you already have a working Ruby environment installed):
$ git clone https://github.com/mykoweb/data-engineering.git
$ cd data-engineering
$ bundle install
$ rake db:migrate
$ rails server
You can now open a browser and point to http://localhost:3000
If you want to run tests, you'll need to run
$ rake db:test:prepare
$ rspec
Some assumptions were made when solving this challenge:
- Purchaser and Merchant names must be unique.
- For a given Merchant, 2 items with the same description but with different prices are considered 2 different items.
- Columns in the tab-delimited file will always be in the correct order and will have data in the correct format.
- There will always be a header line with the correct headings in the tab-delimited file.
- Special characters are allowed in the tab-delimited file only for Item objects (see Issues described below).
- There was an issue querying for Items with special characters using the Rails
find_by
andfind_or_create_by
methods with SQLite. As a solution, we are usingItem.where
in conjunction with the SQLLIKE
command. This seems to be an SQLite bug. We tested in PostgreSQL and this issue does not appear there. This issue was found when reading in the same tab-delimited file twice.
- Add more validations for the tab-delimited file. Currently, we are only performing simple validations such as checking for empty fields or checking for non-integers in the 'Purchase Count' field. We also need to decide what we should do if a validation fails on a line other than the first line of the tab-delimited file. In this case, should we ignore all previously validated lines? Or should we allow the validated lines but stop reading the file as soon as we hit an invalid field and ignore all subsequent lines? Decisions, decisions...
- Add handling for Authentication and Authorization.
- Make it aesthetically pleasing.
- Modify code to handle special characters in other columns (other than those columns pertaining to Items) in the tab-delimited file.