Hargh allows you to store and query HAR records. This project is to complete the Rigor coding exercise.
You can perform CRUD operations on a HAR file by sending it API requests.
See the docs. For example, you can create
a new HAR record by sending a POST
request to /hars
where the body
is the JSON data of the HAR.
Yes, I know the docs aren't the most beautiful. I was giving a new gem RSpec API Documentation a try. It generates docs from your API specs
The input is validated against what I think is the whole spec, not just some pieces. I wanted to give dry-validation a shot and try to put it through the ringer. It performed very well. Check out har_schema.rb
This is the interesting part. For these, you're going to look at the
/hars/:id/entries
and /hars/:id/entries/aggregations
endpoints.
To answer the questions in the instructions, you might run some
queries like this:
/hars/1/entries?sort=timings_blocked&direction=desc&limit=1
/hars/1/entries?sort=timings_blocked&direction=asc&limit=1
/hars/1/entries?sort=timings_blocked&direction=asc&limit=1&offset=1
/hars/1/entries/aggregations?column_name=response_content_size&operation=average
/hars/1/entries/aggregations?column_name=response_content_size&operation=sum
/hars/1/entries?url=something
There are a number of other queries or mixes of the same params we could do. We could run similar mean or sum queries on the other timing measurements. I think we should also add a max and min aggregation that would help us figure out if there was an outlier that was unfairly skewing the results up or down. I think we should also add an index/search endpoint for the HAR records themselves. We currently cache the number of entries. Perhaps there is a correlation between entry_count and load time. Although, some of the entries may be loaded asynchronously, so it may not matter.
There is a Postgres database backing up this API. However, there are a couple of alternative approaches I can think of right away.
-
Don't parse key datapoints into their own columns. Just use Rails and Postgres support for
jsonb
. This would make it easier to dynamically query for whatever we wanted across a document structure. However, it still feels kind of complicated. The Postgres process of adding and managingjsonb
indexes to support specific types of queries is also not straightforward. -
Use MongoDB and Mongoid. This makes a lot of sense for this use case. We're attempting to store and query a spec that is a JSON document, so let's use a document database. It's not very 'Railsy', but MongoDB aggregations also make a lot of sense for this use case.
You'll notice that there is a general aggregations
endpoint for the
entries rather than something super specific to the questions asked. I
think it makes a lot of sense to have a more generalized endpoint where
you pass in one of a set number of numerically-typed columns that we've
already indexed (column_name
parameter on the aggregations
endpoint)
and then another parameter if we want it summed or meaned (operation
paramter on the aggregations
endpoint). We can easily expand this to
allow for mins, maxes, medians with this pattern.
- Only supports single page HARs
The HAR spec allows for files to include multiple pages.
However, this project currently only supports single page HARs.
We're essentially ignoring the pageRef
key in each entry
for expediency right now.
- Issues with Unicode Sequences
Some web pages will include Unicode sequences. Since we store the raw
data in Postgres jsonb
fields. There is a known issue since 9.4.1
with unicode sequences. A quick Google Search
reveals a lot of information. I haven't tried to implement any of
the workarounds yet
- Performance
It might just be that we are validating and processing lots of data.
Some of these HAR files are quite large. Howver, the create and update
endpoints take half a second to a second to run. The logs show that the
issue is not in the view generation or at
the database layer. It also doesn't look like the schema validation is
causing the time delay. My suspicion for further investigation would
be the use of a jsonb
column for the raw data storage. I wonder if the
Rails side has to do a lot of computation on these large data structures
before save. I'm also using HashWithIndifferentAccess
everywhere. I
wonder if that also has a performance implication.
- Test Quality
The high coverage may fool you. However, some of the tests right now are not very thorough and don't effectively use mocks. Again, this can be remedied with time. However, the current test suite provides a lot of bang for the buck.
- Error Handling
This just isn't fully baked in the interest of time. Bad user input that should return informative errors will probably return 500s.
-
Install the Ruby Version specified in the
.ruby-version
file via RVM or RBenv -
Install PostgreSQL and have it running
-
Clone the Repo and
cd
into it -
Install Ruby dependencies via
bundle install
-
Copy the
.env.example
file to.env
-
Set all values in the
.env
file. -
Setup the database using
bundle exec rails db:setup
-
Run the server using
bundle exec rails server
This project relies on the excellent Overcommit gem that adds Git hooks to run various linters and code checkers. You'll need to run the following command to make sure your code complies.
overcommit --install
If you don't want to bother with a local setup for now, just use the Heroku deploy button below.
If you want to see the current API docs, you can check them out on the deployed Heroku Instance
Alternative if you prefer Postman, you can import the collection of endpoints from here.
There is a little helper script to load HARs from the command line
called load_har
.
Download a HAR and call ./load_har '<path_to_har>''
This project is licensed under the MIT License