Hargh

Overview
Steps
Design Thoughts
Known Limitations
Running Locally
Deployment
API Docs
License

Overview

Hargh allows you to store and query HAR records. This project is to complete the Rigor coding exercise.

Steps

Step 1

You can perform CRUD operations on a HAR file by sending it API requests. See the docs. For example, you can create a new HAR record by sending a POST request to /hars where the body is the JSON data of the HAR.

Yes, I know the docs aren't the most beautiful. I was giving a new gem RSpec API Documentation a try. It generates docs from your API specs

The input is validated against what I think is the whole spec, not just some pieces. I wanted to give dry-validation a shot and try to put it through the ringer. It performed very well. Check out har_schema.rb

Step 2

This is the interesting part. For these, you're going to look at the /hars/:id/entries and /hars/:id/entries/aggregations endpoints. To answer the questions in the instructions, you might run some queries like this:

Longest Blocked Timing

/hars/1/entries?sort=timings_blocked&direction=desc&limit=1

Shortest Blocked Timing

/hars/1/entries?sort=timings_blocked&direction=asc&limit=1

Second Shortest Blocked Timing

/hars/1/entries?sort=timings_blocked&direction=asc&limit=1&offset=1

Average BodySize

/hars/1/entries/aggregations?column_name=response_content_size&operation=average

Total BodySize

/hars/1/entries/aggregations?column_name=response_content_size&operation=sum

All Entries With Some Request URL

/hars/1/entries?url=something

There are a number of other queries or mixes of the same params we could do. We could run similar mean or sum queries on the other timing measurements. I think we should also add a max and min aggregation that would help us figure out if there was an outlier that was unfairly skewing the results up or down. I think we should also add an index/search endpoint for the HAR records themselves. We currently cache the number of entries. Perhaps there is a correlation between entry_count and load time. Although, some of the entries may be loaded asynchronously, so it may not matter.

Design Thoughts

Using Postgres and Building ActiveRecord Models

There is a Postgres database backing up this API. However, there are a couple of alternative approaches I can think of right away.

Don't parse key datapoints into their own columns. Just use Rails and Postgres support for jsonb. This would make it easier to dynamically query for whatever we wanted across a document structure. However, it still feels kind of complicated. The Postgres process of adding and managing jsonb indexes to support specific types of queries is also not straightforward.
Use MongoDB and Mongoid. This makes a lot of sense for this use case. We're attempting to store and query a spec that is a JSON document, so let's use a document database. It's not very 'Railsy', but MongoDB aggregations also make a lot of sense for this use case.

Query Aggregates

You'll notice that there is a general aggregations endpoint for the entries rather than something super specific to the questions asked. I think it makes a lot of sense to have a more generalized endpoint where you pass in one of a set number of numerically-typed columns that we've already indexed (column_name parameter on the aggregations endpoint) and then another parameter if we want it summed or meaned (operation paramter on the aggregations endpoint). We can easily expand this to allow for mins, maxes, medians with this pattern.

Known Limitations

Only supports single page HARs

The HAR spec allows for files to include multiple pages. However, this project currently only supports single page HARs. We're essentially ignoring the pageRef key in each entry for expediency right now.

Issues with Unicode Sequences

Some web pages will include Unicode sequences. Since we store the raw data in Postgres jsonb fields. There is a known issue since 9.4.1 with unicode sequences. A quick Google Search reveals a lot of information. I haven't tried to implement any of the workarounds yet

Performance

It might just be that we are validating and processing lots of data. Some of these HAR files are quite large. Howver, the create and update endpoints take half a second to a second to run. The logs show that the issue is not in the view generation or at the database layer. It also doesn't look like the schema validation is causing the time delay. My suspicion for further investigation would be the use of a jsonb column for the raw data storage. I wonder if the Rails side has to do a lot of computation on these large data structures before save. I'm also using HashWithIndifferentAccess everywhere. I wonder if that also has a performance implication.

Test Quality

The high coverage may fool you. However, some of the tests right now are not very thorough and don't effectively use mocks. Again, this can be remedied with time. However, the current test suite provides a lot of bang for the buck.

Error Handling

This just isn't fully baked in the interest of time. Bad user input that should return informative errors will probably return 500s.

Running Locally

Install the Ruby Version specified in the .ruby-version file via RVM or RBenv
Install PostgreSQL and have it running
Clone the Repo and cd into it
Install Ruby dependencies via bundle install
Copy the .env.example file to .env
Set all values in the .env file.
Setup the database using bundle exec rails db:setup
Run the server using bundle exec rails server

Developing Locally

This project relies on the excellent Overcommit gem that adds Git hooks to run various linters and code checkers. You'll need to run the following command to make sure your code complies.

overcommit --install

Deployment

If you don't want to bother with a local setup for now, just use the Heroku deploy button below.

API Docs

If you want to see the current API docs, you can check them out on the deployed Heroku Instance

Alternative if you prefer Postman, you can import the collection of endpoints from here.

Loading HAR files

There is a little helper script to load HARs from the command line called load_har.

Download a HAR and call ./load_har '<path_to_har>''

License

This project is licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
readme_images		readme_images
sample_files		sample_files
spec		spec
tmp		tmp
vendor		vendor
.codeclimate.yml		.codeclimate.yml
.env.example		.env.example
.gitignore		.gitignore
.mdl_style.rb		.mdl_style.rb
.mdlrc		.mdlrc
.overcommit.yml		.overcommit.yml
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
.travis.yml		.travis.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Hargh.postman_collection.json		Hargh.postman_collection.json
LICENSE.md		LICENSE.md
Procfile		Procfile
README.md		README.md
Rakefile		Rakefile
app.json		app.json
config.ru		config.ru
load_har		load_har

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hargh

Overview

Steps

Step 1

Step 2

Longest Blocked Timing

Shortest Blocked Timing

Second Shortest Blocked Timing

Average BodySize

Total BodySize

All Entries With Some Request URL

Design Thoughts

Using Postgres and Building ActiveRecord Models

Query Aggregates

Known Limitations

Running Locally

Developing Locally

Deployment

API Docs

Loading HAR files

License

About

Releases

Packages

Languages

License

rvirani1/hargh

Folders and files

Latest commit

History

Repository files navigation

Hargh

Overview

Steps

Step 1

Step 2

Longest Blocked Timing

Shortest Blocked Timing

Second Shortest Blocked Timing

Average BodySize

Total BodySize

All Entries With Some Request URL

Design Thoughts

Using Postgres and Building ActiveRecord Models

Query Aggregates

Known Limitations

Running Locally

Developing Locally

Deployment

API Docs

Loading HAR files

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages