-
-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend API to instrument metrics reported by Bundler #2075
Conversation
end | ||
end | ||
|
||
def validate_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I did say people may send garbage data. Sorry I was not clear.
My intention was that create
should not throw exceptions. _json
key not being present was an obvious one, for which using params.require(:_json)
would be nicer.
Another is known_id?(params[:_json].last[:request_id])
. params[:_json].last
could return nil and known_id? nil
will also blow up. Please add checks for that.
As far as validation of values using regex goes.. I think we should remove it. Its fine if we store some garbage data, it should be rare and ignorable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't evil people just spam our database with garbage and mess it up?
Since its all open source, cant someone just find the domain and send thousands of requests to /api/metrics
with, say, bundler_version = "drfgredgedrgreger"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they could but my guess is that it would be rare. The issue with regex check is that we may end up ignoring valid data.
Attackers are generally interested in the app crashes (exceptions) than messing up data.
If you like we can wait for more opinion on this before removing it 🛩️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values we currently store have low cardinality so I think it would be possible to write the regex so that it doesn't ignore valid data.
For example, I've went over all possible Bundler versions to see that they aren't ignored.
I'll let you decide, but I can research farther into possible values and change the regex's to be more permissive if you wish.
I refactored according to your suggestions with require
and known_id?
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to use a regex for version numbers, there is a version number regex inside rubygems
that it uses for the Gem::Version class. That is what I would suggest using 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to use Gem::Version::ANCHORED_VERSION_PATTERN
.
…ed JSON, and validating that the same metrics don't increment the counter twice
…for them, and tests were added to ensure it
…ation, and add tests with garbage data
fdfc484
to
eea28be
Compare
…ormat, and metrics_controller was refactored to comply with that. the tests were also refactored to properly test YAMLed metrics
Bundler now sends metrics as YAML instead of JSON due to limitations, so the backend was refactored to accept and handle the metrics in YAML format. |
Can we close this in favour of https://ecosystem.rubytogether.org/? |
I think the hope was to one day report metrics directly from Bundler rather than infer them indirectly from Fastly logs like we do today for https://stats.rubygems.org, but it does seem like what we have so far is good enough for now. 👍🏻 I'm ok with closing this for now, with the option to pick up reporting metrics directly again in the future. |
By the way, what's the status of the recent improvements at https://ecosystem.rubytogether.org/? I'm very excited to see the new numbers! Currently if I click on the "2 weeks" view, I get no data. That time interval seems to match the landing of these improvements. |
We are… troubleshooting 🥲 |
This work has been done as part of Google Summer of Code 2019
Backend API to instrument the metrics reported by Bundler (Bundler branch is here, Bundler PR is here).
Added functionality:
Upon recieving a valid POST request to /api/metrics, uses StatsD to instrument all low cardinality metrics.
Before instrumenting, checks for garbage data and discards such if found.
The metrics are sent as a JSONified array of two or more ruby hashes.
Due to limitations on the usage of Datadog, the high cardinality metrics are currently still uninstrumented, and we will probably use timescale to instrument them.
The work is based on @sonalkr132 work here, changed to work with Bundler's new, more extensive metric reporting system.
@indirect @sonalkr132