Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend API to instrument metrics reported by Bundler #2075

Closed
wants to merge 13 commits into from

Conversation

Akrabut
Copy link
Contributor

@Akrabut Akrabut commented Jul 27, 2019

This work has been done as part of Google Summer of Code 2019
Backend API to instrument the metrics reported by Bundler (Bundler branch is here, Bundler PR is here).

Added functionality:

Upon recieving a valid POST request to /api/metrics, uses StatsD to instrument all low cardinality metrics.
Before instrumenting, checks for garbage data and discards such if found.
The metrics are sent as a JSONified array of two or more ruby hashes.
Due to limitations on the usage of Datadog, the high cardinality metrics are currently still uninstrumented, and we will probably use timescale to instrument them.

The work is based on @sonalkr132 work here, changed to work with Bundler's new, more extensive metric reporting system.

@indirect @sonalkr132

@Akrabut Akrabut changed the title Metrics project Backend API to instrument metrics reported by Bundler Jul 27, 2019
end
end

def validate_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I did say people may send garbage data. Sorry I was not clear.
My intention was that create should not throw exceptions. _json key not being present was an obvious one, for which using params.require(:_json) would be nicer.
Another is known_id?(params[:_json].last[:request_id]). params[:_json].last could return nil and known_id? nil will also blow up. Please add checks for that.
As far as validation of values using regex goes.. I think we should remove it. Its fine if we store some garbage data, it should be rare and ignorable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't evil people just spam our database with garbage and mess it up?
Since its all open source, cant someone just find the domain and send thousands of requests to /api/metrics with, say, bundler_version = "drfgredgedrgreger"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they could but my guess is that it would be rare. The issue with regex check is that we may end up ignoring valid data.
Attackers are generally interested in the app crashes (exceptions) than messing up data.
If you like we can wait for more opinion on this before removing it 🛩️

Copy link
Contributor Author

@Akrabut Akrabut Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values we currently store have low cardinality so I think it would be possible to write the regex so that it doesn't ignore valid data.
For example, I've went over all possible Bundler versions to see that they aren't ignored.

I'll let you decide, but I can research farther into possible values and change the regex's to be more permissive if you wish.

I refactored according to your suggestions with require and known_id?.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use a regex for version numbers, there is a version number regex inside rubygems that it uses for the Gem::Version class. That is what I would suggest using 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use Gem::Version::ANCHORED_VERSION_PATTERN.

…ormat, and metrics_controller was refactored to comply with that. the tests were also refactored to properly test YAMLed metrics
@Akrabut
Copy link
Contributor Author

Akrabut commented Aug 28, 2019

Bundler now sends metrics as YAML instead of JSON due to limitations, so the backend was refactored to accept and handle the metrics in YAML format.

@simi
Copy link
Member

simi commented Mar 5, 2022

Can we close this in favour of https://ecosystem.rubytogether.org/?

@indirect
Copy link
Member

indirect commented Mar 5, 2022

I think the hope was to one day report metrics directly from Bundler rather than infer them indirectly from Fastly logs like we do today for https://stats.rubygems.org, but it does seem like what we have so far is good enough for now. 👍🏻 I'm ok with closing this for now, with the option to pick up reporting metrics directly again in the future.

@indirect indirect closed this Mar 5, 2022
@deivid-rodriguez
Copy link
Member

By the way, what's the status of the recent improvements at https://ecosystem.rubytogether.org/? I'm very excited to see the new numbers! Currently if I click on the "2 weeks" view, I get no data. That time interval seems to match the landing of these improvements.

@indirect
Copy link
Member

indirect commented Mar 8, 2022

We are… troubleshooting 🥲

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants