Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor v1 HTTP to use middlewares and http.HandlerFuncs #325

Merged
merged 30 commits into from Apr 21, 2020

Conversation

metalmatze
Copy link
Contributor

Now that Telemeter doesn't have the hashring anymore, we can go ahead and get rid of the store.Store interface, allowing us to move to simpler architecture.
With this PR we're moving to a pkg/server that contains multiple HTTP middlewares and http.HandlerFuncs.

/cc @squat @kakkoyun @krasi-georgiev @bwplotka @brancz

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 3, 2020
@@ -325,7 +322,12 @@ func (o *Options) Run() error {
})
}
{
external := http.NewServeMux()
external := chi.NewRouter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: above we call the router r and the mux internal and here we are following a different pattern. let's keep it consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the above still needs the internal to begin with. We need to refactor telemeter_http.DebugRoutes etc in another PR and then can have consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I simply mean we should call the router r in both cases

@@ -403,17 +405,12 @@ func (o *Options) Run() error {
}

auth := jwt.NewAuthorizeClusterHandler(o.Logger, o.PartitionKey, o.TokenExpireSeconds, signer, o.RequiredLabels, clusterAuth)
validator := validate.New(o.PartitionKey, o.LimitBytes, 24*time.Hour, time.Now)
//validator := validate.New(o.PartitionKey, o.LimitBytes, 24*time.Hour, time.Now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we intentionally ignoring this? do we just want to delete the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out during refactoring for reference. The validate.New func doesn't even exist anymore. Will remove.

}
}
return func(w http.ResponseWriter, r *http.Request) {
partitionKey, ok := PartitionFromContext(r.Context())
Copy link
Contributor

@squat squat Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename this partitionKey thing. It's a relic from the partitioned metrics type but it's a weird name IMO. I think something like ID, key, tenant would be much more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same thought during refactoring. I would have gone for clusterID. That's what it's ultimately is, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly

}

if len(timeseries) == 0 {
level.Info(s.logger).Log("msg", "no time series to forward to receive endpoint")
return nil
level.Info(logger).Log("msg", "no time series to forward to receive endpoint")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also respond with an HTTP error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a transient error before. We simply ignored those requests and returned 200. I don't really care, honestly.

"golang.org/x/time/rate"
)

// ErrWriteLimitReached is an error that is returned when a cluster has send too many requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/send/sent/

}
defer r.Body.Close()

payload, _ := snappy.Decode(nil, body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we handle this error just in case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like I've forgotten 🙃

reader = ioutil.NopCloser(bytes.NewBuffer(payload))
}

r.Body = reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put this line and line 14 inside the if block, no? otherwise it's funny following why we are changing the body when the encoding is not snappy

labels := map[string]string{"cluster": "test"}
validator := validate.New("cluster", 0, 0, time.Now)
//validator := validate.New("cluster", 0, 0, time.Now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simply delete this line?

Copy link
Contributor

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall @metalmatze good work so far!

@squat
Copy link
Contributor

squat commented Apr 17, 2020

/retest

}

if resp.StatusCode/100 != 2 {
return fmt.Errorf("response status code is %s", resp.Status)
// surfacing upstreams error to our users too
http.Error(w, fmt.Errorf("response status code is %s", resp.Status).Error(), resp.StatusCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct IMO. If telemeter-server gets a 400 from the upstream then it is not the user's fault that there is an auth error, but rather an internal server error. The user should not receive a 400 for an improperly configured server. Instead the user should get a 500.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That makes sense. Keep in mind that this is the request send from Telemeter to Thanos Receivers. I guess it still holds true then too? (There's not auth in this step)

pkg/server/forward.go Outdated Show resolved Hide resolved
@@ -36,7 +36,8 @@ func init() {
prometheus.MustRegister(requestDuration, requestSize, requestsTotal)
}

func NewInstrumentedHandler(handlerName string, next http.Handler) http.Handler {
// InstrumentedHandler is a HTTP middleware that monitors HTTP requests and responses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... is an HTTP ...

type ErrWriteLimitReached string

func (e ErrWriteLimitReached) Error() string {
return fmt.Sprintf("write limit reached for key %q", string(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should rename key to id or cluster now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write limit reached for cluster with id? 😊

t.Errorf("expected %d and got %d", http.StatusOK, resp.StatusCode)
}
}
//{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to re-enable this test or should we remove it simply?

return p, ok
}

type clusterIDCtxType int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put these definitions at the top of the file to follow convention

transforms.With(metricfamily.NewRequiredLabels(client.Labels))
transforms.With(metricfamily.TransformerFunc(metricfamily.DropEmptyFamilies))

if limitBytes > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is funny, shouldn't this come before the ioutil.ReadAll on line 66?

transforms.With(metricfamily.TransformerFunc(metricfamily.DropEmptyFamilies))

if limitBytes > 0 {
r.Body = reader.NewLimitReadCloser(r.Body, limitBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what effect does this have? it seems like the entire body was already read into a new buffer called body on line 66

fakeAuthorizeHandler(
Validate(metricfamily.MultiTransformer{}, time.Hour, 512*1024, time.Now,
func(w http.ResponseWriter, r *http.Request) {
// TODO: Make the check proper to changing timestamps?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delete or re-enable this?

@metalmatze
Copy link
Contributor Author

/retest

@squat
Copy link
Contributor

squat commented Apr 21, 2020

/retest

Copy link
Contributor

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: metalmatze, squat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@brancz
Copy link
Contributor

brancz commented Apr 21, 2020

This is entirely code that is not shipped in OpenShift, but needed for the telemetry server infrastructure. All CI green. Merging.

@brancz brancz merged commit 958dc61 into openshift:master Apr 21, 2020
@metalmatze metalmatze deleted the http-refactorings branch April 21, 2020 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants