Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question / use case #1322

Closed
joeblew99 opened this Issue Jan 16, 2016 · 17 comments

Comments

Projects
None yet
4 participants
@joeblew99
Copy link

joeblew99 commented Jan 16, 2016

I have lots of devices that are on NAT networks, and mostly behind VPN.
As I understand it Prometheus is designed around a pull model, which means the server will not be able to reach my devices.

I really want to use Prometheus because its proven itself many times to be awesome, compared to other solutions s, but I am stuck due to the pull model.

Does anyone have any suggestions ?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 16, 2016

Firstly, we recommend running a Prometheus on each network so that you're not crossing failure domains.

Then you'd (optionally) have a Prometheus running that'd pull aggregated stats from those. You could use a number of ways to get that through your network, including VPN, ssl and proxies.

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 17, 2016

Ah that makes sense. Good idea.

I think weave would be excellent match to push the remote Prometheus data back up to the data centre.

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 17, 2016

Weave mates with docker and now non docker networks as a mesh .

@joeblew99 joeblew99 closed this Jan 17, 2016

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 17, 2016

I just read also that i can use the https://github.com/prometheus/pushgateway to send from the NAT network into the data center. Sounds like this also fits my use case ?

@joeblew99 joeblew99 reopened this Jan 17, 2016

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 17, 2016

It's not advised to use the PushGateway in that manner, it's meant for batch jobs. The main issue you'd run into would be metrics hanging around for things that no longer exist.

@zeroware

This comment has been minimized.

Copy link

zeroware commented Jan 20, 2016

We used OpenVPN to pull metrics from network behind NAT / Firewall.
And you get encryption for free.

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 21, 2016

Hey all - thanks for the though advice.

For sure openvpn will do the job. I just can use one because all the sites are wrapped in one already, that we don't control. This is an IOT style environment.

I am thinking of trying the weave golang code. It creates mesh networks in data centres and between them. I need to try it I guess.

Regarding the batch aspect of the push gateway. Ok good to know.
The sites are not always connected to the internet. Generally they are, but they definitly are not super redundant. I guess this is really the perfect reason to use batch then , despite the downsides you mention ?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 21, 2016

That would be the perfect reason to have a Prometheus running at each site, so that you still have monitoring when the network is down. The pushgateway will not let you workaround network outages.

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 21, 2016

Thanks ! But the push gateway will pick up where it off, once the network
comes back up right :) ? Its just that getting every message is pretty
vital for the use case.

Also I had some nasty experiences with message ordering on another system ,
which will remain nameless. Basically it would not deliver the messages in
the order it was created in terms if time. That really made nasty
experience in the data Center with aggregates and other calculations being
done over holes in the data. Could you comment on this aspect in terms of
Prometheus and the gateway ?

On Thu, 21 Jan 2016, 19:33 Brian Brazil notifications@github.com wrote:

That would be the perfect reason to have a Prometheus running at each
site, so that you still have monitoring when the network is down. The
pushgateway will not let you workaround network outages.


Reply to this email directly or view it on GitHub
#1322 (comment)
.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 21, 2016

But the push gateway will pick up where it off, once the network comes back up right

The pushgateway only remember the most recent value for a metric. Depending on how you've done things, this may or not be okay.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jan 21, 2016

On Jan 21, 2016 7:42 PM, "jow blew" notifications@github.com wrote:

Thanks ! But the push gateway will pick up where it off, once the network
comes back up right :) ? Its just that getting every message is pretty
vital for the use case.

Two misconceptions here, I think.

The pushgateway does not batch metrics or samples, it only holds the
current value. It is many to be used from batch jobs. Imagine a
background/cron job that processes items. it increments a counter in the
gateway, say items_processed_total. A Prometheus scrapes it on its own
cycle, this easy you always know how many cycles were processed in a given
time frame.

The other is messages. Prometheus does not deal in these at all. Instead,
it tracks the counter. If it misses a scrape or two, the counter will still
increment, whether in Pushgateway or a long running application directly.
This way, you still know how many items were processed, even if you lose a
little fidelity when exactly.

Prometheus if not suited for tracking one-off events.

/MR

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 21, 2016

Ok I am getting that it's basically upping counters and it pushes the counter, not the raw data ??

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jan 21, 2016

The push is from your application to the gateway. Prometheus still only
pulls from that.

There is no "raw data" in this model, only sampled monotonically increasing
counters.

/MR

On Thu, Jan 21, 2016 at 7:45 PM, jow blew notifications@github.com wrote:

Ok I am getting that it's basically upping counters and it pushes the
counter, not the raw data ??


Reply to this email directly or view it on GitHub
#1322 (comment)
.

Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173
6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company
No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 21, 2016

Got it ta.

So the reason I need both is this.

Basically the raw data is very high frequency.
So I want to do aggravation on the remote servers, and I think we have established that it does that.

But its also very nice to be able to deliver faw data to the data Center and do aggregation there too. Maybe a separate "channel" can be used for that ?

The reason is to do with energy and network bandwidth. Some sites you have lots of energy and crappy bandwidth, and so counters are perfect.
Other sites you have low energy but lots if bandwidth and so raw data is good there.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jan 21, 2016

You can configure the scrape frequency on a job-by-job basis. Federation is
a special case of scraping.

On Thu, Jan 21, 2016 at 7:54 PM, jow blew notifications@github.com wrote:

Got it ta.

So the reason I need both is this.

Basically the raw data is very high frequency.
So I want to do aggravation on the remote servers, and I think we have
established that it does that.

But its also very nice to be able to deliver faw data to the data Center
and do aggregation there too. Maybe a separate "channel" can be used for
that ?

The reason is to do with energy and network bandwidth. Some sites you have
lots of energy and crappy bandwidth, and so counters are perfect.
Other sites you have low energy but lots if bandwidth and so raw data is
good there.


Reply to this email directly or view it on GitHub
#1322 (comment)
.

Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173
6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company
No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B

@joeblew99

This comment has been minimized.

Copy link
Author

joeblew99 commented Jan 21, 2016

Thanks again. I will do off and find more info on federation patterns.

@joeblew99 joeblew99 closed this Jan 21, 2016

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.