Skip to content
This repository has been archived by the owner on Mar 25, 2022. It is now read-only.

Monitoring #44

Merged
16 commits merged into from
Jul 15, 2015
Merged

Monitoring #44

16 commits merged into from
Jul 15, 2015

Conversation

ghost
Copy link

@ghost ghost commented Jul 12, 2015

This turned out a bit big and monolithic. The changes are contained in individual commits though, with proper description.

Monitoring is now available at http://metrics.i.ipfs.io, once you've peered with cjdns and added yourself to the whitelist of allowed clients. More information about this comes with the updated README.md, and https://github.com/protocol/infrastructure/pull/73

screenshot

Lars Gierth added 12 commits July 12, 2015 23:25
The value determines how many vhosts the nginx process
can have, so it's important it's not too small.

By default, nginx sets this to a value that depends on
the CPU's cache line width. We don't want to be too sure
that all solarnet machines have the same CPU, so we make
the value the same everywhere.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
The role is to make metrics accessible, and
Prometheus is just an implementation detail.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
This is one more step in decoupling nginx from the IPFS gateway.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
... in order to make space at the top-level for Grafana.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
Changes to the Dockerfile often require a rebuild of the image.
We can't delete the image to trigger a rebuild, because we don't
wanna kill off the Prometheus daemon.

We introduce a ref-file, similar to how cjdns upgrades are handled.
The ref-file contains the currently running git ref (commit, branch,
or tag). We build unless the file exists and contains the expected ref.

In order to trigger a rebuild, simply remove the ref-file:

    ssh root@host rm /opt/prometheus.ref

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
Prometheus' node_exporter is a little daemon which exposes
metrics for things like CPU, RAM, network, filesystems.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
Allows anyone with a password to peer with the Hyperboria
network, via the solarnet hosts.

Peering is needed in order to access monitoring.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
We don't want the playbook to break everytime someone fat-fingers
file changes into the build directory.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
This bug is really silly: the copy step grabs the file from
the *local* disk, not the actual host's disk. That worked
purely out of incident.

It obviously never installed the correct version of cjdns.

We now install the correct cjdroute, as well as the correct
upstart script. We also restart cjdroute in case the upstart
script changed.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
This latest version introduces a default password for the
admin API. Additional configuration for the tools is no longer
neccessary.

This default password is "NONE" by convention, and the tools
are assume this.

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
@ghost ghost added the solarnet label Jul 12, 2015
Communications:
- Github repos
- IRC Channel
- Mailing Lists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to keep (and expand) reference to these things here. the IPFS infrastructure is more than just the programs we run. IRC, mailing list, github, etc are all important to account.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea here was to avoid duplication and keep that in ipfs/community.git, so it'd be about "using the infrastructure", while ipfs/infrastructure.git would be about "running the infrastructure".

But maybe this distinction doesn't make sense? We should maybe look into merging the two then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be about "using the infrastructure", while ipfs/infrastructure.git would be about "running the infrastructure".

Yeah this is the distinction i was going for too. i'm +1 to moving the listing over for users to consume. still want one listing somewhere that whoever will be taking care of infra can know where everything is + make sure it's running / accounted for. (e.g. i dont expect freenode or google groups or https://botbot.me/freenode/ipfs/ to fail at all, but should still be accounted for somewhere in case someone needs to replace a piece).

@jbenet
Copy link
Member

jbenet commented Jul 13, 2015

Aside from the simple comments above, this LGTM.

@jbenet
Copy link
Member

jbenet commented Jul 13, 2015

@lgierth i think re: #44 (comment) -- maybe having just one file in /infrastructure that tells someone all the pieces they have to worry about (if we all disappear) would be useful. (/community can be the user-friendly UX to it, which may not list everything, or whatever)

@ghost
Copy link
Author

ghost commented Jul 13, 2015

@lgierth i think re: #44 (comment) -- maybe having just one file in /infrastructure that tells someone all the pieces they have to worry about (if we all disappear) would be useful. (/community can be the user-friendly UX to it, which may not list everything, or whatever)

SGTM, that makes a lot of sense -- this is still infrastructure that needs to be "run" in a wider sense.

@ghost ghost force-pushed the monitoring branch 2 times, most recently from 7674b95 to a9cef2c Compare July 14, 2015 00:28
Lars Gierth added 4 commits July 15, 2015 02:53
License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
1. indentation matters
2. last variable definition wins
3. name the password list item

License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
License: MIT
Signed-off-by: Lars Gierth <larsg@systemli.org>
ghost pushed a commit that referenced this pull request Jul 15, 2015
@ghost ghost merged commit 61222af into master Jul 15, 2015
@ghost ghost deleted the monitoring branch July 15, 2015 23:38
@jbenet jbenet mentioned this pull request Jul 16, 2015
58 tasks
@ghost ghost mentioned this pull request Aug 11, 2015
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant