mirrors stats script#169
Conversation
|
|
||
| '/usr/local/sbin/record-mirrors-stats': | ||
| content => template('ocf_mirrors/record-mirrors-stats.py.erb'), | ||
| mode => '0644'; |
There was a problem hiding this comment.
- I don't think this should be world-readable to be consistent with our use of secrets elsewhere.
- Shouldn't this be executable?
There was a problem hiding this comment.
You should probably store config separately as @chriskuehl says and as we do for all of our other programs.
|
|
||
| '/usr/local/sbin/record-mirrors-stats': | ||
| content => template('ocf_mirrors/record-mirrors-stats.py.erb'), | ||
| mode => '0644'; |
There was a problem hiding this comment.
if it's in sbin it should be executable. additionally, if it has secrets, it shouldn't be world-readable and should have show_diff => false
ideally don't template a python file though. store the config separately.
There was a problem hiding this comment.
yeah, my mistake
I modeled this after how it's done in the labstats script, I guess I'll fix this and then fix that as well
There was a problem hiding this comment.
it now uses environment variables, seeded by cron. The password is stored in the root crontab so it should be secure enough + no need to add another config file. Does that seem reasonable?
| @@ -0,0 +1,54 @@ | |||
| #!/usr/bin/python | |||
|
|
||
| import os | ||
| import pymysql | ||
| from datetime import date |
There was a problem hiding this comment.
group imports into stdlib / 3rd party / application per PEP8: https://www.python.org/dev/peps/pep-0008/#imports
pre-commit should enforce this for you?
There was a problem hiding this comment.
pre-commit didn't say anything about this, I'll re-install it I guess
| sources = [mirrored for mirrored in os.listdir(MIRRORS_PATH) | ||
| if os.path.isdir(os.path.join(MIRRORS_PATH, mirrored)) and not mirrored.startswith('.')] | ||
|
|
||
| sources.append('other') # catchall |
There was a problem hiding this comment.
2 spaces before inline comments per PEP8: https://www.python.org/dev/peps/pep-0008/#inline-comments
pre-commit isn't enforcing this?
| def build_dists(): | ||
| return { mirrored: { 'up': 0, 'down': 0 } for mirrored in sources } | ||
|
|
||
| def process_file(fn): |
There was a problem hiding this comment.
2 blank lines before/after functions per PEP8 (going to stop commenting these, pre-commit should do this for you)
| cursorclass=pymysql.cursors.DictCursor, | ||
| ) | ||
|
|
||
| c = conn.cursor() |
There was a problem hiding this comment.
use a context manager
with conn as cursor:
...| for line in f: | ||
| stats = line.split() | ||
| dist = stats[6] | ||
| dist = dist.split('/')[1] if '/' in dist else dist |
There was a problem hiding this comment.
can you include a comment with an example line so people reading the script can see what is being parsed?
|
|
||
| cron { | ||
| 'mirrors-stats': | ||
| command => '/usr/local/sbin/record-mirrors-stats', |
There was a problem hiding this comment.
don't know how this works if it's not executable?
There was a problem hiding this comment.
seems to work fine for the last two weeks I let it run, but yeah I need to make it +x
There was a problem hiding this comment.
Maybe cron runs the crunchbang without checking if it's +x? Sounds fishy to me.
| dist = stats[6] | ||
| dist = dist.split('/')[1] if '/' in dist else dist | ||
|
|
||
| if stats[8][0] in ('2', '3') and dist in dists: # http status code is 2xx/3xx |
There was a problem hiding this comment.
Comment this as well as the above lines, please.
EDIT: I'm referring to the stats[8][0] part of course.
| @@ -0,0 +1,63 @@ | |||
| #!/usr/bin/python | |||
There was a problem hiding this comment.
python files should either drop the .py if they're not intended to be imported (and only executed), or use underscores so they can be imported
our tradition is typically to drop the extension
| OCFSTATS_PWD = os.environ['OCFSTATS_PWD'] | ||
| MIRRORS_PATH = '/opt/mirrors/ftp' | ||
| LOG_PATH = '/var/log/apache2' | ||
| LOG_NAME = 'mirrors.ocf.berkeley.edu_access.log' |
There was a problem hiding this comment.
I think I forgot to mention this before, but the reason why I was using mirrors.ocf.berkeley.edu_access.log.1 instead of this was because the logs get rotated at around 6 AM, so if you check them once a day, unless you check very close to when they are rotated, you lose some data. For example, this is currently checked at midnight, so all the data from midnight to around 6 AM is currently lost since it is rotated before the next midnight.
There was a problem hiding this comment.
Another option I just thought of is to scan both log files and check if the date is within the current day, that might be the best if you truly want the numbers to be representative of the actual day boundaries instead of 6 AM - 6 AM the next day. (honestly I don't think it matters, but it wouldn't be too bad to scan both log files either)
|
I changed the cron.daily timing to be at 12:05am for precisely this reason.
Was trying to figure out how to puppet that for logrotate only.
On Jun 27, 2017 9:17 PM, "Jason Perrin" <notifications@github.com> wrote:
*@jvperrin* commented on this pull request.
------------------------------
In modules/ocf_mirrors/files/process-mirrors-logs
<#169 (comment)>:
@@ -0,0 +1,82 @@
+#!/usr/bin/python
+import argparse
+import os
+from datetime import date
+
+import pymysql
+
+OCFSTATS_PWD = os.environ['OCFSTATS_PWD']
+MIRRORS_PATH = '/opt/mirrors/ftp'
+LOG_PATH = '/var/log/apache2'
+LOG_NAME = 'mirrors.ocf.berkeley.edu_access.log'
I think I forgot to mention this before, but the reason why I was using
mirrors.ocf.berkeley.edu_access.log.1 instead of this was because the logs
get rotated at around 6 AM, so if you check them once a day, unless you
check very close to when they are rotated, you lose some data. For example,
this is currently checked at midnight, so all the data from midnight to
around 6 AM is currently lost since it is rotated before the next midnight.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#169 (review)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB0_lijtg8NWpmarbiqLEFFpJSwLbLNbks5sIdPTgaJpZM4N9W6j>
.
|
|
|
||
| # extract dist name from request url | ||
| # '/debian/pool/main/h/hwdata/...' -> 'debian' | ||
| dist = stats[6] |
There was a problem hiding this comment.
I wrote this script as a bit of a hack to quickly see what traffic levels we had across dists, but ideally there would be a comment here explaining why these indices were chosen with maybe an example line:
The format comes from here:
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O
which translates to something like
8.8.8.8 - - [26/Jun/2017:00:05:13 -0700] "GET /centos/7.3.1611/path/here/blah HTTP/1.1" 200 3735 "-" "urlgrabber/3.10 yum/3.4.3" 147 3971
so that's why the 6th index is needed (%r corresponds to the path because there is an extra space in the timestamp), and the 8th index below gives the status code so that can be filtered for only successes. -1 (%O) and -2 (%I) below give the bytes downloaded and uploaded as seen by Apache respectively.
|
Meh, I think that's a bit much and would be surprising behavior honestly that one system has a different |
|
+1 for jvperrin's approach |
add mirrors stats script and cronjob thank based @jvperrin
* fix(mirrors): reload nginx when certs change Previously, nginx was not picking up renewed certs. * refactor(mirrors): move+use variable for ocfstats password Some history: the variable was introduced in #169 when stat tracking was added, but #925 changed the implementation and didn't use the variable any more (probably because it was so far away from where it was used). This commit moves it next to where it is used now.
* fix(mirrors): reload nginx when certs change Previously, nginx was not picking up renewed certs. * refactor(mirrors): move+use variable for ocfstats password Some history: the variable was introduced in #169 when stat tracking was added, but #925 changed the implementation and didn't use the variable any more (probably because it was so far away from where it was used). This commit moves it next to where it is used now.
No description provided.