-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record installation date and time in DB #7334
Record installation date and time in DB #7334
Conversation
f6871cb
to
526055c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never ran into the situation where I wanted to do this but it seems like a useful feature. I find explicit argument lists easier to understand than opaque **kwargs, but no strong feelings there. But the Exception that is raised in case of a ill formatted input str should really state which argument is malformed
lib/spack/spack/database.py
Outdated
@@ -579,23 +626,33 @@ def _read(self): | |||
self._write(None, None, None) | |||
self.reindex(spack.store.layout) | |||
|
|||
def _add(self, spec, directory_layout=None, explicit=False): | |||
def _add(self, spec, directory_layout=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not keep the arguments explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is a bit different, how do I know why kwargs are accepted/expected? And documentation can will become outdated...
I agree with you that writing out arguments makes a call easier to read, but I also know that I always have to google the damn kwargs of any matplotlib call... The former could be enforced by convention (which will probably not hold in the long run either). So I see points for both sides -> no strong feelings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case _add
is by convention a private (if you pass the term in python) method. Usually I regard those methods as implementation details and don't document them using the same care as I have for user facing APIs.
If you think it should be documented with the same level of details as query
that won't be a problem - just let me know. In any case users are not supposed to be calling _add
, and the function itself won't appear in the docs online.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that _ means private ;) and agreed private functions shouldn't need to be documented in detail. However that means there is no longer any information in the function signature or documentation which kwargs are accepted. So if I were to use _add I would have no idea which kwargs are expected or accepted.
I don't have a good solution, either you could:
- document _add
- revert back to explicit variables and expect users to call it by name
If I were to use this in half a year I'd at least like to have the knowledge of what the parameters are, so that I can grep for them in the source code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this one be explicit=False
again too?
lib/spack/spack/database.py
Outdated
@@ -766,45 +828,64 @@ def activated_extensions_for(self, extendee_spec, extensions_layout=None): | |||
continue | |||
# TODO: conditional way to do this instead of catching exceptions | |||
|
|||
def query(self, query_spec=any, known=any, installed=True, explicit=any): | |||
"""Run a query on the database. | |||
def query(self, query_spec=any, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, why not leave the arguments explicitly? Yes it makes the signature bulkier, but it's much easier to understand the function if you don't have to constantly ask yourself what else is contained ink kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it forces users, in both python 2 and 3, to pass optional arguments by keyword (which makes it more readable at call site). If you look at this:
db.query(spec, True, False, False)
are you sure it's easier to understand than:
db.query(spec, known=True, installed=False, explicit=False)
? The longer signature is another reason, but in my opinion of secondary importance with respect to the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with you that the second one is easier to read, I was thinking about writing another db.query
statement, but then the kwargs are documented, so it is fine, I'll probably find what I need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually agree with @healther -- let's not change this everywhere just to force users to do this. I don't think most calls to these functions use positional arguments anyway, and we can either add a style check for this, or catch it in review.
lib/spack/spack/database.py
Outdated
start_date = kwargs.pop('start_date', datetime.datetime.min) | ||
end_date = kwargs.pop('end_date', datetime.datetime.max) | ||
|
||
# Raise an exception if we have still entries in kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to prevent that typos pass through kwargs.
lib/spack/spack/cmd/find.py
Outdated
for attribute in ('start_date', 'end_date'): | ||
date = getattr(args, attribute) | ||
if date: | ||
q_args[attribute] = spack.database.str2datetime(date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should add a catch if data is ill formatted that should raise an Exception telling which date was ill formed and how it should have been formatted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh? Is there something wrong with the current behavior?
$ spack find --start-date=456
==> Error: time data '456' does not match format '%Y-%m-%d %H:%M:%S'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm just still burned from the SyntaxError in package.py
... At the moment it is only used on the command line, where the user should immediately see what happened, I worry about a future use in less user-obvious situations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the comment above? I don't get the request there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you have a syntax error (I'm not too sure anymore of which kind it has to be) in either one of the configuration files or any package.py
in the repo, there were at least cases where spack wouldn't react to any spack install
or spack spec
like command except printing out something to the effect of SyntaxError in 'package.py'
, I'd have wanted there to be a feedback in the sense of SyntaxError in 'path-to-package.py'
.
In the same spirit: I'd like to get feedback on which argument is broken. Not how the broken context looks like. If it's only used from the command line it probably won't matter much but I spend more time with finding out which package.py
was broken than was good for my sanity :D
526055c
to
67b5ffc
Compare
@scheibelp @tgamblin Are you ok with merging this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some stye and date format (for storage and input) requests.
lib/spack/spack/database.py
Outdated
@@ -75,6 +76,27 @@ | |||
# Types of dependencies tracked by the database | |||
_tracked_deps = ('link', 'run') | |||
|
|||
#: Time format used in database entries | |||
_time_format = '%Y-%m-%d %H:%M:%S' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database entries should just use precise timestamps (since the epoch), and formats should be used for display only.
spec (Spec): spec tracked by the install record | ||
path (str): path where the spec has been installed | ||
installed (bool): whether or not the spec is currently installed | ||
ref_count (int): number of specs that depend on this one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually agree with @healther here. Don't use **kwargs
for everything; use explicit arguments. I realize that allows them to be positional, but we can enforce that at the callsite instead of in the function. We could even add a style check for it. I think the benefits of good documentation and method signatures outweigh the use of **kwargs
, and I had started switching away from **kwargs
where possible.
In google pydoc strings, the way to document an optional parameter is ref_count (int, optional): ...
... see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use **kwargs for everything; use explicit arguments.
I'll take this request as proper style in Spack, and will implement it. But if we are talking about idiomatic python, I think the subject is at least controversial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it’s controversial. I like it because it makes the prototypes more readable and easier to document. It’s also closer to the way you’d write it in Python 3, where you can get the best of both worlds and force kwargs. I’d argue that doing things more like 3 is the better way to go... maybe one day we can drop 2 support altogether... one day far away 😭.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for dropping py2 👼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will probably be hpcpythonclock.org or craypythonclock.org or something long after pythonclock.org has counted down 😭
lib/spack/spack/database.py
Outdated
@@ -766,45 +828,64 @@ def activated_extensions_for(self, extendee_spec, extensions_layout=None): | |||
continue | |||
# TODO: conditional way to do this instead of catching exceptions | |||
|
|||
def query(self, query_spec=any, known=any, installed=True, explicit=any): | |||
"""Run a query on the database. | |||
def query(self, query_spec=any, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually agree with @healther -- let's not change this everywhere just to force users to do this. I don't think most calls to these functions use positional arguments anyway, and we can either add a style check for this, or catch it in review.
lib/spack/spack/cmd/find.py
Outdated
@@ -96,6 +97,14 @@ def setup_parser(subparser): | |||
action='store_true', | |||
help='show fully qualified package names') | |||
|
|||
subparser.add_argument( | |||
'--start-date', | |||
help='earliest date of installation [YYYY-MM-DD HH:MM:SS]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the format here is too cumbersome. Most people don't care about the seconds. It should really let you specify something telescoping like YYYY
, YYYY-MM
, YYYY-MM-DD
, etc. Or at least just YYYY-MM-DD
. If you want to get fancy it would be really cool to invert llnl.util.lang.pretty_date
and allow, e.g., --since yesterday
or --since "a month ago"
, but I don't think that's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feed9bd
to
fb0592c
Compare
lib/spack/llnl/util/lang.py
Outdated
pattern[n_days_regexp] = _n_days_ago | ||
|
||
# {n} weeks ago | ||
n_weeks_regexp = re.compile('^(\d*) week[s]? ago') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably support "a week ago" if you're at it anyway. Also what if I write --since 'weeks ago'
? Should'nt that trigger a ValueError
at the end? \d*
matches 0 or more digits. I think you want ([\d+|a])
lib/spack/llnl/util/lang.py
Outdated
pattern[re.compile('^yesterday$')] = callback | ||
|
||
# {n} days ago | ||
n_days_regexp = re.compile('^(\d*) day[s]? ago') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here as for weeks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could simplify the parsing like this:
regex = re.compile(r'(a|\d+)\s*(year|month|week|day|hour|minute|second)s?\s*ago')
match = regex.search(string)
if match:
how_many, time_period = match.groups()
# how many is 'a' or a number
# time_period is one of year/month/day/etc.
lib/spack/llnl/util/lang.py
Outdated
@@ -442,6 +442,59 @@ def pretty_date(time, now=None): | |||
return str(diff) + " years ago" | |||
|
|||
|
|||
def str2date(date_str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about pretty_str_to_date
or something to indicate it's reciprocal? Also needs a test case (can probably reuse the cases for pretty_date
)
@@ -347,7 +372,7 @@ def check(cond, msg): | |||
def invalid_record(hash_key, error): | |||
msg = ("Invalid record in Spack database: " | |||
"hash: %s, cause: %s: %s") | |||
msg %= (hash_key, type(e).__name__, str(e)) | |||
msg %= (hash_key, type(error).__name__, str(error)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
lib/spack/spack/database.py
Outdated
@@ -579,23 +626,33 @@ def _read(self): | |||
self._write(None, None, None) | |||
self.reindex(spack.store.layout) | |||
|
|||
def _add(self, spec, directory_layout=None, explicit=False): | |||
def _add(self, spec, directory_layout=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this one be explicit=False
again too?
lib/spack/spack/database.py
Outdated
concrete_specs = self.query(query_spec, known, installed) | ||
concrete_specs = self.query( | ||
query_spec, known=known, installed=installed | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: can we put this on the prior line? I wish there was a way to make it more consistent via the flake8 check but I don't think there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like:
concrete_specs = self.query(
query_spec, known=known, installed=installed)
?
p.s. we should definitely check yapf once again and consider using it like go fmt
lib/spack/spack/database.py
Outdated
@@ -442,12 +467,18 @@ def _read_suppress_error(): | |||
tty.debug( | |||
'RECONSTRUCTING FROM SPEC.YAML: {0}'.format(spec)) | |||
explicit = True | |||
inst_time = _now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the right thing to do really to update old installations that don't have an install time with the time of the last reindex? Seems like that could lead to surprising behavior. What about the creation date of the package's install prefix? That way people could just spack reindex
and get reasonable install times, even for old stuff.
Information on the date and time of installation of a spec is recorded into the database. The information is retained on reindexing.
The DB can now be queried for specs that have been installed in a given time window. This query possibility is exposed to command line via two new options of the `find` command.
fb0592c
to
7360e02
Compare
@tgamblin done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alalazo: LGTM! It would be good if you could also add docs for this!
@tgamblin Will do. What could be an appropriate section? |
@alalazo: probably the existing bit on |
* Added installation date and time to the database Information on the date and time of installation of a spec is recorded into the database. The information is retained on reindexing. * Expose the possibility to query for installation date The DB can now be queried for specs that have been installed in a given time window. This query possibility is exposed to command line via two new options of the `find` command. * Extended docstring for Database._add * Use timestamps since the epoch instead of formatted date in the DB * Allow 'pretty date' formats from command line * Substituted kwargs with explicit arguments * Simplified regex for pretty date strings. Added unit tests.
Information on the date and time of installation of a spec is recorded into the database. The information is retained on reindexing. The DB can now be queried for specs that have been installed in a given time window. This query possibility is exposed to command line via two new options of the
find
command.Examples
With the following command:
$ spack install hdf5~mpi~cxx~fortran
you'll generate a database that contains date and time information. Now you can query from command line based on date and time of installation:
A practical use-case we have in mind at EPFL is to use this feature in a cron job that, based on the result of the query, updates the MOTD on clusters front-end and advertizes the software that has been installed recently (e.g. in the last two weeks).
@nazavode