Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record installation date and time in DB #7334

Merged

Conversation

alalazo
Copy link
Member

@alalazo alalazo commented Feb 23, 2018

Information on the date and time of installation of a spec is recorded into the database. The information is retained on reindexing. The DB can now be queried for specs that have been installed in a given time window. This query possibility is exposed to command line via two new options of the find command.

Examples

With the following command:

$ spack install  hdf5~mpi~cxx~fortran

you'll generate a database that contains date and time information. Now you can query from command line based on date and time of installation:

$ spack find
==> 2 installed packages.
-- linux-ubuntu14.04-x86_64 / gcc@4.8 ---------------------------
hdf5@1.10.1  zlib@1.2.11

$ spack find --start-date="2019"
==> 0 installed packages.

$ spack find --start-date="2 days ago" 
==> 1 installed packages.
-- linux-ubuntu14.04-x86_64 / gcc@4.8 ---------------------------
hdf5@1.10.1

$ spack find --end-date="2018-02-23"
==> 1 installed packages.
-- linux-ubuntu14.04-x86_64 / gcc@4.8 ---------------------------
zlib@1.2.11

A practical use-case we have in mind at EPFL is to use this feature in a cron job that, based on the result of the query, updates the MOTD on clusters front-end and advertizes the software that has been installed recently (e.g. in the last two weeks).

@nazavode

Copy link
Contributor

@healther healther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never ran into the situation where I wanted to do this but it seems like a useful feature. I find explicit argument lists easier to understand than opaque **kwargs, but no strong feelings there. But the Exception that is raised in case of a ill formatted input str should really state which argument is malformed

@@ -579,23 +626,33 @@ def _read(self):
self._write(None, None, None)
self.reindex(spack.store.layout)

def _add(self, spec, directory_layout=None, explicit=False):
def _add(self, spec, directory_layout=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep the arguments explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below answer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is a bit different, how do I know why kwargs are accepted/expected? And documentation can will become outdated...
I agree with you that writing out arguments makes a call easier to read, but I also know that I always have to google the damn kwargs of any matplotlib call... The former could be enforced by convention (which will probably not hold in the long run either). So I see points for both sides -> no strong feelings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case _add is by convention a private (if you pass the term in python) method. Usually I regard those methods as implementation details and don't document them using the same care as I have for user facing APIs.

If you think it should be documented with the same level of details as query that won't be a problem - just let me know. In any case users are not supposed to be calling _add, and the function itself won't appear in the docs online.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that _ means private ;) and agreed private functions shouldn't need to be documented in detail. However that means there is no longer any information in the function signature or documentation which kwargs are accepted. So if I were to use _add I would have no idea which kwargs are expected or accepted.
I don't have a good solution, either you could:

  • document _add
  • revert back to explicit variables and expect users to call it by name

If I were to use this in half a year I'd at least like to have the knowledge of what the parameters are, so that I can grep for them in the source code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended help here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this one be explicit=False again too?

@@ -766,45 +828,64 @@ def activated_extensions_for(self, extendee_spec, extensions_layout=None):
continue
# TODO: conditional way to do this instead of catching exceptions

def query(self, query_spec=any, known=any, installed=True, explicit=any):
"""Run a query on the database.
def query(self, query_spec=any, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why not leave the arguments explicitly? Yes it makes the signature bulkier, but it's much easier to understand the function if you don't have to constantly ask yourself what else is contained ink kwargs

Copy link
Member Author

@alalazo alalazo Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it forces users, in both python 2 and 3, to pass optional arguments by keyword (which makes it more readable at call site). If you look at this:

db.query(spec, True, False, False)

are you sure it's easier to understand than:

db.query(spec, known=True, installed=False, explicit=False)

? The longer signature is another reason, but in my opinion of secondary importance with respect to the above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with you that the second one is easier to read, I was thinking about writing another db.query statement, but then the kwargs are documented, so it is fine, I'll probably find what I need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree with @healther -- let's not change this everywhere just to force users to do this. I don't think most calls to these functions use positional arguments anyway, and we can either add a style check for this, or catch it in review.

start_date = kwargs.pop('start_date', datetime.datetime.min)
end_date = kwargs.pop('end_date', datetime.datetime.max)

# Raise an exception if we have still entries in kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to prevent that typos pass through kwargs.

for attribute in ('start_date', 'end_date'):
date = getattr(args, attribute)
if date:
q_args[attribute] = spack.database.str2datetime(date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should add a catch if data is ill formatted that should raise an Exception telling which date was ill formed and how it should have been formatted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh? Is there something wrong with the current behavior?

$ spack find --start-date=456
==> Error: time data '456' does not match format '%Y-%m-%d %H:%M:%S'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm just still burned from the SyntaxError in package.py... At the moment it is only used on the command line, where the user should immediately see what happened, I worry about a future use in less user-obvious situations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the comment above? I don't get the request there...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have a syntax error (I'm not too sure anymore of which kind it has to be) in either one of the configuration files or any package.py in the repo, there were at least cases where spack wouldn't react to any spack install or spack spec like command except printing out something to the effect of SyntaxError in 'package.py', I'd have wanted there to be a feedback in the sense of SyntaxError in 'path-to-package.py'.
In the same spirit: I'd like to get feedback on which argument is broken. Not how the broken context looks like. If it's only used from the command line it probably won't matter much but I spend more time with finding out which package.py was broken than was good for my sanity :D

@alalazo alalazo force-pushed the features/record_installation_date_in_db branch from 526055c to 67b5ffc Compare March 6, 2018 10:26
@alalazo
Copy link
Member Author

alalazo commented Mar 6, 2018

@scheibelp @tgamblin Are you ok with merging this?

Copy link
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stye and date format (for storage and input) requests.

@@ -75,6 +76,27 @@
# Types of dependencies tracked by the database
_tracked_deps = ('link', 'run')

#: Time format used in database entries
_time_format = '%Y-%m-%d %H:%M:%S'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database entries should just use precise timestamps (since the epoch), and formats should be used for display only.

spec (Spec): spec tracked by the install record
path (str): path where the spec has been installed
installed (bool): whether or not the spec is currently installed
ref_count (int): number of specs that depend on this one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree with @healther here. Don't use **kwargs for everything; use explicit arguments. I realize that allows them to be positional, but we can enforce that at the callsite instead of in the function. We could even add a style check for it. I think the benefits of good documentation and method signatures outweigh the use of **kwargs, and I had started switching away from **kwargs where possible.

In google pydoc strings, the way to document an optional parameter is ref_count (int, optional): ...... see here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use **kwargs for everything; use explicit arguments.

I'll take this request as proper style in Spack, and will implement it. But if we are talking about idiomatic python, I think the subject is at least controversial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree it’s controversial. I like it because it makes the prototypes more readable and easier to document. It’s also closer to the way you’d write it in Python 3, where you can get the best of both worlds and force kwargs. I’d argue that doing things more like 3 is the better way to go... maybe one day we can drop 2 support altogether... one day far away 😭.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for dropping py2 👼

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will probably be hpcpythonclock.org or craypythonclock.org or something long after pythonclock.org has counted down 😭

@@ -766,45 +828,64 @@ def activated_extensions_for(self, extendee_spec, extensions_layout=None):
continue
# TODO: conditional way to do this instead of catching exceptions

def query(self, query_spec=any, known=any, installed=True, explicit=any):
"""Run a query on the database.
def query(self, query_spec=any, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree with @healther -- let's not change this everywhere just to force users to do this. I don't think most calls to these functions use positional arguments anyway, and we can either add a style check for this, or catch it in review.

@@ -96,6 +97,14 @@ def setup_parser(subparser):
action='store_true',
help='show fully qualified package names')

subparser.add_argument(
'--start-date',
help='earliest date of installation [YYYY-MM-DD HH:MM:SS]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the format here is too cumbersome. Most people don't care about the seconds. It should really let you specify something telescoping like YYYY, YYYY-MM, YYYY-MM-DD, etc. Or at least just YYYY-MM-DD. If you want to get fancy it would be really cool to invert llnl.util.lang.pretty_date and allow, e.g., --since yesterday or --since "a month ago", but I don't think that's necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a minimal parser for this one. If in the future we want a better solution, we might consider this (thanks to @nazavode for the pointer).

@alalazo alalazo force-pushed the features/record_installation_date_in_db branch 2 times, most recently from feed9bd to fb0592c Compare March 17, 2018 23:19
pattern[n_days_regexp] = _n_days_ago

# {n} weeks ago
n_weeks_regexp = re.compile('^(\d*) week[s]? ago')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably support "a week ago" if you're at it anyway. Also what if I write --since 'weeks ago'? Should'nt that trigger a ValueError at the end? \d* matches 0 or more digits. I think you want ([\d+|a])

pattern[re.compile('^yesterday$')] = callback

# {n} days ago
n_days_regexp = re.compile('^(\d*) day[s]? ago')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here as for weeks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could simplify the parsing like this:

regex = re.compile(r'(a|\d+)\s*(year|month|week|day|hour|minute|second)s?\s*ago')
match = regex.search(string)
if match:
    how_many, time_period = match.groups()
    # how many is 'a' or a number
    # time_period is one of year/month/day/etc.

@@ -442,6 +442,59 @@ def pretty_date(time, now=None):
return str(diff) + " years ago"


def str2date(date_str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about pretty_str_to_date or something to indicate it's reciprocal? Also needs a test case (can probably reuse the cases for pretty_date)

@@ -347,7 +372,7 @@ def check(cond, msg):
def invalid_record(hash_key, error):
msg = ("Invalid record in Spack database: "
"hash: %s, cause: %s: %s")
msg %= (hash_key, type(e).__name__, str(e))
msg %= (hash_key, type(error).__name__, str(error))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -579,23 +626,33 @@ def _read(self):
self._write(None, None, None)
self.reindex(spack.store.layout)

def _add(self, spec, directory_layout=None, explicit=False):
def _add(self, spec, directory_layout=None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this one be explicit=False again too?

concrete_specs = self.query(query_spec, known, installed)
concrete_specs = self.query(
query_spec, known=known, installed=installed
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: can we put this on the prior line? I wish there was a way to make it more consistent via the flake8 check but I don't think there is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like:

concrete_specs = self.query(
    query_spec, known=known, installed=installed)

?

p.s. we should definitely check yapf once again and consider using it like go fmt

@@ -442,12 +467,18 @@ def _read_suppress_error():
tty.debug(
'RECONSTRUCTING FROM SPEC.YAML: {0}'.format(spec))
explicit = True
inst_time = _now()
Copy link
Member

@tgamblin tgamblin Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the right thing to do really to update old installations that don't have an install time with the time of the last reindex? Seems like that could lead to surprising behavior. What about the creation date of the package's install prefix? That way people could just spack reindex and get reasonable install times, even for old stuff.

Information on the date and time of installation of a spec is recorded
into the database. The information is retained on reindexing.
The DB can now be queried for specs that have been installed in a given
time window. This query possibility is exposed to command line via two
new options of the `find` command.
@alalazo alalazo force-pushed the features/record_installation_date_in_db branch from fb0592c to 7360e02 Compare March 20, 2018 22:09
@alalazo
Copy link
Member Author

alalazo commented Mar 20, 2018

@tgamblin done

Copy link
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalazo: LGTM! It would be good if you could also add docs for this!

@tgamblin tgamblin merged commit 5655895 into spack:develop Mar 22, 2018
@alalazo alalazo deleted the features/record_installation_date_in_db branch March 22, 2018 18:09
@alalazo
Copy link
Member Author

alalazo commented Mar 22, 2018

@tgamblin Will do. What could be an appropriate section?

@tgamblin
Copy link
Member

@alalazo: probably the existing bit on spack find in the basics. Also, the command-line help could be updated so that --start-date and --end-date help better explain valid date formats (e.g. "yesterday", ability to do just the year, etc.) I'd use something like YYYY[-MM[-DD]] | yesterday | N (days|weeks|months|hours|minutes) ago | etc.

ch741 pushed a commit to ch741/spack that referenced this pull request Apr 20, 2018
* Added installation date and time to the database

Information on the date and time of installation of a spec is recorded
into the database. The information is retained on reindexing.

* Expose the possibility to query for installation date

The DB can now be queried for specs that have been installed in a given
time window. This query possibility is exposed to command line via two
new options of the `find` command.

* Extended docstring for Database._add

* Use timestamps since the epoch instead of formatted date in the DB

* Allow 'pretty date' formats from command line

* Substituted kwargs with explicit arguments

* Simplified regex for pretty date strings. Added unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants