Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB - class and tests #8

Merged
merged 1 commit into from
Nov 30, 2017
Merged

MongoDB - class and tests #8

merged 1 commit into from
Nov 30, 2017

Conversation

dendisuhubdy
Copy link
Contributor

No description provided.

from abstractdb import AbstractDB

try:
from StringIO import StringIO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case StringIO is not available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python 3.0 StringIO and cStringIO modules are gone. See https://stackoverflow.com/questions/11914472/stringio-in-python3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! You are speaking to an ignorant of python3 indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha 👍

Copy link
Member

@tsirif tsirif Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.password = str(password)

# we might have a different port for the database port
# so here we do a check if the port is there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a TODO comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leaved it for a IF block before, I should've removed it. Noted. Thanks

# according to MongoDB documentation
# https://docs.mongodb.com/getting-started/python/client/

self.uri = 'mongodb://' + self.dbhost + ':' + self.dbport
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building the DB url (you typed uri by the way, was it intentional?) can get complicated if we support many options. That would deserve it's own method. It could be worth making it a dynamic @property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the full URI until our discussion last week with Fred that for current localhost testing purposes we shouldn't put a default username and password because it would get passed on. Anyway, I'll comment out this URI and use the one following this line instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the @property is a good idea. I am not sure how we should ask for the password. There are multiple ways of authentications here: https://api.mongodb.com/python/current/examples/authentication.html

Also, there this useful stl in 2.7, >=3.5: https://docs.python.org/3/library/getpass.html

conn = MongoClient(self.uri)
self.conn = conn
except pymongo.errors.ConnectionFailure:
raise Exception("Could not connect to server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could ConnectionFailure connect important information that we would like to pass on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we would need it? If pymongo.errors.ConnectionFailure as e we pass on (like in C), the assert value for the error is null, thus outside of the try exception block we would always return a valid connection object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By passing on, I meant passing the error information in the raised error so that it finds its way in the stack trace printed to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmn conn itself contains enough information to let the user know if things aren't right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a good idea to raise or catch pure Exception objects; there is no way to differentiate between cases. I agree with Dendi's thought to reraise an exception here, but it should be an Exception of our own w.r.t. common database problems, so that we can wrap in the future other libraries as well.

However, I think we should still keep the traceback though, Pythonic stuff. raise wrapped_e from e clauses are the case here, but unfortunately however beautiful (I find) they are, they are Py3 only. But there is this:
http://python-future.org/compatible_idioms.html#raising-exceptions (from future.utils)
https://pythonhosted.org/six/#six.raise_from (from six although they do not implement the correct impl of this functn; there's a current PR about this)

For truthful reference: https://www.python.org/dev/peps/pep-3134/#rationale
.. seemore:: difference between implicit and explicit exception chaining :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsirif I referred to this example in another comment for a way to reraise an exception. I agree we should not reraise it as a pure Exception. But again, I'm not sure we would need to reraise it. Isn't the ConnectionFailure explicit enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either both, I added it to the __init__.py in utils.py and redid all of the exception blocks.

def close_connection(self):
try:
close = self.conn.close()
except pymongo.errors.ConnectionFailure:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem

except pymongo.errors.ConnectionFailure:
raise Exception("Could determine unlock due to connection")

def insert(self, experiment_name, trial_field, dictionary, field_filters=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trial_field is used as a collection name. It's confusing. Is trial_field supposed to be a string inside the collection experiment_name pointing to a dictionary of trials ({id: trial})? Shouldn't that trial_field be a CONSTANT variable then?

Nevertheless, I'm not sure DB classes should have experiment and trial semantic tied to them. From Christos' UMLs, it seems to me that Experiment and Trial should have their own interface, with each using AbstractDB internally (which can be MongoDB or PostgreSQL, etc). In some way, workers and producers do not need to know they are communicating with a DB, anything could be hidden behind Experiment and Trial objects; Database, Sockets, whatever. @tsirif Any thoughts on that?

# 'hyperparam_type': 'int',
# 'hyperparam_value': '200'}

# check field_filters value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem

if field_filters is None:
field_filters = {}

dbcollection = self.db[experiment_name][trial_field]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8

return conn

def mongodb_insert_test(self):
result = self.db.insert(self.experiment, self.trial, self.query_test, field_filters=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see here what trial_field was intended to be. I'd prefer to combine them inside a single dict inside experiments otherwise they kind of pollute the experiment row.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that I understand what you mean. Could you give 2 cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ name: dummy,
  # some other properties,
  trial_1: {
      # properties of trial_1
   },
  # ...
  trial_n: {
      # properties of trial_n
   }
  # some other properties,
}

vs

{ name: dummy,
  # some other properties,
  trials: [
      {
          id: 1, 
          # properties of trial #1 
      },
      # ...
      trial_n: {
          # properties of trial_n
      }
   },
  # some other properties,
}

Trials could be either a list or a dictionary


from metaopt.io.database.mongodb import MongoDB

class MongoDB_test():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using unittest.TestCase and tests directly inside the class rather than gluing them together in a function later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha I adjusted this per assert in the functions inside the class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for subclassing in general: https://docs.pytest.org/en/latest/getting-started.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No but the assert methods coming with unittest.TestCase are quite useful. 🙂

# try to initialize connection
# setting this as a function allows to make sure reconnection
# is properly done in the case of an unreliable network
self.uri = property(self.get_uri, self.set_uri, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using this syntax?

@property
def uri(self):
    # ...

@uri.setter
def uri(self, value):
    # ...

uri = "mongodb://%s:%s@%s:%s" % (quote_plus(self._username), \
quote_plus(self._password), \
self._dbhost, self._dbport)
self._uri = uri
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri passed to the setter is not used. Do we support editing dbhost, dbport, etc, or editing uri? Or both? I think we should support both, but for simplicity I would only support setting dbhost, dbport and others for now. If we allow setting the uri, than we need to parse it and set dbhost and others accordingly. We can keep that for later.

# return connection string for checking purposes
return conn

def check_connection(self):
if self.db:
#print("Connection Successful!")
print("Connection is successful!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.info please :)

return True
else:
#print("Please check your connection")
print("Please check your connection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

return False

def close_connection(self):
try:
close = self.conn.close()
except pymongo.errors.ConnectionFailure:
raise Exception("Could not close socket connection")
except pymongo.errors.ConnectionFailure as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not saying we should do this. I was just wondering what kind of information is passed through ConnectionFailure. Is there any relevant information that we should not lose? Additionally, I believe raising Exception like this might break the stack trace. Look at this for instance. For simplicity, could the ConnectionFailure error message be good enough to just let it pass with no try-except clause?

@bouthilx
Copy link
Member

bouthilx commented Nov 15, 2017

@dendisuhubdy It would be preferable to use informative messages for the commits. Taken out of context, "adding suggested changes from code review" doesn't mean anything. Even then, the first though I had reading the commit's name was "What changes did I asked exactly? I don't remember". 🙂

A good reading about commit messages: http://who-t.blogspot.ca/2009/12/on-commit-messages.html

Copy link
Member

@tsirif tsirif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gogogo let's close this tomorrow!

conn = MongoClient(self.uri)
self.conn = conn
except pymongo.errors.ConnectionFailure:
raise Exception("Could not connect to server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a good idea to raise or catch pure Exception objects; there is no way to differentiate between cases. I agree with Dendi's thought to reraise an exception here, but it should be an Exception of our own w.r.t. common database problems, so that we can wrap in the future other libraries as well.

However, I think we should still keep the traceback though, Pythonic stuff. raise wrapped_e from e clauses are the case here, but unfortunately however beautiful (I find) they are, they are Py3 only. But there is this:
http://python-future.org/compatible_idioms.html#raising-exceptions (from future.utils)
https://pythonhosted.org/six/#six.raise_from (from six although they do not implement the correct impl of this functn; there's a current PR about this)

For truthful reference: https://www.python.org/dev/peps/pep-3134/#rationale
.. seemore:: difference between implicit and explicit exception chaining :P


from metaopt.io.database.mongodb import MongoDB

class MongoDB_test():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for subclassing in general: https://docs.pytest.org/en/latest/getting-started.html

Copy link
Member

@bouthilx bouthilx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to address the design issue of separation between AbstractDB and childs with Experiment and Trial

if __debug__:
self.logger = logging.getLogger('database')
self.fh = logging.FileHandler('database.log')
self.fh.setLevel(logging.DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why we would want a detailed log for the database but do we really want to have DEBUG by default? I guess that's like system's log that are always very verbose by default.

return self._uri

@uri.setter
def uri(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a correct setter interface. It should be

@uri.setter
def uri(self, value):

But that wouldn't make sense since we set it internally anyway. Maybe we just shouldn't have a setter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda confused if we do it this way what value should we pass, the value of the URI itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing. It would not make sense and that's why I'm saying that maybe there should not be any setter method for uri.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there would only be a @property? I'm not that familiar with Python yet, so does it work without a setter?

In the other case would a self.uri as the previous commits be preferable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works without a setter. Using property can be more flexible if for instance the property is built using other arguments which can change over time. Since uri is built using other arguments I think we should keep @Property.

conn = MongoClient(self.uri)
self.conn = conn
except pymongo.errors.ConnectionFailure:
raise Exception("Could not connect to server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsirif I referred to this example in another comment for a way to reraise an exception. I agree we should not reraise it as a pure Exception. But again, I'm not sure we would need to reraise it. Isn't the ConnectionFailure explicit enough?

return True
else:
#print("Please check your connection")
self.logger.info("Please check your connection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to me to print logging info inside the check_connection like this. If False, I guess the code calling the function might raise an error or something. Why should the check_connection print logging info then? Maybe that would be better to set them as debug. I don't know. @tsirif What do you think?

return False

def close_connection(self):
try:
close = self.conn.close()
except pymongo.errors.ConnectionFailure:
raise Exception("Could not close socket connection")
self.logger.info("Could not close socket connection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never get executed

return close

def is_locked(self):
try:
is_locked = self.conn.is_locked()
except pymongo.errors.ConnectionFailure:
raise Exception("Could not determine if connection is locked")
self.logger.info("Could not determine if connection is locked")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also never get executed

# 'hyperparam_type': 'int',
# 'hyperparam_value': '200'
# }
# }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be worth turning into a doctstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

Copy link
Member

@bouthilx bouthilx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB classes still have experiment and trial semantic tied to them. Methods insert, update, load and remove should be more general

formatter = logging.Formatter('%(asctime)s - %(name)s -'
'%(levelname)s - %(message)s')
self.fh.setFormatter(formatter)
self.logger.addHandler(self.fh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set the filehandler as an attribute of the AbstractDB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it will be used if we would want to have a PostgreSQL or MySQL class down the line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will change the file handler?

Copy link
Contributor Author

@dendisuhubdy dendisuhubdy Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't one prefer just one type of database, so we just need one type of database.log file rather than mongodb-database.log thus one database filehandler? What do you think? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then I still don't see why set self.fh. It seems to me that they can all use self.logger just fine without messing with self.fh.

self.conn = conn
except pymongo.errors.ConnectionFailure as e:
raise_from(DatabaseError("Could not connect to server"), e)
self.logger.debug("Could not connect to server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never be executed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this would never be executed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines under raise SomeError() never get executed. Once reraising the error, the process will move up to the code calling this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so let's settle for raise Exception then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He means that commands should be in reverse order 😛

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First logger.debug and then raise_from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh lol 💃

close = self.conn.close()
except pymongo.errors.ConnectionFailure as e:
raise_from(DatabaseError("Could not close socket connection."), e)
self.logger.debug("Could not close socket connection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

is_locked = self.conn.is_locked()
except pymongo.errors.ConnectionFailure as e:
raise_from(DatabaseError("Could not determine if connection is locked."), e)
self.logger.debug("Could not determine if connection is locked.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

unlock = self.conn.unlock()
except pymongo.errors.ConnectionFailure as e:
raise_from(DatabaseError("Could not determine unlock due to connection."), e)
self.logger.debug("Could not determine unlock due to connection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

result = db.remove(experiment, trial, field_filters)
assert result is None

def mongodb_tests():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you moved back to functions like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed on working with pytest that's why we moved to these functions, now we decided on Unittest.TestCase and now we're moving back to the previous class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use pytest without unittest if Christos and you prefers this. From what I understand Christos' comment was no against nosetest but simply that pytest supports the class tests without inheriting from special classes like unittest.TestCase. You can avoid writing a test function to combine them all by structuring the tests in a class.


from metaopt.io.database.mongodb import MongoDB

class MongoDB_test():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No but the assert methods coming with unittest.TestCase are quite useful. 🙂

return conn

def mongodb_insert_test(self):
result = self.db.insert(self.experiment, self.trial, self.query_test, field_filters=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ name: dummy,
  # some other properties,
  trial_1: {
      # properties of trial_1
   },
  # ...
  trial_n: {
      # properties of trial_n
   }
  # some other properties,
}

vs

{ name: dummy,
  # some other properties,
  trials: [
      {
          id: 1, 
          # properties of trial #1 
      },
      # ...
      trial_n: {
          # properties of trial_n
      }
   },
  # some other properties,
}

Trials could be either a list or a dictionary

@@ -32,19 +32,19 @@ def close_connection(self):
pass

@abstractmethod
def insert(self, experiment_name, trial_field, dictionary, field_filters):
def insert(self, collection_name, dictionary, field_filters):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think field_filters is necessary here

pass

@abstractmethod
def update(self, experiment_name, trial_field, update_dictionary, field_filters):
def update(self, collection_name, update_dictionary, field_filters):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (self, collection_name, query, update_dictionary) would be more clear.

pass

@abstractmethod
def load(self, experiment_name, experiment_field, field_filters):
def load(self, collection_name, field_filters):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (self, collection_name, query, projection) or (self, collection_name, query, selection) would be more clear. projection is the name used by mongodb for what is called a selection in SQL DBs. I don't have a clear preference. Sticking to mongodb's nomenclature could make the interface more coherent but at the same time I always found projection to be less obivous than selection.

pass

@abstractmethod
def remove(self, experiment_name, trial_field, field_filters):
def remove(self, collection_name, field_filters):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (self, collection_name, query) would be more clear.


if len(dbdocs) == 0:
if len(dbdocs) is 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switch from == to is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically is works on object comparison and == works on value comparison. I guess in this case it's a comparison between the lengths I was just trying to make the code to look more functional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the difference :) and I would agree if it was meant for is False but here it's really about integers.

def insert(self, collection_name, query, dictionary):

dbcollection = self.db[collection_name]
result = dbcollection.insert(query, dictionary)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there is no such thing as insert(query, dict) in pymongo. It should be insert_many(query, objects) where objects is a list of dictionaries. Also I can't see why we would want to query something when we do an insert. It's not an update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert does exists What's the difference between insert(), insertOne() and insertMany() methods on MongoDB. I've double checked it with a local execution on my local computer with MongoDB too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although you're right, it's deprecated, I'll do some changes, although object does sound ambiguous, I propose dictionaries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I though it did not exist looking at the doc here. Still, insert does not take a query as first argument. I don't understand how your tests can pass 🤔

I agree objects is not very clear but dictionaries only tells about the container not the content. What about document_dictionaries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document_dictionaries sounds better 👍

@tsirif
Copy link
Member

tsirif commented Nov 21, 2017

Public Interface

  • Read data by attribute name from db, experiment as "value"/"list of values": [collection_name, attr_name, **filter] -> data
  1. From collection, with name collection_name, read attribute, with name attr_name and return a "value" (number, string or dict) or a "list of values", corresponding to that attribute.
  2. If attribute, with name attr_name, is known to correspond to a "list of values", then **filter can contain key=(how, value) or key=value entries:
    • key := attribute of an object in that list
    • how := way of comparing; one of ['=', '>=', '<=', '>', '<'] ?
    • value := reference "value"
    • A key=value entry assumes how == '='.
  3. A certain experiment can be access with attr_name == exp_name from collection 'experiments'. A certain attribute of an experiment can be accessed by attr_name == '<exp_name>.<attr_name>'.
  • Write data by attribute name to db, experiment as "value"/"list of values": [collection_name, attr_name, data, **filter] -> data
  1. len(filter) != 0 assumes an update at a "list of values" attribute, with name attr_name. data is a "value".
  2. len(filter) == 0 assumes:
    • If attribute, with attr_name, is known to correspond to a "list of values", then insert (many) data in that list.
    • Else, (over)write data of that attribute.
  • Provide context manager interface for a specific experiment, with name exp_name, in a specific collection, with name collection_name.

@tsirif
Copy link
Member

tsirif commented Nov 21, 2017

@dendisuhubdy To be able not to brake CI from now on, it would be better if, instead of merging master back, you would rebase on top of master (after #7 is merged ^_^)

@bouthilx
Copy link
Member

@tsirif I don't understand why we would need 2 read/write interfaces for the DB rather than a single one. For now I would only rely on MongoDB's methods for doing complex queries and updates. All the information for this can be contained in the dictionaries passed to the DB methods. We can later port those dictionaries to SQL statements.

@tsirif
Copy link
Member

tsirif commented Nov 22, 2017

Initially I thought that it would be convenient to have two different sets, one for experiment entry and one for an experiment's attributes. But I see your point, you are right. I am restating that.

def test_bad_connection():
"""Raise when connection cannot be achieved."""
with pytest.raises(DatabaseError) as exc_info:
MongoDB(dbhost='asdfada', username='uasdfaf', password='paasdfss')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you test this? You don't have a database right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the wrong host we would assert that the connection will definitely fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was confused because I didn't know we had a mongo database setup for the tests. I realized when looking at the CI setup files.

from metaopt.io.database.mongodb import MongoDB


@pytest.fixture(scope='module')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I didn't know about pytest fixtures, that's nice!

.pylintrc Outdated
@@ -51,7 +51,7 @@ confidence=
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use"--disable=all --enable=classes
# --disable=W"
disable=fixme,F0401,intern-builtin,nonzero-method,parameter-unpacking,backtick,raw_input-builtin,dict-view-method,filter-builtin-not-iterating,long-builtin,unichr-builtin,input-builtin,unicode-builtin,file-builtin,map-builtin-not-iterating,delslice-method,apply-builtin,cmp-method,setslice-method,coerce-method,long-suffix,raising-string,import-star-module-level,buffer-builtin,reload-builtin,unpacking-in-except,print-statement,hex-method,old-octal-literal,metaclass-assignment,dict-iter-method,range-builtin-not-iterating,using-cmp-argument,indexing-exception,no-absolute-import,coerce-builtin,getslice-method,suppressed-message,execfile-builtin,round-builtin,useless-suppression,reduce-builtin,old-raise-syntax,zip-builtin-not-iterating,cmp-builtin,xrange-builtin,standarderror-builtin,old-division,oct-method,next-method-called,old-ne-operator,basestring-builtin
disable=redefined-outer-name,fixme,F0401,intern-builtin,nonzero-method,parameter-unpacking,backtick,raw_input-builtin,dict-view-method,filter-builtin-not-iterating,long-builtin,unichr-builtin,input-builtin,unicode-builtin,file-builtin,map-builtin-not-iterating,delslice-method,apply-builtin,cmp-method,setslice-method,coerce-method,long-suffix,raising-string,import-star-module-level,buffer-builtin,reload-builtin,unpacking-in-except,print-statement,hex-method,old-octal-literal,metaclass-assignment,dict-iter-method,range-builtin-not-iterating,using-cmp-argument,indexing-exception,no-absolute-import,coerce-builtin,getslice-method,suppressed-message,execfile-builtin,round-builtin,useless-suppression,reduce-builtin,old-raise-syntax,zip-builtin-not-iterating,cmp-builtin,xrange-builtin,standarderror-builtin,old-division,oct-method,next-method-called,old-ne-operator,basestring-builtin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we disable so much checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the them are default exclusions, I will try to introduce small amount of them at a time.

@@ -29,7 +29,7 @@ install:
- pip install tox
before_script:
- sleep 15
- mongo metaopt --eval 'db.createUser({user:"user",pwd:"pass",roles:["readWrite"]});'
- mongo metaopt_test --eval 'db.createUser({user:"user",pwd:"pass",roles:["readWrite"]});'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Here is the test database. Fine.

def __init__(self, dbhost='localhost', dbname='metaopt',
dbport=27017, username=None, password=None):
super(MongoDB, self).__init__(dbhost, dbname, dbport, username, password)
def __init__(self, uri):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems my comment on slack wasn't clear. You did the exact opposite of what I proposed. 😆

I believe the simplest functional interface for now would be _init__(uri, with, all, the, sugar, already, there, being, optional), but inside __init__ only self._uri = uri would be set. Then, the uri getter would be as you wrote. However, there would be no setter for now as all the sugar already there are not set in the database yet.

I know it's tempting to already implement all of it. It's what, just 2 or 3 more functions and a couple of attributes after all? But then we need to test all use cases and make sure passing uri works fine just as much as passing attributes, and think about what to do if user pass uri + other arguments. How do we handle the setter for uri? How do we handle the setter for the other arguments? It's a lot more work than it looks like, enough to make us miss the Friday's milestone for the DB. Furthermore, we would have to maintain this code while implementing the rest of the minimum viable product, hence spending even more time maintaining extra features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it really. If you mean we use the function parser, we need to pass a uri into it, before it was like this, it was __init__(self, dbhost, dbname, dbport, username, password)) which you said we shouldn't do this and we should just use the parser instead. If that get's parsed again inside the MongoClient we should just pass in the uri, which it would look like just that.

I don't know how this is incorrectly. You don't need to pass through init both uri and all of the dbhost, dbname, dbpassword, and dbport all over again, it's inefficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you are confused, I'm sorry I didn't detail enough my thoughts.

I believe we should eventually support both

db = Database(uri)

and

db = Database(dbhost, dbname, dbport, username, password, **options)

Just like pymongo's MongoClient does. But for now only one of the two would be enough for the MVP. I believe Database(uri) would be the simplest one to implement first. You can keep the arguments in the init as they are optional, but they wouldn't be used for now.

We could use the same mechanism than MongoClient:

class pymongo.mongo_client.MongoClient(host='localhost', port=27017, ...)
...
The host parameter can be a full mongodb URI, in addition to a simple hostname

raise_from(DatabaseError("Could not determine unlock due to connection."), e)
return unlock

def insert(self, collection_name, query, dictionaries):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe previous discussion still holds: #8 (comment)

@@ -86,10 +86,10 @@ def unlock(self):
raise_from(DatabaseError("Could not determine unlock due to connection."), e)
return unlock

def insert(self, collection_name, query, dictionaries):
def insert(self, collection_name, query, document_dictionaries):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the query? Based on the doc, it seems it should be insert_many(document_dictionaries). I don't understand what a query would be used for in during an insertion.

# see http://api.mongodb.com/python/current/api/pymongo/collection.html
# for more details

result = dbcollection.update_one(query, update_dictionary)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should support update_many here.

@dendisuhubdy
Copy link
Contributor Author

@tsirif @bouthilx are we going to implement #8 (comment) or this thin wrapper is enough?

@bouthilx
Copy link
Member

@dendisuhubdy I believe the wrapper is enough to handle all of this. We can write _load, _update and _save methods in Experiment and Trial if we want to save us the burden of always specifying the collection_name and id. This way we have a clear separation between databases and meta optimization specific classes.

@tsirif What do you think?

@@ -50,6 +50,7 @@ def initiate_connection(self):
try:
conn = MongoClient(self.uri)
self.conn = conn
self.conn.admin.command('ismaster')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good example of why detailed commit messages are necessary. I believe that would be a good time to read and apply what I suggested in this comment. In this case, the commit message should explain why we need to set ismaster and maybe also the alternatives and why command('ismaster') is better.

Writing detailed commit messages does slow down the process of committing code but it significantly speeds up review and maintenance. Also, you will sometimes realize while writing the commit message that something is wrong or missing in the commit. Many of your commits in this PR have been undo/redo, which is a sign that what you were doing was not totally clear to you. It was my fault many times, because my comments were not detailed enough, but if you took the time to write down the commit message you could have realized that informations was missing and you would have asked me for clarifications before committing the changes.


Finds a single document and updates it, returning either the original or the updated document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a string here? It won't be registered as a docstring.

result = dbcollection.find_one_and_update(filter=query,
update=update_dictionary,
projection=selection,
upsert=upsert,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upsert is a nice additon, but why limit ourselves to one update? Please don't make a new commit with update_many. 🙂 I'd prefer we discuss it and we'll see if we settle on find_one_and_update or update_many.

@tsirif Any thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently encountered a problem by doing a nested query inside a dictionary, and by the way update is deprecated http://api.mongodb.com/python/current/api/pymongo/collection.html (at the bottom of the page)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update is deprecated but update_{one, many} are not.

Do you have an example for the problem? I'd like to understand it better. It's not clear to me that update_one rather than update_many would be the best fix for that kind of problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you can not update two documents at once with a MongoDB query. You will always have to do that in two queries. You can of course set a value of a field to the same value, or increment with the same number, but you can not do two distinct updates in MongoDB with the same query. As a reference https://stackoverflow.com/questions/17207183/way-to-update-multiple-documets-with-different-values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure. Different updates need different calls to update(). I don't see that as an argument against update_many. The only argument I see in favor of find_one_and_update is the atomicity of the operation. I'm not sure however it is worth the cost for flexibility.

@@ -188,4 +188,8 @@ def remove(self, collection_name, query):
Returns:
result -- the document/documents
"""
self.db[collection_name].remove(query)
result = self.db[collection_name].find_one_and_delete(filter=query,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To complement this comment, I feel like deletion are very likely to be requested on many documents and we are getting our hands tied here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem. But I'd rather return the document first at the end of the function rather than using delete_one or delete_many.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to recover the deleted document could be handy but I'm not sure we are likely to encounter such a use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although rarely. I'm seeing a case where we want to simply remove a node from the set of resources if suddenly it becomes idle, we need to do a remove from the resources, but then if the node shows up again on our ping, we might need to return it back again. Just removing it might be dangerous.

Copy link
Member

@bouthilx bouthilx Nov 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. If ressources are represented as objects which can easily register themselves in the database then it would do no harm to remove it and register again using the objects info rather than the deleted rows info. To me the best point in favor of find_and_remove is if we want to delete a row and make sure nobody changes it meanwhile. If we do db.load() and then db.remove() it could have changed in between. But again, how is that likely to happen given our use cases and is it worth limiting the flexibility of our interface?

@dendisuhubdy @tsirif What would you think about setting base methods (insert, load, update, remove) as non-atomic ones supporting many operations and latter on we extend the interface with find_and_(update, remove) if needed?

Copy link
Member

@bouthilx bouthilx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dendisuhubdy Please do not make any more modifications for now. Every comments are up to discussion and it would be preferable we agree on what to do before changing anything. Otherwise, comments may get hidden by github and it becomes hard to keep track of the discussions.

@tsirif Please take part of the discussion. 🙂

@@ -79,6 +72,15 @@ def is_locked(self):
raise_from(DatabaseError("Could not determine if connection is locked."), e)
return is_locked

def lock(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add locks now to databases. As Fred said in previous discussions, locks on databases is not trivial. I opened a new issue (#10) to keep track of the idea.

dictionary,
selection,
atomic=False,
upsert=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to give the option for upsert? I though it would always be True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always set it to True we'd lose power over just updating or doing an insert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, upsert only inserts if the document is not found using its id. A very common use case will be to write new documents so it would be weird to always have to set database.write(new_documents, update=True). Also, in our case updates will usually be done by the objects representing the rows (Experiments, Trials, etc) so it will be simple to pass to write() the id so that write() does an update rather than an insert. All of that without specifying upsert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think that dictionary should be renamed to data and put as the second argument (after collection_name).
  2. query and selection should be optional.
  3. I agree that upsert should always be True.
  4. MongoDB specifics should not be asked for in the implementation of an AbstractDB.
  5. So, I believe that the complete argument list should be:
    (self, collection_name, data, query=None, selection=None, atomic=False)

selection,
atomic=False,
upsert=False,
sort=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort is specific to find_one_and_update. I believe the atomic argument should lead to the same behavior if True of False with the only difference being the atomicity of the operation (safer but slower). I see two solutions:

  1. We lock around update_many
  2. We loop on find_one_and_update until all matches are updated (in which case sort argument is useless)

Problem of solution 1 is that database locks is not trivial. Problem of solution 2 is that the operation is only atomic at the row update level, other rows could be changed by another process while the current process updates one.

Solution 2 seems bad because it is only weakly atomic. Because of that I would prefer solution 1. Still, since it is not trivial and atomicity is not something important I would rather leave that for latter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget about last comment. Based on the discussion here #10 it seems atomic operations on many rows is something we should forget about.

Then I think the behaviors for atomic is True or False are too different to be crammed inside the write() method. Since the default behavior is not atomic then we could postpone the addition of atomic operation to write/remove and finish this PR with only write(collection_name, query, dictionary, selection) (and maybe upsert if we decide to make it definable). We can decide later if we integrate atomicity inside write() to keep only write/read/remove or if we add new explicit methods for atomic operations.

atomic=False,
upsert=False,
sort=None,
return_document=ReturnDocument.BEFORE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fill like our use cases where we want the document returned will be ReturnDocument.AFTER (even though I know BEFORE is the default for MongoDB). Also, shouldn't we make that part of the abstract interface? Other kind of DBs will want that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suspicious that for MySQL or PostgreSQL the interface might be different, I'll do research for now and get back to you for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface will probably be different but we could handle that inside the method. If we define our own codes for return_document we could use the same interface for all type of databases. It could be something like this inside the abstractdb python file.

class ReturnDocument(object):
    NONE = 0
    BEFORE = 1
    AFTER = 2

For simplicity I would only implement NONE for now.

upsert=False,
sort=None,
return_document=ReturnDocument.BEFORE,
collation=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have this specific argument. We can add it later if someone requires it. Otherwise we need to test it and maintain for nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see 👍

Copy link
Member

@bouthilx bouthilx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dendisuhubdy @tsirif Please give your opinions on the review comments. We need to make sure we agree on the modifications to do before making them. When commits are submitted, comments get hidden and it is hard to keep track. Furthermore, we cannot continue at that pace. Making and reviewing often small edits that may be changed after discussion is sucking up to much time and hurting our other projects (research namely). If you want to propose small code changes for specific comments, please do so in the comments using markdown's syntax.

Also, I will have to ask you to remove extra features from the PR to speed up our progress. There is for instance the locking system and the support for both URI and attributes. For the atomic option, I believe we still have to discuss to decide whether many_updates or atomicity will be the feature we leave for latter. In other words, what should be the default behavior.

For any feature, please make sure to open an issue and explain it so that we do not loose track of it.

I would like this review to be the penultimate one. This means that all comments should be discussed so that when commits are pushed they address all of the comments. Some comments are requests, others are open questions so they need to be discussed and we need to reach agreement before any commit can address them.

Thanks for your cooperation and continue the good work.

else:
uri = self._uri
ret = parse_uri(uri)
if self._uri is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this. It is a weird side effect. If we want the uri to be settable it should be using a setter explicitely. We don't need both for now. We should decide what we support first: URI or attributes. I remember Pascal was in favor of attributes so that we can ask users for passwords and avoid passwords in URI. Personally I would start with URI as it is much simpler to handle but then we should make support for attributes a priority when MVP is ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hold the same opinion that I wrote in the comment above. I agree with host only positional, being also the uri. But I think that we should let MongoClient handle the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, comments are here


"""

def __init__(self, uri=None, dbname='metaopt', host='localhost',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about making host the first argument and definable as host or URI like in MongoClient

The host parameter can be a full mongodb URI, in addition to a simple hostname. It can also be a list of hostnames or URIs. Any port specified in the host string(s) will override the port parameter. If multiple mongodb URIs containing database or auth information are passed, the last database, username, and password present will be used. For username and passwords reserved characters like ‘:’, ‘/’, ‘+’ and ‘@’ must be percent encoded following RFC 2396:

As I saw in one of Christos' commits, dbname could be the first positional (non-optional) argument as we know we will always work with only one database. Data for an experiment will be scattered across different collections/tables, not across different databases. Database is a singleton anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making host/uri the first and only positional argument is the best practice here. dbname can be specified inside the uri. Also, in this case explicitly given arguments can override uri-specified ones. In fact I suggest that we let MongoDB handle the priority, and just pass all the arguments inside MongoClient. This in effect would drop the need for a property uri. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it. Defining the attributes separately in the configuration file is more natural than the URI. The URI is more convenient for passing as an argument in command line.

So as you propose I would make the host/uri the first and only positional argument and all the arguments would be passed to MongoClient rather than the URI. That would also address Pascal's concern about the username/password in the URI. In that case, I would drop everything related to URI; the property and self._uri. Until we add support for URI, MongoDB("some_uri") would fail.

"""Disconnect from database."""
pass

@abstractproperty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave lock for latter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Remove these methods?

pass

@abstractmethod
def write(self, collection_name, data, query, atomic, selection):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional arguments should be as well in the AbstractDB if it is intended to stay the same for all children. The docstring should describe all of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed. I did not see that. Will do.

pass

@abstractmethod
def read(self, collection_name, query, selection, atomic):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

upsert=True)
return result.acknowledged
elif atomic is True:
selected_one_key, selection = _get_selected_one_key(selection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is _get_selected_on_key? Why do we select only the first item of the given selection?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only one item has been specified in the selection, then return the value for the specific projection, not a dictionary whose final value is what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I understand the intent. Then we would not need to do db.read({}, {'arg':1})['arg']. But isn't a bit weird? I'd be surprised as a user if making a selection of a single value suddenly returns items rather than dictionaries. It's convenient, but it makes the method's behavior more complex. What if some function makes dynamic reads and at some point its dictionary gets unitary. It would need to take into account that at if its dictionary gets unitary than it needs to rebuild the dictionary because the method suddenly returns an item.

Personally, I would favor coherence and simplicity over a convenience like this. Also, accumulation of small tweaks like that adds on the burden of testing and maintenance, thus slowing down development. Writing db.read({}, {'arg':1})['arg'] is less elegant but doesn't have such an impact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With more dotted selections 'arg1.arg2.arg3' the more ugly it will get. But I agree about your point on coherence and simplicity. Will see what I can do 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should finish this discussion before spending time on selected_one_key, otherwise it might be a waste of time.

upsert=True,
sort=[('$natural', -1)],
return_document=ReturnDocument.AFTER)
return _get_leaf_value(selected_one_key, result) if selected_one_key else result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return the value of the selected_on_key leaf? We do we integrate this in this method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if selected_one_key is not None return the the final value of a dictionary like {'a': {'b': {'c': 5}}}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

dbdocs = list(cursor)
elif atomic is True:
# Return the most recent of the query
dbdocs = list(dbcollection.find(query, selection, sort=[('$natural', -1)], limit=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, find is atomic anyway even on many documents. To me, this attempt to unify write/read behaviors highlights the problem of integrating many updates with one atomic updates in the same write() method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted an interface to return the most recent only. atomic means "which cannot be split/have separate parts".

Copy link
Member

@bouthilx bouthilx Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case atomic means that the operation is safe for concurrency. The atomicity is about the operation. The operation could return many documents.

With the current implementation we are forced to return a single document in write() because of MongoDB particularities, but in general write() could be applied on many documents atomically. We are forcing read()s behavior to stay coherent on write()s behavior, while the later was already forced because of a specific database particularity. That doesn't sound like a good design to me.

It is my belief that we should give ourselves some time to figure out what would be the best way to integrate atomicity. Not having it won't hurt us for the MVP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. I think though that I will find useful searching for a single most recent document later on, for MVP. We could change the name of the optional argument though to something else. I believe that in general this is ok for now. AbstractDB is an internal API, we could change this later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand having a way to fetch the most recent document can be handy, but I disagree this is how it should be done. If I read the following code snippet, it's not clear why the output would be the most recent row unless I read the doc.

most_recent_row = db.read(some_query, atomic=True)

The following is less elegant but more explicit

most_recent_row = next(sorted(db.read(some_query), key=lambda d: d["date"]))

And could later be turned into

most_recent_row = db.read(some_query, sort={"date": 1})

Anyhow, we don't need sort for now, and adding it would not impact the interface so we don't need to rush for it. However, adding it now would affect our progress. When we need it, we can simply make a PR for sort only.

It might feel like I'm arguing for details and slowing down our progress as I advocate for removing stuff rather than changing or adding, but we need to focus more. The tendency of adding extra features will hurt us more and more as we progress with a larger and more complicated code base. Every extra feature is adding burden on development and maintenance. When the code is designed properly, extra features are easy to integrate later on, so there is no reason to rush them in. What we should rather rush for is users' feedback. We cannot afford to design hastily for that, but we can spare time on extra features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just like an optional interface to return the most return recent thing. Databases can do that generally. I would not like a general sort argument, this can be done as you demonstrated. However, searching for the most recent something is probably going to be common. I am not sure.

I agree with your last comment. It takes unjustified motivation to remove stuff I tried right now to facilitate for my future coding. However, I will try to contain my urge to implement in front stuff before they become necessary, in the future.

What would you suggest that I should do? Should I remove it? I think the current tests describe pretty much what I will need to proceed. Interface can be polished later, no need to perfect it right now as you said.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes unjustified motivation to remove stuff I tried right now to facilitate for my future coding. However, I will try to contain my urge to implement in front stuff before they become necessary, in the future.

I don't understand what you mean.

What would you suggest that I should do? Should I remove it? I think the current tests describe pretty much what I will need to proceed. Interface can be polished later, no need to perfect it right now as you said.

I think the best would be to move the extra feature to a new branch and point to it in an issue. This way we do not loose trace of the code but do not have the burden of maintaining it. From what I see there should be 2 issues. One about atomicity and one about getting the most recent document only. Those two should not be coupled.

# Return the most recent of the query
dbdocs = list(dbcollection.find(query, selection, sort=[('$natural', -1)], limit=1))

if len(dbdocs) is 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the pythonic way: if some_integer == 0:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

@pytest.mark.skip(reason="sugar")
def test_ctx_read_experiment(self, exp_config, moptdb):
"""Fetch a whole experiment entry using an experiment context (always atomic)."""
with moptdb('experiments', {'exp_name': 'supernaedo2',
Copy link
Member

@bouthilx bouthilx Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. What is moptdb here? Is it a MongoDB? Where was __enter__ defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet implement this. These test are marked to be skipped because of 'sugar'. See line 100.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this would go to the abstract db, once it gets implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I saw the sugar skip. 😆

If I may ask you, could you please remove tests that are for sugar features. You can keep them in a branch on your fork to avoid loosing the code. I would also encourage you to make an issue where you explain your idea of context manager if you do spend some time on that. Nevertheless, we are past the milestone for the DB and are gonna soon be past the milestone for Experiment/Trials so I would recommend we focus on the base first.

@@ -6,8 +6,9 @@

@six.add_metaclass(AbstractSingletonType)
class AbstractDB(object):
def __init__(self, dbhost='localhost', dbname='metaopt',
def __init__(self, uri, dbhost='localhost', dbname='metaopt',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dendisuhubdy @bouthilx Should uri also be optional? In tests for instance, I expect to pass only dbname, username and password.

from urllib import quote_plus

class MongoDB(AbstractDB):
def __init__(self, uri, dbhost='localhost', dbname='metaopt', dbport=27017, username=None, password=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri here should probably be optional (=None).

# setting this as a function allows to make sure reconnection
# is properly done in the case of an unreliable network

self._uri = uri
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate from AbstractDB

@property
def uri(self):
""" The URI property of the class """
uri = "mongodb://%s:%s@%s:%s" % (quote_plus(self._username),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably overwrite private self._uri, only if the user has not already provided one (if self._uri is None)

# return connection string for checking purposes
return conn

def check_connection(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name in similar fashion to is_locked

upsert=True,
sort=[('$natural', -1)],
return_document=ReturnDocument.AFTER)
return _get_leaf_value(selected_one_key, result) if selected_one_key else result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if selected_one_key is not None return the the final value of a dictionary like {'a': {'b': {'c': 5}}}.

dbdocs = list(cursor)
elif atomic is True:
# Return the most recent of the query
dbdocs = list(dbcollection.find(query, selection, sort=[('$natural', -1)], limit=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted an interface to return the most recent only. atomic means "which cannot be split/have separate parts".

# Return the most recent of the query
dbdocs = list(dbcollection.find(query, selection, sort=[('$natural', -1)], limit=1))

if len(dbdocs) is 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

@pytest.mark.skip(reason="sugar")
def test_ctx_read_experiment(self, exp_config, moptdb):
"""Fetch a whole experiment entry using an experiment context (always atomic)."""
with moptdb('experiments', {'exp_name': 'supernaedo2',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet implement this. These test are marked to be skipped because of 'sugar'. See line 100.

@pytest.mark.skip(reason="sugar")
def test_ctx_read_experiment(self, exp_config, moptdb):
"""Fetch a whole experiment entry using an experiment context (always atomic)."""
with moptdb('experiments', {'exp_name': 'supernaedo2',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this would go to the abstract db, once it gets implemented.

@codecov-io
Copy link

codecov-io commented Nov 28, 2017

Codecov Report

Merging #8 into master will increase coverage by 52.07%.
The diff coverage is 99.19%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master       #8       +/-   ##
===========================================
+ Coverage   13.63%   65.71%   +52.07%     
===========================================
  Files           7        9        +2     
  Lines         154      385      +231     
  Branches       22       39       +17     
===========================================
+ Hits           21      253      +232     
+ Misses        133      130        -3     
- Partials        0        2        +2
Impacted Files Coverage Δ
tests/dummy_test.py 100% <100%> (ø) ⬆️
src/metaopt/io/database/__init__.py 100% <100%> (ø)
src/metaopt/utils/__init__.py 100% <100%> (ø) ⬆️
tests/core/unittests/mongodb_test.py 100% <100%> (ø)
src/metaopt/io/database/mongodb.py 97.22% <97.22%> (ø)
src/metaopt/worker/__init__.py 0% <0%> (-100%) ⬇️
src/metaopt/cli.py 0% <0%> (ø) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 679f573...94f8e09. Read the comment docs.

@tsirif
Copy link
Member

tsirif commented Nov 28, 2017

@dendisuhubdy @bouthilx I think this is done. What do you think? Dendi could you merge the last commits I made?

@dendisuhubdy
Copy link
Contributor Author

@tsirif I believe it is done, @bouthilx could you approve the PR? So @tsirif could merge it :)

@bouthilx
Copy link
Member

Overall yes but we didn't finish this discussion. When we are done discussing, I'll make a final review and we should be able to merge.

- Public API of AbstractDB class consists of: `initiate_connection`,
`is_connected`, `close_connection`, `write`, `read` and `remove`.
- `read` uses `MongoClient.find`
- `write` uses `MongoClient.insert_many`, when `query` is not supplied,
and `MongoClient.update_many` with `upsert=True` else.
- `remove` uses `MongoClient.delete_many`.
- Initializing `MongoDB` class needs either a proper Mongo URI or
separate args, which overwrite elements infered from a possible URI.
- Create tests for each interface cases.
- AbstractDB is an singleton.
- Fix docs configuration.
@dendisuhubdy dendisuhubdy dismissed bouthilx’s stale review November 30, 2017 02:28

Discussion has not yet concluded but major working functions of the database has been addressed. Issues has been written as well as associated links towards improvements.

@dendisuhubdy dendisuhubdy merged commit 07f89d9 into Epistimio:master Nov 30, 2017
@bouthilx
Copy link
Member

bouthilx commented Nov 30, 2017

@dendisuhubdy Why did you dismiss the review? In the main comment of the review I wrote "I would like this review to be the penultimate one". Please do not dismiss reviews. I will never dismiss your reviews either. This mechanism is there for a reason.

@dendisuhubdy dendisuhubdy deleted the database branch January 10, 2018 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants