Massive commit of decoupling from mongo and addinng a PanFileDB type by wtgee · Pull Request #414 · panoptes/POCS

wtgee · 2018-01-24T09:47:09Z

PanDB is a factory class that will load either PanMongoDB or PanFileDB
PanFileDB merely writes the json to a flat file
- One file per "collection" along with a current_<collection for current
  entries. Note this means each "current" collection is a separate file.
- Use bson.json_util for serializing. Changes a few things with time
  because it can store timezone aware info, which we don't use properly
  in current_time.
Try to only rely on one DB fro weakref kep - WIP
Clear both mongo and file types of their "current" entries upon POCS init
Change include_collection to store_permanently for calls to insert_current
Change use_mongo to store_result where appropriate
Private POCS method _check_environment changed to public class method, i.e.
POCS.check_environment
Tests that use the db now parameterize it across db types. Now testing
takes almost twice as long! (should deal with this somehow)

* PanDB is a factory class that will load either PanMongoDB or PanFileDB * PanFileDB merely writes the json to a flat file * One file per "collection" along with a `current_<collection` for current entries. Note this means each "current" collection is a separate file. * Use `bson.json_util` for serializing. Changes a few things with time because it _can_ store timezone aware info, which we don't use properly in `current_time`. * Try to only rely on one DB fro weakref kep - WIP * Clear both mongo and file types of their "current" entries upon POCS init * Change `include_collection` to `store_permanently` for calls to `insert_current` * Change `use_mongo` to `store_result` where appropriate * Private POCS method `_check_environment` changed to public class method, i.e. `POCS.check_environment` * Tests that use the `db` now parameterize it across db types. Now testing takes almost twice as long! (should deal with this somehow)

jamessynge · 2018-01-26T01:50:35Z

I think we can use Python's json module, as it supports custom encoder and decoder classes. I'm willing to help with that if you like.

wtgee · 2018-01-26T02:01:21Z

I think we can use Python's json module, as it supports custom encoder and decoder classes. I'm willing to help with that if you like.

I can see some advantages to a custom encoder (as mentioned ni #206) but I don't see any immediate need when this already does what we want.

jamessynge · 2018-01-26T02:15:16Z

Don't you have to install mongodb to use bson.json_util?

wtgee · 2018-01-26T03:37:19Z

No, bson is a standalone module. It's installed by pymongo but also doesn't required pymongo.

But I could still see some use cases for a custom encoder/decoder, just not sure it's a priority.

…-from-mongo-120

jamessynge · 2018-01-27T19:22:06Z

FWIW, I notice that we don't use bson.json_util for much: Most of the loads() and dumps() calls are using the regular json library.

$ egrep '\b(bson|json_util|dumps|loads)\b' $(find . -name '*.py')
./scripts/simple_sensors_capture.py:from bson import json_util
./scripts/simple_sensors_capture.py:                        f.write(json_util.dumps(data) + '\n')
./scripts/sensors_simulator.py:import bson.json_util as json_util
./scripts/collect-grid-stats.py:        f.write(json.dumps(readings))
./peas/tests/serial_handlers/protocol_arduinosimulator.py:        s = json.dumps(self.message) + '\r\n'
./peas/tests/serial_handlers/protocol_arduinosimulator.py:            message = json.loads("""
./peas/tests/serial_handlers/protocol_arduinosimulator.py:            message = json.loads("""
./pocs/dome/bisque.py:            response_obj = json.loads(response)
./pocs/mount/bisque.py:            response_obj = json.loads(response)
./pocs/camera/sbigudrv.py:        On construction loads SBIG's shared library which must have already
./pocs/utils/messaging.py:from bson import ObjectId
./pocs/utils/messaging.py:from json import dumps
./pocs/utils/messaging.py:from json import loads
./pocs/utils/messaging.py:        msg_object = dumps(message, skipkeys=True)
./pocs/utils/messaging.py:                msg_obj = loads(msg)
./pocs/utils/logger.py:    logger_key = (profile, json.dumps(log_config, sort_keys=True))
./pocs/utils/rs232.py:        return json.loads(line)
./pocs/utils/database.py:from bson import json_util
./pocs/utils/database.py:                content = json.dumps(entries, default=json_util.default)
./pocs/utils/images/cr2.py:from json import loads
./pocs/utils/images/cr2.py:        exif = loads(subprocess.check_output(cmd_list).decode('utf-8'))
./build/lib/pocs/dome/bisque.py:            response_obj = json.loads(response)
./build/lib/pocs/mount/bisque.py:            response_obj = json.loads(response)
./build/lib/pocs/camera/sbigudrv.py:        On construction loads SBIG's shared library which must have already
./build/lib/pocs/utils/messaging.py:from bson import ObjectId
./build/lib/pocs/utils/messaging.py:from json import dumps
./build/lib/pocs/utils/messaging.py:from json import loads
./build/lib/pocs/utils/messaging.py:        msg_object = dumps(message, skipkeys=True)
./build/lib/pocs/utils/messaging.py:                msg_obj = loads(msg)
./build/lib/pocs/utils/logger.py:    logger_key = (profile, json.dumps(log_config, sort_keys=True))
./build/lib/pocs/utils/rs232.py:        return json.loads(line)
./build/lib/pocs/utils/database.py:from bson import json_util
./build/lib/pocs/utils/database.py:                content = json.dumps(entries, default=json_util.default)
./build/lib/pocs/utils/images/cr2.py:from json import loads
./build/lib/pocs/utils/images/cr2.py:        exif = loads(subprocess.check_output(cmd_list).decode('utf-8'))

…ple-storage-from-mongo-120

wtgee · 2018-01-30T23:24:53Z

I get the feeling someone doesn't want us to use bson. :)

We should just have a common encode/decode method in our utils folder that would let us change out the underlying implementation however we want. As for the files listed above, it's not a great example of the usage and the main ones in messaging and database tare the ones that have given us problems precisely because json doesn't handle things out of the box.

I'm not necessarily tied to bson, it just seems like we will be writing in a lot of the same functionality.

For now I'll make a common function that can be called so ti's easier to change in future.

jamessynge · 2018-01-30T23:27:58Z

I just don't want to be tied to Mongo. I think I now better understand that loads(default=bson.json_utils.default) provides the hook for serializing. We can leave that as is. BUT, let's put the json code into a single py file, so that we can swap it out easily.

Fixing some testing scope

Removing timezone info from utils function

…ple-storage-from-mongo-120

codecov · 2018-02-01T22:52:25Z

Codecov Report

Merging #414 into develop will increase coverage by 0.45%.
The diff coverage is 81.93%.

@@             Coverage Diff             @@
##           develop     #414      +/-   ##
===========================================
+ Coverage    67.22%   67.68%   +0.45%     
===========================================
  Files           61       62       +1     
  Lines         5058     5164     +106     
  Branches       702      719      +17     
===========================================
+ Hits          3400     3495      +95     
- Misses        1471     1476       +5     
- Partials       187      193       +6

Impacted Files	Coverage Δ
pocs/camera/simulator.py	`100% <ø> (ø)`	⬆️
pocs/observatory.py	`73.63% <0%> (+0.28%)`	⬆️
peas/weather.py	`0% <0%> (ø)`	⬆️
pocs/utils/serializers.py	`100% <100%> (ø)`
pocs/utils/__init__.py	`87.09% <100%> (ø)`	⬆️
peas/sensors.py	`36.88% <50%> (-1.21%)`	⬇️
pocs/core.py	`79.52% <57.14%> (-1.11%)`	⬇️
pocs/base.py	`90.47% <75%> (-6.5%)`	⬇️
pocs/utils/database.py	`89.04% <87.85%> (-2.9%)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21c6e32...5463843. Read the comment docs.

jamessynge

Rename the PR (remove Major WIP) when you're ready for a final review.

jamessynge · 2018-02-02T20:09:47Z

    mounts: POCS/resources/mounts
 db: 
    name: panoptes
+    type: mongo


When should we switch to the new type?

I actually want to add an sqlite type and make that the default. I think the flat file is a good alternative but I'm not sure ideal.

jamessynge · 2018-02-02T20:32:17Z


-        # Set up a Timer that will wait for the duration of the exposure then copy a dummy FITS file
-        # to the specified path and adjust the headers according to the exposure time, type.
+        # Set up a Timer that will wait for the duration of the exposure then


We keep having little formatting/cleanup changes like these inside of bigger changes. Perhaps after this you'd be willing to do another PR that scrubs the whole repo? No need to remove this change from ths PR.

Good idea. I've just been doing them as I see them, which does pollute the PR some.

jamessynge · 2018-02-02T20:39:14Z

@@ -290,15 +290,12 @@ def analyze_recent(self):
                self.current_offset_info))

            # Update the observation info with the offsets


Obsolete comment?

jamessynge · 2018-02-02T20:39:44Z


            is_safe = record['data'].get('safe', False)
-            timestamp = record['date']
+            timestamp = record['date'].replace(tzinfo=None)


What is that replace tzinfo about?

Ah, I thought I had made a comment about this. One thing about bson is that it is correctly serializing the datetime and that includes timezone information. However we are not properly using that timezone information in current_time. Basic math between a naive datetime and a timezone-aware datetime cannot be done so here we are just dropping the tzinfo, which reverts to same behaviour as we currently have.

I had a change to current_time(datetime=True) that made it timezone aware based off what timezone is listed in the config file, but that had implications beyond the scope of this PR. Will file an issue about it.

jamessynge · 2018-02-02T20:52:40Z

 def test_mongo_objectid(forwarder, sub, pub, config, db):

-    db.insert_current('config', {'foo': 'bar'})
+    id0 = db.insert_current('config', {'foo': 'bar'}, store_permanently=False)


Is this a test of serialization/deserialization of a mongo specific type?

It was for a mongo specific ObjectId. With flat files that is a uuid4 id. Will rename test, but fundamentally it is testing whether that is delivered via the messaging system correctly.

jamessynge · 2018-02-02T20:54:08Z

+def test_insert_and_no_permanent(db):
    rec = {'test': 'insert'}
-    db.insert_current('config', rec, include_collection=False)
+    id0 = db.insert_current('config', rec, store_permanently=False)


I should asked earlier: what does store_permanently mean?

It replaces include_collection and basically means that it stores it in the permanent collection and not just in the current collection.

jamessynge · 2018-02-02T21:01:13Z

+        config=config,
+        simulator=['all'],
+        ignore_local_config=True,
+        db_name='panoptes_testing',


I think config provides this db name.

jamessynge · 2018-02-02T21:01:27Z

    pocs = POCS(observatory,
                run_once=True,
                config=config,
+                db_name='panoptes_testing',


jamessynge · 2018-02-02T21:02:46Z

+    pocs.power_down()
+
+
+def test_run_interrupt_with_reschedule_of_target(observatory):


Probably very worthwhile, but is this related?

Hm, not sure why this is in this PR. 😕

jamessynge · 2018-02-04T00:36:51Z

I skimmed the latest 5 commits, which look reasonable and pass travis. Is this still WIP?

wtgee · 2018-02-04T04:02:44Z

I'm not too sure about the test_pocs.py as it now runs almost double the tests so I was going to look into changing that. I removed a few of the sleep calls.

But yes, I think ready for checking and I'd like to get on PAN008 soon since I can't have mongo.

jamessynge · 2018-02-04T21:44:02Z

No mongo because it is a Raspberry Pi?

wtgee · 2018-02-04T21:47:46Z

No mongo because it is a Raspberry Pi?

Correct. You can get it on there it is just a pain to get a recent version.

FWIW, I checked out this branch on PAN008 last night and it seemed to work well. It was cloudy and I was mostly doing alignment so didn't run through full observation run but the basics worked ok.

This PR doesn't handle any kind of cleanup/maintenance of the files it creates. I'm also not sure the $PANDIR/json_store is the best location.

wtgee added 2 commits January 27, 2018 18:12

A few changes to make testing smoother

c0e626d

Merge remote-tracking branch 'upstream/develop' into decouple-storage…

5da15e9

…-from-mongo-120

wtgee added 2 commits January 28, 2018 11:00

Fixing dome tests with db types

6ad25d9

Merge branch 'develop' of https://github.com/panoptes/POCS into decou…

98a4651

…ple-storage-from-mongo-120

wtgee added 4 commits January 31, 2018 11:48

Adding a json utility file so we have a single point to encode/decode

b4ba387

Fixing some testing scope

Changing file name

a1d3016

Removing unused modules and pep8

4a3a181

Removing timezone info from utils function

Merge branch 'develop' of https://github.com/panoptes/POCS into decou…

ee9a351

…ple-storage-from-mongo-120

jamessynge reviewed Feb 2, 2018

View reviewed changes

wtgee added 5 commits February 3, 2018 15:01

Small changes from review

86f652c

Removing unnecessary duplicate test

e1fb72d

Adding comment about timezone replacement

89c6269

Minimize parameterization

7ffb3cf

Reduce wait time on some tests

5463843

wtgee changed the title ~~Massive commit of decoupling from mongo and addinng a PanFileDB type - Major WIP~~ Massive commit of decoupling from mongo and addinng a PanFileDB type Feb 4, 2018

jamessynge approved these changes Feb 4, 2018

View reviewed changes

wtgee merged commit f544638 into panoptes:develop Feb 4, 2018

wtgee deleted the decouple-storage-from-mongo-120 branch February 4, 2018 23:37

This was referenced Feb 5, 2018

get_current db issue #443

Closed

Local test execution is too slow. #191

Open

wtgee mentioned this pull request Feb 13, 2018

Decouple PanMongo from mongo (and rename to PanDB) #120

Closed

		@@ -290,15 +290,12 @@ def analyze_recent(self):
		self.current_offset_info))

		# Update the observation info with the offsets

		pocs.power_down()


		def test_run_interrupt_with_reschedule_of_target(observatory):

Conversation

wtgee commented Jan 24, 2018

Uh oh!

jamessynge commented Jan 26, 2018

Uh oh!

wtgee commented Jan 26, 2018

Uh oh!

jamessynge commented Jan 26, 2018

Uh oh!

wtgee commented Jan 26, 2018

Uh oh!

jamessynge commented Jan 27, 2018

Uh oh!

wtgee commented Jan 30, 2018

Uh oh!

jamessynge commented Jan 30, 2018

Uh oh!

codecov bot commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jamessynge left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wtgee Feb 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamessynge commented Feb 4, 2018

Uh oh!

wtgee commented Feb 4, 2018

Uh oh!

jamessynge commented Feb 4, 2018

Uh oh!

wtgee commented Feb 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 1, 2018 •

edited

Loading

wtgee Feb 3, 2018 •

edited

Loading