Local datasets scaling graph v2 #77

ploh · 2017-01-06T09:08:35Z

Scale the graph and time curves of a derived dataset alongside the area attributes - and refactor scaling of the latter.

(Followup PR since #73 accidentally closed and reopening impossible)

codecov-io · 2017-01-06T09:15:45Z

Current coverage is 93.18% (diff: 100%)

Merging #77 into local-datasets will increase coverage by 0.27%

@@           local-datasets        #77   diff @@
================================================
  Files                  62         64     +2   
  Lines                1609       1674    +65   
  Methods                 0          0          
  Messages                0          0          
  Branches                0          0          
================================================
+ Hits                 1495       1560    +65   
  Misses                114        114          
  Partials                0          0

Powered by Codecov. Last update dbd83a0...6f47a35

antw · 2017-01-06T15:43:26Z

I pushed a fix for the incorrect energy_regasification_lng demand. You needed to scale edge demand in addition to node demand. 😃

ploh · 2017-01-09T10:04:00Z

I pushed a fix for the incorrect energy_regasification_lng demand. You needed to scale edge demand in addition to node demand. 😃

@antw Thank you very much 👍
Now my tests are successful for all node demands as well as final demand Co2 emissions.

btw: I did not know that edges could have a set demand. Now I checked and found out that only these do:

energy_regasification_bio_lng-energy_national_gas_network_natural_gas@greengas
energy_regasification_lng-energy_national_gas_network_natural_gas@natural_gas

What is so special about them?

ploh · 2017-01-09T12:29:02Z

I still have some open questions:

demands/metal_demands.csv, primary_production.csv and central_producers.csv all have a demand column. Now these files are certainly used by dynamic node attributes which are calculated in Atlas::Runner and the computed values are stored in graph.yml for a derived dataset. Hence, we might not need to scale the csv files - since the values in graph.yml will be scaled themselves.
But... are these files maybe also directly accessed by ETEngine?
My source code search found only one such place, namely in Etsource::Dataset::Import. There, only the full_load_hours attribute is read from the central_producers.csv.
@antw How certain are you that there are no direct uses of demand (or max_demand) from one of these files?
To ensure the correctness of the scaling, I successfully checked all node demands as well as all final demand Co2 emissions. But I do not think this is sufficient.
I would like to (automatically) check all gqueries. (This can then also be set up as an automated CI task.)
@ChaelKruip Do you think this is a good idea of validating the scaling?
@antw Is there some sort of "query rake task" in etengine which I could use for this?

ChaelKruip · 2017-01-09T12:42:55Z

@ChaelKruip Do you think this is a good idea of validating the scaling?
@antw Is there some sort of "query rake task" in etengine which I could use for this?

Yes.

I think there is already quite a bit of machinery to view the outcomes of all gqueries in the ETE back-end. For example this list of all outcomes of queries

Ref #77

antw · 2017-01-09T13:54:23Z

But... are these files maybe also directly accessed by ETEngine?
My source code search found only one such place, namely in Etsource::Dataset::Import. There, only the full_load_hours attribute is read from the central_producers.csv.

In hindsight, I think setting full_load_hours like this is wrong, and that it should be done with a dynamic attribute (like demand or share). This way, it would benefit from being scaled correctly without having to duplicate the CSV.

Ideally this would be handled automatically by Atlas (so that you didn't have to put the query in every central producer node document), but presently queries have to be saved in the file.

I pushed Atlas and ETSource branches with a experimental changes, which should allow that code to be removed from ETEngine and the scaling to function correctly:

Atlas: local-datasets-with_central_attr
ETSource: local-datasets-with_central_flh

ploh · 2017-01-09T14:24:23Z

This way, it would benefit from being scaled correctly without having to duplicate the CSV.
...
and the scaling to function correctly

I think I should have been more precise above: The full_load_hours is not a problem (at least in the outcome) since it does not need to be scaled. I was only wondering whether there might be similar direct references to the demand column as well.

Ideally this would be handled automatically by Atlas (so that you didn't have to put the query in every central producer node document), but presently queries have to be saved in the file.

For the demand it seems to be set via CENTRAL_PRODUCTION() for only some of the *.central_producer.ad nodes as well - which seems strange.. but also off topic here 😜

I pushed Atlas and ETSource branches with a experimental changes

👍
I will merge those changes into our local-datasets branches. Do you agree that the code in ETEngine can then simply be removed - since the full_load_hours for central producers are now gonna work as for other converters?

Ref #77

Ref quintel/atlas#77

antw · 2017-01-09T14:33:39Z

I was only wondering whether there might be similar direct references to the demand column as well.

Not that I'm aware of.

The full_load_hours is not a problem (at least in the outcome) since it does not need to be scaled. [...] I will merge those changes into our local-datasets branches.

If it doesn't need to be scaled, no need to merge the branches. 😃 I think full_load_hours should be scaled though; @ChaelKruip what do you think?

ploh · 2017-01-09T14:40:59Z

If it doesn't need to be scaled, no need to merge the branches.

Well, as you said above...

In hindsight, I think setting full_load_hours like this is wrong, and that it should be done with a dynamic attribute (like demand or share)

... so I think we should merge it anyways since it improves the system design. @antw What do you think?

I think full_load_hours should be scaled though; @ChaelKruip what do you think?

As far as I can tell the numbers are "per unit per year", i.e. the hours for one power plant, and thus not to be scaled.

ChaelKruip · 2017-01-09T14:54:11Z

I think full_load_hours should be scaled though; @ChaelKruip what do you think?

full_load_hours are fixed for most technologies but can be changed by ETE (for heating technologies in households) or the Merit module (for the central producers). They don't need to be scaled I think as they just express how many hours a technology typically runs during a year.

The energy (demand) of technologies can be scaled in many ways. The input_capacity, demand, full_load_hours and number_of_units are linked by:

demand [Joule] = input_capacity [Watt] * number_of_units [#] * full_load_seconds [#]

of which demand or number_of_units could be used as the scaling variable to which the other of the two would need to react. I think that currently, the demand is used. input_capacity is kept constant for all technologies except for the ones mentioned in #73 (comment). full_load_seconds (or full_load_hours) need to be fixed as well.

ploh · 2017-01-09T15:03:18Z

Partial answer to my own question about the two explicit-demand edges:

quintel/etsource@81463bdb3d0

Ref quintel/atlas#77 Ref quintel/atlas@89a0880

- ScaledAttributes -> AreaAttributesScaler - Compute scaling_factor in top-level class Scaler

- Add a export_modifier hook to GraphPersistor - Scale after conversion to a hash - White-list of node attributes that need scaling - Most of the node attributes occurring in graph.yml are scaled

ploh · 2017-01-10T18:27:51Z

@grdw I think this is finally ready to be reviewed and merged 😄
@antw I would appreciate your feedback as well if your time permits

grdw · 2017-01-11T09:05:10Z

lib/atlas/csv_document.rb

+
+      # Delete the header row for the internal representation -
+      # will be dynamically (re-)created when outputting
+      table.delete(0)


I'm a bit confused by this line. On line 53 you specify that the CSV should return headers only for them to be deleted a few lines down. Could you explain why you are doing this?

Because otherwise the @table will not contain the headers. So I return headers, retrieve them and then delete the superfluous line.

I wrote comments in the source code for this. Do you have any suggestions how to make them clearer?

Because otherwise the @table will not contain the headers

Just to be clear; line 56 will return [] instead?

Just to be clear; line 56 will return [] instead?

Yes - only if the table is empty except for the header row, of course.

grdw · 2017-01-11T09:11:43Z

lib/atlas/scaler.rb

        .merge(new_attributes))

      derived_dataset.save!

-      GraphPersistor.call(@base_dataset, derived_dataset.graph_path)
+      GraphPersistor.call(@base_dataset, derived_dataset.graph_path, export_modifier: Scaler::GraphScaler.new(scaling_factor))


Can't we call export_modifier, scaler for now. I understand that it's practical to keep it abstract; migth there be a reason to one day support (multiple?) "export modifiers". But I'm not so sure we should do this already considering that only 'scaling' is relevant for local datasets.

The thing is that I would like to avoid the word "scaling" completely inside graph_persistor.rb since the latter has nothing to do with the former.

grdw · 2017-01-11T09:17:46Z

lib/atlas/graph_persistor.rb

    end

    def persist!
+      data = EssentialExporter.dump(refinery_graph)
+      @export_modifier.call(data) if @export_modifier


In which situation will export_modifier be blank?

Well, not at all at the moment, but since it is an optional attribute, the code should be able to handle the blank case.

grdw · 2017-01-11T09:18:08Z

lib/atlas/graph_persistor.rb

      File.open(@path, 'w') do |f|
-        f.write EssentialExporter.dump(refinery_graph).to_yaml
+        f.write data.to_yaml


Maybe data can be a separate method to keep the code clean.

For example:

def data @export_modifier.call(EssentialExporter.dump(refinery_graph)) end

Nitpicking, but can this File.open block not be simplified to a one-liner?

File.write(@path, data.to_yaml)

I'm not sure I see the value in extracting data out to a separate method, unless it is something which is used in more than one place (the persist! method is already very short).

Also done in 62cffb5

grdw · 2017-01-11T09:21:37Z

lib/atlas/scaler.rb

@@ -9,16 +9,30 @@ def initialize(base_dataset_key, derived_dataset_name, number_of_residences)
    def create_scaled_dataset
      derived_dataset = Dataset::DerivedDataset.new(
        @base_dataset.attributes
-        .merge(scaled_attributes)


Nitpicking: can this be indented 2 lines further for the sake of readability? I don't care to strongly about it though so I'm fine either way.

Done in commit 8f85716

grdw · 2017-01-11T09:23:15Z

lib/atlas/scaler/area_attributes_scaler.rb

+          if value = @base_dataset[attr]
+            [attr, Util::round_computation_errors(value * @scaling_factor)]
+          end
+        end.compact


Executing a method on an end is breaking the style guide a little. You can use a tool like rubocop to validate your code.

Edit: you might solve it like:

def scale @base_dataset.attributes.slice(*SCALEABLE_AREA_ATTRS).map do |key, value| [key, ... ] end end

But that way I would not avoid the attributes set to nil. I found another solution though.

Solved in commit fdc9e3f

grdw · 2017-01-11T09:34:34Z

lib/atlas/scaler/time_curve_scaler.rb

+            scaled_csv.set(row_key, column_key, base_value * @scaling_factor)
+          end
+        end
+        scaled_csv.save


This method is a bit to long.

Solved in 9e6e6e0

grdw · 2017-01-11T14:29:58Z

lib/atlas/csv_document.rb

+
+      # Delete the header row for the internal representation -
+      # will be dynamically (re-)created when outputting
+      table.delete(0)


Because otherwise the @table will not contain the headers

Just to be clear; line 56 will return [] instead?

antw · 2017-01-12T12:10:34Z

lib/atlas/scaler/graph_scaler.rb

+      @scaling_factor = scaling_factor
+    end
+
+    # Public: Scales the demands in the graph - modifying the original graph!


I think this documentation is out-of-date. It looks like graph is actually the hash of node and edge data, and it doesn't return the graph object anymore.

Solved in commit a3ea7ce

antw · 2017-01-12T12:14:32Z

lib/atlas/scaler/time_curve_scaler.rb

@@ -0,0 +1,29 @@
+module Atlas
+  class Scaler::TimeCurveScaler


It would be nice to have some documentation for this class. The method signatures of call and initialize don't give much clue as to what the parameters should be, or what the result values are (if any).

For example: are base_dataset and derived_dataset keys for datasets (no), or actual dataset objects (yes). Is it acceptable for both to be any Dataset subclass, or does base have to be a Full and derived a Derived?

Does scale return the scaled CSV file when it has completed, or is the return value nothing useable?

Fixed in 065345b

antw · 2017-01-12T12:18:55Z

lib/atlas/graph_persistor.rb

      File.open(@path, 'w') do |f|
-        f.write EssentialExporter.dump(refinery_graph).to_yaml
+        f.write data.to_yaml


Nitpicking, but can this File.open block not be simplified to a one-liner?

File.write(@path, data.to_yaml)

I'm not sure I see the value in extracting data out to a separate method, unless it is something which is used in more than one place (the persist! method is already very short).

antw · 2017-01-12T12:23:40Z

lib/atlas/graph_persistor.rb

-    def initialize(dataset, path)
-      @dataset = dataset
-      @path    = path
+    def initialize(dataset, path, export_modifier: nil)


Nitpicking: I don't think path needs to be provided to initialize; it is only used in the persist! method. This is somewhat true of export_modifier too, but since that param forms part of the class "identity" I think it's fair to provide that when creating the class.

How about instead giving the destination path to persist?

GraphPersitor.new(dataset, modifier).persist!(path)

At the moment, the whole class feels a bit overkill to me; it can be substituted for a three-line lambda:

module Atlas GraphPersistor = lambda do |dataset, path, export_modifier: nil| data = EssentialExporter.dump(Runner.new(dataset).refinery_graph(:export)) export_modifier.call(data) if export_modifier File.write(path, data.to_yaml) end end

It could also use some documentation to explain what the params and return values are. export_modifier in particular is completely opaque to any new reader of this code. What values are yielded to the modifier? What should the modifier return?

Solved in commit 62cffb5

antw · 2017-01-12T12:39:18Z

lib/atlas/csv_document.rb

+      initial_table = CSV::Table.new([CSV::Row.new(headers, headers, true)])
+
+      write(initial_table, Pathname.new(path))
+


The current use of this method (in TimeCurveScaler) looks like: CSVDocument.create → add data → save.

That involves two writes to disk (in create and again in save), and if something goes wrong in "add data" or the second save, you end up with a half-complete CSV file on disk. Perhaps create could yield itself prior to writing, so that the user can add their initial data? (like File.open(path, 'w') { |f| ... })

CSVDocument.create(path, headers) do |doc| doc.set(:a, :b, 1) doc.set(:c, :d, 2) end

Yes, you are so right 😔
I did it this way because Ruby's CSV class is not really good for read-write access and because I did not want to mess up the current CSVDocument.new signature.
But I found an okish way to build only the document and not save it until later.

This will be commit 310ab25

ploh · 2017-01-12T20:41:36Z

@grdw @antw I fixed all your remarks. I wrote answers to your remarks with corresponding commit hashes for my fixes, in case you want to track my latest changes.

I will merge this branch now and fix possible follow-up remarks in another PR.

Ref quintel/atlas#77

Ref quintel/atlas#77 Ref quintel/atlas@89a0880

ploh mentioned this pull request Jan 6, 2017

Local datasets scaling graph #73

Closed

antw added a commit that referenced this pull request Jan 9, 2017

CENTRAL_PRODUCTION may receive an optional attribute

644830f

Ref #77

ploh pushed a commit that referenced this pull request Jan 9, 2017

CENTRAL_PRODUCTION may receive an optional attribute

90b1455

Ref #77

ploh pushed a commit to quintel/etsource that referenced this pull request Jan 9, 2017

Add full_load_hours queries to central producers

5943807

Ref quintel/atlas#77

ploh pushed a commit to quintel/etengine that referenced this pull request Jan 10, 2017

Remove special case for central producers' full_load_hours

518a442

Ref quintel/atlas#77 Ref quintel/atlas@89a0880

Peter Lohmann and others added 6 commits January 10, 2017 19:15

Refactor scaling of area attributes

510f0a3

- ScaledAttributes -> AreaAttributesScaler - Compute scaling_factor in top-level class Scaler

Scale the exported graph

ebf1bb2

- Add a export_modifier hook to GraphPersistor - Scale after conversion to a hash - White-list of node attributes that need scaling - Most of the node attributes occurring in graph.yml are scaled

Scaler::GraphScaler also scales edge demand

58d68a5

Make specs for GraphScaler independent of graph importing

bdf08b4

Make CSVDocument read-write

199d2b1

Scale the time curves

6f47a35

ploh force-pushed the local-datasets-scaling_graph branch from 8d19108 to 6f47a35 Compare January 10, 2017 18:24

ploh requested a review from grdw January 10, 2017 18:25

ploh assigned grdw Jan 10, 2017

grdw reviewed Jan 11, 2017

View reviewed changes

grdw mentioned this pull request Jan 11, 2017

Overview issue local datasets #78

Closed

grdw reviewed Jan 12, 2017

View reviewed changes

antw reviewed Jan 12, 2017

View reviewed changes

Peter Lohmann added 8 commits January 12, 2017 20:39

Remove obsolete CSVDocument::Production

82a4d1e

CSVDocument.new may be used to create new csv file

310ab25

Simplify GraphPersistor to a lambda

62cffb5

Correct indentation

8f85716

Improve code style

fdc9e3f

Update GraphScaler documentation

a3ea7ce

Documentation for TimeCurveScaler

065345b

Refactor TimeCurveScaler#scale with helper methods

9e6e6e0

ploh merged commit eafba60 into local-datasets Jan 12, 2017

ploh deleted the local-datasets-scaling_graph branch January 12, 2017 20:42

grdw pushed a commit to quintel/etsource that referenced this pull request Jan 27, 2017

Add full_load_hours queries to central producers

1f6d53d

Ref quintel/atlas#77

grdw pushed a commit to quintel/etengine that referenced this pull request Jan 30, 2017

Remove special case for central producers' full_load_hours

483517f

Ref quintel/atlas#77 Ref quintel/atlas@89a0880

		initial_table = CSV::Table.new([CSV::Row.new(headers, headers, true)])

		write(initial_table, Pathname.new(path))

Local datasets scaling graph v2 #77

Local datasets scaling graph v2 #77

Conversation

ploh commented Jan 6, 2017 • edited Loading

codecov-io commented Jan 6, 2017 • edited Loading

Current coverage is 93.18% (diff: 100%)

antw commented Jan 6, 2017

ploh commented Jan 9, 2017

ploh commented Jan 9, 2017

ChaelKruip commented Jan 9, 2017

antw commented Jan 9, 2017

ploh commented Jan 9, 2017

antw commented Jan 9, 2017

ploh commented Jan 9, 2017 • edited Loading

ChaelKruip commented Jan 9, 2017 • edited Loading

ploh commented Jan 9, 2017

ploh commented Jan 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ploh commented Jan 12, 2017

ploh commented Jan 6, 2017 •

edited

Loading

codecov-io commented Jan 6, 2017 •

edited

Loading

ploh commented Jan 9, 2017 •

edited

Loading

ChaelKruip commented Jan 9, 2017 •

edited

Loading