-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local datasets scaling graph v2 #77
Conversation
Current coverage is 93.18% (diff: 100%)
@@ local-datasets #77 diff @@
================================================
Files 62 64 +2
Lines 1609 1674 +65
Methods 0 0
Messages 0 0
Branches 0 0
================================================
+ Hits 1495 1560 +65
Misses 114 114
Partials 0 0
|
I pushed a fix for the incorrect |
@antw Thank you very much 👍 btw: I did not know that edges could have a set demand. Now I checked and found out that only these do:
What is so special about them? |
I still have some open questions:
|
Yes. I think there is already quite a bit of machinery to view the outcomes of all gqueries in the ETE back-end. For example this list of all outcomes of queries |
In hindsight, I think setting full_load_hours like this is wrong, and that it should be done with a dynamic attribute (like Ideally this would be handled automatically by Atlas (so that you didn't have to put the query in every central producer node document), but presently queries have to be saved in the file. I pushed Atlas and ETSource branches with a experimental changes, which should allow that code to be removed from ETEngine and the scaling to function correctly:
|
I think I should have been more precise above: The
For the
👍 |
Not that I'm aware of.
If it doesn't need to be scaled, no need to merge the branches. 😃 I think |
Well, as you said above...
... so I think we should merge it anyways since it improves the system design. @antw What do you think?
As far as I can tell the numbers are "per unit per year", i.e. the hours for one power plant, and thus not to be scaled. |
The energy (demand) of technologies can be scaled in many ways. The
of which |
Partial answer to my own question about the two explicit-demand edges: |
- ScaledAttributes -> AreaAttributesScaler - Compute scaling_factor in top-level class Scaler
- Add a export_modifier hook to GraphPersistor - Scale after conversion to a hash - White-list of node attributes that need scaling - Most of the node attributes occurring in graph.yml are scaled
8d19108
to
6f47a35
Compare
|
||
# Delete the header row for the internal representation - | ||
# will be dynamically (re-)created when outputting | ||
table.delete(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this line. On line 53 you specify that the CSV should return headers only for them to be deleted a few lines down. Could you explain why you are doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because otherwise the @table
will not contain the headers
. So I return headers, retrieve them and then delete the superfluous line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote comments in the source code for this. Do you have any suggestions how to make them clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because otherwise the @table will not contain the headers
Just to be clear; line 56 will return []
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear; line 56 will return [] instead?
Yes - only if the table is empty except for the header row, of course.
.merge(new_attributes)) | ||
|
||
derived_dataset.save! | ||
|
||
GraphPersistor.call(@base_dataset, derived_dataset.graph_path) | ||
GraphPersistor.call(@base_dataset, derived_dataset.graph_path, export_modifier: Scaler::GraphScaler.new(scaling_factor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we call export_modifier
, scaler
for now. I understand that it's practical to keep it abstract; migth there be a reason to one day support (multiple?) "export modifiers". But I'm not so sure we should do this already considering that only 'scaling' is relevant for local datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that I would like to avoid the word "scaling" completely inside graph_persistor.rb since the latter has nothing to do with the former.
end | ||
|
||
def persist! | ||
data = EssentialExporter.dump(refinery_graph) | ||
@export_modifier.call(data) if @export_modifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which situation will export_modifier
be blank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not at all at the moment, but since it is an optional attribute, the code should be able to handle the blank case.
File.open(@path, 'w') do |f| | ||
f.write EssentialExporter.dump(refinery_graph).to_yaml | ||
f.write data.to_yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe data
can be a separate method to keep the code clean.
For example:
def data
@export_modifier.call(EssentialExporter.dump(refinery_graph))
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, but can this File.open
block not be simplified to a one-liner?
File.write(@path, data.to_yaml)
I'm not sure I see the value in extracting data
out to a separate method, unless it is something which is used in more than one place (the persist!
method is already very short).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also done in 62cffb5
@@ -9,16 +9,30 @@ def initialize(base_dataset_key, derived_dataset_name, number_of_residences) | |||
def create_scaled_dataset | |||
derived_dataset = Dataset::DerivedDataset.new( | |||
@base_dataset.attributes | |||
.merge(scaled_attributes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: can this be indented 2 lines further for the sake of readability? I don't care to strongly about it though so I'm fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 8f85716
if value = @base_dataset[attr] | ||
[attr, Util::round_computation_errors(value * @scaling_factor)] | ||
end | ||
end.compact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Executing a method on an end
is breaking the style guide a little. You can use a tool like rubocop to validate your code.
Edit: you might solve it like:
def scale
@base_dataset.attributes.slice(*SCALEABLE_AREA_ATTRS).map do |key, value|
[key, ... ]
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that way I would not avoid the attributes set to nil
. I found another solution though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in commit fdc9e3f
scaled_csv.set(row_key, column_key, base_value * @scaling_factor) | ||
end | ||
end | ||
scaled_csv.save |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a bit to long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in 9e6e6e0
|
||
# Delete the header row for the internal representation - | ||
# will be dynamically (re-)created when outputting | ||
table.delete(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because otherwise the @table will not contain the headers
Just to be clear; line 56 will return []
instead?
@scaling_factor = scaling_factor | ||
end | ||
|
||
# Public: Scales the demands in the graph - modifying the original graph! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this documentation is out-of-date. It looks like graph
is actually the hash of node and edge data, and it doesn't return the graph object anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in commit a3ea7ce
@@ -0,0 +1,29 @@ | |||
module Atlas | |||
class Scaler::TimeCurveScaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some documentation for this class. The method signatures of call
and initialize
don't give much clue as to what the parameters should be, or what the result values are (if any).
For example: are base_dataset
and derived_dataset
keys for datasets (no), or actual dataset objects (yes). Is it acceptable for both to be any Dataset
subclass, or does base
have to be a Full
and derived
a Derived
?
Does scale
return the scaled CSV file when it has completed, or is the return value nothing useable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 065345b
File.open(@path, 'w') do |f| | ||
f.write EssentialExporter.dump(refinery_graph).to_yaml | ||
f.write data.to_yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, but can this File.open
block not be simplified to a one-liner?
File.write(@path, data.to_yaml)
I'm not sure I see the value in extracting data
out to a separate method, unless it is something which is used in more than one place (the persist!
method is already very short).
def initialize(dataset, path) | ||
@dataset = dataset | ||
@path = path | ||
def initialize(dataset, path, export_modifier: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: I don't think path
needs to be provided to initialize
; it is only used in the persist!
method. This is somewhat true of export_modifier
too, but since that param forms part of the class "identity" I think it's fair to provide that when creating the class.
How about instead giving the destination path to persist?
GraphPersitor.new(dataset, modifier).persist!(path)
At the moment, the whole class feels a bit overkill to me; it can be substituted for a three-line lambda:
module Atlas
GraphPersistor = lambda do |dataset, path, export_modifier: nil|
data = EssentialExporter.dump(Runner.new(dataset).refinery_graph(:export))
export_modifier.call(data) if export_modifier
File.write(path, data.to_yaml)
end
end
It could also use some documentation to explain what the params and return values are. export_modifier
in particular is completely opaque to any new reader of this code. What values are yielded to the modifier? What should the modifier return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in commit 62cffb5
initial_table = CSV::Table.new([CSV::Row.new(headers, headers, true)]) | ||
|
||
write(initial_table, Pathname.new(path)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current use of this method (in TimeCurveScaler
) looks like: CSVDocument.create → add data → save.
That involves two writes to disk (in create
and again in save
), and if something goes wrong in "add data" or the second save, you end up with a half-complete CSV file on disk. Perhaps create
could yield itself prior to writing, so that the user can add their initial data? (like File.open(path, 'w') { |f| ... }
)
CSVDocument.create(path, headers) do |doc|
doc.set(:a, :b, 1)
doc.set(:c, :d, 2)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are so right 😔
I did it this way because Ruby's CSV class is not really good for read-write access and because I did not want to mess up the current CSVDocument.new
signature.
But I found an okish way to build only the document and not save it until later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be commit 310ab25
Scale the graph and time curves of a derived dataset alongside the area attributes - and refactor scaling of the latter.
(Followup PR since #73 accidentally closed and reopening impossible)