New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outputlib with pandas - Improvements #54

Closed
ckaldemeyer opened this Issue Jan 14, 2016 · 37 comments

Comments

Projects
None yet
4 participants
@ckaldemeyer
Copy link
Member

ckaldemeyer commented Jan 14, 2016

Some changes still have to be done. I am using this issue to collect and discuss ideas.

Ideas for changes:

  • It should be possible to pass a result-object directly. For this, we need a way to deal with the datetime-index which normally comes from the energy system
  • Add a flag like "return_df". If set to True, the dataframe-object will be returned directly
  • Add possibility to exclude components in plot_bus() method
  • Add a plot method that gives a good overview about the bus balance. My idea would be a stacked area or a normal stacked stackplot. All inputs are positive and all outputs negative. Additionally, the residual load will be plotted as a line and should be zero.
  • Add methods to store/restore the dataframe using pickle or something equivalent
  • Realize dataframe creation and plotting in different classes (separate data and visualisation)
@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 14, 2016

@uvchik @simonhilpert : Add your ideas if you want

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 14, 2016

I have added some notes..

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 14, 2016

@uvchik: Just as an information. I have just used the class to analze results for renpassgis. For one year, the dataframe holds more than 11 million entries and it takes 20-30 minutes to generate it. But this is for all busses.

But I'll have a look if we can tweak something..

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 15, 2016

@ckaldemeyer wrote:

  1. It should be possible to pass a result-object directly. For this, we need a way to deal with the datetime-index which normally comes from the energy system

The more options we have the more cases we have to check every time we change anything.
You have to pass both informations anyway, so what is the problem to create a simple EnergySystem just as a container.

my_es = EnergySystem(results=my_result_dict, time_idx=my_time_index)
my_df = EnergySystemDataFrame(energy_system=my_es)
  1. Add a flag like "return_df". If set to True, the dataframe-object will be returned directly

This works already for me. The init-method cannot return anything.

my_df = tpd.EnergySystemDataFrame(energy_system=my_es)
print(es_df.data_frame)
  1. Add possibility to exclude components in plot_bus() method

+1

  1. Add a plot method that gives a good overview about the bus balance. My idea would be a stacked area or a normal stacked stackplot. All inputs are positive and all outputs negative. Additionally, the residual load will be plotted as a line and should be zero.

Or a table that shows the balance of the year and a function that returns the dates and time where the balance is not valid.

  1. Add methods to store/restore the dataframe using pickle or something equivalent

+1 (e.g. hdf5, csv, postgres...)

  1. Realize dataframe creation and plotting in different classes (separate data and visualisation)

I really like this idea. This is all the more true, as we will have more kinds of outputs like key values, summary tables, reports... +1

So far I do not have any wishes but thank you for your incitements.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 16, 2016

Thanks for your feedback! I'll start on the branch "features/outputlib-improvements".

My newest idea was to design the class to_pandas (better name would be nice) as a subclass of the pandas dataframe class. This way, all methods are operating on the object "self" and we have the advantage that all conventional pandas methods are also available to manipulate the object from outside.

By doing so, we could access the object like this:

my_df = tpd.EnergySystemDataFrame(energy_system=my_es)
print(my_df)
my_df.plot_bus(uid="bla")

I'll see how far I get in the next days and post my progress.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 16, 2016

Ideas:

  • Add possibility to exclude components in plot-methods One can now pass a list of obj_uids or obj_uid-substrings that will be excluded. See newest commit for an example.
  • Add bus variables "shortage", "excess" and "duals" correctly to column "other" in dataframe Done. So far, "duals" were saved wrongly in the dataframe. I fixed it now and added the slack vars. But see my question below.
  • Inherit from pandas.dataframe class
  • Realize dataframe creation and plotting in different classes (separate data and visualisation)
  • Add a plot method that gives a good overview about the bus balance
  • Add methods to store/restore the dataframe using pickle or something equivalent

There's something I stumIed upon: I don't understand why the bus variables are saved the way they are at the moment in the result object.

optimization_model.py:

    if hasattr(self, "dual"):
        for bus in getattr(self, str(Bus)).objs:
            if bus.balanced:
                result[bus] = result.get(bus, {})
                result[bus][bus] = [
                    self.dual[getattr(self, str(Bus)).balance[(bus.uid, t)]]
                    for t in self.timesteps]

    for bus in getattr(self, str(Bus)).objs:
        if bus.excess:
            result[bus] = result.get(bus, {})
            result[bus]['excess'] = [
                getattr(self, str(Bus)).excess_slack[(bus.uid, t)].value
                for t in self.timesteps]
        if bus.shortage:
            result[bus] = result.get(bus, {})
            result[bus]['shortage'] = [
                getattr(self, str(Bus)).shortage_slack[(bus.uid, t)].value
                for t in self.timesteps]

I don't see a difference between the types of the variables since they either belong to one object (bus) or are a result of multiple objects (component interaction). So they should be treated in the same way in my opinion.

@gnn @simonhilpert

Why don't we just add an entry result[bus]['duals'] instead of result[bus][bus] for the dual variables?

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 17, 2016

Testing renpass-gis, I have just noticed that there are multiple entries for duals:

                                                               val
bus_uid    bus_type type  obj_uid                 datetime                
el: DEdr13 el       other dual                    2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                          excess                  2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          shortage                2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081388946880 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081388978864 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081388982112 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081389014096 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081389041984 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0

These all seem to be saved under the same key if I get it right:

    if hasattr(self, "dual"):
        for bus in getattr(self, str(Bus)).objs:
            if bus.balanced:
                result[bus] = result.get(bus, {})
                result[bus][bus] = [
                    self.dual[getattr(self, str(Bus)).balance[(bus.uid, t)]]
                    for t in self.timesteps]

@simonhilpert : Is it somehow possible to add unique identifiers here?

@simnh

This comment has been minimized.

Copy link
Member

simnh commented Jan 17, 2016

you are right. Dual variables exist for every constraint inside the (linear) optimization problem. We can extract all of them but the loop only gets all duals for balanced bus constraint. The unique identifier could then be the bus-uid.

For renpass-gis I think the major interest lies on the duals from balanced-electricity busses which can be interpreted as the marginal electricity price (shadowprice) for the bus.

In the future I would actually extract all duals (if wanted) because in terms of economics as all of them can be interpreted in a certain way. But we can start with duals for balanced busses...

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 18, 2016

@ckaldemeyer wrote:

My newest idea was to design the class to_pandas (better name would be nice) as a subclass of the pandas dataframe class. This way, all methods are operating on the object "self" and we have the advantage that all conventional pandas methods are also available to manipulate the object from outside.

So i will wait with new changes until you finished the restructure process, to avoid difficult merge conflicts.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 19, 2016

I have now fixed the dataframe creation and pushed my results. Somehow I still don't get the duals but everything else should work and is now a bit clearer in the creation. The result-object is such a brainf*** ...

@simonhilpert : Could you maybe have a look at the duals-problem? We could also talk about that on Wednesday..
@uvchik : Could you have a look if everything works as expected? I have already made some tests but you never know ;)

Concerning the inheritance from pandas classes: Obviously, it is not recommended to inherit from pandas.

See:

but we could use a composition instead:

Concerning the split into two classes: What about these class names?

  • EnergySystemDataFrame (dataframe and methods for slicing)
  • DataFramePlot (plotting with dataframe)

What do you think?

I'll go to bed now and will be back on Wednesday ;-)

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 19, 2016

My example seems to work, thanks 😄

Maybe ResultsDataFrame instead of EnergySystemDataFrame but this is not a hard objection. I totally agree with splitting the class.

For me it is not important to inherit or use a composition if there are no big advantages but go ahead if you think it's better (either way).

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 20, 2016

Maybe ResultsDataFrame instead of EnergySystemDataFrame but this is not a hard objection. I totally agree with splitting the class.

Then, I agree on ResultsDataFrame and DataFramePlot! It's a bit shorter. At the moment I am quite busy with a paper. But I'll spend my spare time to implement the changes!

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 21, 2016

TODO:

  • Add possibility to exclude components in plot-methods One can now pass a list of obj_uids or obj_uid-substrings that will be excluded. See newest commit for an example.
  • Add bus variables "shortage", "excess" and "duals" correctly to column "other" in dataframe Done. So far, "duals" were saved wrongly in the dataframe. I fixed it now and added the slack vars. But see my question below.
  • Realize dataframe creation and plotting in different classes (separate data and visualisation)
  • Add a plot method that gives a good overview about the bus balance
  • Add methods to store/restore the dataframe using pickle or something equivalent
  • Add component input/output type or component input-/outbus-bus_uid in dataframe to allow technology grouping
@s3pp

This comment has been minimized.

Copy link

s3pp commented Jan 24, 2016

@ckaldemeyer: I edited the create func to improve its performance in features/outputlib-performance. Let me know what you think : )

@simnh

This comment has been minimized.

Copy link
Member

simnh commented Jan 25, 2016

Is this going to be finished and merged until the end of week or should we set the milestone to february?

@s3pp

This comment has been minimized.

Copy link

s3pp commented Jan 25, 2016

No, it`s likely not finished by the end of the week, we should set the milestone to february.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 25, 2016

@s3pp : As I already said. Nice improvement!

I think we should at least have the improved dataframe creation in the next release. It's only an "internal" change and the plotting, etc. is already on the dev-Branch anyway.

The version with two classes can then become part of the next release or we already push things until Friday ;)

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 25, 2016

So we could merge from "outputlib-performance..." into "outputlib-improvements" and then back into "dev".

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 28, 2016

I need to do some changes for my plots but I think it make no sense to do anything until you split the classes. Do you have a plan when you are going to do it?

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 29, 2016

At the moment I got stuck with my paper and Martin is busy with renpass-gis. I can tell you more on Monday!

@s3pp

This comment has been minimized.

Copy link

s3pp commented Feb 4, 2016

I recently divided the code into two classes: A ResultsDataFrame and a DataFramePlot. Its far from finished unfortunately, because I have some trouble with renpass-gis and not enough time, but one can see the concept behind. The code is on features/fix-outpultlib-dev. Im still working on it, because the stackplot does not work properly atm.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 4, 2016

For me all plots work. Thank you. I activated the plots in the storage_invest example and everything works fine. I pushed it. Revert it if it doesn't work for you.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 4, 2016

If you are done, I will revise the plot classes according to the protocol of the web conference.

@s3pp

This comment has been minimized.

Copy link

s3pp commented Feb 4, 2016

Huch, thats nice 😄. You`re absolutely right. Its working as expected now. If you want to, you can already start to revise the plot methods. I was thinking to maybe implement other methods in ResultsDataFrame, but not in DataFramePlot.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 4, 2016

I think we can do that in the same branch.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Feb 5, 2016

Go ahead ;-)

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 5, 2016

The first proposal is finished 😄 I left the old stuff to compare the changes but everything below line 291 will be removed. I adapted the storage_invest example to show how the new plotting class works compared to the old one. On the first view there is more code to write but it is very easy code and it makes it much more flexible without handling hundreds of parameters.

The heart of the class is the slice_unstacked method which creates a ready to plot unstacked subset. With this subset you can just use the pandas plot method.

myplot = to_pandas.DataFramePlot(energy_system=energysystem)
subset = myplot.slice_unstacked(bus_uid="bel", type="input").subset
subset.plot(**pandas_plotting_arguments)

But the class offers you some helper methods to make plotting easier. Some things really work good and it took me and others some time to find the solution. So I wanted to keep them. So I created a plot method which only passes all arguments to pandas. So the following code does exactly the same than the code above.

myplot = to_pandas.DataFramePlot(energy_system=energysystem)
myplot.slice_unstacked(bus_uid="bel", type="input")
myplot.plot(**pandas_plotting_arguments)

If you use the second code block you have the opportunity to use the helper methods. But you still will be able to do all that matplotlib stuff.

myplot = to_pandas.DataFramePlot(energy_system=energysystem)
myplot.slice_unstacked(bus_uid="bel", type="input")
myplot.plot(**pandas_plotting_arguments)
myplot.ax.legend(loc='upper right')  
myplot.ax.set_ylabel('Power in MW') 
myplot.ax.set_xlabel('Date')  
myplot.set_datetime_ticks()  # our helper method to make the ticks nicer

With another helper method the legend can be placed outside the plot.

Now it works more like a framework because the helper methods can be used for every plot and we do not have to create overcrowded plotting methods with hundreds of parameters.

Anyway I created an io_plot that replaces the old stackplot method. I did it because the pandas plot has a problem when plotting a stacked line plot, so I had to correct it and this is half of the code Otherwise I would have used the plot method to do it instead of creating a new method.

@ckaldemeyer , @s3pp : What do you think?

@s3pp

This comment has been minimized.

Copy link

s3pp commented Feb 8, 2016

I just discussed this topic with @ckaldemeyer . We both think it's very well done and overall a strong improvement. The readability of the storage invest plot creation is also very nice imo. One point could be questionable though, whether the slice_unstacked method would better fit in the ResultsDataFrame class, because it`s single purpose is data handling in the end. Or are there any disadvantages by doing so?

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 8, 2016

One point could be questionable though, whether the slice_unstacked method would better fit in the ResultsDataFrame class, because it`s single purpose is data handling in the end.

I totally agree. I had the same idea but did not want to write code in the EnergySystemDataFrame class because I did not know how you still want to restructure this class. So feel free to move it into this class.

@s3pp

This comment has been minimized.

Copy link

s3pp commented Feb 8, 2016

Ok. so I moved the method to the ResutlsDataFrame class. Imo we could start thinking about merging it back, don't we, @ckaldemeyer, @uvchik ? I thought that it might be beneficial to include methods for aggregating data on specific levels, but that's not so much restructuring than extending and eventually it isn't even necessary, because of pandas' easy-to-use functions. I will find out while working with the results of renpassg!s.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Feb 8, 2016

Ok. so I moved the method to the ResutlsDataFrame class. Imo we could start thinking about merging it back, don't we, @ckaldemeyer, @uvchik ?

👍 for me. This would also make #80 obsolete since the fix is already contained.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 8, 2016

I do not know how it was intended but the ResultsDataFrame class does not have a subset attribute so far. After @s3pp changes the slice_unstacked method created one. But it is never used. So I removed it and returned the subset (return subset) instead of setting the internal attribute.

As I need the internal attribute in the plotting class I inherit the method and extended it by setting the internal attribute. If you decide to have a subset attribute in the ResultsDataFrame class we can remove the method in the plotting class.

I fixed pep8, added docstrings and removed the old stuff. For me we a ready to merge.

@ckaldemeyer : You are assigned. May you do it?

@s3pp

This comment has been minimized.

Copy link

s3pp commented Feb 8, 2016

I see, I see. Actually that was something I wasn't quite sure about.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 8, 2016

We do not have to decide that now. For now everything works fine and that is worth a merge 😄

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Feb 9, 2016

👍

@uvchik: Concerning the merge would you mind if one of you merges? I am really busy atm and do not have the time to fix errors which eventually might occur (even if I think there won't be any)..

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Feb 9, 2016

No problem, but not today. I will do it tomorrow afternoon.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Feb 9, 2016

Thanks!!

@uvchik uvchik closed this in b99495c Feb 10, 2016

uvchik added a commit that referenced this issue Feb 10, 2016

Merge features/fix-outputlib-dev into dev and close #54
Conflicts:
	examples/development_examples/example_app.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment