New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outputlib - basic plots #36

Closed
ckaldemeyer opened this Issue Dec 23, 2015 · 39 comments

Comments

Projects
None yet
5 participants
@ckaldemeyer
Copy link
Member

ckaldemeyer commented Dec 23, 2015

Uwe has already started with his outputlib and created a method which creates a dataframe with all component timeseries arround a given bus.

He started with basic matplotlib which has all configuration options but in my opinion sometimes to many for standard plots as it is overwhelming...

After some trying my idea would be to go with the pandas basic plotting functions (based on matplotlib) for the basic plotting functions and the plots for renpass-gis.

Here's a small example with a handfull of plots and some possible configuration options beyond the standard functionalities (needs matplotlib >= 1.4):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from datetime import datetime as dt
mpl.style.use('ggplot')

# Generate sample data
sample_data = np.random.rand(24*365, 5)
df = pd.DataFrame(sample_data,
                  index=pd.date_range('1/1/2015 00:00',
                                      periods=len(sample_data), freq='H'))

# Select date range to plot
date_from = dt(2015, 12, 22, 0, 0)
date_to = dt(2015, 12, 22, 23, 0)
df = df.loc[date_from:date_to]

# Plotting
# Formatting-tuple (title, colormap, xlabel, ...)
# for matplotlib.axes.AxesSubplot object could
# be passed by kwargs later
df.plot(kind='line', colormap='Spectral', title='Line Plot', linewidth='2')
[(ax.set_ylabel("Power in GW"),
 ax.set_xlabel("Date and Time"),
 ax.legend(('Wind', 'PV', 'Biomass', 'RoR', 'Demand'), loc='upper right'))
 for ax in plt.gcf().axes]

df.plot(kind='bar', stacked=True, colormap='Greens', title='Bar Plot')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='barh', stacked=True, colormap='Oranges', title='H-Bar Plot')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='area', stacked=False, alpha=0.5, colormap='Spectral',
        title='Area Plot')
[ax.legend(('Wind', 'PV', 'Biomass', 'RoR', 'Demand'),
           loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='box', colormap='Reds', title='Box Plot')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

df.loc['2015-12-22 12:00:00':'2015-12-22 18:00:00', 2:3] \
    .plot(kind='hist', stacked=True, bins=20, colormap='ocean',
          title='Histogram of a subset')
[ax.legend(('Col1', 'Col2'), loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='scatter', x=0, y=1,
        title='Scatter Plot (first vs. second column)')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

It would be quickly implemented on top of Uwes work and should fulfill most needs as I do not want to spend too much time on visualisation tweaking.

@oemof/oemof-main

Whats your opinion on this?

Happy christmas
Cord

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 4, 2016

@ckaldemeyer wrote:

He started with basic matplotlib which has all configuration options but in my opinion sometimes to many for standard plots as it is overwhelming...

No, so far the devplots module is based on the pandas plotting functions and not on basic matplotlib.
One module creates DataFrames with all flows around one bus. Than you can create plots based on this DataFrame. One default plot is implemented but if you have special wishes you can create you own ones.

The idea was to create a plotting library based on the EnergySystem class to make it easy to get default plots, so that people do not have to spend time on programming plots again and again.

@ckaldemeyer It would be helpful if you add you ideas to this library.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 4, 2016

@uvchik : I'll push my state tomorrow. It took a bit longer dealing with a pandas MultiIndex (http://pandas.pydata.org/pandas-docs/stable/advanced.html) but I think it's worth spending time on it!

@simnh

This comment has been minimized.

Copy link
Member

simnh commented Jan 4, 2016

Multiindexing looks pretty cool to me, especially as we always have tuples as sets for optimization model.
Looking forward to your push ;-)

@cswh

This comment has been minimized.

Copy link
Contributor

cswh commented Jan 5, 2016

Have a look at the features/pypower branch. There, the energy system class has a method to plot itself as graph. This could be a blueprint for plotting functions.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 5, 2016

I can't find it there. But I am almost done and we can still adjust it later

@c-moeller

This comment has been minimized.

Copy link
Member

c-moeller commented Jan 5, 2016

I'm not sure if this is the proper thread to address this issue, but within the RLI oemof team we discussed last summer a budget for support in data processing and results analysis (including plotting). Due to organizational problems this has not been realized so far, but popped up again today and seems still relevant. We have now a student who is interested in doing this and I have to clarify the budget once more. This just as an information.. if it's not right here, please move this comment or tell me :-)

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 5, 2016

I have pushed my current state to "features/outputlib-based-on-pandas" and adjusted the storage optimization example to show how it works!

Make sure that your pandas version is >= 0.17.0. Otherwise the multiindex will fail..

There is still some stuff to do (see TODOs) but I think that at least the idea gets clearer. At the moment I am not sure if it is really necessary to write more plotting functions as it is more or less just a "passing through" of parameters and plots are always individual and a matter of taste. Maybe instead of this some "slicing methods" that return pre-formatted dataframes for different purposes would also do the job. But that's more for the discussion..

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 5, 2016

I have tested the code with renpass-gis as well and it seems to work fine. Only the dataframe takes some time being created in the beginning. But it is still quite fast and can be improved..

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 5, 2016

@caro-rli : Does it mean someone at the RLI gets paid to improve our plotting-/result-code? This would be great ;)

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 6, 2016

I tested your to_pandas module. I don't understand why you changed the example instead of just using the EnergySystem class. I added this possibility and reverted the storage example to the old version. Now both plots work. Revert the commit if you don't like it.

I will read more about the multiindex and maybe i can adapt my plot to this DataFrame. I still like the combination of bar and line plots to check the results but this is a matter of taste.

If the multiindex DataFrame proves its value in the long run the method to create it could be part of the EnergySystem class (convert_results_to_dataframe).

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 6, 2016

That's fine. Go ahead

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 6, 2016

I will read more about the multiindex and maybe i can adapt my plot to this DataFrame. I still like the combination of bar and line plots to check the results but this is a matter of taste.

Bar and line plots can still be plotted easily. From my opinion some slicer-methods that convert subsets of the multiindex df into preformatted easily plottable dataframes should be enough including one or two common standard plots (e. g. power versus time and annual sums). The preformatted easily plottable dataframes can then be plotted with individual styles as described here (see here http://pandas.pydata.org/pandas-docs/stable/visualization.html) or using matplotlib.

In my opinion, only the slicer-methods and one or two standard plots should be part of the framework. Further plotting could then be done on app-level by extending the class and doesn't blow up the code which also has to be adapted to every change.

If the multiindex DataFrame proves its value in the long run the method to create it could be part of the EnergySystem class (convert_results_to_dataframe).

Either here or in the class as it is now. We should discuss that!

@simnh

This comment has been minimized.

Copy link
Member

simnh commented Jan 6, 2016

I think the only thing that is missing now is a implementation of a stacked plot with steps. This is something pandas plotting does not easily provide, but I think we need it as we are discrete in terms of our timesteps. (setting kind="area" in the df.plot() method for instance doesn't satisfy me...)

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 6, 2016

No, that's something I see as well. I'll see if I can sort out something on Friday using pandas as well. Otherwise it will be matplotlib but based on a well pre-formatted dataframe.
For now I am back in bed... :/

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 6, 2016

@simonhilpert
For steps with pandas you can use the drawstyle='steps-mid' argument. I use it in the outputlib as you can see in the actual commit of the features/outputlib-based-on-pandas branch. Just execute the storage_invest example.
steps

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 6, 2016

@ckaldemeyer

I think now I fully understand the idea of the Multiindex DataFrame. Thank you. For me it looks good.

  • I added a stackplot method to your to_pandas class that is based on your plot_bus method. If it is okay like this we can remove the devplot module
  • I cleaned up the example file (pep8, removing unused lines, ...)
  • We should talk about the name to_pandas is not a talking name for a plotting class.
  • If we use it like this I have to write the docstring of the stackplot method

I agree that we should not add too many plots to the library. Maybe we could create a gallery with nice plots based on the EnergySystem class or the Multiindex DataFrame (like matplotlib, just smaller: http://matplotlib.org/gallery.html). But it helps if some plots are ready to use within the outputlib.

Printing the results is missing in the example file, but maybe it should also use the Multiindex DataFrame.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 7, 2016

I added a stackplot method to your to_pandas class that is based on your plot_bus method. If it is okay like this we can remove the devplot module

To me it looks good. I wouldn't have expected it to be so easy. Thanks!

I cleaned up the example file (pep8, removing unused lines, ...)

Thanks.

We should talk about the name to_pandas is not a talking name for a plotting class.

In my opinion, the name depends on what it is supposed to be. For me, the class provides a structured and easily usable data structure for results with an additional printing option. Thus, for me it's less plotting than structuring results. But it should be discussed.

Any suggestions for a good name?

If we use it like this I have to write the docstring of the stackplot method

What about the others? Do you think this is a good way to go?

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 7, 2016

Just in case we go this way, here are some TODOs from my side:

  • Add self referenced entries (results[component][component]) to "other" (Cord)
    • Storage filling levels
    • Dispatch values for dispatchable renewables
  • Add the option to add a user defined legend (or one which is automatically labeled by the uid and not tuples) (Cord)
  • Make dataframe creation and plotting configurable with as less code as possible via **kwargs
    • Proposal Cord: method plot_bus(bus_uid, bus_type, date_from, date_to, *_kwargs)
      where *_kwargs holds everything that is plot-related like now in 'df_plot_kwargs', {}
  • Uniform code
    • Docstrings: r''' vs """ and completition
    • kwargs['tick_distance'] vs. kwargs.get('xlabel')
  • Try to circumvent addditional plotting code when using the class
    • Plotting a combined stacked plot

    • fig = plt.figure(figsize=(24, 14))
    • ...
    • es_df.stackplot(bus_uid="bel",
    • plt.show()
    • Can we reduce this to "es_df.stackplot(bus_uid="bel"...)" and put the rest into the class?
@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 7, 2016

Make dataframe creation and plotting configurable with as less code as possible via **kwargs
Proposal Cord: method plot_bus(bus_uid, bus_type, date_from, date_to, **kwargs) where **kwargs holds everything that is plot-related like now in 'df_plot_kwargs', {}

Good idea but at least for the stackplot we have to differ between plot options and options for the stackplot method. So we still need something like df_plot_kwargs.

Uniform code
Docstrings: r''' vs """ and completition

We decided to use numpydocs and they use r""" text... """ so I will change that.

kwargs['tick_distance'] vs. kwargs.get('xlabel')

I think we should use the get method if a None is okay. If a None causes errors somewhere in the pandas/matplotlib code than it is better to use kwargs['blablubb'] to get the error directly. If you agree I will check my code and do it this way.

Try to circumvent addditional plotting code when using the class
Can we reduce this to "es_df.stackplot(bus_uid="bel"...)" and put the rest into the class?

The idea is that you can easily plot a combined plot:

fig = plt.figure(figsize=(24, 14))

# First part
ax = fig.add_subplot(2, 1, 1)
es_df.stackplot(bus_uid="bel"....)

# Second part
ax = fig.add_subplot(2, 1, 2)
es_df.stackplot(bus_uid="bheat"....)

But I can divide it into two methods and allow both ways. Okay?

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 7, 2016

Sounds good!

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 7, 2016

I have just talked to Günni and there are still two more entries missing in the dataframe.

Update in TODO-list:

  • Add self referenced entries (results[component][component]) to "other" (Cord)
    • Storage filling levels
    • Dispatch values for dispatchable renewables
@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 10, 2016

Here's the updated TODO-list:

  • Add self referenced entries (results[component][component]) to "other" (Cord)
    • Storage filling levels
    • Dispatch values for dispatchable renewables
  • Add the option to add a user defined legend (or one which is automatically labeled by the uid and not tuples) (Cord)
  • Make dataframe creation and plotting configurable with as less code as possible via **kwargs
    • Proposal Cord: method plot_bus(bus_uid, bus_type, date_from, date_to, *_kwargs)
      where *_kwargs holds everything that is plot-related like now in 'df_plot_kwargs', {}
  • Uniform code
    • Docstrings: r''' vs """ and completition
    • kwargs['tick_distance'] vs. kwargs.get('xlabel')
  • Try to circumvent addditional plotting code when using the class
    * # Plotting a combined stacked plot
    • fig = plt.figure(figsize=(24, 14))
    • ...
    • es_df.stackplot(bus_uid="bel",
    • plt.show()
    • Can we reduce this to "es_df.stackplot(bus_uid="bel"...)" and put the rest into the class?

Additionally, I have added the possibility to create the dataframe only for specific busses/bus types by passing a list of uids/types.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 10, 2016

Oh, I have just read my mails and saw your pull request. Anyhow, we are making progress here ;)

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 11, 2016

My Todos for today are finished:

  • Using a time index attribute instead of a year attribute in the EnergySystem class
  • Using the time index of ES class in the to_pandas class.
  • Add the option to define and pass a color dictionary {'uid1': '#000000', 'uid2': 'red',....}
  • Add an option to decide whether the plot is shown or saved or both
  • Add autostyle=True option to plot the window and the legend in the default way
  • Add the option to set no value for the "tick_distance". Four equal ticks will be setted. The number can be changed.

📝 Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

💬 As discussed I will merge the branch, but please test the example and give feedback.

⚠️ Be aware that you may have to change your Apps!

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 11, 2016

We still have to find a name for the class that builds the dataframe and provides some basic plots. Even though I'm already used to it, I think to_pandas is not very catchy for newcomers.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 11, 2016

@ckaldemeyer : Please update the requirements in the setup.py. Actual it is pandas >= 0.13.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 12, 2016

Done. But the storage invest example is not working anymore. I am already searching for the error..

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 12, 2016

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 12, 2016

We still have to find a name for the class that builds the dataframe and provides some basic plots. Even though I'm already used to it, I think to_pandas is not very catchy for newcomers.

What about solph_results_to_pandas ? Basically this it what it does ;-)

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 12, 2016

Works now.

The error was

#energysystem.restore()
energysystem.optimize()

Probably from your testing procedures.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 12, 2016

Sorry, I fixed it in the dev-branch after merging and forgot to fix it in the outputlib-based-on-pandas-branch, too.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 12, 2016

What about solph_results_to_pandas?

But it also provides basic plots. What about pandas_plots or pandas_output? In my opinion the DataFrame is just a tool to make plots or other outputs easier. The main goal is to get plots, csv-files, pdf's ... .

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 12, 2016

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

I'm not talking about the pandas arguments but the parameters we set within our method such as kwargs.setdefault('date_format', '%d-%m-%Y').

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 13, 2016

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

I'm not talking about the pandas arguments but the parameters we set within our method such as kwargs.setdefault('date_format', '%d-%m-%Y').

For me it is not necessary. But go ahead if you want.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 13, 2016

What about solph_results_to_pandas?

But it also provides basic plots. What about pandas_plots or pandas_output? In my opinion the DataFrame is just a tool to make plots or other outputs easier. The main goal is to get plots, csv-files, pdf's ... .

Then I would prefer pandas_output. It depends on what we want the module to be. For me it's more a data-extractor with the additional ability to create plots.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 13, 2016

Btw: Should we delete the features/outputlib-with-pandas branch and create a new one for further developments?

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 13, 2016

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

I'm not talking about the pandas arguments but the parameters we set within our method such as kwargs.setdefault('date_format', '%d-%m-%Y').

For me it is not necessary. But go ahead if you want.

That is really funny. Of course it is not necessary for you. You wrote it.

@uvchik

This comment has been minimized.

Copy link
Member

uvchik commented Jan 13, 2016

Agree with closing the branch. For me we can leave the name as it is and let somebody else find a new name. I think we did a lot service for users/developers who want to plot. Thank you for your work.

I would also close this issue and start a new one with the remaining ToDos if there are any.

@ckaldemeyer

This comment has been minimized.

Copy link
Member Author

ckaldemeyer commented Jan 13, 2016

I have just fixed an issue that occured with renpassgis on the dev branch. I'll close this issue and remove the branch, too.

Thank you and Simon for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment