Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deviating statistical key indicators for different Python versions #315

Closed
3 of 4 tasks
gplssm opened this issue Sep 30, 2019 · 7 comments · Fixed by #324
Closed
3 of 4 tasks

Deviating statistical key indicators for different Python versions #315

gplssm opened this issue Sep 30, 2019 · 7 comments · Fixed by #324
Assignees
Labels
Milestone

Comments

@gplssm
Copy link
Contributor

gplssm commented Sep 30, 2019

Work in progress can be found here

Problem

Comparing statistical key indicator from ding0 data that was produced by running ding0 with different Python versions, shows differences.

Comparing data from different runs of ding0 with the same Python versions shows no differences.

Tasks

  • Close PR Preprocessed data #296 when this issue exists, branch needs to be kept. State, that the actual issue PR Preprocessed data #296 adressed, was resolved in new branch
  • Generate 100 iterations of the same grid with python 3.5 and compare them
  • Find the reason for deviations
  • Propose and/or implement fixes

Material

@gplssm gplssm added the bug label Sep 30, 2019
@gplssm gplssm added this to the Release 0.1.13 milestone Sep 30, 2019
@gplssm gplssm self-assigned this Sep 30, 2019
@AnyaHe
Copy link
Collaborator

AnyaHe commented Oct 1, 2019

It could be related to lists, the test_rings_full_data fails for python=3.5 but is successfull for python=3.6. Maybe it is easier to check this behaviour first before going into all statistical values.

@gplssm
Copy link
Contributor Author

gplssm commented Oct 14, 2019

The deviation of statistical numbers and their reproducibility (respectively determinism) was assessed more in detail. It shows, more than one parameter is deviating at a time. Probably deviations of multiple parameters is due to dependencies of these parameters.

A differentiation between the comparison of data was made
a. Data was compared where data was generated on the fly
b. Data was compared where data was loaded from a file
This was done to detect the location of non-determinism. In case (a) deviations may result either from the execution on ding0 or the execution of the stats function (marked as "create_data=True"). In case (b) the deviation can only be explained by non-determinism within ding0 if always the same deviation results (marked as "create_data=False").

Comparing (a) and (b) it gets obvious that when data from file is compared to each other, the deviation can be reproduced.
When data is generated on the fly, the deviation between multiple runs differs. Hence, there is more salt flying around in ding0 :-/

Next steps

  • Identify the reasons for deviations within ding0
    1. Find reason for reproducible deviations when data is not created on-the-fly (case (b))
    2. Analyze other deviations of case (a)

Deviations for each run: mvgd_stats_diff.xlsx

@gplssm
Copy link
Contributor Author

gplssm commented Oct 16, 2019

Deviations in case (a)

I_max of first segment of path from MV station to terminal node (mean value) Impedance Z of path to terminal node (mean value) Length of path from MV station to terminal node (mean value)
-3.076329992289942 46.34096261128434 -0.009671682332331955

Dependencies of deviating stats numbers (that might lead to the cause)

I_max of first segment of path from MV station to terminal node (mean value)

  • .type['I_max_th'] of outgoing lines at the HV/MV transformer (sum_thermal_limits)
  • n_terminal_nodes

Impedance Z of path to terminal node (mean value)

  • sum_impedances
  • n_terminal_nodes

Length of path from MV station to terminal node (mean value)

  • sum_path_lengths
  • n_terminal_nodes

Next Steps

  • Find out if n_terminal_nodes (or variables that determine this) already deviates while running through ding0's grid buildup steps

@gplssm
Copy link
Contributor Author

gplssm commented Nov 4, 2019

In order to test if calculate_mvgd_stats/calculate_lvgd_stats (and the other four functions) return reproducible results, I ran these functions multiple times over the same ding0 pickle data.

That's the shortened output of assert_frame_equal()

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4618.688738326718]
[right]: [4616.6718504217115]

i=2, I_max
DataFrame.iloc[:, 8] values are different (100.0 %)
[left]:  [254.2479568234387]
[right]: [257.3242868157286]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4617.803226952255]
[right]: [4616.67185042173]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4618.688738326737]
[right]: [4616.671850421729]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4571.462264340964]
[right]: [4569.445376435956]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4617.80322695225]
[right]: [4616.671850421728]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4618.688738326744]
[right]: [4616.671850421723]

What does it tell?

  • stats function(s) do(es) not return reproducible results
  • Note: assert_frame_equal() does only show one deviating column (there are more deviating values in other columns)

Next steps: find and erase salt in _stats_functions. Did some comparing runs of these functions using the same pickled ding0 data over and over again to spot irreproducibility.

mvgd_stats_diff_runs-10_compares_0-1.xlsx
mvgd_stats_diff_runs-10_compares_2-3.xlsx

@gplssm
Copy link
Contributor Author

gplssm commented Nov 11, 2019

Update

I_max of first segment of path from MV station to terminal node (mean value) depends - among others - on mvlv_thermal_limits which is set here. The actual value is calculated a few lines above.
Within this calculation, path[1] isn't always the same. Thereof, differenct mv_thermal_limits result.

Next step

  • Find out why path[1] changes

@gplssm
Copy link
Contributor Author

gplssm commented Nov 13, 2019

  • Path has different results, sometimes
  • nx,shortest_path() return always one shortest depends on how the graph is stored in nx (aka how it was added), see
  • use len(nx.all_shortest_paths()) to determine if there are multiple path from the HV-MV station to the terminal node. Results: when path is different in version b from version b, there 2 possible shortest paths
  • Actually, there should only be one (and exactly one path) from the MV station to each node in the MV grid. Unless the circuit breaker is closed. But then it should affect more nodes...

@gplssm
Copy link
Contributor Author

gplssm commented Nov 26, 2019

Closed circuit breakers brought the most deviations. This was resolved in 4710233 by changing nw.control_circuit_breakers(mode='close') to 'open' the circuit breakers.

Afterwards, only a deviation between Length of MV type NA2XS(FL)2Y 3x1x400 RM/35 and Length of MV type NA2XS2Y 3x1x150 RE/25 exists. Seems like ding0 is sometimes using this cable, sometimes the other.

Calculating the stats again and again based on the same .pkl file, the deviation was reproducible.
Generating new ding0 .pkl files, the deviation appears in different rows. Hence, the salt is included in ding0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants