Deviating statistical key indicators for different Python versions #315

gplssm · 2019-09-30T14:56:13Z

Work in progress can be found here

Problem

Comparing statistical key indicator from ding0 data that was produced by running ding0 with different Python versions, shows differences.

Comparing data from different runs of ding0 with the same Python versions shows no differences.

Tasks

Close PR Preprocessed data #296 when this issue exists, branch needs to be kept. State, that the actual issue PR Preprocessed data #296 adressed, was resolved in new branch
~~Generate 100 iterations of the same grid with python 3.5 and compare them~~
Find the reason for deviations
Propose and/or implement fixes

Material

Required code is available in branch fix/#296-cleaning-table-data
- update_stats_test_data
- test_grid_stats
A script to compare equality across multiple files is available in Anya's gists

The text was updated successfully, but these errors were encountered:

AnyaHe · 2019-10-01T13:51:15Z

It could be related to lists, the test_rings_full_data fails for python=3.5 but is successfull for python=3.6. Maybe it is easier to check this behaviour first before going into all statistical values.

gplssm · 2019-10-14T15:39:57Z

The deviation of statistical numbers and their reproducibility (respectively determinism) was assessed more in detail. It shows, more than one parameter is deviating at a time. Probably deviations of multiple parameters is due to dependencies of these parameters.

A differentiation between the comparison of data was made
a. Data was compared where data was generated on the fly
b. Data was compared where data was loaded from a file
This was done to detect the location of non-determinism. In case (a) deviations may result either from the execution on ding0 or the execution of the stats function (marked as "create_data=True"). In case (b) the deviation can only be explained by non-determinism within ding0 if always the same deviation results (marked as "create_data=False").

Comparing (a) and (b) it gets obvious that when data from file is compared to each other, the deviation can be reproduced.
When data is generated on the fly, the deviation between multiple runs differs. Hence, there is more salt flying around in ding0 :-/

Next steps

Identify the reasons for deviations within ding0
1. Find reason for reproducible deviations when data is not created on-the-fly (case (b))
2. Analyze other deviations of case (a)

Deviations for each run: mvgd_stats_diff.xlsx

gplssm · 2019-10-16T07:59:16Z

Deviations in case (a)

I_max of first segment of path from MV station to terminal node (mean value)	Impedance Z of path to terminal node (mean value)	Length of path from MV station to terminal node (mean value)
-3.076329992289942	46.34096261128434	-0.009671682332331955

Dependencies of deviating stats numbers (that might lead to the cause)

I_max of first segment of path from MV station to terminal node (mean value)

.type['I_max_th'] of outgoing lines at the HV/MV transformer (sum_thermal_limits)
n_terminal_nodes

Impedance Z of path to terminal node (mean value)

sum_impedances
n_terminal_nodes

Length of path from MV station to terminal node (mean value)

sum_path_lengths
n_terminal_nodes

Next Steps

Find out if n_terminal_nodes (or variables that determine this) already deviates while running through ding0's grid buildup steps

gplssm · 2019-11-04T15:59:59Z

In order to test if calculate_mvgd_stats/calculate_lvgd_stats (and the other four functions) return reproducible results, I ran these functions multiple times over the same ding0 pickle data.

That's the shortened output of assert_frame_equal()

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4618.688738326718]
[right]: [4616.6718504217115]

i=2, I_max
DataFrame.iloc[:, 8] values are different (100.0 %)
[left]:  [254.2479568234387]
[right]: [257.3242868157286]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4617.803226952255]
[right]: [4616.67185042173]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4618.688738326737]
[right]: [4616.671850421729]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4571.462264340964]
[right]: [4569.445376435956]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4617.80322695225]
[right]: [4616.671850421728]

i=2, Impedance Z
DataFrame.iloc[:, 9] values are different (100.0 %)
[left]:  [4618.688738326744]
[right]: [4616.671850421723]

What does it tell?

stats function(s) do(es) not return reproducible results
Note: assert_frame_equal() does only show one deviating column (there are more deviating values in other columns)

Next steps: find and erase salt in _stats_functions. Did some comparing runs of these functions using the same pickled ding0 data over and over again to spot irreproducibility.

mvgd_stats_diff_runs-10_compares_0-1.xlsx
mvgd_stats_diff_runs-10_compares_2-3.xlsx

gplssm · 2019-11-11T15:37:10Z

Update

I_max of first segment of path from MV station to terminal node (mean value) depends - among others - on mvlv_thermal_limits which is set here. The actual value is calculated a few lines above.
Within this calculation, path[1] isn't always the same. Thereof, differenct mv_thermal_limits result.

Next step

Find out why path[1] changes

gplssm · 2019-11-13T16:58:52Z

Path has different results, sometimes
nx,shortest_path() return always one shortest depends on how the graph is stored in nx (aka how it was added), see
use len(nx.all_shortest_paths()) to determine if there are multiple path from the HV-MV station to the terminal node. Results: when path is different in version b from version b, there 2 possible shortest paths
Actually, there should only be one (and exactly one path) from the MV station to each node in the MV grid. Unless the circuit breaker is closed. But then it should affect more nodes...

gplssm · 2019-11-26T14:43:06Z

Closed circuit breakers brought the most deviations. This was resolved in 4710233 by changing nw.control_circuit_breakers(mode='close') to 'open' the circuit breakers.

Afterwards, only a deviation between Length of MV type NA2XS(FL)2Y 3x1x400 RM/35 and Length of MV type NA2XS2Y 3x1x150 RE/25 exists. Seems like ding0 is sometimes using this cable, sometimes the other.

Calculating the stats again and again based on the same .pkl file, the deviation was reproducible.
Generating new ding0 .pkl files, the deviation appears in different rows. Hence, the salt is included in ding0.

Fix/#315 deviating stats

gplssm added the bug label Sep 30, 2019

gplssm added this to the Release 0.1.13 milestone Sep 30, 2019

gplssm self-assigned this Sep 30, 2019

gplssm mentioned this issue Nov 27, 2019

Key errors in tools.results.calculate_mvgd_stats() #318

Closed

gplssm mentioned this issue Jan 29, 2020

Fix/#315 deviating stats #324

Merged

1 task

gplssm added a commit that referenced this issue Jan 29, 2020

Merge branch 'dev' into fix/#315-deviating-stats

6fc2268

gplssm mentioned this issue Apr 27, 2020

Fix/#296 cleaning table data #301

Closed

nesnoj closed this as completed in #324 May 10, 2021

nesnoj added a commit that referenced this issue May 10, 2021

Merge pull request #324 from openego/fix/#315-deviating-stats

48bfbbf

Fix/#315 deviating stats

nesnoj modified the milestones: Release 0.1.13, Release 0.2.0 May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deviating statistical key indicators for different Python versions #315

Deviating statistical key indicators for different Python versions #315

gplssm commented Sep 30, 2019 •

edited

AnyaHe commented Oct 1, 2019 •

edited

gplssm commented Oct 14, 2019 •

edited

gplssm commented Oct 16, 2019

gplssm commented Nov 4, 2019 •

edited

gplssm commented Nov 11, 2019

gplssm commented Nov 13, 2019

gplssm commented Nov 26, 2019

Deviating statistical key indicators for different Python versions #315

Deviating statistical key indicators for different Python versions #315

Comments

gplssm commented Sep 30, 2019 • edited

Problem

Tasks

Material

AnyaHe commented Oct 1, 2019 • edited

gplssm commented Oct 14, 2019 • edited

Next steps

gplssm commented Oct 16, 2019

Next Steps

gplssm commented Nov 4, 2019 • edited

gplssm commented Nov 11, 2019

gplssm commented Nov 13, 2019

gplssm commented Nov 26, 2019

gplssm commented Sep 30, 2019 •

edited

AnyaHe commented Oct 1, 2019 •

edited

gplssm commented Oct 14, 2019 •

edited

gplssm commented Nov 4, 2019 •

edited