Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor suggestions in docstr #53

Merged
merged 4 commits into from Dec 14, 2020

Conversation

Didou09
Copy link
Contributor

@Didou09 Didou09 commented Nov 15, 2020

Motivations:

  • As I read through the code to better understand how it works, I figured I would include small comments or reminders that I find helpful in some docstrings, to complement the documentation for beginner users like me, feel free to dismiss if considered not useful
  • Also, when I see a syntax that I feel could be made either more self-explaining or more numerically efficient, I suggest a minor change in that way. Also feel free to dismiss if not useful

Main changes:

  • Minor changes in tools/iomath.py where the calculation functions are for missing A, Z, x, L,...
  • Syntax: mostly suggesting [None, :] (faster and more explicit) instead of np.replace()

If you fell this PR is relevant, I will continue to suggest such minor changes as I go through the code.
Otherwise I won't

(but feel free to keep only a part of the changes and tell why you dismissed the others so I can do more focused suggestions next time)

else:
return A * x
return A * x[None, :]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x[None,:] is not doing a row vector - leads to different results

@konstantinstadler
Copy link
Member

Hi, very much appreciate the updates of the docstrings. Thanks a lot for that.
However, the [None,:] stuff is not the same as reshape. It adds another dimension and leads to different results:
`
In [1]: import pymrio

In [2]: import numpy as np

In [3]: t = pymrio.load_test().calc_all()

In [4]: x = t.x.values

In [5]: A = t.A.values

In [6]: Zorig = A.dot(np.diagflat(x))

In [7]: Zreshape = A*x.reshape((1,-1))

In [8]: Znone = A*x[None, :]

In [9]: np.allclose(t.Z.values, Zorig)
Out[9]: True

In [10]: np.allclose(t.Z.values, Zreshape)
Out[10]: True

In [11]: np.allclose(t.Z.values, Znone)
Out[11]: False

`

@Didou09
Copy link
Contributor Author

Didou09 commented Nov 16, 2020

Hi,

Hum, looks like there has been a semantic misunderstanding:

  • I was assuming, wrongly, that by 'vector' you meant a 1d array (i.e.: an array with only one dimension: shape (N,))
  • I now realize you mean a 2D array with shape (1, N) for a 'row vector', or (N, 1) for a 'column vector'

I suggest we explicitly mention the shape of the array to avoid mis-comprehensions about what we mean by 'vector'.

The syntax x[None, :] (respectively x[:, None]) is a fast numpy broadcasting syntax typically used on a 1D array of shape (N,), to add a dimension and make it a 2D array of shape (1, N), respectively (N, 1), to be used as a calculation intermediate, like in the operation A*x.

From your example, I understand that x was already a 2D array (of shape (N, 1)).
So in that case indeed you do not need to add a dimension, just to exchange the dimensions.
This is what you do by using x.reshape((1, -1)).

But in that case, this operation amounts to computing the transpose of x.
Then I would suggest using directly the transpose operator : x.T
It is ~x2 faster on large arrays.

In [13]: x = np.random.random((100000,1))                                                                                                                                               

In [14]: %timeit y = x.T                                                                                                                                                                
315 ns ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit y = x.reshape((1, -1))                                                                                                                                                 
606 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: np.allclose(x.T, x.reshape((1, -1)))                                                                                                                                           
Out[16]: True

Suggestion:

Maybe, when dealing with arrays like x that have a dimension equals to unity (shape (1, N) or (N, 1)), it might be better practice to just store it as a 1D array of shape (N,) (to avoid dimensions that do not carry information) and just use broadcasting when necessary.
Your choice :-)

@konstantinstadler
Copy link
Member

The point of this is to give people the possibility to provide the x "vector" (and other stuff) in various formats. It is not only to use the x (or y in that regard) coming from pymrio, but also to provide a different x or to use the io_math functions independently of pymrio. Kind of duck-typing for the vector.

I updated the tests (test_math) to explicitly test these things (hash daed302).

More general: this pull request now got very mixed up. I would suggest to separate the two issues. I am quite happy to have the docstrings updates. I could also just cherry pick them. For the code changes, the best way would be to first open an issue to discuss changes.

@Didou09
Copy link
Contributor Author

Didou09 commented Nov 18, 2020

OK, will split into 2 separate issues and PRs tomorrow, for better readability :-)

Copy link
Contributor Author

@Didou09 Didou09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed all code changes
Only docstr changes retained
Sorry for the long wait, lots to do elsewhere lately

I notified your last release to the PhD student so he can keep is version up to date

Hi @konstantinstadler , if you want only the docstr changes this PR is ready for validation (I removed all code changes)

@coveralls
Copy link

coveralls commented Nov 25, 2020

Pull Request Test Coverage Report for Build 216

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 89.505%

Totals Coverage Status
Change from base Build 196: 0.0%
Covered Lines: 1791
Relevant Lines: 2001

💛 - Coveralls

@@ -19,6 +19,8 @@
def calc_x(Z, Y):
"""Calculate the industry output x from the Z and Y matrix

industry output x = flows(Z) + final demand(Y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats not fully correct: it is: sum_columns(Z) + sum_columns(Y)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, corrected in the last commit

@konstantinstadler konstantinstadler merged commit dcba4d6 into IndEcol:master Dec 14, 2020
@konstantinstadler
Copy link
Member

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants