New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pad arrays #239

Merged
merged 19 commits into from May 6, 2016

Conversation

Projects
None yet
2 participants
@drandykass
Contributor

drandykass commented Dec 2, 2015

This is an incomplete pull request for the addition of array padding and unpadding routines to gridder.

Some talking points:
--The padding function expects the array to be of the same dimension as intended--that is, no 'flattened' arrays.
--The unpadding function currently is set up to return unpadded coordinate vectors. However, I think that may be completely unnecessary.
--Still working on cleaning up the docs, the actual code, and need to create the test functions.

Update 03-12-2015 (@leouieda): Added the checklist below.

Checklist:

  • Make tests for new code
  • Create/update docstrings
  • Include relevant equations and citations in docstrings
  • Code follows PEP8 style conventions
  • Code and docs have been spellchecked
  • Include new dependencies in docs, requirements.txt, README, and .travis.yml
  • Documentation builds properly
  • All tests pass
  • Can be merged
  • Changelog entry (leave for last)
  • Firt-time contributor? Add yourself to doc/contributors.rst (leave for last)
@leouieda

This comment has been minimized.

Member

leouieda commented Dec 3, 2015

Thanks @drandykass!

A few initial comments:

The padding function expects the array to be of the same dimension as intended--that is, no 'flattened' arrays.

That's reasonable. The padding only makes sense for gridded data anyway. This should be explicit in the function docstring.

The unpadding function currently is set up to return unpadded coordinate vectors. However, I think that may be completely unnecessary.

I agree that it's redundant since the original arrays should be available from the start. It would make the function simpler if it takes only the padded array. Plus, you could unpad the coordinates by calling this function, right?

Still working on cleaning up the docs, the actual code, and need to create the test functions.

OK, let me know if you need help with that.

@@ -439,3 +444,244 @@ def cut(x, y, scalars, area):
if x[i] >= xmin and x[i] <= xmax
and y[i] >= ymin and y[i] <= ymax]
return [x[inside], y[inside], [s[inside] for s in scalars]]
def pad_array(a, xy=None, np=None, padtype='OddReflectionTaper'):

This comment has been minimized.

@leouieda

leouieda Dec 3, 2015

Member

@drandykass it's better to avoid np for variable names since that's usually what people import numpy as. We've been changing this import in Fatiando lately to make code a bit nicer to read. So if you want to change the import numpy to import numpy as np in this module, please feel free. You'd need to replace all calls to numpy with np though.

This comment has been minimized.

@drandykass

drandykass Dec 3, 2015

Contributor

Good point. I'll make the appropriate modifications (I'll leave numpy as
is and just change the variable name).

On Thu, Dec 3, 2015 at 10:26 AM, Leonardo Uieda notifications@github.com
wrote:

In fatiando/gridder.py
#239 (comment):

@@ -439,3 +444,244 @@ def cut(x, y, scalars, area):
if x[i] >= xmin and x[i] <= xmax
and y[i] >= ymin and y[i] <= ymax]
return [x[inside], y[inside], [s[inside] for s in scalars]]
+
+def pad_array(a, xy=None, np=None, padtype='OddReflectionTaper'):

@drandykass https://github.com/drandykass it's better to avoid np for
variable names since that's usually what people import numpy as. We've been
changing this import in Fatiando lately to make code a bit nicer to read.
So if you want to change the import numpy to import numpy as np in this
module, please feel free. You'd need to replace all calls to numpy with np
though.


Reply to this email directly or view it on GitHub
https://github.com/fatiando/fatiando/pull/239/files#r46582582.


Dum inter homines sumus, colamus humanitatem.

@drandykass

This comment has been minimized.

Contributor

drandykass commented Dec 3, 2015

OK, I think she's ready to go--tests fine on my system. Pending review by others, the only thing I think that's left on the checklist is a changelog entry and contributor addition. Let me know if you find anything!

@leouieda

This comment has been minimized.

Member

leouieda commented Dec 3, 2015

@drandykass great! That was fast.

I took a quick look at the code and I have a few suggestions. Look into this with more care later on this week and get back to you.

@leouieda leouieda self-assigned this Dec 3, 2015

@drandykass

This comment has been minimized.

Contributor

drandykass commented Dec 3, 2015

Great. Let me know if you expect the function call structure to change.
I'm going to finish up the fourier domain calculation functions, which will
use these.

On Thu, Dec 3, 2015 at 1:49 PM, Leonardo Uieda notifications@github.com
wrote:

@drandykass https://github.com/drandykass great! That was fast.

I took a quick look at the code and I have a few suggestions. Look into
this with more care later on this week and get back to you.


Reply to this email directly or view it on GitHub
#239 (comment).


Dum inter homines sumus, colamus humanitatem.

pads = []
xy = np.zeros((len(x),2))
xy[:,0] = x[:]
xy[:,1] = y[:]

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

@drandykass the handling of the coordinate arrays seems a bit too complicated. Wouldn't it be better to pass along (and return) a list of coordinate arrays? That would also avoid the memory copying involved in creating a new 2D array (the list holds only references to the arrays, not their values).

This comment has been minimized.

@drandykass

drandykass Dec 4, 2015

Contributor

Sure, a list is fine. There was a reason I did it this way--couldn't tell
you what that reason was now though...

It returns a list of coordinate arrays so no reason to not expect a list
coming in. Wonder why I did it that way?

Anyway, I'll start a \todo list and add this to it.

On Fri, Dec 4, 2015 at 12:04 PM, Leonardo Uieda notifications@github.com
wrote:

In cookbook/grid_pad.py
#239 (comment):

+import numpy as np
+
+# Generate gridded data
+shape = (101,172)
+x,y,z = gridder.regular((-5000, 5000, -5000, 5000), shape, z=-150)
+model = [mesher.Prism(-4000, -3000, -4000, -3000, 0, 2000, {'density':1000}),

  •    mesher.Prism(-1000, 1000, -1000, 1000, 0, 2000, {'density':-900}),
    
  •    mesher.Prism(2000, 4000, 3000, 4000, 0, 2000, {'density':1300})]
    
    +gz = prism.gz(x, y, z, model)
    +gz = gz.reshape(shape)
    +
    +# Pad arrays with all the padding options
    +pads = []
    +xy = np.zeros((len(x),2))
    +xy[:,0] = x[:]
    +xy[:,1] = y[:]

@drandykass https://github.com/drandykass the handling of the
coordinate arrays seems a bit too complicated. Wouldn't it be better to
pass along (and return) a list of coordinate arrays? That would also avoid
the memory copying involved in creating a new 2D array (the list holds only
references to the arrays, not their values).


Reply to this email directly or view it on GitHub
https://github.com/fatiando/fatiando/pull/239/files#r46718529.


Dum inter homines sumus, colamus humanitatem.

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

👍 that happens a lot to me. One of the advantages of writing long git commit messages (though it's easier said than done).

@@ -80,7 +85,7 @@ def load_surfer(fname, fmt='ascii'):
# Read the number of columns (ny) and rows (nx)
ny, nx = [int(s) for s in ftext.readline().split()]
shape = (nx, ny)
# Read the min/max value of columns/longitue (y direction)
# Read the min/max value of columns/longitude (y direction)

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Thanks for the typo fix!

[ OddReflectionTaper | OddReflection | Reflection | value | LinTaper
| edge | mean ]
OddReflectionTaper - Generates odd reflection then tapers to the

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

I think it would be more consistent to have all names lowercase. That would also remove the need to the .lower() calls below.

LinTaper - Linearly tapers to the mean
value - numeric value

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Could you clarify the explanation here? It's not that clear that one should pass '0' and not 'value' for this option. I was a bit confused until I saw the cookbook. Maybe add the comments in the recipe that you use to explain this.

This comment has been minimized.

@drandykass

drandykass Dec 4, 2015

Contributor

Lower case names: sure. I'll leave in the .lower() though so that it
catches if someone has caps lock or whatever on.

Clarify explanation: Can do.

On Fri, Dec 4, 2015 at 12:13 PM, Leonardo Uieda notifications@github.com
wrote:

In fatiando/gridder.py
#239 (comment):

  •    dimension
    
  • * padtype : string (optional)
  •    String describing what to pad the new values with. Options:
    
  •    [ OddReflectionTaper | OddReflection | Reflection | value | LinTaper
    
  •    | edge | mean ]
    
  •        OddReflectionTaper - Generates odd reflection then tapers to the
    
  •        mean using a cosine function (Default)
    
  •        OddReflection - Pads with the odd reflection, with no taper
    
  •        Reflection - Pads with simple reflection
    
  •        LinTaper - Linearly tapers to the mean
    
  •        value - numeric value
    

Could you clarify the explanation here? It's not that clear that one
should pass '0' and not 'value' for this option. I was a bit confused
until I saw the cookbook. Maybe add the comments in the recipe that you use
to explain this.


Reply to this email directly or view it on GitHub
https://github.com/fatiando/fatiando/pull/239/files#r46719655.


Dum inter homines sumus, colamus humanitatem.

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

I'll leave in the .lower() though so that it catches if someone has caps lock or whatever on.

I'd think that would be a good reason to remove them. Kind of like how good password fields warn you that you caps on. But I'm okay with leaving them.

PS: You seen to be replying from the e-mails. I find it easier to view the comments on the website as the code appears right above them.

dimension. This is effectively a concatenated xp,yp, etc...
* npd : tuple (optional)
Optional tuple containing the total number of desired elements in each
dimension

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Could you add some text saying what happens if npd is None? I see you put it in a comment but the user wouldn't be able to see that.

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Optional tuple containing the total number of desired elements in each dimension.

Another way to say this with a more "numpy" feel would be: "Desired shape of the new padded array". Maybe put this in as well as the current description? This would benefit both users familiar with numpy and those a bit new still.

Returns None if xy not provided
* nps : list
List of tuples containing the number of elements padded onto each
dimension.

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Do you really need to return this? This argument is a bit confusing. I see that this is used to unpad the array later but this information is contained in the shape of the padded array, right?

# Test to make sure padtype is valid
padopts = ['','oddreflectiontaper','oddreflection', 'reflection',
'lintaper', 'edge', 'value', 'mean']

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

What happens if padtype='' (empty string)? I'm curious because you added it as a valid option.

This comment has been minimized.

@drandykass

drandykass Dec 4, 2015

Contributor

Returning nps:
It's a luxury--it would be encoded in the padded array, although if you had
an odd number of padding elements, you would need to know which side had
the extra element. You could find that out by comparing your coordinate
vectors, or by knowing which side the extra element is padded on. I
personally think it makes life easier, if a tad more messy.

padtype = ' ' Oops, that should not be a valid option. The default is
provided in the function definition, so there really should not be an
option here. I'll fix.

Other previous comments: Agreed.

On Fri, Dec 4, 2015 at 12:20 PM, Leonardo Uieda notifications@github.com
wrote:

In fatiando/gridder.py
#239 (comment):

  • Returns:
  • * ap : numpy array
  •    Padded array. The array core is a deep copy of the original array
    
  • * cp : list
  •    List of coordinate arrays containing the extrapolated coordinate values
    
  •    Returns None if xy not provided
    
  • * nps : list
  •    List of tuples containing the number of elements padded onto each
    
  •    dimension.
    
  • """
  • Test to make sure padtype is valid

  • padopts = ['','oddreflectiontaper','oddreflection', 'reflection',
  •        'lintaper', 'edge', 'value', 'mean']
    

What happens if padtype='' (empty string)? I'm curious because you added
it as a valid option.


Reply to this email directly or view it on GitHub
https://github.com/fatiando/fatiando/pull/239/files#r46720655.


Dum inter homines sumus, colamus humanitatem.

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

It's a luxury--it would be encoded in the padded array, although if you had an odd number of padding elements, you would need to know which side had the extra element. You could find that out by comparing your coordinate vectors, or by knowing which side the extra element is padded on. I personally think it makes life easier, if a tad more messy.

You're completely right! I hadn't thought of odd number padding. That's perfect the way it is then.

Maybe you could add a small example of how that argument looks? That might help clarify. It could even be a doctest like this one. That would help documenting and testing at the same time.

This comment has been minimized.

@drandykass

drandykass Dec 11, 2015

Contributor

Could you clarify what you mean by adding example of how the argument looks? I'm not sure what I'm looking at in the example. It just returns a list with each element containing a tuple with the number of cells padded on the lower and upper sides, respectively. Dimension order is the same as what was input.

raise ValueError(et)
else:
npt = npd
for ii in numpy.arange(len(npt)):

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Be careful of numpy.arange for looping over an index because it might not produce integers (that would break the indexing code below). It's always better to use range or xrange.

# [ ] if statements below - Finally add an elif to the if statements
# below. Add a descriptive comment.
if padtype.lower() not in padopts and not _is_number(padtype):
raise ValueError('gridder.pad_array: Pad Type not understood')

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

There is no need to add gridder.pad_array: to the exception. That information is already included in the traceback.

for ii in numpy.arange(len(npt)):
if npt[ii] <= a.shape[ii]:
# How are you supposed to properly assign a multi-line string?
et1 = 'gridder.pad_array: Desired padding is less than array '

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

First, you can remove gridder.pad_array: from the message.

Also, this might look better:

raise ValueError(
    "Very long error message " + 
    "that I'll split into multiple lines " +
    "using + operations.")
if _is_number(padtype):
# Pad with value
ap = numpy.pad(numpy.copy(a), nps, mode='constant',
constant_values = float(padtype))

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Doesn't numpy.pad already produce a copy? I would think so. Usually the inplace operations are done via array methods.

rp=nps[ii][1])
ap += m
if xy is not None:
cp = _padcoords(xy, a.shape, nps)

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Hum. This makes me think that maybe it would be simpler to just make _padcoords into a public pad_coords function that does the extending of the coordinate arrays. That would greatly simplify the code, calling and returning of this function.

Effectively, this is a complement to gridder.cut for when you already
know the number of elements to remove.
.. note: Unlike pad_array, this returns a slice of the input array.

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

If you replace Unlike pad_array, with

Unlike :func:`~fatiando.gridder.pad_array`,

Sphinx will replace that with a link the pad_array function docs.

coords = []
d = []
coordspad = []
for ii in numpy.arange(len(s)):

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Is ii something from Matlab or Fortran to avoid conflict with complex numbers? If so, that wouldn't be a problem in Python.

mesher.Prism(-1000,1000,-1000,1000,0,2000,{'density':-900}),
mesher.Prism(2000,4000,3000,4000,0,2000,{'density':1300})]
gz = prism.gz(x,y,z,model)
gz = gz.reshape(s)

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

The tests are great! Very comprehensive!

One this I'd change is this call to the forward modeling code. I don't think it's very good to rely on such complex code for a unit test. It would be much better to use a simpler function, like x**2 + y**3 or random numbers like below.

This comment has been minimized.

@drandykass

drandykass Dec 11, 2015

Contributor

Do you have objection to using the Rosenbrock function as an example? If so, I can easily write the rosenbrock function, or I can use the scipy.optimize package. I'd prefer the latter, but can understand if you don't want to tie to that. Let me know what you prefer.

def test_pad_and_unpad_equal_1d():
'gridder.pad_array and subsequent .unpad_array gives original array: 1D'
x = np.random.rand(21)

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

It's better to use a defined random seed in unit tests to make sure that you can always repeat a failing test later on.

gz = gz.reshape(s)
gpad,_,nps = gridder.pad_array(gz)
gunpad = gridder.unpad_array(gpad,nps)
assert_almost(gunpad,gz)

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Could you add a for loop here that checks all available padding options?

mesher.Prism(-1000,1000,-1000,1000,0,2000,{'density':-900}),
mesher.Prism(2000,4000,3000,4000,0,2000,{'density':1300})]
gz = prism.gz(x,y,z,model)
gz = gz.reshape(s)

This comment has been minimized.

@leouieda

leouieda Dec 4, 2015

Member

Same thing here about the use of prisms.

@leouieda

This comment has been minimized.

Member

leouieda commented Dec 4, 2015

One last thing, for some reason the PEP8 unit test is not catching the format errors as it should. You can run make pep8 to get a full report. It will tell what is off and what you have to do to fix it.

@drandykass

This comment has been minimized.

Contributor

drandykass commented Dec 4, 2015

Yes, I was replying to the emails--gmail was showing me the code though, so it was working ok.

I will apply all these suggestions, with a bit of a discussion on a few.
--The ii is an old habit to simply distinguish from imaginary numbers. Actually, any repeated variable in my code is guaranteed to be an indexing variable (just a personal style thing)

--I hear you on the pad_coords. I can certainly make that public. Do you think there's still value in having it be called from pad_array though, for simplicity to the end user? I'm ok either way.
Option a: the user sends their array and coordinate arrays to pad_array, though pad_coords is public for whomever wants to use it
Option b: the user sends an array to pad_array, and their coordinate vectors to pad_coords

--I'm not sure if numpy.pad produces a copy. I'll check and do what is appropriate (let's not make 2 copies!)

--Other than that, I agree 100% on everything else. I'll apply those changes, write up a changelog, and get that submitted this weekend sometime, and we can iterate and see if there's anything else. Thanks for that detailed review!

@leouieda

This comment has been minimized.

Member

leouieda commented Dec 8, 2015

The ii is an old habit to simply distinguish from imaginary numbers. Actually, any repeated variable in my code is guaranteed to be an indexing variable (just a personal style thing)

OK, not a problem. Just found it curious and I think I've seen it before (I think Valeria uses that sometimes).

I hear you on the pad_coords. I can certainly make that public. Do you think there's still value in having it be called from pad_array though, for simplicity to the end user? I'm ok either way. Option a: the user sends their array and coordinate arrays to pad_array, though pad_coords is public for whomever wants to use it. Option b: the user sends an array to pad_array, and their coordinate vectors to pad_coords.

I tend towards option b: pad_array only acts on the data array and the user calls pad_coords if he/she wants. That way pad_array doesn't need to know about coordinates at all and the code doesn't have to deal with optional arguments.

I'm not sure if numpy.pad produces a copy. I'll check and do what is appropriate (let's not make 2 copies!)

I ran some tests here and it does already make a copy.

Other than that, I agree 100% on everything else. I'll apply those changes, write up a changelog, and get that submitted this weekend sometime, and we can iterate and see if there's anything else. Thanks for that detailed review!

No problem! Reviewing is something that I really want to establish as a habit for Fatiando. You can pay me back with some reviews in the future 😉 Post a comment here when you push your updates so that I get a notification.

@leouieda

This comment has been minimized.

Member

leouieda commented Dec 9, 2015

@drandykass I just merged in some changes to fix a few problems with the documentation build. Could you please merge these into your branch?

@leouieda leouieda force-pushed the drandykass:pad_arrays branch from a86a6f7 to dec6aa1 May 6, 2016

@leouieda leouieda added this to the 0.5 milestone May 6, 2016

@leouieda

This comment has been minimized.

Member

leouieda commented May 6, 2016

@drandykass I think I've got things working. For the record, what I did was:

  1. Got a clone of your fork

  2. Registered fatiando/fatiando as a remote option:

    git remote add upstread git@github.com:fatiando/fatiando.git
    
  3. Pull in changes from upstream (fatiando/fatiando) to your master branch:

    git checkout master
    git pull upstream master
    
  4. Update the master branch on your fork:

    git push origin master
    
  5. Rebase your branch on the current master branch. This applies the commits
    from your branch on top of the current master. What happens is that this ends
    up rewriting all your commits (keeping same message etc, just the hash
    changes).

    git checkout pad_arrays
    git rebase master
    
  6. Fix all the merge conflicts that appeared (only the changelog and
    contributors).

  7. Force push to you branch the new commits, overwriting the old ones:

    git push -f origin pad_arrays
    
@leouieda

This comment has been minimized.

Member

leouieda commented May 6, 2016

Now everything is up dated and passing tests. So I'll happily merge this! Congrats and thank you!

@leouieda leouieda merged commit d41d2cd into fatiando:master May 6, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.5%) to 70.432%
Details
@drandykass

This comment has been minimized.

Contributor

drandykass commented May 6, 2016

Thanks for all your help and hand-holding! Now I need the fourier forward
modeling done!

On Thu, May 5, 2016 at 6:12 PM, Leonardo Uieda notifications@github.com
wrote:

Merged #239 #239.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#239 (comment)


Dum inter homines sumus, colamus humanitatem.

@leouieda

This comment has been minimized.

Member

leouieda commented May 6, 2016

Not a problem! One thing I figured out these past 2 years as a prof is that I love teaching.

Welcome (officially) to the contributors list! http://www.fatiando.org/dev/contributors.html

mtb-za added a commit to mtb-za/fatiando that referenced this pull request May 31, 2016

Add array padding functions to gridder (fatiando#239)
Three new functions in gridder: pad_array, unpad_array, pad_coords.
Extend a given grid using various methods to pad values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment