# Aliasing and Varying Behavior of Different Data Types when Passed to Functions

This Jupyter notebook discusses two topics which, on their face, may seem unrelated.  To the contrary, the same factor determines whether aliasing occurs and how data behaves when variables are passed to functions.

This factor is whether a variable's memory address (i.e., reference) is used or a copy of the variable's value(s) are used in assignment statements or when variables are passed to function.  In the first instance, assignment statements, aliasing is possible and changing values of variables passed to functions, in the second instance, can exhibit quite different behavior.

What exactly is aliasing?  Let's answer that question by demonstration.

# Aliasing <a name = 'aliasing' />

Aliasing the the act of referencing the same memory location/variable with different variables.  It happens most often unintentionally and can cause a frustrating debugging adventure.

Aliasing does not occur with <font face='courier'>int</font> and <font face='courier'>float</font> data types.  In the cell below, the values of <code>p</code> and <code>q</code> refer to different memory locations, as we might expect.  Although the value of <code>q</code> is set using <code>p</code>, we can change their values independently. 

In [None]:
p = 1
q = p
p = 2
print(f'p = {p}; q ={q}')

In [None]:
p = 1.0
q = p
p = 2.0
print(f'p = {p}; q ={q}')

Aliasing can occur, however, with these data types:

- <code>list</code>
- <code>dictionary</code>
- <code>numpy</code> arrays
- <code>pandas DataFrame</code>s and <code>Series</code>

When we assign a list to another list, however, both list names point to the same data, which is to say that they both pint to the same place in memory.

In [None]:
x = [1,2]
y = x
y[0] = 99
print(f'x = {x}; y = {y}')

Aliasing occurs with <code>numpy</code> when an assignment statement is used and its left-hand side results in the creation, or re-creation of a complete numpy variable (rather than an indexed portion of a <code>numpy</code> array.

In [None]:
import numpy as np

In [None]:
arr = np.arange(12).reshape(3,4)
arr

In [None]:
arr2 = arr
arr2[1] = np.zeros(arr.shape[1])
print(f'arr2 = \n{arr2}\n')
print(f'arr = \n{arr}')

In [None]:
arr3 = arr[2]
arr3[2] = 99

print(f'arr3 = \n{arr3}\n')
print(f'arr2 = \n{arr2}\n')
print(f'arr = \n{arr}')

In [None]:
arr4 = np.zeros((4,4))
arr4

Note that aliasing does not occur in the example below when the left-hand side of the assignment statement refers to a slice of a <code>numpy</code> array.

In [None]:
arr4[2] = arr3
arr4[2,0] = 99

print(f'arr4 = \n{arr4}\n')
print(f'arr3 = \n{arr3}\n')


Be careful with functions that return <code>numpy</code> arrays, or portions of them because these are assignment statements as well.

In this example the fitness of a population individual is simply the number of ones that it contains.  The <code>mutate()</code> function mutates a gene (nonrandomly) in a way that increases it fitness.  The <code>stat()</code> function keeps track of the best fitness score and the best solution that resulted in the best fitness.

In [None]:
def fit(pop):
    return pop.sum(axis=1)

def stat(pop, fitness, best_fit, best_soln):
    if fitness.max() > best_fit:
        return fitness.max(), pop[fitness.argmax()]
    else:
        return best_fit, best_soln

In [None]:
''' Initialize population '''
pop = np.array([[0,0,0,0,0],
                [1,0,0,0,0],
                [1,1,0,0,0]])
best_fit = 0
best_soln = np.array([0,0,0,0])

''' Select parents'''

''' Crossover '''

''' Mutate '''

''' Compute fitness '''
fitness = fit(pop)

''' Keep track of best solution '''
best_fit, best_soln = stat(pop, fitness, best_fit, best_soln)
print(f'best_soln: {best_soln}')

In [None]:
pop[2,0] = 0

In [None]:
pop

In [None]:
best_soln

Any operation on the population will unintentionally alter the variable <code>best_soln</code>.

Can anybody see where/how aliasing happened in the code above?

Note that aliasing does not occur in the example below when the left-hand side of the assignment statement refers to a slice of a <code>numpy</code> array.

In [None]:
arr4[2] = arr3
arr4[2,0] = 99

print(f'arr4 = \n{arr4}\n')
print(f'arr3 = \n{arr3}\n')


# What Causes Aliasing with Some Variable Types

Aliasing is the situation when two different variable names refer to the same data in memory.  That is, the definitions of both variables point, or have a reference, to the same memory address so that when you alter the memory using one variable the defintion is changed for the second variable also since both variables refer to the same memory location.

Aliasing in an assignment statement is caused, when it does occur, by the data transmitted from the right-hand side to the left-hand side of the assignment statement being in the form of a reference to a memory location rather than the value(s) of a variable.  This is called "passing a variable by reference" in the former case and "passing a variable by value" in the latter case.  Which happens is determined by the data type that is on the right-hand of an assignment statement.  Some data types are passed by reference and some by value, where aliasing being associated with the former case.

An incomplete list of how variables are passed to functions is shown below.

|Passed by Value|Passed by Reference|
|---|---|
|<code>int</code>|<code>list</code>|
|<code>float</code>|<code>dictionary</code>|
||<code>numpy</code> array|

# Avoiding Aliasing

Avoiding aliasing requires that a "copy" of a variable be assigned to another variable and how one would do that varies depending on what data type you are using.  This forces the information flowing from the left-die to the right side of an assignment statement to be values rather than a reference, which causes the values of a variable's elements to be assigned to another variable rather than passing one variable's address to another variable so that they are pointing at the same memory location.

Here are a couple strategies for lists: <code>.copy()</code> and using a slice.

In [None]:
x = [1,2]
y = x
y[0] = 99
print(f'x = {x}; y = {y}')

In [None]:
x = [1,2]
y = x.copy()
y[0] = 99
print(f'x = {x}; y = {y}')

In [None]:
x = [1,2]
y = x[:]
y[0] = 99
print(f'x = {x}; y = {y}')

Resolving aliasing with <code>numpy</code>

In [None]:
arr = np.arange(12).reshape(3,4)
arr

In [None]:
arr2 = arr
arr2[1] = np.zeros(arr.shape[1])
print(f'arr2 = \n{arr2}\n')
print(f'arr = \n{arr}')

In [None]:
arr = np.arange(12).reshape(3,4)
arr

In [None]:
arr2 = arr.copy()
arr2[1] = np.zeros(arr.shape[1])
print(f'arr2 = \n{arr2}\n')
print(f'arr = \n{arr}')

Note that using a slice in <code>numpy</code> does not work.

In [None]:
arr2 = arr[:,:]
arr2[1] = np.zeros(arr.shape[1])
print(f'arr2 = \n{arr2}\n')
print(f'arr = \n{arr}')

The cells below illustrate how to avoid aliasing in the previous example with the function <code>stat()</code> .

In [None]:
def fit(pop):
    return pop.sum(axis=1)

def stat(pop, fitness, best_fit, best_soln):
    if fitness.max() > best_fit:
        return fitness.max(), pop[fitness.argmax()].copy()
    else:
        return best_fit, best_soln

In [None]:
''' Initiate population '''
pop = np.array([[0,0,0,0,0],
                [1,0,0,0,0],
                [1,1,0,0,0]])
best_fit = 0
best_soln = np.array([0,0,0,0])

''' Select parents'''

''' Crossover '''

''' Mutate '''

''' Compute fitness '''
fitness = fit(pop)

''' Keep track of best solution '''
best_fit, best_soln = stat(pop, fitness, best_fit, best_soln)
print(f'best_soln: {best_soln}')

In [None]:
pop[2,0] = 0
print(f'pop = \n{pop}\n')
print(f'best_soln = \n{best_soln}')

# When is Aliasing Useful?

In my experience, I have never needed to use it on purpose.  It most often is done unintentionally and, because it is not an intuitive behavior, it can take a long time to diagnose and resolve.

# Function Arguments by Value and by Reference

This topic is closely related to aliasing because the underlying factor is whether a variable is passed to a function by its elements' values or whether a pointer to its memory location is passed (e.g., passing a reference).

Bottom line, if a variable is passed to a function by value, then it needs to be returned to the calling program if it is changed while a variable passed by reference can be changed directly in a function without needing to have its revised value passed back to the calling function.

An incomplete list of how variables are passed to functions is shown below.

|Passed by Value|Passed by Reference|
|---|---|
|<code>int</code>|<code>list</code>|
|<code>float</code>|<code>dictionary</code>|
||<code>numpy</code> array|

Note that if some new variable is created within a function, then it must be passed back to the calling program via a <code>return</code> statement.  Note also, that it is moot to discuss <code>tuples</code> since they are immutable and their valeus cannot be revised within a function or by any other means.

In [None]:
def change_int_float(x):
    x += 1
    
def change_dictionary(x):
    x['hello'] = 'good bye'
    
def change_numpy(x):
    x[0,0] = 99
    return x

In [None]:
import numpy as np

my_int = 3
print(f'my_int starting value: {my_int}')
change_int_float(my_int)
print(f'my_int ending value: {my_int}\n\n')

my_float = 5
print(f'my_float starting value: {my_float}')
change_int_float(my_float)
print(f'my_float ending value: {my_float}\n\n')

my_dictionary = {'hello':'greeting'}
print(f'my_dictionary starting value: {my_dictionary}')
change_dictionary(my_dictionary)
print(f'my_dictionary ending value: {my_dictionary}\n\n')

my_np_array = np.arange(16).reshape(4,4)
print(f'my_np_array starting value: \n{my_np_array}')
change_numpy(my_np_array)
print(f'my_np_array ending value: \n{my_np_array}')