# Goals of this Notebook

To learn or recall:
- Iteration through elements in data structures
  - <code>for</code> loops
  - <code>while</code> loops
  - When each type of loop is best
    - Accessing data in variables
    - Altering data in variables
    - Creating variables
- Alternatives for creating variables that are collections of elements
  - List comprehension
  - Dictionary comprehension
  - Set comprehension
- Other general functions
  - Ternary functions
- Helpful function for <code>for</code> loops
  - <code>enumerate()</code>
- Other functions for handling variables that are collections of elements
  - <code>zip()</code> Function
  - <code>map()</code> Function
  - <code>filter()</code> Function
  
While the <font face='courier'>numpy</font> and <font face='courier'>pandas</font> packages have utilities for inputting data into <code>arrays</code> and <code>DataFrames</code>, respectively, programmers need to often input data files with base Python statements, and so we will demonstrate how to do that.  This knowledge is also essential for understanding what <font face='courier'>numpy</font> and <font face='courier'>pandas</font> are doing under the covers. In addition, data input provides a good opportunity for gaining experience with <code>for</code> loops which you will use in virtually every program you write.   

In addition, while the <font face='courier'>numpy</font> and <font face='courier'>pandas</font> packages ease data acquisition, they hide important details from you which are very important to know.  So, if you understand how to input data with base Python, then you will be in a better position to troubleshoot problems you might have with inputting data with <font face='courier'>numpy</font> and <font face='courier'>pandas</font>.

  
# Topics & Navigation <a name='navigation' />
- <a href = #ItIt>Iteration and Iterables</a>
- <a href = #data_for_loop>Text File Input with For Loops</a>
- <a href = #OtherComp>Comprehension Methods</a>
  - <a href = #ListComprehension>List Comprehension</a>
  - <a href = #DictComp>Dictionary Comprehension</a>
  - <a href = #SetComp>Set Comprehension</a>
- <a href = #enumerate>For loops with <code>enumerate</code></a>
- <a href = #zip_map_filter><code>zip()</code>, <code>map()</code>, and <code>filter()</code> Functions</a>
  - <a href = #asterisk_op><code>*</code> Operator</a>
  - <a href = #zip><code>zip()</code> Function</a>
  - <a href = #map><code>map()</code> Function</a>
  - <a href = #filter><code>filter()</code> Function</a>
  - <a href = #apps_zip_map_filter>Applications of <code>zip()</code>, <code>map()</code>, and <code>filter()</code></a>
- <a href = #aliasing>Aliasing</a>
- <a href = #RefVal>Passing References and Values</a>
- <a href = #Ternary> If-Else (Ternary) Statement</a>


# Iteration and Iterables<a name='ItIt' />
Back to <a href = '#navigation'>Navigation</a>

## _Iterables_   <a name='Iterables' />

We are familiar with <code>for</code> loops, which provide a basis for an effective functional definition of an _iterable_ data type, which is a data type that:

- can contain 0, 1, or many elements
- can be used with a <code>for</code> statement so that each element of the _iterable_ data type is made available in the iterations of the <code>for</code> loop

<code>for x in _put-iterable-here_:
    ...</code>
    
These are well-known basic Python _iterable_ data types:

- list
- range()
- tuple
- dictionary
- set
- string

The term _iterable_ is often used as a noun, as a short form of '_iterable_ data type'.

The cells below illustrate how these data types are iterable in <code>for</code> loops.

We will later show other data types that are iterable and, therefore, can be included in <code>for</code> loops and in list comprehension statements.

Computers are powerful because they can iterate through large data sets and make computations. Iterable data and loops are the programming components that enable that capability.

### Looping by element

In [None]:
my_list = [0, 1, 2]
my_range = range(3)
my_tup = (0, 1, 2)
my_dct = {0:'zero', 1:'one', 2:'two'}
my_set = {0, 1, 2}
my_str = 'hello'

In [None]:
for e in my_list:
    print(e)

In [None]:
for e in my_range:
    print(e)

In [None]:
for e in my_tup:
    print(e)

In [None]:
for e in my_dct.items():
    print(e)

In [None]:
for e in my_set:
    print(e)

In [None]:
for e in my_str:
    print(e)

### Looping by index

In [None]:
for i in range(len(my_list)):
    print(my_list[i])

Looping by index is required when the variable elements are being changed.

In [None]:
this_list = ['1','2','3','4','5']

In [None]:
for element in this_list:
    element = int(element)
this_list

In [None]:
for i in range(len(this_list)):
    this_list[i] = int(this_list[i])
this_list

##  Dictionaries <a name = "dictionaries" />
Back to <a href = '#navigation'>Navigation</a>

Dictionaries:
- Are indicated by curly braces each of its elements, which are separated by commas, have two components separated by a colon
- Dictionary element _keys_ are indicated before the colon
- Dictionary element _values_ are indicated after the colon
- Each key is unique: only one dictionary may have any particular key
- We often think of a dictionary key as a way to look up a dictionary value

You can simply use a key to look up a dictionary value or change the value.

In [None]:
# Example dictionary
myDiction = {0:'cero', 1:'uno', 2:'sod', 3:'tres'}

Referring to a dictionary element by _key_

In [None]:
print(myDiction[2])

Creating or changing a dictionary value associated with a key by placing the reference to the dictionary element on the left dise of an assignment statement

In [None]:
myDiction[2] = 'dos'
print(myDiction)

In [None]:
myDiction[4] = 'quatro'

## Looping through Dictionaries

We can iterate through dictionaries in three modes:
- By keys
- By values
- By keys and values

Dictionaries are by default _iterated_ through by key, which means that we access all of the uniques in one manner or another.  Two of the ways are illustrated below, the first of which shows that when we simply put the dictionary name in a for statement, we iterate through teh keys by default.  We can also explicitly use the <font face = 'courier'>.keys()</font> property of dictionaries to remind ourselves that we are iterating through the keys: this is the second approach below.

In [None]:
for key in myDiction:
    print(key)

In [None]:
for k in myDiction.keys():
    print(k)

In [None]:
for v in myDiction.values():
    print(v)

A frequently-used method of looping explicitly through both the keys and values in a dictionary is as follows using the <font face = 'courier'>.items()</font> method of dictionaries.  When two dummy variables are used in the for statement each key and value pair are _unpacked_ into the two variables.

In [None]:
for key,value in myDiction.items():
    print(key,value)

Or, you can use this more succinct code.  As you get more experience it will be obvious that you are dealing with a dictionary and <font face='courier'>k</font> is the key and <font face='courier'>v</font> is the value.

In [None]:
for k,v in myDiction.items():
    print(k,v)

# Data Input with For Loops <a name='data_for_loop' />
Back to <a href = '#navigation'>Navigation</a>

## Full Disclosure

As discussed previously, there exist easier ways to read data files into Python with <font face='courier'>numpy</font> and <font face='courier'>pandas</font> than using <code>for</code> loops, as shown below.

It is best in many circumstances, however, to input data with base Python commands and avoiding learning this technique will be to your detriment later on.  Besides inputting data and learning basic Python text cleansing at the same time, competence with <code>for</code> loops is important basis for learning:
- List comprehension
- Dictionary and set comprehension
- Generators and iterators

In [None]:
import numpy as np
import pandas as pd

In [None]:
np_nw1999 = np.genfromtxt('files/NorfolkWeather1999.csv', delimiter = ',')
np_nw1999[:5]

In [None]:
df_nw2018 = pd.read_csv('files/NorfolkWeather2018.csv', names = ['index', 'temp'], index_col = 'index')
df_nw2018[:5]

# Handling Data with Basic Python


This first statement imports <font face = 'courier'>matplotlib</font>.  

In [None]:
import matplotlib.pyplot as plt 

## 1D Data


* In a nutshell
  * 1-dimensional time series data
  * Inputting data from text file
  * Computing moving average

_Time series data_ are data that are collected over time at constant intervals.  Examples include, daily or hourly weather data, daily sales of an item or all company sales in total (either in units or dollar volume), number of daily visits to a web site, and annual gross domestic product.

One issue that arises with time series data is to summarize the pattern over time despite there being frequent highs and lows (noise) and highs and lows over longer periods (either economic cycles or seasonal effects).  The _moving average_ is frequently used to summarize the trend that is muddied by noise over the short-term.

We will work with inputting two 1D time series from two text files.
- <code>NorfolkWeather1999_1.csv</code>
- <code>NorfolkWeather2008_1.csv</code>

Note that data read from text files is always interpreted as string data.  It must be converted to numerical quantitites for plotting. 

In [None]:
f = open('files/NorfolkWeather1999_1.csv','r')
nw1999 = f.readlines()
f.close()

for i in range(len(nw1999)):
    nw1999[i] = float(nw1999[i])

x = range(len(nw1999))
plt.plot(x, nw1999, label = '1999')
plt.show()

Note the improved method of opening files for input below.  No <code>f.close()</code> statement is needed as the file is implicitly closed after completion of the code within the <code>with</code> block.  This approach is less prone to errors.  The same method can be used also when writing to files.

In [None]:
with open('files/NorfolkWeather1999_1.csv','r') as f:
    nw1999 = f.readlines()

for i in range(len(nw1999)):
    nw1999[i] = float(nw1999[i])

x = range(len(nw1999))
plt.plot(x, nw1999, label = '1999')
plt.show()

We can iterate through copies of values with a <code>for</code> loop in this situation as well as long as we accumulate the data in another variable with <code>.append</code>.

In [None]:
with open('files/NorfolkWeather1999_1.csv','r') as f:
    data = f.readlines()

nw1999 = []
for t in data:
    nw1999.append(float(t))

x = range(len(nw1999))
plt.plot(x, nw1999, label = '1999')
plt.show()

## A 1D Data Application: The Moving Average

Let our time series data be collected for time indices $d \in \left\lbrace 0, ... , D-1 \right\rbrace$, where $d$ stands for the index for a given day. We will be considering daily time series for a year, so $D=365$.  A moving average, $ma_n(d)$ of $n$ days for day $d$ is defined as:

$ma_n(d) = \frac{1}{n} \Sigma_{i = d-n+1}^d t(i)$

where $t(d)$ is the average temperature for day $d$.  Note that $ma(d)$ is defined only for day in this range: $d \in \left\lbrace n-1, \ldots ,    D-1 \right\rbrace$.

*Click on the image below* to see a visualization of how a moving average is computed in a way that may be more clear than the math above.

[![Moving Average Animation](https://img.youtube.com/vi/rq__3P_rs4M/0.jpg)](https://www.youtube.com/watch?v=rq__3P_rs4M)

The files <font face = 'courier'>NorfolkWeather1999_1.csv</font> and <font face = 'courier'>NorfolkWeather2018_1.csv</font>  have the daily average temperatures for the years indicated in the filenames.  Load each of these text files into a list (load only the daily average temperature and not the row index), and then create new lists, <font face = 'courier'>ma1999</font> and <font face = 'courier'>ma2018</font>, that are 10-day moving averages of the original data.

Although we are not focusing on plotting in this session, we will compare the two moving average series using the <font face = 'courier'>matplotlib</font> package.

Let's compute and plot the moving averages of the two data series using <font face = 'courier'>for</font> loops and compare the two results.

Read in the 2018 data.

In [None]:
with open('files/NorfolkWeather2018_1.csv','r') as f:
    nw2018 = f.readlines()

for i in range(len(nw1999)):
    nw2018[i] = float(nw2018[i])

x = range(len(nw2018))
plt.plot(x, nw2018, label = '2018')
plt.show()

In [None]:
''' period: 
      * a variable representing the number of periods that are averaged together to create
        the moving average 
      * Given the zero-based indexing of Python, the first moving average can be computed
        in Period period - 1 '''
period = 10

ma1999 = []  # Initialize moving average list for 1999
ma2018 = []  # Initialize moving average list for 2018

for i in range(0, len(nw1999) - period + 1):
    ma1999.append(sum(nw1999[i:i + period])/period)
    ma2018.append(sum(nw2018[i:i + period])/period)

plt.plot(ma1999, label = '1999')
plt.plot(ma2018, label = '2018')
plt.legend()

plt.show()

In [None]:
plt.plot(ma1999, label = '1999 Avg.')
plt.plot(nw1999, label = '1999 Temps')
plt.legend()

plt.show()

## 2D Data Input

These files have two data elements per line, an integer index and a temperature, separated by commas.

- <code>NorfolkWeather1999.csv</code>
- <code>NorfolkWeather2008.csv</code>

We'll need to take extra care to split the elements of each line at the commas before we convert the data to numerical values.  __Please note that we do not split the data when there exists only one element per row, but we do need split row elements apart when multiple are in each row of data.__

In [None]:
with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()

x = []
y1999 = []

for i in range(len(nw1999)):
    nw1999[i] = nw1999[i].strip()
    nw1999[i] = nw1999[i].split(',')
    x.append(int(nw1999[i][0]))
    y1999.append(float(nw1999[i][1]))

plt.plot(x, y1999, label = '1999')
plt.show()

Now, input the 2018 data.

In [None]:
with open('files/NorfolkWeather2018.csv','r') as f:
    nw2018 = f.readlines()

y2018 = []

for i in range(len(nw2018)):
    nw2018[i] = nw2018[i].strip()
    nw2018[i] = nw2018[i].split(',')
    y2018.append(float(nw2018[i][1]))

""" Plot both data series """
plt.plot(x, y1999, label = '1999')
plt.plot(x, y2018, label = '2018')
plt.legend()
plt.show()

## An Improvement: Chaining Commands

Commands can be chained done whenever a command returns the data that is being manipulated, which sometimes is not case.  When an executed statement does not return the the data, then we say the operation is being executed _in place_.  

Using <code>sort()</code> with a list is an _in place_ operation: it sorts the list for next time we access it, but it does not return the sortedlist so that we cannot use the <code>.sort()</code> method in an assignment statement to change the value of another variable.

Using <font style="font-family:'courier'">.strip()</font> and <font style="font-family:'courier'">.split()</font> do return the revised data, as shown below, so they are not _in place_ operations.

In [None]:
with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()

x, y1999 = [], []

for i in range(len(nw1999)):
    nw1999[i] = nw1999[i].strip().split(',')
    #nw1999[i] = nw1999[i].split(',')
    x.append(int(nw1999[i][0]))
    y1999.append(float(nw1999[i][1]))

plt.plot(x, y1999, label = '1999')
plt.show()

## Another Improvement: A More General Function

The two code blocks could be combined and the repetitive cleansing steps can be put into a function to clean up the code.  The function name <code>ssc()</code> is an acronym for strip, split, and convert.

In [None]:
def ssc(data):
    x = []
    y = []
    for i in range(len(data)):
        data[i] = data[i].strip().split(',')
        x.append(int(data[i][0]))
        y.append(float(data[i][1]))
    return x, y
    
with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()

with open('files/NorfolkWeather2018.csv','r') as f:
    nw2018 = f.readlines()

x, y1999 = ssc(nw1999)
x, y2018 = ssc(nw2018)
plt.plot(x, y1999, label = '1999')
plt.plot(x, y2018, label = '2018')
plt.legend()
plt.show()

## Generalized 2D Input

Rows of data most often will have more than two elements in which case it makes sense to loop through the row elements with a <code>for</code> loop rather than converting the row elements explicitly, as we have done previously.

This approach is not flexible, however, and we would need to write a new function each time we wanted to convert data in an input file depending on how many fields were in the data and what types we wanted to convert them to.  The conversion function can be made more flexible by supplying it with the data for conversion in the form of a dictionary with its keys equal to the column indices and its values indicating the data types.

In [None]:
def ssc_2d_map(convert, data):
    for i in range(len(data)):
        data[i] = data[i].strip().split(',')
        for j, mk_type in convert.items():
            data[i][j] = mk_type(data[i][j])
    return data

In [None]:
with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()

''' convert_map dictionary: key = filed index; value = data type'''
convert_map = {0:int, 1:float}
nw1999 = ssc_2d_map(convert_map, nw1999)

''' Create x and y series data '''
x = []
y1999 = []
for point in nw1999:
    x.append(point[0])
    y1999.append(point[1])
    
plt.plot(x, y1999, label = '1999')
plt.show()

We can simplify one of the functions above as follows.  This is possible because some Python data types are altered _in place_ in functions so that we do not need to return them.  For people with CS backgrounds this is the difference of passing variables to function by reference versus by value.  We will discuss this concept (and when you can use it) later in the lecture.

In [None]:
def ssc_2d_map_1(convert, data):
    for i in range(len(data)):
        data[i] = data[i].strip().split(',')
        for j, mk_type in convert.items():
            data[i][j] = mk_type(data[i][j])
    #return data

In [None]:
with open('files/NorfolkWeather1999.csv','r') as f:
    data = f.readlines()

convert_map = {0:int, 1:float}
print(data)
#data = ssc3(data)
ssc_2d_map_1(convert_map, data)
        
print(data)

## 2D Input Exercise

In this case we will read in a matrix or array of values from the file <code>matrix.csv</code>.  Create a list of lists where each sublist contains the elements from each row of the data file.

In [None]:
with open('files/matrix.csv', 'r') as f:
    data = f.readlines()

''' Let us finish the code here '''
data

In [None]:
a = [[0,1,2,3],[4,5,6,7], [8,9,10,11], [12,13,14,15]]

In [None]:
rows = 8
cols = 2

# Flatten the list of lists: 1 by 16
temp = []
for row in a:
    for ele in row:
        temp.append(ele)
print(temp)

b = []
for r in range(rows):
    newRow = []
    for c in range(cols):
        newRow.append(temp[r*cols + c])
    b.append(newRow)
print(b)

# From Loops to List Comprehesion <a name='ListComprehension' />
Back to <a href = '#navigation'>Navigation</a>

List comprehension is a technique for condensing the number of statements required to create a new list using for loops.  

- More succinct code
- More readable code
- Faster than <code>for</code> loops

The cells immediately below demonstrate the simplest form of list comprehension.

In [None]:
import random
numElements = 10
newList = []

for i in range(numElements):
    newList.append(random.random())
newList

The method for converting a <code>for</code> loop into a list comprehension state is illustrated below.

![List Comprehension Structure](images/listComprehension1.jpg)
![List Comprehension Example](images/listComprehension.jpg)

In [None]:
import random
numElements = 10

newList1 = [random.random() for i in range(numElements)]
newList1

Logic can be applied within list comprehension statements to filter the elements appended to the list.

In [None]:
newList2 = []

for i in range(21):
    if i % 2 ==0:
        newList2.append(i)
newList2



![List Comprehension Structure w/Logic](images/listComprehension2.jpg)

For example, we could use the following code if we wanted to create a list of even numbers through 20.

In [None]:
newList3 = [i for i in range(21) if i%2 == 0]
newList3

Compound Boolean filtering logic can be used as well.

In [None]:
newList4 = [i for i in range(21) if i%2 == 0 and i>7]
newList4

More complex <code>if-elif-else</code> logic can be implemented, although having multiple levels can make the statement difficult to read.

Notice that the location of the Boolean statement changes when the <code>else</code> is introduced.

In [None]:
newList5 = []
for i in range(21):
    if i % 2 == 0:
        newList5.append(i)
    else:
        newList5.append('')

newList5


![List Comprehension Structure w/If-Else](images/list_comp_if_else.jpg)


In [None]:
newList5 = [i if i % 2 == 0 else '' for i in range(21)]
newList5

Keeping the <code>if</code> list comprehension statement syntax straight with the <code>if-else</code> syntax can be difficult.  The reason the <code>if</code> and the else go together are that, together, they form a coherent Python statement, which is called a _ternary_ statement, which can be used in an assignment statement.

When the ternary statement is viewed as a single statement, this form of list comprehension really is of the same for of the most simplest version without a conditional statement that we discussed first.

# Ternary Statements<a name = 'Ternary' />
Back to <a href = '#navigation'>Navigation</a>

Ternary operations are executed in different ways across programming languages.  In Python, a ternary statement takes the form of an <code>if-else</code> statement and they provide a succinct, readable method for <code>if-else</code> constructs on a single line.

In [None]:
i = 5
'even' if i%2==0 else 'odd'

In [None]:
i = 6
result = 'even' if i%2==0 else 'odd'
result

In [None]:
e_o = lambda x: 'Even' if x%2 == 0 else 'Odd'

In [None]:
e_o(9)

Multiple levels of <code>if-else</code>.  These are actually nested _ternary_ statement.

In [None]:
data = [1, 2.1828, 'string', [0, 1, 2], (1,2), 2, 3.5]
newList6 = ['int' if isinstance(x, int) else 'float' if isinstance(x, float) else 'other' for x in data]
newList6

## Inputting Data and Computations with List Comprehension

Let's work with 1D data first.

In [None]:
with open('files/NorfolkWeather2018_1.csv','r') as f:
    nw2018 = f.readlines()

for i in range(len(nw2018)):
    nw2018[i] = float(nw2018[i].strip())

""" Plot both data series """
plt.plot(x, nw2018, label = '2018')
plt.legend()
plt.show()

In [None]:
with open('files/NorfolkWeather2018_1.csv','r') as f:
    nw2018 = f.readlines()

nw2018 = [float(nw2018[i].strip()) for i in range(len(nw2018))]

""" Plot both data series """
plt.plot(x, nw2018, label = '2018')
plt.legend()
plt.show()

In [None]:
with open('files/NorfolkWeather2018_1.csv','r') as f:
    nw2018 = f.readlines()

nw2018 = [float(d.strip()) for d in nw2018]

""" Plot both data series """
plt.plot(x, nw2018, label = '2018')
plt.legend()
plt.show()

We can make the code above more succinct by recognizing that <code> f.readliens()</code> creates an iterable data structure that we can iterate through with <code>for</code>.

In [None]:
with open('files/NorfolkWeather2018_1.csv','r')as f:
    nw2018 = [float(d.strip()) for d in f.readlines()]

""" Plot both data series """
plt.plot(x, nw2018, label = '2018')
plt.legend()
plt.show()

The cell below refreshes our memory on one of the 2D functions for stripping, splitting, and converting data.  Subsequent cells illustrate how the code can be simplified and made more succinct with list comprehension.  We will simplify this example a bit by assuming that both row elements need to be converted to floating point values.  More complex examples can be handled by using a mapping function.

In [None]:
def ssc_2d(data):
    for i in range(len(data)):
        data[i] = data[i].strip().split(',')
        for j in range(len(data[i])):
            data[i][j] = float(data[i][j])
    return data

with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
nw1999 = ssc_2d(nw1999)
nw1999

In [None]:
def ssc_2d(data):
    result = []
    for row in data:
        row = row.strip().split(',')
        new_row = []
        for e in row:
            new_row.append(float(e))
        result.append(new_row)
        
    return result

with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
nw1999 = ssc_2d(nw1999)
nw1999

In [None]:
def ssc_2d(data):
    result = []
    for row in data:
        row = row.strip().split(',')
        new_row = [float(e) for e in row]
        result.append(new_row)
        
    return result

with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
nw1999 = ssc_2d(nw1999)
nw1999

In [None]:
def ssc_2d(data):
    result = []
    for row in data:
        row = row.strip().split(',')
        result.append([float(e) for e in row])
        
    return result

with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
nw1999 = ssc_2d(nw1999)
nw1999

In [None]:
def ssc_2d(data):
    result = []
    for row in data:
        result.append([float(e) for e in row.strip().split(',')])
        
    return result

with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
nw1999 = ssc_2d(nw1999)
nw1999

In [None]:
def ssc_2d(data):
    result = [[float(e) for e in row.strip().split(',')] for row in data]
    return result

with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
nw1999 = ssc_2d(nw1999)
nw1999

In [None]:
def ssc_2d(data):
    return [[float(e) for e in row.strip().split(',')] for row in data]

with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
nw1999 = ssc_2d(nw1999)
nw1999

This problem is more difficult if the row elements need to be converted to different data types.  But tackling this increased complexity is easy later on when we learn a new tool.

## Computations with List Comprehension

The moving average computations with <code>for</code> loops can be done with list comprehension as well.

The first cell repeats our original code.

In [None]:
''' period: 
      * a variable representing the number of periods that are averaged together to create
        the moving average 
      * Given the zero-based indexing of Python, the first moving average can be computed
        in Period period - 1 '''
period = 10

ma1999 = []  # Initialize moving average list for 1999
ma2018 = []  # Initialize moving average list for 2018

for i in range(0, len(nw1999) - period + 1):
    ma1999.append(sum(nw1999[i:i + period])/period)
    ma2018.append(sum(nw2018[i:i + period])/period)

plt.plot(ma1999, label = '1999')
plt.plot(ma2018, label = '2018')
plt.legend()

plt.show()

In [None]:
period = 10

ma1999 = [sum(nw1999[i:i + period])/period for i in range(0,len(nw1999) - period + 1)]
ma2018 = [sum(nw2018[i:i + period])/period for i in range(0,len(nw2018) - period + 1)]

plt.plot(ma1999, label = '1999')
plt.plot(ma2018, label = '2018')
plt.legend()

plt.show()

# Other Comprehension Methods <a name='OtherComp' />

## Dictionary Comprehension <a name='DictComp' />
Back to <a href = '#navigation'>Navigation</a>

Dictionaries can be created with _dictionary comprehension_  just as lists can be generated with _list comprehension_.  We will create a dictionary from a text file, both with a for loop and with dictionary comprehension.  But first, let's practice with some hard-coded data.

The differences between list comprehension and dictionary comprehension are that the latter uses curly braces rather than square brackets and the specification for what is appended to the dictionary requires both a key and a value, separated by a colon. Here, a sequence of 2-tuples is converted into a dictionary:

<code>
    tups = [(0, 'cero'), (1, 'uno'), (2, 'dos')]
    newDictionary = {k:v for k,v in tups}
</code>

Let's convert a list of lists into a dictionary, where each sublist contains the name of a national parks and its acreage.  Specifically, the national park name will be the key and the acreage will be the value.

The first cells show multiple ways to use for loops for this task and the corresponding methods with dictionary comprehension.

Source: National Parks data was extracted from [Wikipedia National Parks page](https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States).

In [None]:
natParksList = [['Acadia',49075.26], ['Yellowstone',2219790.71], ['Yosemite',761747.5], ['Shenandoah',199223.77], \
                ['Saguro',91715.72], ['Petrified Forest',221390.21]]
natParksList

### Dictionaries from _for_ Loops
Back to <a href = '#navigation'>Navigation</a>

For this demonstration, let's assume that we have a list of lists, where each sublist contains the name of a national park and its acreage.  Let's construct a dictionary from this list of lists with the national park name as the key and the acreage as the value, first, with <code>for</code> loops.  Later, we will do the same with dictionary comprehension.

Source: National Parks data was extracted from [Wikipedia National Parks page](https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States).

In [None]:
natParksDict1 = {}
for i in range(len(natParksList)):
    natParksDict1[natParksList[i][0]] = natParksList[i][1]
natParksDict1

In [None]:
natParksDict2 = {}
for x in natParksList:
    natParksDict2[x[0]] = x[1]
natParksDict2

In [None]:
natParksDict3 = {}
for name, acreage in natParksList:
    natParksDict3[name] = acreage
natParksDict3

### Dictionary Comprehension  <a name='DictComp' />
Back to <a href = '#navigation'>Navigation</a>

The logic of unpacking a for loop and inserting the contents into a dictionary comprehension statement is the same as with list comprehension, except for tweaks that are appropriate for dictionaries: 
- Curly braces specify a dictionary is being created rather than a list, which uses square brackets
- We need to specify a key and a value, separated by a colon, to be inserted in each implicit iteration.

![Dictionary Comprehension Template](images/dictComp.jpg)

Here is another prototypical statement of dictionary comprehension:

<font face='courier'>newDictionary = {k:v for k,v in some_iterator}</font>

Dictionary comprehension is useful when filestreams are read into dictionaries, or if, for example, lists need to be converted to dictionaries.

In [None]:
natParksDict1DC = {natParksList[i][0]:natParksList[i][1] for i in range(len(natParksList))}
natParksDict1DC

In [None]:
natParksDict2DC = {x[0]:x[1] for x in natParksList}
natParksDict2DC

In [None]:
natParksDict3DC = {name:acreage for name,acreage in natParksList}
natParksDict3DC

Here is an example with input data from a text file.  Each line of the file <font face = 'courier'>npNameLoc.txt</font> contains the name of a United States National Park and the state in which that park is located.  Read the file and create a dictionary where the key is the name of the park and the value is a string indicating the state in which the park is located.

In [None]:
""" Read the data """
with open('files/npNameLoc.txt','r') as f_in:
    data = f_in.readlines()

""" Create the dictionary """
natParkDict1 = {}
for i in range(1,len(data)):
    data[i] = data[i].strip().split(',')
    natParkDict1[data[i][0]] = data[i][1]
natParkDict1

### Complete the task above with dictionary comprehension

The cells below progressively make the code more succinct.  At what point does reducing the number of lines of code cause the code to be less readable in your opinion?

In [None]:
""" Read the data """
with open('files/npNameLoc.txt','r') as f_in: 
    data = f_in.readlines()

""" Create the dictionary """
for i in range(1,len(data)):
    data[i] = data[i].strip().split(',')
natParkDict1 = {data[i][0]:data[i][1] for i in range(1, len(data))}
natParkDict1

In [None]:
""" Read the data """
with open('files/npNameLoc.txt','r') as f_in: 
    data = f_in.readlines()
data = [line.strip().split(',') for line in data]

""" Create the dictionary """
natParkDict1 = {data[i][0]:data[i][1] for i in range(1, len(data))}
natParkDict1

In [None]:
""" Read the data """
with open(files/'npNameLoc.txt','r') as f_in: 
    data = [line.strip().split(',') for line in f_in.readlines()]

""" Create the dictionary """
natParkDict1 = {data[i][0]:data[i][1] for i in range(1, len(data))}
natParkDict1

In [None]:
""" Read the data """
with open('files/npNameLoc.txt','r') as f_in: 
    data = [line.strip().split(',') for line in f_in.readlines()]

""" Create the dictionary """
natParkDict1 = {line[0]:line[1] for line in data[1:]}
natParkDict1

In [None]:
''' Read the data and create the dictionary '''
with open('files/npNameLoc.txt','r') as f_in: 
    natParkDict1 = {line[0]:line[1] for line in [line.strip().split(',') for line in f_in.readlines()][1:]}
natParkDict1

## Set Comprehension  <a name='SetComp' />
Back to <a href = '#navigation'>Navigation</a> 

A <code>set</code> is a Python data type that contains unique items, that is, no element can be repeated.  These, too, can be created with a comprehension statement.  This data type can be viewed as a set of dictionary keys without values.

In [None]:
my_set = set([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5])
my_set

Elements can be added to a set using the <code>.add()</code> set method in a for loop as shown below.  If the element that is added already exists in the set, then the set remains unchanged.

In [None]:
mylist = [0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5]
my_set1 = set()
for ele in mylist:
    my_set1.add(ele)
my_set1

Comprehension can also be used to create sets.

In [None]:
my_set2 = {ele for ele in mylist}
my_set2

# An Aside: <code>for</code> versus <code>while</code> loops

Thus far, we have used only <code>for</code> loops, but another type of loops are <code>while</code> loops.  <code>for</code> loops have made sense thus far because we can determine how many times we need to iterate through a list, for example, by using the <code>len()</code> function.  <code>while</code> loops make sense when we can't determine how many times we will need to execute the loop.

<code>while</code> loops continue iterating until the condition specified with them is <code>False</code>.

Write a <code>while</code> loop to append powers of 2, $2^n$ for $n = 0,1, ...$ into a list for all powers of $n$ for $2^n < 2000$.

In [None]:
n = 0
result = []
while 2**n < 2000:
    result.append(2**n)
    n = n + 1
print(result)

While we might like to sometimes use <code>while</code> loops in list comprehension statements, only <code>for</code> loops can be used in list comprehension.

# <code>for</code> Loops with <code>enumerate()</code>  <a name='enumerate' />
Back to <a href = '#navigation'>Navigation</a>

... and <code>enumerate()</code> outside of <code>for</code> loops

As we already know, two ways to iterate through data with <code>for</code> loops are:
- By index
- Directly through the elements

While the latter approach (below) is thought be many to be more readable, you must use the first approach if you want to change the element values.

In [None]:
values = [0, 1, 2, 4, 5, 7, 8, 9]

for i in range(len(values)):
    print(i)

In [None]:
values = [0, 1, 2, 4, 5, 7, 8, 9]

for ele in values:
    print(ele)

In some circumstances you might want to use the latter approach for its readability, while you need the index in order to change the values of the elements in the list.  The <code>enumerate()</code> functions resolve the need to use a single approach, as it affords you the opportunity to use both simultaneously.

Three example circumstances where you need <code>enumerate()</code> are illustrated below:
- You want to keep track of the iteration number for debugging purposes
- You have multiple lists that are the same length, with elements corresponding to one another, and you need to revise the values in at least one of the lists
- You want to create a list of tuples from an original data structure whose first element is an index

Caution: if you are in the second situation, you may want to rethink the data type you are using.  Perhaps both data fields should be in the same structure.

A simple example of what <code>enumerate()</code> does.

In [None]:
import random
values = [random.random() for _ in range(5)]

for i,v in enumerate(values):
    print(i,v)

In [None]:
import random
values = [random.random() for _ in range(50000)]

Monitor analysis process for big data...

In [None]:
for i,v in enumerate(values):
    # Do something useful with values v here
    if i%10000 == 0 and i > 0:
        print(f'{i} elements completed')

Find rows with defective data...

In [None]:
with open('files\defects.csv', 'r') as f:
    data = f.readlines()
    
results = []
empty_rows = []
for i,v in enumerate(data):
    if v.strip() != '':
        results.append(int(v.strip()))
    else:
        empty_rows.append(i)
print(results)
print(empty_rows)

In this example, let's assume that you are reading a large data file, which is mimicked by the list below, and converting string data to integers.  You are getting an error because some of the data cannot be converted to integers.  You can use <code>enumerate()</code> to let you know which line(s) of data cannot be transformed.

In [None]:
data = ['0', '1', '2', 'actual', '4', 'planned', '7', '9']

my_int = []
junk = []
for i,d in enumerate(data):
    try:
        my_int.append(int(d))
    except:
        junk.append((i,d))
        
print(f'junk has {len(junk)} elements: {junk}')

Index tuples in case you want to resort them back into the original order.

In [None]:
tups = [(i, v) for i,v in enumerate(values)]
tups[:5]

... or, more simply

In [None]:
tups1 = list(enumerate(values))
tups1

which shows that enumerate can be used outside of a for loop and might sometimes be useful in such a context.

Which days had high temperatures?

In [None]:
temps = [random.randint(0,110) for i in range(100)]

In [None]:
temp_thres = 90
hot_days = []
for d,t in enumerate(temps):
    if t >= temp_thres:
        print(f'Day {d} had a temperature of {t}.')
        hot_days.append(d)
print(hot_days)

# The <code>zip()</code>, <code>map()</code>, and <code>filter()</code>  Functions ... and the <code>*</code> Operator  <a name='zip_map_filter' />
Back to <a href = '#navigation'>Navigation</a>

These functions are very useful, although accessing the results can be a bit unintuitive until one realizes they are iterators.  We have described iterators and so you have the foundation to understand how to view and use the results of these functions.  By returning results in the form of iterators, these functions perform faster while using less memory.  

## The <code>*</code> Operator <a name='asterisk_op' />  
Back to <a href = '#navigation'>Navigation</a>

We, most liekly, think first of <code>*</code> as denoting a multiplication operation.  But, it has another meaning, which is to unpack Python data types that are collections of elements.  We introduce the notion of unpacking in this section in the context of well known Python data types, including lists, tuples, and dictionaries.  The <code>*</code> operator becomes even more important when we work with generators and iterators, but it is easiest to introduce it in the context of basic Python types.

Take these lists and tuples, for example,
<code>
    ls_1 = [0, 1, 2]
    ls_2 = [[0, 1], [2, 3], [4, 5]]
    ls_3 = [(0, 1), (2, 3), (4, 5)]
    t_1 = (0, 1, 2)
    t_2 = ([0, 1], [2, 3], [4, 5])
    t_3 = ((0, 1), (2, 3), (4, 5))
</code>

You can think of the <code>*</code> placed in fornt of one of these variable names as removing the outer container type.  That is, in the respective order above, we would have three uncontained integers, three uncontained sub-lists, and three uncontained tuples in that order or, in effect,

<code>
    *ls_1 = 0, 1, 2
    *ls_2 = [0, 1], [2, 3], [4, 5]
    *ls_3 = (0, 1), (2, 3), (4, 5)
    *t_1 = 0, 1, 2
    *t_2 = [0, 1], [2, 3], [4, 5]
    *t_3 = (0, 1), (2, 3), (4, 5)
</code>

The statements above are not valid Python statements but, rather, demonstrate, in effect, what <code>*</code> does. To further demonstrate, we will try to evaluate one of these starred expressions.

In [None]:
ls_1 = [0, 1, 2]
ls_2 = [[0, 1], [2, 3], [4, 5]]
ls_3 = [(0, 1), (2, 3), (4, 5)]
t_1 = (0, 1, 2)
t_2 = ([0, 1], [2, 3], [4, 5])
t_3 = ((0, 1), (2, 3), (4, 5))

In [None]:
*ls_1

So, why is this a useful operation?  One use is to repackage the inner elements into a different outer data type, such as in the following cells.

In [None]:
(*ls_1,)

In [None]:
[*t_1]

Well, that is not so impressive or useful: this can be accomplished with other obvious methods.  In the same vein, however, a starred expression can be used to conveniently specify part of a lerger list or tuple that is being constructed.

In [None]:
[9, 8, 7, 6, *ls_1]

You can also use <code>*</code> with dictionaries.  Using just the variable name references the dictionary keys, as is the default in a <code>for</code> loop, or both keys and values can be referenced with <code>.items()</code>.

In [None]:
dct_nums = {0:'zero', 1:'one', 2:'two'}

In [None]:
[*dct_nums]

In [None]:
[*dct_nums.items()]

## <code>zip()</code> Function <a name='zip' />  
Back to <a href = '#navigation'>Navigation</a>

The <code>zip()</code> function takes as input two or more lists, tuples, or other iterable data types and outputs a number of tuples equal to the length of the inputs.

The 0th elment of the output tuples is the combination of the 0th element from each of the inputs, the 1st element of the output is the combination of the 1st elements from the inputs, and so forth.

Statement|Result|
:---: | :---: |
<code>zip((0,1),(2,3),(4,5))</code>|<code>((0, 2, 4),(1, 3, 5))</code>|

If the input tuples represented rows of a matrix, then <code>zip()</code> produces what one can consider the rows of the transpose matrix.

Notice that printing the output does not reveal the data, but merely indicates that the result is of the <code>zip</code> data type.  This data type is an example of an <code>iterator</code>, as we have previously discussed.  Its values can be retrieved in a number of ways as demonstrated below.

In [None]:
zip((0,1),(2,3),(4,5))

Retrieval with a <code>for</code> loop, list comprehension, and the asterisk, <code>*</code> operator, respectively, and a couple other methods.

In [None]:
for x in zip((0,1),(2,3),(4,5)):
    print(x)

In [None]:
[x for x in zip((0,1),(2,3),(4,5))]

The <code>*</code> (asterisk) operator affords the same functionalty as does list comprehension in terms of viewing the result: it unpacks the <code>zip</code> result, which is a tuple of tuples in this case, so the sub-tuples are repacked into a list.

In [None]:
result = zip((0,1),(2,3),(4,5))
[*result]

In [None]:
result = zip((0,1),(2,3),(4,5))
list(result)

In [None]:
result = zip((0,1),(2,3),(4,5))
tuple(result)

An example where zip is useful is when data for a <code>matplotlib</code> line plot is supplied in a list of lists, with each sublist representing the <code>x</code>-<code>y</code> coordinates of a point.  The data must be transformed into a series of <code>x</code> data and a series of <code>y</code> data in order to be plotted.

The <code>zip()</code> function reorganizes those data accordingly and the <code>*</code> operator 'unpacks' the two data series so they can be plotted in the code below.

In [None]:
import matplotlib.pyplot as plt

In [None]:
data = [[0,0], [1,1], [2,2], [3,0], [4,1], [5,2], [6,0], [7,1], [8,2], [9,0]]

In [None]:
data_xy = zip(*data)
for series in data_xy:
    print(series)

In [None]:
x, y = zip(*data)
plt.plot(x, y)
plt.show()

In a similar way, the <code>zip()</code> function is often used to prepare sets of parameters to be fed into a function in a multiprocessing context, as shown in a simplified example below to compute the distance between a list of origins and another list of corresponding destinations.

The code also demonstrate the formatting of output with an <code>f</code> string.
  - Curly braces <code>{}</code> indicate where to insert variable values in the string
  - Floating-point values can be formatted as indicated with the expression after the colon

In [None]:
def dist(p, q):
    import math
    return math.sqrt(sum([abs(p[i] - q[i])**2 for i in range(len(p))]))

In [None]:
origin = [[0,0], [1,5], [4,2], [8,1]]
dest = [[9,5], [1,7], [0,5], [6,2]]

arg = zip(origin, dest)

for a in arg:
    print(f'The distance between {a[0]} and {a[1]} is {dist(*a):.2f}')

Note that if the two data structures being zipped do not ahve the same number of elements, then the number of elements being zipped equals the number of elements in the smaller variable.

In [None]:
origin = [[0,0], [1,5], [4,2], [8,1]]
dest = [[9,5], [1,7], [0,5]]

arg = zip(origin, dest)

for a in arg:
    print(f'The distance between {a[0]} and {a[1]} is {dist(*a):.2f}')

## <code>map()</code> Function <a name='map' />  
Back to <a href = '#navigation'>Navigation</a>

The <code>map()</code> function takes as its first argument a function and an iterable data type as its second element (e.g., list, tuple).  It applies the function sequentially to each of the elements of the iterable data.

The result is a <code>map</code> data type, which is an _iterator_, as was the case with the <code>zip()</code> function.

The applied functions are often <code>lambda</code> functions although custom function and built-in functions can be applied also.

In [None]:
result = map(lambda x:x**2, range(4))
type(result)

The contents of iterators cannot be printed directly.

In [None]:
result = map(lambda x:x**2, range(4))
result

In [None]:
result = map(lambda x:x**2, range(4))
print(result)

But, their contents can be iterated through in a for loop and they can be transformed into lists with multiple methods.

In [None]:
result = map(lambda x:x**2, range(4))
for r in result:
    print(r)

In [None]:
result = map(lambda x:x**2, range(4))
[*result]

In [None]:
result = map(lambda x:x**2, range(4))
list(result)

In [None]:
result = map(lambda x:x**2, range(4))
[r for r in result]

In [None]:
data = ['1', '3', '5', '6', '8']
data = map(int, data)
data = [*data]
print(data)
print(type(data[0]))

Other <code>map()</code> examples with...

- Custom functions
- <code>sum</code>
- <code>len()</code> function
- <code>min()</code> and <code>max()</code> functions

In [None]:
def sq(x):
    return x**2

[*map(sq, range(5))]

In [None]:
data = [[0,1,2,3], [4,5,6,7], [8,9,10,11]]
result = map(sum, data)
for r in result:
    print(r)

In [None]:
import random

my_list = [[random.random() for _ in range(random.randint(1,6))] for _ in range(5)]

print(my_list)
[*map(len, my_list)]

In [None]:
[*map(min, my_list)]

In [None]:
[*map(max, my_list)]

## <code>filter()</code> Function  <a name='filter' />  
Back to <a href = '#navigation'>Navigation</a>

The filter function takes a function as its first argument and a iterable data type as its second argument.  The function is applied to each element of the iterable argument and its output is interpreted as <code>True</code> or <code>False</code>, and it returns only those elements of the iterable argument where the function result is <code>True</code>.

Similar to <code>zip()</code> and <code>map()</code>, <code>filter()</code> returns an iterator whose values must be unpacked in order to view them.  The iterator can be used directly in a <code>for</code> loop or list comprehension.

In [None]:
filter(lambda x: x%2 == 0, range(10))

In [None]:
result = filter(lambda x: x%2 == 0, range(10))
[*result]

In [None]:
result_explain = map(lambda x: x%2 == 0, range(10))
[*result_explain]

In [None]:
import random

def big(x):
    return x >= 0.5

data = [random.random() for _ in range(20)]
result = filter(big, data)
[*result]

## Applications of <code>zip()</code>,  <code>map()</code>,  and <code>filter()</code> Functions  <a name='apps_zip_map_filter' />  
Back to <a href = '#navigation'>Navigation</a>

We will use these functions throughout this notebook where they are useful.

For now, here are some exercises for absorbing how these functions work.

### <code>zip()</code> Exercise

Suppose you wanted to plot the data below with <code>matplotlib</code>.  Apply the <code>zip()</code> function to reorganize the data into two series, one for the <code>x</code>-axis and one for the <code>y</code>-axis.  you may need to use the <code>*</code> operator as well.

In [None]:
data = [[0,0], [1,2], [2,4], [3,6]]


Using the <code>*</code> operator and <code>zip()</code> for reorganizing data series.

Function repreated for convenience.

In [None]:
import matplotlib.pyplot as plt
    
def ssc_2d_map(convert, data):
    for i in range(len(data)):
        data[i] = data[i].strip().split(',')
        for j, mk_type in convert.items():
            data[i][j] = mk_type(data[i][j])
    return data

In [None]:
with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()

for i in range(len(nw1999)):
    nw1999[i] = nw1999[i].strip()
    nw1999[i] = nw1999[i].split(',')
    nw1999[i][0] = int(nw1999[i][0])
    nw1999[i][1] = float(nw1999[i][1])

''' Create x and y series data '''
x, y = zip(*nw1999)
plt.plot(x, y, label = '1999')
plt.show()

In [None]:
with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
convert_map = {0:int, 1:float}
nw1999 = ssc_2d_map(convert_map, nw1999)

''' Create x and y series data '''
x, y1999 = zip(*nw1999)
    
plt.plot(x, y1999, label = '1999')
plt.show()

In [None]:
with open('files/NorfolkWeather1999.csv','r') as f:
    nw1999 = f.readlines()
convert_map = {0:int, 1:float}
nw1999 = ssc_2d_map(convert_map, nw1999)

plt.plot(*zip(*nw1999), label = '1999')
plt.show()

### <code>map()</code> Exercise

Write a function to return the strings <code>odd</code> or <code>even</code> depending on the value of an integer passed to it.  Then, use the <code>map()</code> function to apply that function to the list names <code>values</code>.

In [None]:
values = [0, 2, 4, 6, 8, 1, 3, 5, 7, 9]



### <code>filter()</code> Exercise

Use the <code>filter()</code> function to filter out the elements from the list <code>values</code> that are less than 5.

In [None]:
values = [0, 2, 4, 6, 8, 1, 3, 5, 7, 9]

Transpose the list of lists <code>lol</code> using the <code>zip</code> function.

In [None]:
import random
lol= [[random.randint(0,9) for _ in range(3)]  for  _ in range(5)]



# Passing Arguments to Functions by Reference or Value<a name = 'RefVal' />
Back to <a href = '#navigation'>Navigation</a>

One of Python's strengths is its simplicity, which makes Python easier for a beginner coder to grasp.  A downside of this strength is that Python hides some of the complexity inherent in computer operations, which limits a more advanced programmer the capability of writing better code (think speed and memory usage).

One mechanism that Python hides is the manner that arguments are passed to functions.  In some cases, copies of the variable values are sent to a function (passing by value) while, in other circumstances the memory address of the variable is sent to the function (passing by reference or pointer).  Some programming languages (e.g., C++) require the programmer to specify how an argument is sent.  In Python, that decision is set by default depending on the variable type: some types are passed by value and others by reference.

## Passing by Value

When a variable is passed by value, just the value is sent to the function and it is assigned to the variable specified in the function.  For example, the code below sends the value of the variable <code>x</code> to the function and assigns it to the variable <code>y</code> within the function.

In [None]:
def f(y):
    y = y**2
    return(y)

x = 3
print(x, f(x))

In the example above, as illustrated in the <code>print()</code> statement the variable <code>y</code> within the function is independent of the variable <code>x</code>, that is, the value of the variable <code>x</code> is unchanged by the function.

When variables of more complex data types are sent to functions, a pointer to their memory address is passed and, as a consequence, if the function manipulates the passed argument/variable, then its original value is changed as well. This is demonstrated below for a variale of the <code>list</code> type that is passed to a function.

In [None]:
def g(y):
    y[0] = y[0]**2
    return y

x = [3,4]
print(x, g(x))

An advantage of passing a variable by reference is that the variable need not be passed back to the "calling" code.  The code can be faster because the entirety of the data represented by the argument need not be passed to the function and then back again to the calling program.

In [None]:
def h(y):
    y[0] = y[0]**2

x = [3,4]
h(x)
print(x)

In [None]:
def j(y):
    y['a'] = 99
    
x = {'a':0, 'b':1}
j(x)
print(x)

In [None]:
def k(y):
    y.add(2)
    
x = {0, 1}
k(x)
print(x)

In [None]:
import numpy as np

def m(y):
    y[0] = 99
    
x = np.array([0,1])
m(x)
print(x)

Try this with dictionaries passed to a function.

In [None]:
def add2dict(d):
    for k,v in d.items():
        d[k] = v+1
        
my_dct = {'zero':0, 'one':1, 'two':2}
add2dict(my_dct)
print(my_dct)

In [None]:
import pandas as pd

def n(y):
    y.iloc[0] = 99
    
x = pd.Series([0,1])
#print(x)
n(x)
print(x)

# Aliasing <a name = 'aliasing' />
Back to <a href = '#navigation'>Navigation</a>

Aliasing the the act of referencing the same memory location/variable using multiple names.  It happens most often unintentionally and can cause a frustrating debugging adventure.

with <font face='courier'>int</font> and <font face='courier'>float</font> data types there is no chance of aliasing.  Everything works intuitively.  When we change the value of <font face='courier'>p</font> we change only its value and not the value of <font face='courier'>q</font>.  The converse is true when we change the value of <font face='courier'>q</font>.

In [None]:
p = 1
q = p
p = 2
print('p =',p,'     q =',q)

When we assign a list to another list, however, both list names point to the same data, which is to say that they both pint to the same place in memory.

In [None]:
x = [1,2]
y = x
y[0] = 99
print('x =',x,'     y =',y)

We often need to create a list, or list of lists, to receive our computations, such as when we are computing the tranpose of a lsit of lists.  Using the apprach above would create aliasing and we would either end up with a matrix with the wrong values, or we would corrupt the original matrix, or both.

One safe way to create a list of lists of the same dimension as an existing list of lists, use the approach in the cell below.

In [None]:
X = [[0,1,2],[3,4,5],[6,7,8]]  # original list of lists
Y = [[0 for i in range(len(X[0]))] for j in range(len(X))] # list of lists to receive the transpose
print('Y  =',Y,'\n')
Y[0][0] = 99
print('Y =',Y,'\n')
print('X =',X,'\n')

You may also import the <font face='courier'>copy</font> package and use the <font face='courier'>copy.deepcopy()</font> method (see [<font face='courier'>list.copy() function</font>](https://docs.python.org/3.6/library/copy.html)) but I prefer the approach above.

I have had issues with aliasing with the methods below, so <font color='red'><b>DO NOT USE THE METHODS BELOW</b></font>.

In [None]:
X = [[0,1,2],[3,4,5],[6,7,8]]  # original list of lists
Y = [[0] * len(X[0])]  * len(X)  # list of lists to receive the transpose
print('Y  =',Y,'\n')
Y[0][0] = 99
print('Y =',Y,'\n')
print('X =',X,'\n')

In [None]:
X = [[0,1,2],[3,4,5],[6,7,8]]  # original list of lists
Y = X.copy()  # list of lists to receive the transpose
print('Y  =',Y,'\n')
Y[0][0] = 99
print('Y =',Y,'\n')
print('X =',X,'\n')

These approaches successfully create copies of list while avoiding aliasing for 1-dimensional matrices, but they create aliasing with multidimensional matrices.

In [None]:
X = [[0,1,2],[3,4,5],[6,7,8]]  # original list of lists
Y = X[:]  # list of lists to receive the transpose
print('Y  =',Y,'\n')
Y[0][0] = 99
print('Y =',Y,'\n')
print('X =',X,'\n')

In [None]:
X = [[0,1,2],[3,4,5],[6,7,8]]  # original list of lists
Y = list(X) # list of lists to receive the transpose
print('Y  =',Y,'\n')
Y[0][0] = 99
print('Y =',Y,'\n')
print('X =',X,'\n')