# Exercise 1: List comprehensions

Author: Laura Gutierrez Funderburk

Created on: April 18 2018

Last modified on: April 20 2018

### Abstract

In this notebook, we will use  list comprehensions to store and parse data. We will build from the previous exercise. 

### From for loops to list comprehension

Suppose we are given an array. We would like to apply a function function_on_array that acts on each element of the array. Using basic for loop syntax we would do something along the lines of

array = [...]

for i in range(a):

    function_on_array(array[i])
    
print(array)

When handling hundreds of entries this becomes time consuming. We can accomplish the same result in a more efficient way as follows:

array =  [function_on_array(array[i]) for i in range(a)]


Let us take a couple of simple examples. We will then move on to our file. 

In [None]:
# Simple example
# Suppose we want to create an array with numbers from 1 to 10
number_arr = []
for i in range(10):
    number_arr.append(i+1)    
print("For loop approach")
print(number_arr)

print("\n")

# We can accomplish the same result using comprehension lists
number_array = [i+1 for i in range(10)]
print("Comprehension list approach")
print(number_array)

### Comprehension lists with an if statement

Now that we have mastered the basic syntax of comprehension lists, let us dial things up one more level. This time we will add an if statement. 

Let us take our celestials array from last exercise and suppose, as before, we are interested in the elements which are planets. We can then skip the cumbersome for loop as follows.

In [None]:
# Take an existing array
celestials = ['Moon','Sun','Neptune','Mars','Jupiter','Venus']

# Time for comprehension lists
planets = [item for item in celestials if item != "Moon" and item != "Sun" ]

# Print
print("Our celestials array is " + str(celestials))
print("\n")
print("The planets in our array are " + str(planets))

### Comprehension Lists and File Handling

Now that we are more used to using comprehension lists, we will use them to our advantage when handling multiple files. 

In the github page, access the /DATA directory. In there you will find a number of files.

Suppose we are given access to the file data.csv, and we are requested to extract for each row a pair of the form 'MZ22503562', 'MZ22507874'. 

We can use comprehension lists to read and store file content into a table for later parsing. 

In [None]:
# Import glob library to deal with files
import glob

%run -i functions.py

In [None]:
# Store directory location into variable
data_directory = "./DATA/"

# access all file names that end with a csv
data_files = glob.glob(data_directory + "*.csv")
print(data_files)
# in our particular case, we are interested in the first entry in this array
data = data_files[0]

In [None]:
# Open the file and save all entries within data.csv into a table
# Notice we are doing the same thing as in exercise 0, but using comprehension lists instead
with open(data,'r') as f:
    lines = [line for line in f]
f.close()

print("We print the first 5 entries in our array lines.\n")
print(lines[0:5])

Say we are now interested in parsing the information. We notice each row is a string whose entries are separated by commas. We can use the split() method to separate the information into pieces. 

In [None]:
# Apply split() method on first row to disect it into little pieces
r_one = lines[1].split(",")
print(r_one)

If we are interested in only the first and fourth element in our array, we would then go ahead and compute

In [None]:
print(r_one[0],r_one[3])

### Your turn

Complete the following cells to extend what we have been working on to all the rows in our file data.csv

In [None]:
# Number of elements in the lines array. Apply the len() function on the lines array.
size_of_lines = len(_ _ _)


# Using a comprehension list, apply the split() method on each entry of the lines array
# How many times should we iterate?

tabulate_lines = [_ _ _.split(",") for i in range(_ _ _)]

# Now that you obtained an array whose entries are arrays with information separated by commas,
# your job is to extract from each row the firs and the fourth element from tabulate_lines.
# Complete the comprehension list below and indicate an appropriate range

pair  = [[_ _ _, tabulate_lines[i][3]] for i _ _ _ ]

# Print the pair array
print(_ _ _)

# Apply the function remove_repetitions() on the pair array to ensure we do not have repetitions
unique_pairs = remove_repetitions(pair)

# Print the unique_pairs array
print(unique_pairs)

### Review 

In this exercise, we made extensive use of comprehension lists to read, store and disect information within a data file. 

In [None]:
# Read file and store contents into lines array. Comprehension list version
with open(data,'r') as f:
    lines = [line for line in f]
f.close()

# Number of elements in the lines array
size_of_lines = len(lines)

# Use comprehension list to split array elements into subarrays
tabulate_lines = [lines[i].split(",") for i in range(size_of_lines)]

# Use comprehension lists again to extract the first and fourth pieces of information from each row
pair = [[tabulate_lines[i][0],tabulate_lines[i][3]] for i in range(1,size_of_lines)]

# Apply remove_repetitions on the array pair to remove any repeated entries
unique_pairs = remove_repetitions(pair)
print(unique_pairs)