# Computational Module 2: Intermediate Python

Please complete this notebook by filling in the cells provided.

For all problems that you must write our explanations and sentences for, you must provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use <code>max_temperature</code> in your answer to one question, do not reassign it later on.

Directly sharing answers with your fellow ULAB colleagues is not okay, but discussing problems with your mentors or them is encouraged. You should start early so that you have time to get help if you're stuck! Drop-in office hours will staffed by ULAB computational scientists will be periodically held; please keep an eye on ULAB Slack for more information.

## Lesson Plan

In this lesson, you will...
- be introduced to Python functions and how to create them, and learn about specific variations of functions like lambdas and nested functions
- learn some more specifics regarding common data types you might encounter in your project
- get some practice with Python shortcuts / tips / tricks, including list comprehensions

In [None]:
#[[IMPORTANT]change to your name and run this cell! Otherwise grading won't work!
%env grade_name='Name'

In [14]:
# MAKE SURE TO RUN THIS CELL (It imports the autograder file)
!pip install slacker
import csModule2

## Functions

A lot of the time when you're writing long chunks of code, you'll notice that some of the lines seem a bit redundant. Maybe you're changing one number or one string or some particular input at a time, but at the same time, you're not able to use a loop to reduce the number of lines you write because the offending "repetitive" lines are far apart from each other. For example, you have a dataset in which all of the columns are in string form, but as you begin analysis on one column, you would like to be able to convert all of the values in it to a different type.

In this case, if you are dealing with a <i>Pandas dataframe</i> -- Pandas is a Python library for data science and manipulation, and a dataframe is the standard structure of table storage within Pandas -- a method that can be used to convert column data types follows the general skeleton <code>[column].astype([type])</code> in that if you would like to convert a column assigned to the variable <code>ex_column</code> to <code>int</code> values, you could reassign <code>ex_column</code> to <code>ex_column.astype('int64')</code>.

The <code>astype</code> name here represents a function. Just as variables and their data types are the nouns of programming, so functions are the verbs, actions in series of lines that can be performed upon <i>objects</i> (variables, numbers, strings, even other functions, etc.) called <i>parameters</i>. In the previous Module, we went over the basics of what a function is, and explored some built-in functions, as well as the difference between <i>pure</i> and <i>non-pure</i>, a distinguishment based on whether or not the function returns a value. 

However, functions like <code>astype</code> and <code>print</code> and <code>abs</code> are all built-in functions, and built-in functions have a limited range of capabilities. What if you need to write your own custom function to handle something specific to your needs that a simple import of a library or built-in functions cannot perform on their own?

You can define a function in Python using the following basic template:

<code>
def [function_name](... parameters ...):
    [lines of code that may include manipulations using parameter names]
    optional: return a value ex. return x
</code>

Let's walk through what's going on in the block of code above. The Python keyword <code>def</code> signifies the start of a function definition, similar to defining a variable, with <code>[name] = </code> being an analogy to <code>def [function_name]</code>. After the function name, the parameters of the function are listed separated by commas within parentheses. The names of the parameters can be anything, but they should be descriptive placeholders for the values that the programmer expects there to be when the function is called.

In the body of the function, which should be indented consistently from the encapsulating definition line, you can include code or calculations that might concern one or more of the parameters passed through. For example, if a parameter to a function <code>square</code> (which will give back the square of <code>n</code>) is <code>n</code>, a line inside the function to calculate the square could be <code>square = n ** 2</code>. Notice that the calculations and lines are all written in terms of the parameters. However, this means that the parameter variable names only have local scope i.e. you can't write <code>print(n)</code> outside the function in which it is defined because the computer won't know what it is. The name <code>n</code> is defined inside <code>square</code> and discarded as soon as the function finishes running.

Here's an example of a fully defined function:

<code>
def n_sum(n):
    total = 0
    for i in range(1, n + 1):
        total += i
    print("Now we're done!")
    return total
</code>

What do you think this function does? What will it print for different input values to <code>n</code>?

Now, you know how to define a function. But that's not very useful unless you also know how to call it! In fact, calling your own function works in exactly the same way as calling a built-in function or an imported function. If you define the function in the current environment, you can simply call it on some input <i>arguments</i> in the following way: <code>[function_name](... arguments ...)</code>. You will lose that function as soon as you exit the created Python environment (which is why whenever you reopen a Jupyter notebook, the kernel restarts and you have to rerun all of the cells). The cool thing about Jupyter notebook is that you also have the code for your functions saved in cells, which makes rerunning more convenient than copy-pasting within the terminal. However, if your function is in a different file and you want to import it into your Python environment, you can simply run the line <code>import [file_name]</code> and reference the function as <code>[file_name].[function_name](... arguments ...)</code> from then on or (a simpler and recommended method), you can import the function you need specifically from the file: <code>from [file_name] import [function_name]</code> and from then on, the function can be called in the format <code>[function_name](... arguments ...)</code> which is often much more convenient.

<b>Note</b>: Also, this little tidbit isn't just relevant to this problem, but it might be useful in terms of shorthand in future ones. When you want to increment a variable, there are two main ways you can go about doing this in Python. Assuming the variable <code>x</code> is something you want to increase by 5, one command is <code>x = x + 5</code> and the other which means the same thing but compresses the number of characters you use is <code>x += 5</code>. Replace 5 with any incrementing number and <code>x</code> with any variable and the template still holds! You can also change the operation to $-$, $*$ or even $/$ if you want to divide in place, for example, and you can replace the positive number with a negative one, or even a float.

#### Checkpoint 1

Write a function <code>find_max_sum</code> that finds the maximum sum of 2 consecutive numbers in a list that is passed as an argument. Assume that the list will contain only positive integers. For example, in the list <code>[3, 4, 2, 5, 4, 6, 1, 1, 2]</code>, your function should return <code>10</code> because the maximum 2-term consecutive sum is $4 + 6 = 10$.

Note: If you've heard of this problem or variants thereof before, don't worry about making your code maximally efficient! You just have to get the correct answer given different inputs (think about edge cases that can cause your code to crash).

In [21]:
def find_max_sum(lst):
    ... # insert your code here

In [None]:
# autograder cell: do not alter
csModule2.checkpoint1(find_max_sum)

### Lambdas

Let's delve a little deeper into the land of functions. As we learned above, when functions are defined, a they must be assigned to a name. In that way, a function is accessible in the future -- a name is how we reference that some function <code>f</code> in particular should be called.

<i>Lambda</i> expressions in Python are a less restrictive / formal way of defining functions. They are essentially shortcuts to defining small, anonymous functions for what are usually one-time calculations. Take a look! Run the code cell below.

In [None]:
# a lambda function!
multiplier = lambda x, y: x * y
multiplier(4, 5)

You might have guessed what would happen here simply from the explicit variable names, but let's walk through the syntax of defining lambdas and exactly what's going on above. 

In the first line, the name <code>multiplier</code> is assigned to a lambda expression, so we can use this one in future code. Zooming in on the expression, we know a lambda is an anonymous function (you can tell from the fact that it has no name), and this lambda takes 2 parameters <code>x</code> and <code>y</code>, returning their product. Hence, when passing in <code>x = 4</code> and <code>y = 5</code>, we get back $4 * 5 = 20$. 

However, you may be thinking: why are lambdas useful? Can't you just get the product of 2 numbers by doing this:

<code>
print(4 * 5)
</code>

or by doing this:

<code>
def multiplier(x, y):
    return x * y
</code>

? What's the point?

The point is that lambdas are extremely efficient. You can execute powerful one-line statements that without lambdas might take multiple. Try running the code cell below.

In [None]:
# a complicated lambda!
sorted(range(-5, 6), key = lambda x: x ** 2)

The function <code>sorted</code> is a built-in method that can be executed on Python sequences, including lists and range objects -- <code>range(-5, 6)</code> is a list-like object that includes -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 sequentially. The <code>key</code> parameter while sorting is what helps order the elements of the range, which is why some negative numbers are shown after 0 (they are ordered by their squares, which are larger positive numbers). The ties are broken by placing the positive element after the corresponding negative one.

But can you see how lambdas can construct objects quickly based on specified conditions? They're confusing at first, but in certain coding situations can be important and time-saving to understand.

The general structure of a lambda resembles the following (and they can be assigned to variables for future reference):

lambda [... parameters ...]: [returned expression]

Remember that just as in functions, the parameters are comma-separated!

Lambdas probably won't be extremely useful while you're working on the data analysis for your project (try to use functions as often as possible, they make your code more readable), but they're an important feature of Python and many other programming languages, an interesting concept in CS in general.

<b>Note:</b> Lambdas are single expression functions. After the colon, you may only place what the lambda function will <i>return</i> (might be a <code>None</code> value if you don't want to return anything). However, you may not perform intermediate statements past the colon, including explicit return statements.

#### Checkpoint 2

Assign a lambda function to the variable <code>count_match</code> which returns the number of times an element occurs in an array.

<i>Hint</i>: The function <code>count</code> will perform this action like so: <code>[1, 2, 3, 8, 12, 1, 1].count(1)</code> will give <code>3</code>.

In [None]:
count_match = lambda lst, elem: ... # insert your code here
count_match([1, 2, 3, 8, 12, 1, 1], 1)

In [None]:
# autograder cell: do not alter
csModule2.checkpoint2(count_match)

### Nested Functions

A <i>nested function</i> is a function that is defined inside of another function. Often, this is referred to as a type of helper function, performing intermediate calculations before the final result is returned. There are 3 main reasons a programmer may use nested functions in their code:

1) <i>Encapsulation</i>. This is an important idea in CS in which individual functions or classes are defined specifically do their own jobs and call each other instead of having one function do the work of multiple. This is a superior way of designing code that increases readability. Encapsulation relates to nested functions specifically because you can have inner functions inside a larger one to protect them from being in global scope, where they might not be necessary and names might conflict or be overwritten. For example:

<code>
def outer(x, y):
    def inner_mult(x, y):
        return x * y
    return inner_mult(x, y) + inner_mult(x, y)
    
def inner_mult(x, y):
    return x * y
</code>

Just because another function <code>inner_mult</code> is defined outside the function <code>outer</code> doesn't mean both functions with the same name overwrite each other. The <i>scope</i> of inner_mult is global because it isn't inside another function. Global scope means that you can access the function from any part of your program. However, the scope of the <code>inner_mult</code> is limited to anywhere inside the <code>outer</code> function, which keeps it from being overwritten by the global <code>inner_mult</code>. When <code>outer(4, 5)</code> is called, <code>inner_mult(4, 5) + inner_mult(4, 5)</code> will be returned where <code>inner_mult</code> is the one nested in <code>outer_mult</code>. This is exactly what we expect!

2) <i>Abstraction</i>. Say you want to make your function robust and throw errors when an unexpected input is passed through. You can nest a function inside the main function that handles and validates inputs before any operations or calculations are performed. In most cases, you could un-nest this function and widen its scope to global so that you can extend its capabilities to multiple functions, but if a function has special inputs that make it different from the others in your file, nesting might be a good idea. You might have to deal with nesting validation functions when you're cleaning up messy, unformatted data before you begin to apply your astrophysics knowledge!

3) <i>Closures</i>. A closure is a situation in which the inner function "remembers" the values defined / set in its outer scope when it's called. This might be confusing to conceptualize, so take a look at example below:

In [None]:
def exponentiate(base):
    def inner(exponent):
        return base ** exponent
    return inner
    
exp_4 = exponentiate(4) # line 1
exp_7 = exponentiate(7) # line 2

print(exp_4(3))

print(exp_7(2))

In the above example, <code>exponentiate</code> is called a factory function because each time it's called, it creates / returns a new function -- a variation of <code>inner</code> in which the base is set to the parameter of <code>exponentiate</code>. 

Check out the labeled "line 1"! Here, we called <code>exponentiate</code> with argument 4 which set the parameter <code>base</code> to 4. The returned value (now <code>exp_4</code>) points to the function <code>inner</code> where <code>base</code> is 4. Similarly, <code>exp_7</code> points to the function <code>inner</code> where <code>base</code> is 7. Hence, when the <code>exp_4</code> or <code>exp_7</code> functions are called, their respective <code>inner</code> functions evaluate <code>base ** exponent</code> with the corresponding known base in mind. This explains why $4^3 = 64$ and $7^2 = 49$.

#### Checkpoint 3

Implement a nested function as a closure inside the function <code>counter</code> where every time you call <code>start</code> with a starting parameter, you return a <i>function</i> that when called with an <code>end</code> parameter will return the product <code>start \* (start + 1) \* ... \* (end - 1) \* end</code>. For example, if <code>start = 2</code> and <code>end = 5<code>, the program should return 2 \* 3 \* 4 \* 5 = 120.

In [None]:
def partial_factorial(start):
    def helper(end):
        ...
    ...
    
start_2 = partial_factorial(2)
print(start_2(5))

In [None]:
# autograder cell: do not alter
csModule2.checkpoint3(partial_factorial)

### Map, Filter, Reduce

This subsection is -- for the most part -- tangentially related to the functions content you just learned above. However, we will be going over 3 important built-in functions that can be applied to Python sequences as a whole, evading the for loop syntax that tends to be less efficient as lists grow larger.

If you want to know the reasoning behind why loops are time-hungry, you can take a look at some CS 61A and CS 61B content: search for Big O notation and time complexity of programs. In brief, if a sequence as $n$ elements, it will take $n$ time units for a program to make 1 pass over the entire list. This time gets longer and longer as $n \rightarrow \infty$ (i.e. for very large datasets). If you want to apply a certain function to every element of the list, this time (the $n$ time units) must be multiplied by the time it takes each function call to execute. This can get unreasonably large very quickly.

The functions <code>map</code>, <code>filter</code>, and <code>reduce</code> are meant to subvert this issue! Let's take a closer look.

1) <b>Map</b>: The general structure of the <code>map</code> function is the following: <code>map(function_to_apply, sequence)</code>. The function will output a <code>map</code> type object which you can then convert into a list or tuple by simply performing the functions <code>list(...)</code> or <code>tuple(...)</code> upon the object. The resultant sequence will have the same number of elements as the input list, except each corresponding element in the original list will have had the function applied to it. Hence, the resulting list will look like: <code>[func(seq[0]), func(seq[1]), ...]</code>.

This might seem a little abstract so let's look at a practical example. Note that the parameter of <code>map</code> <code>function_to_apply</code> must be a function that takes only 1 argument (the current element of the sequence).

In [None]:
map_ex = range(5, 100, 2)
list(map((lambda x: x ** 2), map_ex))

In [None]:
# the for loop alternative
squared_map_ex = []
for i in map_ex:
    squared_map_ex.append(i ** 2)
squared_map_ex

In the first line when defining <code>map_ex</code> we assign that name to a range object (a type of Python sequence) in which the first element is 5 and the next elements are progressively incremented by 2 upper bounded by 100 i.e. <code>map_ex = [5, 7, 9, ..., 99]</code>. In the second line, we define a lambda function that returns the square of whatever argument is passed into it. Because the function and <code>map_ex</code> are the parameters of <code>map</code>, the lambda function is applied to all of the elements of <code>map_ex</code>, so the resultant list as printed above consists of the elements $5^2 = 25, 7^2 = 49, ..., 99^2$.

We are using a simple example to illustrate the effects of <code>map</code> here, but during data analysis, you can use this to -- perhaps -- combine two columns in a table together according to a physics equation you encountered in a paper.

2) <b>Filter</b>: The <code>filter</code> function follows the same general structure (<code>filter(function_to_apply, sequence)</code>) as the <code>map</code> function does, but their jobs are different. As its name implies, <code>filter</code> will filter out the elements in a sequence that do not satisfy a particular condition, and leave the rest behind in the final sequence. Run the code cell below to try this out!

In [None]:
filter_ex = range(5, 100, 5)
list(filter((lambda x: x % 2 == 0), filter_ex))

In [None]:
# the for loop alternative
even_filter_ex = []
for i in filter_ex:
    if i % 2 == 0:
        even_filter_ex.append(i)
even_filter_ex

In the first line, when we're defining <code>filter_ex</code> we assign that name to a range object that contains elements <code>[5, 10, ..., 95]</code> because the start of the object is at 5, and end is at 100 exclusive, and the skip factor is 5. In the second line, we're defining another lambda function that returns whether or not the argument passed through is even. Note that the function that is passed through as an argument in <code>filter</code> must always return a boolean value i.e. either True or False. Elements that when called with the function return True stay within the filtered list while those that return False are filtered out.

The answer that we get here makes sense; only the elements <code>[10, 20, ..., 90]</code> are printed because they are the only multiples of 5 less than 100 that are also divisible by 2. The <code>filter</code> performs the equivalent of essentially writing a loop with a conditional statement inside.

3) <b>Reduce</b>: The <code>reduce</code> function is a good alternative to performing a rolling operation on all of the elements in a sequence. Want to multiply all the elements in a list together?

In [None]:
from functools import reduce
reduce_ex = list(range(1, 5))
reduce((lambda x, y: x * y), reduce_ex)

In [None]:
# the for loop alternative
mult_reduce_ex = 1 # base value for multiplying (this really depends on the operation you're considering)
for i in reduce_ex:
    mult_reduce_ex *= i
mult_reduce_ex

In the case above, the rolling operation is \*, or multiplication. Note that the lambda function that is passed as the first argument to <code>reduce</code> takes two arguments -- the current list element and the "next" one, effectively functioning as the indices <code>i</code> and <code>i + 1</code> in a for loop. We know that <code>reduce_ex</code> is a variable that holds the list value <code>[1, 2, 3, 4]</code>, so what <code>reduce</code> does when this is provided as the second argument is multiply every element in the list together to yield $24$, or $4!$. Note that <code>reduce</code> is not a built-in function and must be imported from a library called <code>functools</code>.

#### Checkpoint 4

You are a librarian and have a nested list of book data in the following format: <code>[book_number, [status, num_checked_out, title, author, category, num_copies, cost], ..., book_number, [status, ...]]</code>. Write a Python program that returns a list with 2-tuples. Each tuple consists of a book number and the product of <code>num_copies</code> and <code>cost</code>.

Write a Python program using lambda, map, and the concepts explored in this section.

In [None]:
book_data = [
    1, ["checked out", 3, "Slaughterhouse Five", "Kurt Vonnegut", "science fiction", 3, 10],
    2, ["on shelf", 2, "Hunger Games", "Suzanne Collins", "dystopian", 5, 11],
    3, ["checked out", 5, "Harry Potter and the Philosopher's Stone", "J. K. Rowling", "fantasy", 5, 7],
    4, ["on shelf", 1, "The Sphere of Secrets", "Catherine Fisher", "fantasy", 2, 8],
    5, ["on shelf", 0, "The Handmaid's Tale", "Margaret Atwood", "dystopian", 2, 7]
]

... # write your code here

final_list = ...

In [None]:
# autograder cell: do not alter
csModule2.checkpoint4(final_list)

## Data Type Details

In this section, we will dive further into specific data types and various ways in which we can use their specific methods to avoid reinventing the wheel! Loops are useful, but built-in methods are often optimized for speed, especially with large inputs, which you will likely be dealing with during your project.

### String Concatenation

In Python, 2 numbers can be added pretty easily: we know if we type in <code>45 + 2</code>, we'll get 47 back. But what if we wanted to add 2 strings together? Run the code cell below.

In [None]:
# add the strings
s1 = "Hello, "
s2 = "world!"

s1 + s2

Hmm. String addition (or <i>concatenation</i>, which basically means pasting the end of a string onto the beginning of another) is formatted very similarly to regular addition, and because <code>s1</code> and <code>s2</code> are of the same data type, the line doesn't error. The purpose of this functionality might seem a bit frivolous now, but putting some deeper thought into it...

In [None]:
# user input string concatenation
name = input("Name: ")
major = input("Major: ")

print("Hi! I'm " + name + " and I'm a " + major + " major at Cal.")

String concatenation can be incredibly useful when trying to reformat data, move it to a new table, or consolidate data to write to files with specific extensions. Remember, however, that you must convert the elements to be concatenated into strings because adding an integer to a string, for example, will throw a TypeError.

In [None]:
# bad concatenation: typerror

age = 19

print("Hi! I'm " + age + " years old.")

In [None]:
# good concatenation

age = 19

print("Hi! I'm " + str(age) + " years old.")

#### Checkpoint 5

Let's take a break and play some Mad Libs! Use the string concatenation principles you learned in this section to replace the <code>[part of speech]</code> blocks inside <code>str_w_blanks</code> so that the story reads more smoothly (you can write your own words corresponding to each part of speech in the <code>...</code> parts). When you print <code>final_str</code> there should be no words in square brackets left over!

Note that you can pass this checkpoint by just inserting random words; you don't have to use the Disney lyrics if you don't want to.

In [None]:
str_w_blanks = """Let's get down to [noun_1], to [verb_1] the [noun_2]
Did they send me [noun_3], when I [verb_2] for [noun_4]?
You're the [adj_1] [noun_4] I ever met
But [pronoun_1] can [verb_3] before we're through
[noun_5], I'll [verb_4] a man out of you"""

noun_1 = ...
noun_2 = ...
noun_3 = ...
noun_4 = ...
noun_5 = ...

verb_1 = ...
verb_2 = ...
verb_3 = ...
verb_4 = ...

adj_1 = ...

... # insert your code here

final_str = ...

print(final_str)

In [None]:
# autograder cell: do not alter
csModule2.checkpoint5(final_str)

Have you guessed where the original string is from?

<i>Hint</i>: In the next section, we'll get down to business! :)

### String Formatting

An alternative to string concatenation is Python's string formatting capabilities. Instead of having to laboriously think of where to end a component string so that there won't be any extra spaces or commas, string formatting allows programmers to write out the original string format and code replacements into the blank "spaces" afterward. See the below code cell!

In [None]:
num_cats = "45"

print("I have %s cats living in my house!" % num_cats)

The code cell above should print "I have 45 cats living in my house!" (by the way, I love cats, but that's way too many). What's going on in that cell though?

Notice that in the spot that "45" should be in the printed string, there is instead a "%s" -- a placeholder for the value (the "s" part stands for string, clarifying the data type) that should actually be there. The <code>% num_cats</code> after the string format specifies exactly what value should replace the placeholder.

What if you want to replace more than 1 value in a string? This is where tuples come in handy. Look at the code cell here to see how.

In [None]:
num_cats = 45
address = "123 Cleary St."

print("I have %d cats living in my house on %s!" % (num_cats, address))

As you can see from above, we are now using %d along with %s. The %d specifies a numeric data type just as %s can be used for strings (or actually, any other data type: lists, dictionaries, etc.). This is because of a built-in <code>repr</code> method common to all Python data types that allows conversion to string when necessary i.e. before a list can be dropped into the placeholder %s, it is actually converted to a string representation of that list!

In the tuple past the % sign, you can include any number of values to replace the placeholders as long as there are the same number of values as placeholders. The first tuple element is matched to the first placeholder, and so on. This explains why the 45 goes into %d and the address is placed in %s, in order.

#### Checkpoint 6

A cool thing about Jupyter notebooks is that you can access the variables defined in previous cells (unless you've redefined them). In this checkpoint, we're going to use the following variables from <b>Checkpoint 5</b> to reimplement Mad Libs but with string formatting to get a better sense of its ease of use.

- <code>str_w_blanks</code>
- <code>noun_1</code>
- <code>noun_2</code>
- <code>noun_3</code>
- <code>noun_4</code>
- <code>noun_5</code>
- <code>verb_1</code>
- <code>verb_2</code>
- <code>verb_3</code>
- <code>verb_4</code>
- <code>adj_1</code>

You can check the correctness of your code as <code>new_final_str</code> in this section should end up looking the same as <code>final_str</code> in <b>Checkpoint 5</b>.

In [None]:
... # insert your code here

new_final_str = ...

print(new_final_str)

In [None]:
# autograder cell: do not alter
csModule2.checkpoint6(new_final_str)

### List Comprehensions

So far, we've learned that we can generate lists using the <code>range</code> function, for loops, or while loops. This often takes multiple lines. Analogous to the lambda for functions, list comprehensions are a condensed, efficient method of constructing lists using for loops and conditions for filtering purposes. Try out the code cell below:

In [None]:
# list constructed by for loop
lst_squares = []
for i in range(20):
    lst_squares += [i ** 2]
lst_squares

The output above should display a list of 20 elements, all integer squares from $0^2 = 0$ to $19^2 = 361$. We can actually obtain the same list in one line of code using a list comprehension.

In [None]:
lst_compr_squares = [i ** 2 for i in range(20)]

A list comprehension essentially looks like a rearranged for loop that condenses the entire process! The general form of a list comprehension follows the following format:

<code>
lst_compr = [[expression to evaluate] if [condition] else [expression for else condition] for [loop index] in [iterable: some sort of sequence]]
</code>

OR

<code>
lst_compr = [[expression to evaluate] for [loop index] in [iterable: some sort of sequence] if [condition]]
</code>

<b>Note</b>: The 2nd format is only valid if there is only an if condition. The 1st format will not work if the else condition is eliminated.

An example of all of these components in action is below so you can experiment with different aspects of it.

In [None]:
lst_compr = [i ** 2 if i % 2 == 0 else i ** 3 for i in range(20)]
lst_compr

The list <code>lst_compr</code> contains 20 elements. The ones at even indices: i = 0, 2, 4, ... are expressed in their square form in the final list. On the other hand, the ones at odd indices: i = 1, 3, 5, ... are expressed in cubic form.

Now that you know the basics of list comprehensions, let's do a problem to quiz you on what you've learned!

#### Checkpoint 7

Create a lambda function called <code>flatten</code> that takes a Python list as its argument and returns the <i>flattened</i> list when <code>flatten</code> is called. By flattened list, we mean that all elements in sublists should just be in the list at depth 1 (not in the sublist). For example, this deep list <code>[[1, 2, 3], [2, 3], [4]]</code> should look like this flattened: <code>[1, 2, 3, 2, 3, 4]</code>. For the sake of simplicity, assume that the list passed through does not have sublists of sublists i.e. does not look something like this: <code>[[1, 2, [3, 4]], 4]</code> and that the elements of your original list to flatten are only lists and not any other data type i.e. 5 all by itself will never be in the list.

In [42]:
flatten = lambda lst: [l[i] for l in lst for i in range(len(l))] # insert your code here

flatten([[1, 2, 3], [2, 3], [4]])

[1, 2, 3, 2, 3, 4]

In [44]:
# autograder cell: do not alter
csModule2.checkpoint7(flatten)

Checkpoint 7 Passed!


### List-specific Methods

Python has some list-specific methods that can help you shortcut common operations performed upon lists, like adding, removing, etc. Here are a few list methods you might find useful in your data analysis! You may use the code cells provided as a playground to experiment with edge cases to understand the functions better.

The first cell includes the list we will be using during our explorations of this topic!

In [None]:
# our original list

lst = [5, 8.0, 88.3, "sparkles", ["blue", "green", 'y', 5]]

In [None]:
# the append function

lst.append("cats")
lst

In [None]:
# the extend function

lst.extend(["cats", "dogs", 15])
lst

The major difference between the oft-confused <code>append</code> and <code>extend</code> is that the latter takes a sequence as its argument and individually takes each element of the sequence and concatenates it to the original list. If you were appending a sequence $s$, $s$ would just end up pasted as <code>lst</code>'s last element, a whole sequence together. However, by extending <code>lst</code> by $s$, $s[0]$, $s[1]$, and so on will be individually concatenated to <code>lst</code>.

In [None]:
# the insert function: inserts "sunshine" at index 2

lst.insert(2, "sunshine")
lst

In [None]:
# the remove function: removes an element from the lst

lst.remove(88.3)
lst

In [None]:
# the count function: returns the number of times an element occurs in the lst

lst.count(5)

In [None]:
# the pop function: pops off the last element from the lst

lst.pop()
lst

In [None]:
# the reverse function

lst.reverse()
lst

In [None]:
# the sort function: sorts the list (bad)

lst.sort()
lst

Notice that there's a TypeError when you run the cell above. This is because Python's list sort function is only capable of sorting lists in which all of the elements are numeric values. Let's use a different list to experiment with sort.

In [None]:
# the sort function: sorts the list (good)

lst_alt = [3, 5, 2, 8, 8, 1, 1, 1, 2]
lst_alt.sort()
lst_alt

Because duplicates are tied in terms of position, they are simply placed next to each other in the resultant sorted list.

#### Checkpoint 8

Perform a few basic operations on the list below by following the listed directions in order.

1. Remove the last element.
2. Remove all elements that not numeric in type.
3. Insert 44 at position 2 and 4 at position 1.
4. Count how many times 4 occurs in the list.
5. Sort the list in descending order.

How does <code>original_lst</code> look like at the end?

In [None]:
original_lst = [4, 2.0, 88, "staircase", 5 / 6, 42, ["blue", "pineapples", 3], {"straw": "man", "4": "trees"}]

# operation 1

# operation 2

# operation 3

# operation 4

# operation 5

original_lst

In [None]:
# autograder cell: do not alter
csModule2.checkpoint8(original_lst)

In [None]:
csModule2.test_all(find_max_sum, count_match, partial_factorial, final_list, final_str, new_final_str,
                  flatten, original_lst)

## Summary

In this lesson we learned...
- what functions, lambdas, and nested functions are, and how to create them
- more about Python data structures, specifically lists, strings, and dictionaries, and intricacies about formatting them or type-specific methods
- how to construct lists concisely using list comprehensions
- how to use the <code>map</code>, <code>filter</code>, and <code>reduce</code> functions to avoid running too many / nested for loops

Things to look forward to...
- using dictionary-specific methods
- Making graphs in Python (the deepest of rabbit holes!)
- science applications! First up: exoplanets. If you have any requests for applications, send a Slack message to Arjun Savel! Other plans include gravitational waves, relations between black holes and galaxies, and (hopefully) the Event Horizon Telescope data.