In [1]:
# Import libraries
import inspect
import expectexception

import pandas as pd

In [2]:
# Read data
gpa_grades = pd.read_fwf('datasets/gpa_grades.data', index_col=0)
gpa_grades.head(2)

Unnamed: 0,y1_gpa,y2_gpa,y3_gpa,y4_gpa
0,2.785877,2.052513,2.170544,0.06557
1,1.144557,2.666498,0.267098,2.884737


# 1. Best Practices

Learn how you can merge disparate data using inner joins. By combining information from multiple sources you’ll uncover compelling insights that may have previously been hidden. You’ll also learn how the relationship between those sources, such as one-to-one or one-to-many, can affect your result.

# <font color=darkred>1.1 Docstrings</font>

1. Docstrings
>Hi. My name is Shayne Miel. You've probably spent a lot of time using functions that someone else wrote. In this course, you'll learn how to write functions that others can use. Docstrings are a Python best practice that will make your code much easier to use, read, and maintain.

2. A complex function
>Look at this split_and_stack() function. If you wanted to understand what the function does, what the arguments are supposed to be, and what it returns, you would have to spend some time deciphering the code.

3. A complex function with a docstring
>With a docstring though, it is much easier to tell what the expected inputs and outputs should be, as well as what the function does. This makes it easier for you and other engineers to use your code in the future.

4. Anatomy of a docstring
>A docstring is a string written as the first line of a function. Because docstrings usually span multiple lines, they are enclosed in triple quotes, Python's way of writing multi-line strings. Every docstring has some (although usually not all) of these five key pieces of information: what the function does, what the arguments are, what the return value or values should be, info about any errors raised, and anything else you'd like to say about the function.

5. Docstring formats
>Consistent style makes a project easier to read, and the Python community has evolved several standards for how to format your docstrings. Google-style and Numpydoc are the most popular formats, so we'll focus on those.

6. Google Style - description
>In Google style, the docstring starts with a concise description of what the function does. This should be in imperative language. For instance: "Split the data frame and stack the columns" instead of "This function will split the data frame and stack the columns".

7. Google style - arguments
>Next comes the "Args" section where you list each argument name, followed by its expected type in parentheses, and then what its role is in the function. If you need extra space, you can break to the next line and indent as I've done here. If an argument has a default value, mark it as "optional" when describing the type. If the function does not take any parameters, feel free to leave this section out.

8. Google style - return value(s)
>The next section is the "Returns" section, where you list the expected type or types of what gets returned. You can also provide some comment about what gets returned, but often the name of the function and the description will make this clear. Additional lines should not be indented.

9. Google-style - errors raised and extra notes
>Finally, if your function intentionally raises any errors, you should add a "Raises" section. You can also include any additional notes or examples of usage in free form text at the end.

10. Numpydoc
>The Numpydoc format is very similar and is the most common format in the scientific Python community. Personally, I think it looks better than the Google style. It takes up more vertical space though, so this course will either use Google-style or leave out the docstrings entirely to keep the examples compact and legible.

11. Retrieving docstrings
>Sometimes it is useful for your code to access the contents of your function's docstring. Every function in Python comes with a __doc__ attribute that holds this information. Notice that the __doc__ attribute contains the raw docstring, including any tabs or spaces that were added to make the words line up visually. To get a cleaner version, with those leading spaces removed, you can use the getdoc() function from the inspect module. The inspect module contains a lot of useful methods for gathering information about functions.

12. Let's practice!
>Now it's your turn to practice writing and retrieving docstrings.

In [3]:
# Google style
def function(arg_1, arg_2=42):
    """Description of what the function does. 
    
    Args: 
        arg_1 (str): Description of arg_1 that can break onto the next line 
            if needed. 
        arg_2 (int, optional): Write optional when an argument has a default 
            value. 
    
    Returns: 
        bool: Optional description of the return value 
        Extra lines are not indented. 
    
    Raises: 
        ValueError: Include any error types that the function intentionally 
            raises. 
    
    Notes: 
        See https://www.datacamp.com/community/tutorials/docstrings-python 
        for more info. 
    """

In [4]:
# Numpydoc style
def function(arg_1, arg_2=42):
    """
    Description of what the function does.
    
    Parameters
    ----------
    arg_1 : expected type of arg_1
        Description of arg_1.
    arg_2 : int, optional
        Write optional when an argument has a default value. 
        Default=42.
    
    Returns
    -------
    The type of the return value
        Can include a description of the return value.
        Replace "Returns" with "Yields" if this function is a generator.
    
    Raises
    -------
    ValueError: Include any error types that the function intentionally 
        raises. 
    
    Notes
    -------
    See https://www.datacamp.com/community/tutorials/docstrings-python 
    for more info.
    """
    return True

In [5]:
# Retrieving docstrings
print(function.__doc__)


    Description of what the function does.
    
    Parameters
    ----------
    arg_1 : expected type of arg_1
        Description of arg_1.
    arg_2 : int, optional
        Write optional when an argument has a default value. 
        Default=42.
    
    Returns
    -------
    The type of the return value
        Can include a description of the return value.
        Replace "Returns" with "Yields" if this function is a generator.
    
    Raises
    -------
    ValueError: Include any error types that the function intentionally 
        raises. 
    
    Notes
    -------
    See https://www.datacamp.com/community/tutorials/docstrings-python 
    for more info.
    


In [6]:
# Retrieving docstrings
print(inspect.getdoc(function))

Description of what the function does.

Parameters
----------
arg_1 : expected type of arg_1
    Description of arg_1.
arg_2 : int, optional
    Write optional when an argument has a default value. 
    Default=42.

Returns
-------
The type of the return value
    Can include a description of the return value.
    Replace "Returns" with "Yields" if this function is a generator.

Raises
-------
ValueError: Include any error types that the function intentionally 
    raises. 

Notes
-------
See https://www.datacamp.com/community/tutorials/docstrings-python 
for more info.


# <font color=darkred>1.2 Crafting a docstring</font> 

You've decided to write the world's greatest open-source natural language processing Python package. It will revolutionize working with free-form text, the way numpy did for arrays, pandas did for tabular data, and scikit-learn did for machine learning.

The first function you write is count_letter(). It takes a string and a single letter and returns the number of times the letter appears in the string. You want the users of your open-source package to be able to understand how this function works easily, so you will need to give it a docstring. Build up a Google Style docstring for this function by following these steps.

**Instructions**
- Copy the following string and add it as the docstring for the function: "Count the number of times 'letter' appears in 'content'".
- Now add the arguments section, using the Google style for docstrings. Use str to indicate a string.
- Add a returns section that informs the user the return value is an int.
- Finally, add some information about the ValueError that gets raised when the arguments aren't correct.

**Results**

<font color=darkgreen>What a delightful docstring! While it does require a bit more typing, the information presented here will make it very easy for others to use this code in the future. Remember that even though computers execute it, code is actually written for humans to read (otherwise you'd just be writing the 1s and 0s that the computer operates on).</font>

In [7]:
# Add a docstring to count_letter()
def count_letter(content, letter):
    """Count the number of times `letter` appears in `content`.

    Args:
        content (str): The string to search.
        letter (str): The letter to search for.
    
    Returns:
        int: Number of times `letter` appears in `content`.
    
    Raises: 
        ValueError: gets raised when the arguments aren't 
          correct.
        TypeError: gets raised if a variable is not defined.
  """
    if (not isinstance(letter, str)):
        raise ValueError('`content` must be a string.')
    if len(letter) != 1:
        raise ValueError('`letter` must be a single character string.')
    
    return content.count(letter)
    #return len([char for char in content if char == letter])

In [8]:
%%expect_exception TypeError

count_letter('Calabazas')

[1;31m---------------------------------------------------------------------------[0m
[1;31mTypeError[0m                                 Traceback (most recent call last)
[1;32m<ipython-input-8-78b778baee8a>[0m in [0;36m<module>[1;34m[0m
[1;32m----> 1[1;33m [0mcount_letter[0m[1;33m([0m[1;34m'Calabazas'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m
[1;31mTypeError[0m: count_letter() missing 1 required positional argument: 'letter'


In [9]:
%%expect_exception ValueError

count_letter('Calabazas', 'ae')

[1;31m---------------------------------------------------------------------------[0m
[1;31mValueError[0m                                Traceback (most recent call last)
[1;32m<ipython-input-9-b508a965742e>[0m in [0;36m<module>[1;34m[0m
[1;32m----> 1[1;33m [0mcount_letter[0m[1;33m([0m[1;34m'Calabazas'[0m[1;33m,[0m [1;34m'ae'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m
[1;32m<ipython-input-7-7f41781f0453>[0m in [0;36mcount_letter[1;34m(content, letter)[0m
[0;32m     18[0m         [1;32mraise[0m [0mValueError[0m[1;33m([0m[1;34m'`content` must be a string.'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0;32m     19[0m     [1;32mif[0m [0mlen[0m[1;33m([0m[0mletter[0m[1;33m)[0m [1;33m!=[0m [1;36m1[0m[1;33m:[0m[1;33m[0m[1;33m[0m[0m
[1;32m---> 20[1;33m         [1;32mraise[0m [0mValueError[0m[1;33m([0m[1;34m'`letter` must be a single character string.'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[0;32m     21[0m [1;33m[0m[

In [10]:
print(inspect.getdoc(count_letter))

Count the number of times `letter` appears in `content`.

Args:
    content (str): The string to search.
    letter (str): The letter to search for.

Returns:
    int: Number of times `letter` appears in `content`.

Raises: 
    ValueError: gets raised when the arguments aren't 
      correct.
    TypeError: gets raised if a variable is not defined.


# <font color=darkred>1.3 Retrieving docstrings</font> 

You and a group of friends are working on building an amazing new Python IDE (integrated development environment -- like PyCharm, Spyder, Eclipse, Visual Studio, etc.). The team wants to add a feature that displays a tooltip with a function's docstring whenever the user starts typing the function name. That way, the user doesn't have to go elsewhere to look up the documentation for the function they are trying to use. You've been asked to complete the build_tooltip() function that retrieves a docstring from an arbitrary function.

You will be reusing the count_letter() function that you developed in the last exercise to show that we can properly extract its docstring.

**Instructions**
- Begin by getting the docstring for the function count_letter(). Use an attribute of the count_letter() function.
- Now use a function from the inspect module to get a better-formatted version of count_letter()'s docstring.
- Now create a build_tooltip() function that can extract the docstring from any function that we pass to it.

**Results**

<font color=darkgreen>This IDE is going to be an incredibly delightful experience for your users now! Notice how the count_letter.__doc__ version of the docstring had strange whitespace at the beginning of all but the first line. That's because the docstring is indented to line up visually when reading the code. But when we want to print the docstring, removing those leading spaces with inspect.getdoc() will look much better.</font>

In [11]:
# Get the "count_letter" docstring by using an attribute of the function
docstring = count_letter.__doc__

border = '#' * 28
print('{}\n{}\n{}'.format(border, docstring, border))

############################
Count the number of times `letter` appears in `content`.

    Args:
        content (str): The string to search.
        letter (str): The letter to search for.
    
    Returns:
        int: Number of times `letter` appears in `content`.
    
    Raises: 
        ValueError: gets raised when the arguments aren't 
          correct.
        TypeError: gets raised if a variable is not defined.
  
############################


In [12]:
# Inspect the count_letter() function to get its docstring
docstring = inspect.getdoc(count_letter)

border = '#' * 28
print('{}\n{}\n{}'.format(border, docstring, border))

############################
Count the number of times `letter` appears in `content`.

Args:
    content (str): The string to search.
    letter (str): The letter to search for.

Returns:
    int: Number of times `letter` appears in `content`.

Raises: 
    ValueError: gets raised when the arguments aren't 
      correct.
    TypeError: gets raised if a variable is not defined.
############################


In [13]:
def build_tooltip(function):
  """Create a tooltip for any function that shows the
  function's docstring.

  Args:
    function (callable): The function we want a tooltip for.

  Returns:
    str
  """
  # Get the docstring for the "function" argument by using inspect
  docstring = inspect.getdoc(function)
  border = '#' * 28
  return '{}\n{}\n{}'.format(border, docstring, border)

print(build_tooltip(count_letter))
print(build_tooltip(range))
print(build_tooltip(print))

############################
Count the number of times `letter` appears in `content`.

Args:
    content (str): The string to search.
    letter (str): The letter to search for.

Returns:
    int: Number of times `letter` appears in `content`.

Raises: 
    ValueError: gets raised when the arguments aren't 
      correct.
    TypeError: gets raised if a variable is not defined.
############################
############################
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
############################
############################
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the val

# <font color=darkred>1.4 Docstrings to the rescue!</font>

**Instructions**

Some maniac has corrupted your installation of numpy! All of the functions still exist, but they've been given random names. You desperately need to call the numpy.histogram() function and you don't have time to reinstall the package. Fortunately for you, the maniac didn't think to alter the docstrings, and you know how to access them. numpy has a lot of functions in it, so we've narrowed it down to four possible functions that could be numpy.histogram() in disguise: numpy.leyud(), numpy.uqka(), numpy.fywdkxa() or numpy.jinzyxq().

**Possible Answers**

Examine each of these functions' docstrings in the IPython shell to determine which of them is actually numpy.histogram().

**Results**
- numpy.leyud()<br>
    <code>print(numpy.leyud.__doc__)
    Gives a new shape to an array without changing its data.
    Parameters
    ----------</code>

- numpy.uqka()
    <code>print(numpy.uqka.__doc__)
    Returns the indices that would sort an array.
    Perform an indirect sort along the given axis using the algorithm specified
    by the `kind` keyword. It returns an array of indices of the same shape as
    `a` that index data along the given axis in sorted order.
    Parameters
    ----------</code>

- <font color=red>numpy.fywdkxa()</font>
    <code>print(numpy.fywdkxa.__doc__)
    Compute the histogram of a set of data.
    Parameters
    ----------</code>

- numpy.jinzyxq()
    <code>print(numpy.jinzyxq.__doc__)
    Return an array of zeros with the same shape and type as a given array.
    Parameters
    ----------</code>

<font color=darkgreen>You found it! numpy.fywdkxa() is actually numpy.histogram() in disguise. If you've spent any time browsing numpy's online documentation, you will notice that it is built directly from the docstrings. There are some wonderful tools like sphinx and pydoc that will automatically generate online documentation for you based off of your docstrings.</font>

# <font color=darkred>1.5 DRY and "Do One Thing"</font>

1. DRY and "Do One Thing"
>DRY (also known as "don't repeat yourself") and the "Do One Thing" principle are good ways to ensure that your functions are well designed and easy to test. Let's see how.

2. Don't repeat yourself (DRY)
>When you are writing code to look for answers to a research question, it is totally normal to copy and paste a bit of code, tweak it slightly, and re-run it. However, this kind of repeated code can lead to real problems. In this code snippet, I load my train, validation, and test data, and plot the first two principal components of each data set. I wrote the code for the train data set, then copied it and pasted it into the next two blocks, updating the paths and the variable names.

3. The problem with repeating yourself
>But one of the problems with copying and pasting is that it is easy to accidentally introduce errors that are hard to spot. If you'll notice in the last block, I accidentally took the principal components of the train data instead of the test data. Yikes!

4. Another problem with repeating yourself
>Another problem with repeated code is that if you want to change something, you often have to do it in multiple places. For instance, if we realized that our CSVs used the column name "label" instead of "labels", we would have to change our code in six places. Repeated code like this is a good sign that you should write a function. So let's do that.

5. Use functions to avoid repetition
>Wrapping the repeated logic in a function and then calling that function several times makes it much easier to avoid the kind of errors introduced by copying and pasting. And if you ever need to change the column "label" back to "labels", or you want to swap out PCA for some other dimensionality reduction technique, you only have to do it in one or two places.

6. Problem: it does multiple things
>However, there is still a big problem with this function.

7. Problem: it does multiple things
>First, it loads the data.

8. Problem: it does multiple things
>Then it plots the data.

9. Problem: it does multiple things
>And then it returns the loaded data. This function violates another software engineering principle: Do One Thing. Every function should have a single responsibility. Let's look at how we could split this one up.

10. Do One Thing
>Instead of one big function, we could have a more nimble function that just loads the data and a second one for plotting. We get several advantages from splitting the load_and_plot() function into two smaller functions. First of all, our code has become more flexible. Imagine that later on in your script, you just want to load the data and not plot it. That's easy now with the load_data() function. Likewise, if you wanted to do some transformation to the data before plotting, you can do the transformation and then call the plot_data() function. We have decoupled the loading functionality from the plotting functionality.

11. Advantages of doing one thing
>The code will also be easier for other developers to understand, and it will be more pleasant to test and debug. Finally, if you ever need to update your code, functions that each have a single responsibility make it easier to predict how changes in one place will affect the rest of the code.

12. Code smells and refactoring
>Repeated code and functions that do more than one thing are examples of "code smells", which are indications that you may need to refactor. Refactoring is the process of improving code by changing it a little bit at a time. This process is well described in Martin Fowler's book, "Refactoring", which is a good read for any aspiring software engineer.

13. Let's practice!
>Now you can do some refactoring of your own in the exercises!

In [14]:
def load_data(path):
    """Load a data set.
    
    Args:
        path (str): The location of a CSV file.
    
    Returns:
        tuple of ndarray: (features, labels)
    """
    
    data = pd.read_csv(path)
    y = data['labels'].values
    X = data[[col for col in data.columns if col != 'labels']].values
    return X, y

def plot_data(X):
    """Plot the first two principal components of a matrix.
    
    Args:
        X (numpy.ndarray): The data to plot.
    """
    pca = PCA(n_components=2).fit_transform(X)
    plt.scatter(pca[:,0], pca[:,1])

# <font color=darkred>1.6 Extract a function</font> 

While you were developing a model to predict the likelihood of a student graduating from college, you wrote this bit of code to get the z-scores of students' yearly GPAs. Now you're ready to turn it into a production-quality system, so you need to do something about the repetition. Writing a function to calculate the z-scores would improve this code.

<code>
# Standardize the GPAs for each year
df['y1_z'] = (df.y1_gpa - df.y1_gpa.mean()) / df.y1_gpa.std()
df['y2_z'] = (df.y2_gpa - df.y2_gpa.mean()) / df.y2_gpa.std()
df['y3_z'] = (df.y3_gpa - df.y3_gpa.mean()) / df.y3_gpa.std()
df['y4_z'] = (df.y4_gpa - df.y4_gpa.mean()) / df.y4_gpa.std()
</code>

Note: df is a pandas DataFrame where each row is a student with 4 columns of yearly student GPAs: y1_gpa, y2_gpa, y3_gpa, y4_gpa

**Instructions**
- Finish the function so that it returns the z-scores of a column.
- Use the function to calculate the z-scores for each year (df['y1_z'], df['y2_z'], etc.) from the raw GPA scores (df.y1_gpa, df.y2_gpa, etc.).

**Results**

<font color=darkgreen>That's a fantastic function! standardize() will probably be useful in other places in your code, and now it is easy to use, test, and update if you need to. It's also easier to tell what the code is doing because of the docstring and the name of the function.</font>

In [15]:
def standardize(column, df):
    """Standardize the values in a column.
    
    Args:
        column (pandas Series): The data to standardize.
    
    Returns:
        pandas Series: the values as z-scores
    """
    # Finish the function so that it returns the z-scores
    z_score = (df[column] - df[column].mean()) / df[column].std()
    return z_score

# Use the standardize() function to calculate the z-scores
gpa_grades['y1_z'] = standardize('y1_gpa', gpa_grades)
gpa_grades['y2_z'] = standardize('y2_gpa', gpa_grades)
gpa_grades['y3_z'] = standardize('y3_gpa', gpa_grades)
gpa_grades['y4_z'] = standardize('y4_gpa', gpa_grades)

gpa_grades.head()

Unnamed: 0,y1_gpa,y2_gpa,y3_gpa,y4_gpa,y1_z,y2_z,y3_z,y4_z
0,2.785877,2.052513,2.170544,0.06557,0.790863,0.028022,0.172322,-1.711179
1,1.144557,2.666498,0.267098,2.884737,-0.872971,0.564636,-1.347122,0.82443
2,0.907406,0.423634,2.613459,0.03095,-1.113375,-1.395594,0.525883,-1.742317
3,2.205259,0.52358,3.984345,0.339289,0.202281,-1.308243,1.620206,-1.464991
4,2.877876,1.287922,3.077589,0.901994,0.884124,-0.64022,0.896379,-0.958884


# <font color=darkred>1.7 Split up a function</font> 

Another engineer on your team has written this function to calculate the mean and median of a sorted list. You want to show them how to split it into two simpler functions: mean() and median()

<code>
def mean_and_median(values):
  """Get the mean and median of a sorted list of `values`

  Args:
    values (iterable of float): A list of numbers

  Returns:
    tuple (float, float): The mean and median
  """
  mean = sum(values) / len(values)
  midpoint = int(len(values) / 2)
  if len(values) % 2 == 0:
    median = (values[midpoint - 1] + values[midpoint]) / 2
  else:
    median = values[midpoint]

  return mean, median
</code>

**Instructions**
1. Write the mean() function.
2. Write the median() function.

**Results**

<font color=darkgreen>A perfect split! Each function does one thing and does it well. Using, testing, and maintaining these will be a breeze (although you'll probably just use numpy.mean() and numpy.median() for this in real life).</font>

In [16]:
def mean(values):
    """Get the mean of a sorted list of values
    
    Args:
        values (iterable of float): A list of numbers
        
    Returns:
        float
    """
    # Write the mean() function
    mean = sum(values) / len(values)
    return mean

In [17]:
def median(values):
    """Get the median of a sorted list of values
    
    Args:
        values (iterable of float): A list of numbers
    
    Returns:
        float
    """
    # Write the median() function
    midpoint = int(len(values) / 2)
    if len(values) % 2 == 0:
        median = (values[midpoint - 1] + values[midpoint]) / 2
    else:
        median = values[midpoint]
    return median

# <font color=darkred>1.8 Pass by assignment</font>

1. Pass by assignment
>The way that Python passes information to functions is different from many other languages. It is referred to as "pass by assignment", which I will explain in this lesson.

2. A surprising example
>Let's say we have a function foo() that takes a list and sets the first value of the list to 99. Then we set "my_list" to the value [1, 2, 3] and pass it to foo(). What do you expect the value of "my_list" to be after calling foo()? If you said "[99, 2, 3]", then you are right. Lists in Python are mutable objects, meaning that they can be changed. Now let's say we have another function bar() that takes an argument and adds ninety to it. Then we assign the value 3 to the variable "my_var" and call bar() with "my_var" as the argument. What do you expect the value of "my_var" to be after we've called bar()? If you said "3", you're right. In Python, integers are immutable, meaning they can't be changed.

3. Digging deeper
>Let's look at another example to understand what's going on. Imagine that this gray bar is your computer's memory.

4. Digging deeper
>When we set the variable "a" equal to the list [1, 2, 3], the Python interpreter says, "Okay, now 'a' points to this location in memory."

5. Digging deeper
>Then if we type "b = a", the interpreter says, "Okay, now 'b' points to whatever 'a' is pointing to."

6. Digging deeper
>So if we were to append 4 to the end of "a", both variables get it because there is only one list.

7. Digging deeper
>Likewise, if we append 5 to "b", both variables get it.

8. Digging deeper
>However, if we assign "a" to a different object in memory, that does not change where "b" is pointing. Now, things that happen to "a" are no longer happening to "b", and vice versa.

9. Pass by assignment
>How does this relate to the example functions we saw earlier?

10. Pass by assignment
>When we assign a list to the variable "my_list", it sets up a location in memory for it.

11. Pass by assignment
>Then, when we pass "my_list" to the function foo(), the parameter "x" gets assigned to that same location.

12. Pass by assignment
>So when the function modifies the thing that "x" points to, it is also modifying the thing that "my_list" points to.

13. Pass by assignment
>In the other example, we created a variable "my_var" and assigned it the value 3.

14. Pass by assignment
>Then we passed it to the function bar(), which caused the argument "x" to point to the same place "my_var" is pointing.

15. Pass by assignment
>But the bar() function assigns "x" to a new value, so the "my_var" variable isn't touched. In fact, there is no way in Python to have changed "x" or "my_var" directly, because integers are immutable variables.

16. Immutable or Mutable?
>There are only a few immutable data types in Python because almost everything is represented as an object. The only way to tell if something is mutable is to see if there is a function or method that will change the object without assigning it to a new variable.

17. Mutable default arguments are dangerous!
>Finally, here is a thing that can get you into trouble. foo() is a function that appends the value 1 to the end of a list. But, whoever wrote this function gave the argument an empty list as a default value. When we call foo() the first time, we get what you would expect, a list with one entry. But, when we call foo() again, the default value has already been modified! If you really want a mutable variable as a default value, consider defaulting to None and setting the argument in the function.

18. Let's practice!
>You can check your understanding with the following exercises.

In [18]:
# A surprising example
def foo(x):
    x[0] = 99

my_list = [1, 2, 3]
foo(my_list)
display(my_list)

def bar(x):
    x = x + 90

my_var = 3
bar(my_var)
display(my_var)

[99, 2, 3]

3

In [19]:
# Mutable default arguments are dangerous!
def foo(var=[]):
    var.append(1)
    return var

print(foo())
print(foo(),'\n')

def foo(var=None):
    if var is None:
        var = []
    var.append(1)
    return var

print(foo())
print(foo())

[1]
[1, 1] 

[1]
[1]


# <font color=darkred>1.9 Mutable or immutable?</font>

**Instructions**

The following function adds a mapping between a string and the lowercase version of that string to a dictionary. What do you expect the values of d and s to be after the function is called?

<code>
def store_lower(_dict, _string):
  """Add a mapping between '_string' and a lowercased version of '_string' to '_dict'
  Args:
    _dict (dict): The dictionary to update.
    _string (str): The string to add.
  """
  orig_string = _string
  _string = _string.lower()
  _dict[orig_string] = _string
  """
d = {}
s = 'Hello'
</code>

store_lower(d, s)

**Possible Answers**

1. d = {}, s = 'Hello'
2. d = {}, s = 'hello'
3. <font color=red>d = {'Hello': 'hello'}, s = 'Hello'</font>
4. d = {'Hello': 'hello'}, s = 'hello'
5. d = {'hello': 'hello'}, s = 'hello'

**Results**

<font color=darkgreen>Correct! Dictionaries are mutable objects in Python, so the function can directly change it in the _dict[_orig_string] = _string statement. Strings, on the other hand, are immutable. When the function creates the lowercase version, it has to assign it to the _string variable. This disconnects what happens to _string from the external s variable.</font>

In [20]:
def store_lower(_dict, _string):
    """Add a mapping between `_string` and a lowercased version of `_string` to `_dict`
    
    Args:
        _dict (dict): The dictionary to update.
        _string (str): The string to add.
    """
    orig_string = _string
    _string = _string.lower()
    _dict[orig_string] = _string


d = {}
s = 'Hello'
store_lower(d, s)
print(d, s)

{'Hello': 'hello'} Hello


# <font color=darkred>1.10 Best practice for default arguments</font> 

One of your co-workers (who obviously didn't take this course) has written this function for adding a column to a pandas DataFrame. Unfortunately, they used a mutable variable as a default argument value! Please show them a better way to do this so that they don't get unexpected behavior.

<code>
def add_column(values, df=pandas.DataFrame()):
    """Add a column of 'values' to a DataFrame 'df'.
    The column will be named 'col_<n>' where 'n' is
    the numerical index of the column.
    Args:
        values (iterable): The values of the new column
        df (DataFrame, optional): The DataFrame to update.
            If no DataFrame is passed, one is created by default.
    Returns:
        DataFrame
    """
    df['col_{}'.format(len(df.columns))] = values
    return df
</code>

**Instructions**
- Change the default value of df to an immutable value to follow best practices.
- Update the code of the function so that a new DataFrame is created if the caller didn't pass one.

**Results**

<font color=darkgreen>Beautiful and best practice! When you need to set a mutable variable as a default argument, always use None and then set the value in the body of the function. This prevents unexpected behavior like adding multiple columns if you call the function more than once.</font>

In [21]:
# Use an immutable variable for the default argument
def better_add_column(values, df=None):
    """Add a column of `values` to a DataFrame `df`.
    The column will be named "col_<n>" where "n" is
    the numerical index of the column.
    
    Args:
        values (iterable): The values of the new column
        df (DataFrame, optional): The DataFrame to update.
            If no DataFrame is passed, one is created by default.
    
    Returns:
        DataFrame
    """
    # Update the function to create a default DataFrame
    if df is None:
        df = pandas.DataFrame()
    df['col_{}'.format(len(df.columns))] = values
    return df

# Aditional material

- Datacamp course: https://learn.datacamp.com/courses/writing-functions-in-python
- Python tool documentation:
    https://www.sphinx-doc.org/en/master/
    https://docs.python.org/3/library/pydoc.html