In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("assignment04.ipynb")

# Assignment 04: Text and Arrays

Welcome to Assignment 04!  Throughout the course you will complete assignments like this one. You can't learn technical subjects without hands-on practice, so these assignments are an important part of the course.

Collaborating on labs is more than okay -- it's encouraged! You should rarely remain stuck for more than a few minutes on a question, so ask a post to the discussion board or ask your instructor for help. Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it. You should **not** just copy/paste someone else's code, but rather work together to gain understanding of the task you need to complete. 

To receive credit for this assignment, answer all questions correctly and submit before the deadline.

**Due Date:** Saturday, July 9, 2022 @ 11:59 pm

**Collaboration Policy:** Data science is a collaborative activity. While you may talk with others about the labs, we ask that you **write your solutions individually**. If you do discuss the assignments with others **please include their names below** (it's a good way to learn your classmates' names).

**Collaborators:** 

List collaborators here.

## Today's Assignment

In today's assignment, you'll learn how to:

- manipulate text strings.

- create an array.

- perform array operations.

Let's get started! Run the cell below.

In [None]:
from datascience import *
import numpy as np
import math

## Text
Programming doesn't just concern numbers. Text is one of the most common data types used in programs. 

Text is represented by a **string value** in Python. The word "string" is a programming term for a sequence of characters. A string might contain a single character, a word, a sentence, or a whole book.

To distinguish text data from actual code, we demarcate strings by putting quotation marks around them. Single quotes (`'`) and double quotes (`"`) are both valid, but the types of opening and closing quotation marks must match. The contents can be any sequence of characters, including numbers and symbols. 

We've seen strings before in `print` statements.  Below, two different strings are passed as arguments to the `print` function.

In [None]:
print("I <3", 'Data Science')

Just as names can be given to numbers, names can be given to string values.  The names and strings aren't required to be similar in any way. Any name can be assigned to any string.

In [None]:
one = 'two'
plus = '*'
print(one, plus, one)

**Question 1.** Yuri Gagarin was the first person to travel through outer space.  When he emerged from his capsule upon landing on Earth, he [reportedly](https://en.wikiquote.org/wiki/Yuri_Gagarin) had the following conversation with a woman and girl who saw the landing:

    The woman asked: "Can it be that you have come from outer space?"
    Gagarin replied: "As a matter of fact, I have!"

The cell below contains unfinished code.  Fill in the `...`s so that it prints out this conversation **exactly** as it appears above.

In [None]:
woman_asking = ...
woman_quote = '"Can it be that you have come from outer space?"'
gagarin_reply = 'Gagarin replied:'
gagarin_quote = ...

print(woman_asking, woman_quote)
print(gagarin_reply, gagarin_quote)

In [None]:
grader.check("q1")

## String Methods

Strings can be transformed using **methods**. Recall that methods and functions are not technically the same thing, but we'll be using them interchangeably for the purposes of this course.

Here's a sketch of how to call methods on a string:

    <expression that evaluates to a string>.<method name>(<argument>, <argument>, ...)
    
One example of a string method is `replace`, which replaces all instances of some part of the original string (or a *substring*) with a new string. 

    <original string>.replace(<old substring>, <new substring>)
    
`replace` returns (evaluates to) a new string, leaving the original string unchanged.
    
Try to predict the output of this example, then run the cell.

In [None]:
# Replace one letter
bean = 'bean'
print(bean.replace('a', 'e'), bean)

You can also replace multiple letters.

Try to predict the output of this example, then run the cell.

In [None]:
# Replace multiple letters
pair = 'pair'
print(pair.replace('ir', 're'), pair)

You can call functions on the results of other functions.  For example, `max(abs(-5), abs(3))` evaluates to 5.  Similarly, you can call methods on the results of other method or function calls.

You may have already noticed one difference between functions and methods - a function like `max` does not require a `.` before it's called, but a string method like `replace` does. Here's a handy [Python reference](http://data8.org/sp20/python-reference.html) on the Data 8 website. It's a good idea to refer to this whenever you're unsure of how to call a function or method.

In [None]:
# Calling replace on the output of another call to replace
'train'.replace('t', 'ing').replace('in', 'de')

Here's a picture of how Python evaluates a "chained" method call like that:

<img src="images/chaining_method_calls.png"/>

**Question 2.** Use `replace` to transform the string `'hitchhiker'` into `'matchmaker'`. Assign your result to `new_word`.

In [None]:
new_word = ...
new_word

In [None]:
grader.check("q2")

## Converting to and from Strings

Strings and numbers are different **types** of values, even when a string contains the digits of a number. For example, evaluating the following cell causes an error because an integer cannot be added to a string.

In [None]:
8 + "8"

However, there are built-in functions to convert numbers to strings and strings to numbers. Some of these built-in functions have restrictions on the type of argument they take:

|Function |Description|
|-|-|
|`int`|Converts a string of digits or a float to an integer ("int") value|
|`float`|Converts a string of digits (perhaps with a decimal point) or an int to a decimal ("float") value|
|`str`|Converts any value to a string|

Try to predict what data type and value `example` evaluates to, then run the cell.

In [None]:
example = 8 + int("10") + float("8")

print(example)
print("This example returned a " + str(type(example)) + "!")

Suppose you're writing a program that looks for dates in a text, and you want your program to find the amount of time that elapsed between two years it has identified.  It doesn't make sense to subtract two texts, but you can first convert the text containing the years into numbers.

**Question 3.** Finish the code below to compute the number of years that elapsed between `one_year` and `another_year`.  Don't just write the numbers `1618` and `1648` (or `30`); use a conversion function to turn the given text data into numbers.

In [None]:
# Some text data
one_year = "1618"
another_year = "1648"

# Complete the next line. 
# Note that we can't just write:
# another_year - one_year
# If you don't see why, try seeing 
# what happens when you write that here.
difference = ...
difference

In [None]:
grader.check("q3")

## Passing Strings to Functions

String values, like numbers, can be arguments to functions and can be returned by functions. 

The function `len` (derived from the word "length") takes a single string as its argument and returns the number of characters (including spaces) in the string.

Note that it doesn't count *words*. `len("one small step for man")` evaluates to 22, not 5.

Run the cell below to see.

In [None]:
len("one small step for man")

**Question 4.**  Use `len` to find the number of characters in the long string in the next cell.  Characters include things like spaces and punctuation. Assign `sentence_length` to that number.

**Note:** The string is the first sentence of the English translation of the French [Declaration of the Rights of Man](http://avalon.law.yale.edu/18th_century/rightsof.asp).

In [None]:
a_very_long_sentence = "The representatives of the French people, organized as a National Assembly, \
                        believing that the ignorance, neglect, or contempt of the rights of man are \
                        the sole cause of public calamities and of the corruption of governments, \
                        have determined to set forth in a solemn declaration the natural, unalienable, \
                        and sacred rights of man, in order that this declaration, \
                        being constantly before all the members of the Social body, \
                        shall remind them continually of their rights and duties; \
                        in order that the acts of the legislative power, \
                        as well as those of the executive power, \
                        may be compared at any moment with the objects \
                        and purposes of all political institutions \
                        and may thus be more respected, and, lastly, \
                        in order that the grievances of the citizens, \
                        based hereafter upon simple and incontestable principles, \
                        shall tend to the maintenance of the constitution and redound to the happiness of all."
sentence_length = ...
sentence_length

In [None]:
grader.check("q4")

## Arrays

Computers are most useful when you can use a small amount of code to **do the same action** to **many different things**.

For example, in the time it takes you to calculate the 18% tip on a restaurant bill (That's if you're pretty fast at doing arithmetic in your head.), a laptop can calculate 18% tips for every restaurant bill paid by every human on Earth that day.  

**Arrays** are how we put many values in one place so that we can operate on them as a group. For example, if `billions_of_numbers` is an array of numbers, the expression

    .18 * billions_of_numbers

gives a new array of numbers that contains the result of multiplying each number in `billions_of_numbers` by .18.  Arrays are not limited to numbers; we can also put all the words in a book into an array of strings.

Concretely, an array is a **collection of values of the same type**. 

### Making arrays

First, let's learn how to manually input values into an array. This typically isn't how programs work. Normally, we create arrays by loading them from an external source, like a data file.

To create an array by hand, call the function `make_array`.  Each argument you pass to `make_array` will be in the array it returns.  

Run the cell below to see an example.

In [None]:
make_array(0.125, 4.75, -1.3)

Each value in an array (in the above case, the numbers $0.125$, $4.75$, and $-1.3$) is called an **element** of that array.

Arrays themselves are also values, just like numbers and strings. That means you can assign them to names or use them as arguments to functions. For example, `len(<some_array>)` returns the number of elements in `some_array`.

Let's see an example. Run the next two cells.

**Note:** `math.pi` and `math.e` correspond to the real numbers $\pi$ and $e$, the natural exponential number. Click [here](http://www.geom.uiuc.edu/~huberty/math5337/groupe/digits.html) to see 100,000 digits of $\pi$ and click [here](https://www.math.utah.edu/~pa/math/e.html) to see 10,000 digits of $e$.

In [None]:
some_array = make_array(22/7, math.pi, 2.718281, math.e)
some_array

In [None]:
len(some_array)

**Question 5.** Make an array containing the numbers 0, 1, -1, $\pi$, and $e$, in that order.  Name it `interesting_numbers`.  

**Hint:** How did you get the values $\pi$ and $e$ in the previous example?

In [None]:
interesting_numbers = ...
interesting_numbers

In [None]:
grader.check("q5")

**Question 6.** Make an array containing the five strings `"Hello"`, `","`, `" "`, `"world"`, and `"!"`.  (The third one is a single space inside quotes.)  Name it `hello_world_components`.

**Note:** If you evaluate `hello_world_components`, you'll notice some extra information in addition to its contents: `dtype='<U5'`.  That's just NumPy's extremely cryptic way of saying that the data types in the array are strings.

In [None]:
hello_world_components = ...
hello_world_components

In [None]:
grader.check("q6")

###  `np.arange`

Arrays are provided by a package called [NumPy](http://www.numpy.org/) (pronounced "NUM-pie"). The package is called `numpy`, but it's standard to rename it `np` for brevity.  You can do that with:

    import numpy as np

At the top of this notebook we imported `NumPy` so we are able to use functions from this package. Very often in data science, we want to work with many numbers that are evenly spaced within some range.  NumPy provides a special function for this called `arange`.  The line of code `np.arange(start, stop, step)` evaluates to an array with all the numbers starting at `start` and counting up by `step`, stopping **before** `stop` is reached.

Run the following cells to see some examples.

In [None]:
# This array starts at 1 and counts up by 2
# and then stops before 6
np.arange(1, 6, 2)

In [None]:
# This array doesn't contain 9
# because np.arange stops *before* 
# the stop value is reached
np.arange(4, 9, 1)

**Question 7.** Use the `np.arange` function to create an array with the multiples of 99 from 0 up to (**and including**) 9999. So its elements are 0, 99, 198, 297, etc.

In [None]:
multiples_of_99 = ...
multiples_of_99

In [None]:
grader.check("q7")

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

When done exporting, download the .zip file by `SHIFT`-clicking on the file name and selecting **Save Link As**. Or, find the .zip file in the left side of the screen and right-click and select **Download**. You'll submit this .zip file for the assignment in Canvas to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)