# Lab 2b: Data Types Part II

Welcome to Lab 2b!

In this lab, we are going to look at a few different data types including numbers and text.  A piece of text is called a *string* in Python.

Last, you'll learn more about working with datasets in Python.

First, initialize the grader. Each time you come back to this site to work on the lab, you will need to run this cell again.

In [None]:
from gofer.ok import check

In [None]:
# Enter your name as a string
# Example
dogname = "Phineas"
# Your name
name = ...

### 4.3 String Methods

Strings can be transformed using **methods**, which are functions that involve an existing string and some other arguments. One example is the `replace` method, which replaces all instances of some part of a string with some alternative. 

A method is invoked on a string by placing a `.` after the string value, then the name of the method, and finally parentheses containing the arguments. Here's a sketch, where the `<` and `>` symbols aren't part of the syntax; they just mark the boundaries of sub-expressions.

    <expression that evaluates to a string>.<method name>(<argument>, <argument>, ...)

Try to predict the output of these examples, then execute them.

In [None]:
'hitchhiker'.replace('hi', 'ma')

In [None]:
# Replace a sequence of letters, which appears twice
'hitchhiker'.replace('hi', 'ma')

Once a name is bound to a string value, methods can be invoked on that name as well. The name is still bound to the original string, so a new name is needed to capture the result. 

In [None]:
sharp = 'edged'
hot = sharp.replace('ed', 'ma')
print('sharp:', sharp)
print('hot:', hot)

Just like we can nest functions together such as what you did in question 4.2, you can also invoke a method on the output of another method call, this is also sometimes called 'chained' methods.  

In [None]:
# Calling replace on the output of another call to replace
'train'.replace('t', 'ing').replace('in', 'de')

Here's a picture of how Python evaluates a "chained" method call like that:

<img src="chaining_method_calls.jpg" alt="In 'train'.replace('t', 'ing').replace('in', 'de'), 'train'.replace('t', 'ing')' is ran first and evaluates to 'ingrain'. Then 'ingrain'.replace('in', 'de') is evaluated to 'degrade'"/>

**Question 8** <br/> Assign strings to the names `you` and `this` so that the final expression evaluates to a 10-letter English word with three double letters in a row. Essentially we're starting with the word 'beeper' and we want to convert this to another word using the string method replace.  

*Hint:* The call to `print` is there to print out the intermediate result called `the`. This should be an English word with two double letters in a row.

*Hint 2:* Run the tests if you're stuck.  They'll give you some hints.

In [None]:
you = ...
this = ...
a = 'beeper'
the = a.replace('p', you) 
print('the:', the)
the.replace('bee', this)

In [None]:
check('tests/q8.py')

Other string methods do not take any arguments at all, because the original string is all that's needed to compute the result. In these cases, parentheses are still needed, but there's nothing in between the parentheses. Here are some methods that take no arguments:

|Method name|Value|
|-|-|
|`lower`|a lowercased version of the string|
|`upper`|an uppercased version of the string|
|`capitalize`|a version with the first letter capitalized|
|`title`|a version with the first letter of every word capitalized||

All these string methods are useful, but most programmers don't memorize their names or how to use them.  Instead, people usually just search the internet for documentation and examples. A complete [list of string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) appears in the Python language documentation. [Stack Overflow](http://stackoverflow.com) has a huge database of answered questions that often demonstrate how to use these methods to achieve various ends.

### 4.3.1 Strings as function arguments

String values, like numbers, can be arguments to functions and can be returned by functions.  The function `len` takes a single string as its argument and returns the number of characters in the string: its **len**gth.  

Note that it doesn't count *words*. `len("one small step for man")` is 22, not 5.

**Question 9**  <br/> Use `len` to find out the number of characters in the very long string in the next cell.  (It's the first sentence of the English translation of the French [Declaration of the Rights of Man](http://avalon.law.yale.edu/18th_century/rightsof.asp).)  The length of a string is the total number of characters in it, including things like spaces and punctuation.  Assign `sentence_length` to that number.

In [None]:
a_very_long_sentence = "The representatives of the French people, organized as a National Assembly, believing that the ignorance, neglect, or contempt of the rights of man are the sole cause of public calamities and of the corruption of governments, have determined to set forth in a solemn declaration the natural, unalienable, and sacred rights of man, in order that this declaration, being constantly before all the members of the Social body, shall remind them continually of their rights and duties; in order that the acts of the legislative power, as well as those of the executive power, may be compared at any moment with the objects and purposes of all political institutions and may thus be more respected, and, lastly, in order that the grievances of the citizens, based hereafter upon simple and incontestable principles, shall tend to the maintenance of the constitution and redound to the happiness of all."
sentence_length = ...
sentence_length

In [None]:
check('tests/q9.py')

### 4.3.2 Converting to and from Strings

Strings and numbers are different *types* of values, even when a string contains the digits of a number. For example, evaluating the following cell causes an error because an integer cannot be added to a string.

In [None]:
8 + "8"

However, there are built-in functions to convert numbers to strings and strings to numbers. 

|Function name|Effect|Example|
|-|-|-|
|`int`  |Converts a string of digits and perhaps a negative sign to an integer (`int`) value|`int("42")`|
|`float`|Converts a string of digits and perhaps a negative sign and decimal point to a decimal (`float`) value|`float("4.2")`|
|`str`  |  Converts any value to a string (`str`) value|`str(42)`|


What do you think the following cell will evaluate to?

In [None]:
8 + int("8")

**Question 10** <br/> Use `replace` and `int` together to compute the time between between the the year 105 BCE ([Ts'ai Lun invents paper based on tree bark for the Emperor of China](https://en.wikipedia.org/wiki/Paper)) and the year 1440 AD ([Start of the Print Revolution](https://en.wikipedia.org/wiki/Printing_press). Try not to use any numbers in your solution, but instead manipulate the strings that are provided.

*Hint*: It's ok to be off by one year. In historical calendars, there is no year zero, but astronomical calendars do include [year zero](https://en.wikipedia.org/wiki/Year_zero) to simplify calculations.

In [None]:
invented = 'BC 105'
revolution = 'AD 1440'
start = ...
end = ...
print('The time between the first invention of paper and the print revolution is', end-start, 'years from', invented, 'to', revolution)

In [None]:
check('tests/q10.py')

### 4.4 Importing code

> What has been will be again,  
> what has been done will be done again;  
> there is nothing new under the sun.

Most programming involves work that is very similar to work that has been done before.  Since writing code is time consuming, it's good to rely on others' published code when you can.  Rather than copy-pasting, Python allows us to **import** other code, creating a **module** that contains all of the names created by that code.

Python includes many useful modules that are just an `import` away.  We'll look at the `math` module as a first example. The `math` module is extremely useful in computing mathematical expressions in Python. 

Suppose we want to very accurately compute the area of a circle with radius 5 meters.  For that, we need the constant $\pi$, which is roughly 3.14.  Conveniently, the `math` module has `pi` defined for us:

In [None]:
import math
radius = 5
area_of_circle = radius**2 * math.pi
area_of_circle

`pi` is defined inside `math`, and the way that we access names that are inside modules is by writing the module's name, then a dot, then the name of the thing we want:

    <module name>.<name>
    
In order to use a module at all, we must first write the statement `import <module name>`.  That statement creates a module object with things like `pi` in it and then assigns the name `math` to that module.  Above we have done that for `math`.


**Modules** can provide other named things, including **functions**.  For example, `math` provides the name `sin` for the sine function.  Having imported `math` already, we can write `math.sin(3)` to compute the sine of 3.  (Note that this sine function considers its argument to be in [radians](https://en.wikipedia.org/wiki/Radian), not degrees.  180 degrees are equivalent to $\pi$ radians.)

**Question 11** <br/> A $\frac{\pi}{4}$-radian (45-degree) angle forms a right triangle with equal base and height, pictured below.  If the hypotenuse (the radius of the circle in the picture) is 1, then the height is $\sin(\frac{\pi}{4})$.  Compute that using `sin` and `pi` from the `math` module.  Give the result the name `sine_of_pi_over_four`.

<img src="http://mathworld.wolfram.com/images/eps-gif/TrigonometryAnglesPi4_1000.gif">
(Source: [Wolfram MathWorld](http://mathworld.wolfram.com/images/eps-gif/TrigonometryAnglesPi4_1000.gif))

In [None]:
sine_of_pi_over_four = ...
sine_of_pi_over_four

In [None]:
check('tests/q11.py')

For your reference, here are some more examples of functions from the `math` module.

Note how different methods take in different number of arguments. Often, the documentation of the module will provide information on how many arguments is required for each method.

In [None]:
# Calculating factorials.
math.factorial(5)

In [None]:
# Calculating logarithms (the logarithm of 8 in base 2).
# The result is 3 because 2 to the power of 3 is 8.
math.log(8, 2)

There's many variations of how we can import methods from outside sources. For example, we can import just a specific method from an outside source, we can rename a library we import, and we can import every single method from a whole library. 

In [None]:
# Importing just cos and pi from math.
# Now, we don't have to use "math." before these names.
from math import cos, pi
print(cos(pi))

In [None]:
# We can nickname math as something else, if we don't want to type the name math
import math as m
m.log(m.pi)

In [None]:
# Lastly, we can import ever thing from math and use all of its names without "math."
from math import *
log(pi)

## 5. Arrays

Up to now, we haven't done much that you couldn't do yourself by hand, without going through the trouble of learning Python.  Computers are most useful when a small amount of code performs a lot of work by *performing the same action* to *many different things*.

For example, in the time it takes you to calculate the 18% tip on a restaurant bill, a laptop can calculate 18% tips for every restaurant bill paid by every human on Earth that day.  (That's if you're pretty fast at doing arithmetic in your head!)

**Arrays** are how we put many values in one place so that we can operate on them as a group. For example, if `billions_of_numbers` is an array of numbers, the expression

    .18 * billions_of_numbers

gives a new array of numbers that's the result of multiplying each number in `billions_of_numbers` by .18 (18%).  Arrays are not limited to numbers; we can also put all the words in a book into an array of strings.

Concretely, an array is a **collection of values of the same type**, like a column in an Excel spreadsheet. 

<img src="excel_array.jpg" alt="In Excel, columns of text are like array of strings for tables.  The same can be said about numbers (ints, floats) as well">

### 5.1. Making arrays
You can type in the data that goes in an array yourself, but that's not typically how programs work. Normally, we create arrays by loading them from an external source, like a data file.

First, though, let's learn how to start from scratch. Execute the following cell so that all the names from the `datascience` module are available to you. The documentation for this module is available at [http://data8.org/datascience](http://data8.org/datascience/).

In [None]:
from datascience import *

Now, to create an array, call the function `make_array`.  Each argument you pass to `make_array` will be in the array it returns.  Run this cell to see an example:


In [None]:
make_array(0.125, 4.75, -1.3)

Each value in an array (in the above case, the numbers 0.125, 4.75, and -1.3) is called an *element* or *item* of that array.

Arrays themselves are also values, just like numbers and strings.  That means you can assign them names or use them as arguments to functions.

**Question 12** <br/> Make an array containing the numbers 0, 1, -1, $\pi$, and $e$, in that order.  Name it `interesting_numbers`.  *Hint:* How did you get the values $\pi$ and $e$ earlier?  You can refer to them in exactly the same way here.

In [None]:
interesting_numbers = ...
interesting_numbers

In [None]:
check('tests/q12.py')

### 5.2.  `np.arange`
Arrays are provided by a package called [NumPy](http://www.numpy.org/) (pronounced "NUM-pie" or, if you prefer to pronounce things incorrectly, "NUM-pee").  The package is called `numpy`, but it's standard to rename it `np` for brevity.  You can do that with:

    import numpy as np

Very often in data science, we want to work with many numbers that are evenly spaced within some range.  NumPy provides a special function for this called `arange`.  `np.arange(start, stop, space)` produces an array with all the numbers starting at `start` and counting up by `space`, stopping before `stop` is reached.

For example, the value of `np.arange(1, 6, 2)` is an array with elements 1, 3, and 5 -- it starts at 1 and counts up by 2, then stops before 6.  In other words, it's equivalent to `make_array(1, 3, 5)`.

`np.arange(4, 9, 1)` is an array with elements 4, 5, 6, 7, and 8.  (It doesn't contain 9 because `np.arange` stops *before* the stop value is reached.)

**Question 13** <br/>Import `numpy` as `np` and then use `np.arange` to create an array with the multiples of 99 from 0 up to (**and including**) 9999.  (So its elements are 0, 99, 198, 297, etc.)

In [None]:
...
multiples_of_99 = ...
multiples_of_99

In [None]:
check('tests/q13.py')

### 5.3. Application: Temperature readings
NOAA (the US National Oceanic and Atmospheric Administration) operates weather stations that measure surface temperatures at different sites around the United States.  The hourly readings are [publicly available](http://www.ncdc.noaa.gov/qclcd/QCLCD?prior=N).

Suppose we download all the hourly data from the Oakland, California site for the month of December 2015.  To analyze the data, we want to know when each reading was taken, but we find that the data don't include the timestamps of the readings (the time at which each one was taken).

However, we know the first reading was taken at the first instant of December 2015 (midnight on December 1st) and each subsequent reading was taken exactly 1 hour after the last.

**Question 14** <br/>Create an array of the *time, in seconds, since the start of the month* at which each hourly reading was taken.  Name it `collection_times`.

*Hint 1:* There were 31 days in December, which is equivalent to ($31 \times 24$) hours or ($31 \times 24 \times 60 \times 60$) seconds.  So your array should have $31 \times 24$ elements in it.

*Hint 2:* The `len` function works on arrays, too.  If your `collection_times` isn't passing the tests, check its length and make sure it has $31 \times 24$ elements.

In [None]:
collection_times = ...
collection_times

In [None]:
check('tests/q14.py')

**Question 15** <br/>The powers of 2 ($2^0 = 1$, $2^1 = 2$, $2^2 = 4$, etc) arise frequently in computer science.  (For example, you may have noticed that storage on smartphones or USBs come in powers of 2, like 16 GB, 32 GB, or 64 GB.)  Use `np.arange` and the exponentiation operator `**` to compute the first 15 powers of 2, starting from `2^0`.

In [None]:
powers_of_2 = ...
powers_of_2

In [None]:
check('tests/q15.py')

## 6. Success! 

Congratulations, you're done with lab 2!  Be sure to 
- **run all the tests and verify that they all pass** (the next cell has a shortcut for that), 
- **Save and Checkpoint** from the `File` menu,
- **Download as a html or ipynb**!

In [None]:
import glob
from gofer.ok import check
for x in range(8, 16):
    print('Testing question {}: '.format(str(x)))
    display(check('tests/q{}.py'.format(str(x))))