# Data abstraction

The lecture notes for this assignment are written in Jupyter. See the adjoining file [video notes](03-01-data-abstraction-video-notes.ipynb) for details. Let's check your understanding of these concepts. 

## For a handy reference
**[Python Data Science Handbook:](http://shop.oreilly.com/product/0636920034919.do)** Essential Tools for Working with Data *By Jake VanderPlas*

Covers the following topics:
* **IPython and Jupyter:** provide computational environments for data scientists using Python
* **NumPy:** includes the ndarray for efficient storage and manipulation of dense data arrays in Python
* **Pandas:** features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python
* **Matplotlib:** includes capabilities for a flexible range of data visualizations in Python
* **Scikit-Learn:** for efficient and clean Python implementations of the most important and established machine learning algorithms

It's available for reading in electronic form the Tufts Library website.

## For the mid-term exam

1. The mid-term will be 75 minutes long, *not 120 minutes as previously announced.* This is because finding a location for other than class hours was difficult.
1. The exam will be held in two locations: in the classroom or in Boston. I will be sending out a survey to ask which you prefer. There are only about 4 spots available in Boston.
1. The exam will be available on our usual place in Jupyterhub but will be encrypted prior to 10:30 on the day o f the exam. The decryption key will be provided at 10:30 via Zoom.
1. A mock exam will be available on 10/7 or prior to make sure you have run through the mechanics of taking the text.
1. Unless you have informed us of any accommodations, what you have submitted by 11:15 will be considered your submission.
1. The assignments 3-01,3-02, 3-03 will not be included.
2. You may bring some paper-based notes on the exam.
3. During the exam, you may search on the internet just in case if we forget some specific function?
4. Not all questions will involve writing code. Some may involve answering text questions in Markdown cells. For code cells, running successfully is important but we may award partial credit, especially if a question is one that most students are struggling with.

In [None]:
# Don't change this cell; just run it. 
from IPython.display import IFrame
IFrame('https://1813261-1.kaf.kaltura.com/media/t/1_govlzyqa/133896931', width=800, height=560)

from client.api.notebook import Notebook
ok = Notebook('03-01-data-abstraction.ok')
ok.auth(inline=True)

1. Make up an array of the numbers 1 to 5. Put into a variable x.

In [9]:
# your answer: 
import numpy as np
x = np.array([x+1 for x in range(5)])
y = np.array([x+1 for x in range(5)]) + np.array([x+1 for x in range(5)])
y = np.array(list(y)+list(y))
y = np.concatenate((x,y))
y

array([ 1,  2,  3,  4,  5,  2,  4,  6,  8, 10,  2,  4,  6,  8, 10])

In [None]:
_ = ok.grade('q01')  # test that your answer is correct 

2. Write code that sets `y` to the vector created by adding 5 to each element of `x`. 

In [None]:
# Your answer: 
y = ...
y

In [None]:
_ = ok.grade('q02')  # test that your answer is correct 

# Is the 'for' loop obsolete? 

Sort of. Let's just say that there are very efficient ways to do things in `numpy` without `for` loops. I'm sure that you could tell me whether 7 is a member of y via a `for` loop. But you can also do that with `arrays` much more simply: 

3. (Advanced) Consider that `y` *is an iterable* and write an expression that is True if 7 is in `y`, and False if not. Put that value into `z`.

In [12]:
# your answer: 
if (y == 7).any(): 
    z = True 
else: 
    z = False 
z

False

In [14]:
z = (y == 7).any()
z

False

# Whoa there! 
The advanced problem shows that there are things about an `array` that are inherited from its status as something else. E.g., the following also works:

In [16]:
type(y)

numpy.ndarray

In [17]:
for i in y: 
    print(i)

1
2
3
4
5
2
4
6
8
10
2
4
6
8
10


# The treasure hunt
Most every common thing that you might want to do to an `array` with a `for` loop is easier to do with some `numpy.ndarray` function and/or some combination of those functions and native Python. A very large user community has gone to great expense to make using an `array` as simple as possible! 

What this means -- in practical terms -- is that it is often simpler to look around for a solution in the *numpy user manual* than to code it yourself. Thus, programming with `numpy` requires both knowledge of native Python and "treasure hunting" in the `numpy` documentation! 

Let's have some fun with a few treasure hunts through https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html 

4. Complete the function below so that it always returns the sum of the one-dimensional array `x` passed to it. Beware: I will test it on multiple arrays `x`!

In [20]:
def mysum(x): 
    # your answer: 
    return np.sum(x)

In [21]:
# run this to check your code
mysum(np.array([1,2,3,4,5]))

15

In [None]:
_ = ok.grade('q04')  # test that your answer is correct

5. In the function below, return a normalized set of data whose mean is 0.0, by subtracting the current mean from x. 

In [None]:
def renorm(x): 
    # your answer: 
    return ...

In [None]:
# run this to check your code
x = np.array([5, 6, 7, 8, 9])
renorm(x)

In [None]:
_ = ok.grade('q05')  # test that your answer is correct

6. (Advanced) What happens if you try to do the same things you did to arrays to lists? 

___Your answer:___

# When you are done with this notebook, 
* Save and checkpoint. 
* Ensure that the name of this file is precisely `03-01-data-abstraction.ipynb`. That workbook will be submitted by the following code whether or not you are currently editing that workbook.
* Change `ready` to `True` in the cell below. 
* Run the cell below to submit your work for grading. 

In [None]:
ready = False  # change to True when ready to submit
print("submitting file {} for assignment {} as {}".format(ok.assignment.src[0], 
                                                          ok.assignment.name, 
                                                          ok.assignment.get_student_email()))
if not ready: 
    raise Exception("change ready to True when ready to submit")
_ = ok.submit()