# Python Exam Questions

In [1]:
from unittest import TestCase
import random
import json

## Functions

### Question 1: 

Given an array (Python list) of integers sorted in ascending order, write a function to find if a number exists in the array. If the number can be found in the array, return the corresponding index, otherwise return -1. You may assume the numbers are unique.

Your function will be called as:

```
>>> array = [1,2,4,7,9,11]
>>> print(search(array, 4))
2
>>> print(search(array, 5))
-1
```

In [371]:
# Answer:

def search(array, target):
    """Return the index of a target number in a sorted array.
    
    :param array: a sorted array of unique integers
    :param target: target number
    :return: index of the target
    """
    if target in array:
        return array.index(target)
    else:
        return -1
        
    return 

In [372]:
# Test:
# DO NOT modify this cell

test = TestCase()
fails = 0

for i in range(5):
    k = 10**i - random.randint(1, 5**i)
    array = sorted(random.sample(range(2*k), k=k))
    idx = random.randint(0, k) - 1
    target = random.choice([-1, 2*k]) if idx < 0 else array[idx]
    try:
        test.assertEqual(search(array, target), idx)
    except AssertionError as e:
        fails += 1
        print('Failed to pass the test case with array of size {}.\nError: {}\n'.format(k, e))
else:
    if fails == 0:
        print('All test cases passed!')

All test cases passed!


### Question 2

JSON (JavaScript Object Notation) is the most widely used data format for data interchange on the web. Below is an example of tweet JSON from Twitter api.

json1 = '{
  "created_at": "Thu Apr 06 15:24:15 +0000 2017",
  "id_str": "850006245121695744",
  "text": "Today we\u2019re sharing our vision for the future of the Twitter API platform!",
  "user": {
    "id": 2244994945,
    "name": "Twitter Dev",
    "screen_name": "TwitterDev",
    "location": "Internet",
    "url": "https:\/\/dev.twitter.com\/",
    "description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
  },
  "place": {
    "id": "01a9a39529b27f36",
    "place_type": "city",
    "name": "Manhattan",
    "full_name": "Manhattan, NY",
    "country_code": "US",
    "country": "United States"
  }
}'

In [171]:
json1 = '''{
  "created_at": "Thu Apr 06 15:24:15 +0000 2017",
  "id_str": "850006245121695744",
  "text": "Today we\u2019re sharing our vision for the future of the Twitter API platform!",
  "user": {
    "id": 2244994945,
    "name": "Twitter Dev",
    "screen_name": "TwitterDev",
    "location": "Internet",
    "url": "https:\/\/dev.twitter.com\/",
    "description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
  },
  "place": {
    "id": "01a9a39529b27f36",
    "place_type": "city",
    "name": "Manhattan",
    "full_name": "Manhattan, NY",
    "country_code": "US",
    "country": "United States"
  }
}'''

In Python, JSON data can be parsed as a dictionary via `json.loads` function. The parsed dictionary might have nested structure, i.e., the values of some keys (attribute) are also python dictionaries.

Given a dictionary which might include nested data structure, we need to "normalize" it as a flattened dictionary before we can create a flat table. For example:

>```
{'foo': 
    {'bar': 1, 
     'baz': 2}, 
 'qux': 3}
``` 

can be normalized as (keys of children level are joined to the key of its parent by a period `.`):

>```
{'foo.bar': 1, 
 'foo.baz': 2,
 'qux': 3}
```

**Question**: Implement the function `flattenJSON` which reads a json and returns a flattened dictionary. Only built-in Python modules are allowed.

Your function will be called as:

```
>>> json_data = '{"text": "A nice day!", "user": {"name": "Twitter Dev"}, "place" : {"name": "Manhattan", "country": "United States"}}'
>>> flattenJSON(json_data)
{'text': 'A nice day!', 'user.name': 'Twitter Dev', 'place.name': 'Manhattan', 'place.country': 'United States'}
```

In [375]:
def flattenJSON(json_data):
    
    """Flatten a JSON data to python dictionary
    
    :param json_data: str Unserialized JSON objects
    :return: dict
    """
    # load json data as a (nested) Python dictionary
    data = json.loads(json_data)
    return flatdict(data,'')
def flatdict(data,prefix):
    a={}
    for k,v in data.items():
        if isinstance(v, dict):
            a.update(flatdict(v,prefix+k+'.'))
        else:
            a[prefix+k]= v 
    return a


In [376]:
# Test:
# DO NOT modify this cell

from pandas.io.json import json_normalize

test = TestCase()

json_1 = '{"id": 1, "name": {"first": "Coleen", "last": "Volk"}}'
json_2 = '{"text": "A nice day!", "user": {"name": "Twitter Dev"}, "place" : {"name": "Manhattan", "country": "United States"}}'
json_3 = """
{
  "created_at": "Thu Apr 06 15:24:15 +0000 2017",
  "id_str": "850006245121695744",
  "text": "Today we\u2019re sharing our vision for the future of the Twitter API platform!",
  "user": {
    "id": 2244994945,
    "name": "Twitter Dev",
    "screen_name": "TwitterDev",
    "location": "Internet",
    "url": "https:\/\/dev.twitter.com\/",
    "description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
  },
  "place": {
    "id": "01a9a39529b27f36",
    "place_type": "city",
    "name": "Manhattan",
    "full_name": "Manhattan, NY",
    "country_code": "US",
    "country": "United States"
  }
}
"""

for json_data in [json_1, json_2, json_3]:
    ans = flattenJSON(json_data)
    data = json.loads(json_data)
    sol = json_normalize(data).to_dict('records')[0]
    try:
        test.assertDictEqual(ans, sol)
    except AssertionError as e:
        print('Failed to pass the test case.\nError: {}\n'.format(e))
else:
    print('All test cases passed!')

All test cases passed!


## Object Oriented Programming

### Question 3

Fahrenheit and Celsius are two commonly used temperature units. Fahrenheit can be convert to Celsius using the following formula:

$$t(C) = \frac{t(F) - 32}{1.8}$$

Below is a comparison of temperature scales

|                                  | Celsius | Fahrenheit |
|----------------------------------|---------|------------|
| Absolute zero                    | -273.15 | -459.67    |
| Ice melts                        | 0.00    | 32.00      |
| Average human body temperature   | 37      | 98.6       |
| Water boils at standard pressure | 100     | 212        |



**Question**: Design and implement a `Temperature` class. Your class should be initialized with string inputs such as '-273.15C', '0F' or '37C', where 'F' stands for Fahrenheit and 'C' stands for Celsius.

Implement the following functions in your `Temperature` class:

* `Temperature.__init__(temperature)`: initialize the Temperature class with a string input
* `Temperature.to_fahrenheit()`: return the string representation (round to 2 decimal places) of the temperature in Fahrenheit
* `Temperature.to_celsius()`: return the string representation (round to 2 decimal places) of the temperature in Celsius

You may assume the input is always composed of two parts, the numerical part and the unit part. The unit is a single character letter and only `'F'` and `'C'` are valid. If the unit is not valid, e.g., `'0K'`, then an exception should to be raised.

Your implementation will be called as:

```
>>> t = '0C'
>>> temp = Temperature(t)
>>> print(temp.to_fahrenheit())
32.00F
>>> print(temp.to_celsius())
0.00C
>>>
>>> t = '0K'
>>> temp = Temperature(t)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-78-a969fb612f68> in <module>()
----> 1 Temperature('0K')

... ...

ValueError: Unit invalid or not supported
```


In [386]:
# Answer:
import re

class Temperature(object):

    def __init__(self, temperature):
        """Initialize your data structure here.

        :param temperature: str
        """
        self.degree = float(re.search('-?\d*\.*\d*',temperature).group(0))
        self.unit = (re.search('.',temperature).group(0))
        if self.unit not in ('F','C'):
            raise ValueError('Unit invalid or not supported')
    def to_fahrenheit(self):
        """String representation (round to 2 decimal places) of the temperature in Fahrenheit.

        :return: str
        """
        if self.unit == 'F':
            return '%.2fF' % self.degree
        return '%.2fF' % ((self.degree*1.8)+32)

    def to_celsius(self):
        """String representation (round to 2 decimal places) of the temperature in Celsius.

        :return: str
        """
        if self.unit == 'C':
            return '%.2fC' % self.degree
        return '%.2fC' % ((self.degree-32)/(1.8))



In [387]:
# Test:
# DO NOT modify this cell

test = TestCase()
fails = 0

celsius = ('-273.15C', '0C', '37C', '100C')
fahrenheit = ('-459.67F', '32.00F', '98.60F', '212.00F')

for c, f in zip(celsius, fahrenheit):
    x = Temperature(c)
    y = Temperature(f)
    try:
        test.assertSetEqual(set([x.to_celsius()[-1], y.to_celsius()[-1]]), set('C'))
        test.assertSetEqual(set([x.to_fahrenheit()[-1], y.to_fahrenheit()[-1]]), set('F'))
        test.assertAlmostEqual(float(c[:-1]), float(x.to_celsius()[:-1]), places=1)
        test.assertAlmostEqual(float(c[:-1]), float(y.to_celsius()[:-1]), places=1)
        test.assertAlmostEqual(float(f[:-1]), float(x.to_fahrenheit()[:-1]), places=1)
        test.assertAlmostEqual(float(f[:-1]), float(y.to_fahrenheit()[:-1]), places=1)
    except AssertionError as e:
        fails += 1
        print('Failed to pass the test case with input: "{}"/"{}".\nError: {}\n'.format(c, f, e))
else:
    t = '12K'
    try:
        test.assertRaises(ValueError, Temperature, t)
    except AssertionError as e:
        fails += 1
        print('Failed to raise Exception with input: "{}".\nError: {}\n'.format(t, e))
    if fails == 0:
        print('All test cases passed!')

ValueError: Unit invalid or not supported

## Numpy and Scientific Computing 

### Question 4

In a simple linear regression model, the relationship between response variable $y$ and features $x$ are modeled as:

$$y = \beta_0 + \beta_1 x$$

or in a matrix format if we concatenate a column of 1 with feature $x$:

$$y = X \cdot \beta$$

where $X$ is a $n \times 2$ matrix and $\beta = [\beta_{0}, \beta_{1}]$ is a $2 \times 1$ matrix.

The coefficient matrix $\beta$ can be estimated using the following formula:

$$\beta = (X^T \cdot X)^{-1} \cdot X^T \cdot y$$

where $X^T$ is the matrix transpose of $X$.

**Question**: Complete the function `fit(X, y)` below to calculate the coefficient matrix $\beta$ using `numpy`.

Your function will be called as:

```
>>> X = np.concatenate([np.ones((4, 1)), np.array([-1,0,1,2]).reshape(-1, 1)], axis=1)
>>> X
array([[ 1., -1.],
       [ 1.,  0.],
       [ 1.,  1.],
       [ 1.,  2.]])
>>> y = np.array([4,2,0,-2]).reshape(-1, 1)
>>> y
array([[ 4],
       [ 2],
       [ 0],
       [-2]])
>>> fit(X, y)
array([[ 2.],
       [-2.]])
```

*Hint*:
* `numpy.matmul(a, b)`: Matrix product of two arrays.
* `numpy.linalg.inv(a)`: Compute the (multiplicative) inverse of a matrix.

In [388]:
import numpy as np


def fit(X, y):
    """Compute the coefficients of a linear model.

    :param X: Training data, shape (n_samples, n_features)
    :param y: Target values, shape (n_samples, 1)
    :return: Coefficient, shape (n_features, 1)
    """
    a = np.linalg.inv(np.matmul(X.T,X))
    b = X.T
    return np.matmul(np.matmul(a, b),y)

In [389]:
# Test:
# DO NOT modify this cell

test = TestCase()
fails = 0

for n in range(10, 100, 20):
    b0, b1 = random.random(), random.random()
    X = np.concatenate([np.ones((n, 1)), 
                        np.array([i for i in range(n)]).reshape(-1, 1)], axis=1)
    y = np.array([b0 + b1*i for i in range(n)]).reshape(-1, 1)
    coef = fit(X, y)
    try:
        test.assertAlmostEqual(b0, coef[0, 0], places=1)
        test.assertAlmostEqual(b1, coef[1, 0], places=1)
    except AssertionError as e:
        fails += 1
        print('Failed to pass the test case with coefficients: "{}"/"{}".\nError: {}\n'.format(b0, b1, e))
else:
    print('All test cases passed!')

All test cases passed!


## Python Data Analysis

### Question 5

The `sklearn.datasets` package embeds some small toy datasets. The original data format is numpy array, so we create a Pandas DataFrame called `data` for you. You can read the dataset decscription via `boston.DESCR`.

Answer the following questions using `pandas`:

1. Among all the features, what are the top 3 most positively and negatively correlated features with the target variable `'MEDV'`? Calculate the correlation coefficient with pearson correlation.
2. The feature `'CHAS'` is a binary categorical variable. Compute the *median* and *standard deviation* of target varialbe `'MEDV'` in each group.

In [390]:
# Answer:

# Q 1
from sklearn import datasets
import pandas as pd
boston = datasets.load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['MEDV'] = boston.target

def negativecorr():
    cordata = data.corr(method ='pearson')[['MEDV']].sort_values('MEDV') 
    return cordata[:3]
negativecorr()


Unnamed: 0,MEDV
LSTAT,-0.737663
PTRATIO,-0.507787
INDUS,-0.483725


In [391]:


# Q 2
def medianchas():
    med = data.groupby('CHAS').median()
    return pd.DataFrame(med.MEDV)
medianchas()


Unnamed: 0_level_0,MEDV
CHAS,Unnamed: 1_level_1
0.0,20.9
1.0,23.3
