# CSCI2000U - Scientific Data Analysis
## Tutorial 04: Functional Programming 

**Goal**
1. Applying functional analysis of the Boston housing dataset

In this tutorial we will explore a the Boston housing dataset. You are given the code that loads the `json` file with the data. After loading the data, solve each of the tasks outlined in this document to complete you tutorial assignment.

*Please note that you need to upload the dataset/json file into your jupyter notebook server and modify the path depending on the location where the file was uploaded.*

- Importing libraries

In [1]:
# rija baig (100746674)

In [2]:
import json
from functools import reduce

- loading the dataset into `data` from the `path`

In [3]:
path = 'boston_housing.json'
with open(path) as f:
    data = json.load(f)

- exploring the keys of `data`

In [4]:
data.keys()

dict_keys(['rows', 'descr'])

- Accessing the data field `descr` which shows the name and description of each of the data columns. We can refer to them as *attributes*.

In [5]:
data['descr']

{'CRIM': 'per capita crime rate by town',
 'ZN': 'proportion of residential land zoned for lots over 25,000 sq.ft.',
 'INDUS': 'proportion of non-retail business acres per town',
 'CHAS': 'Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)',
 'NOX': 'nitric oxides concentration (parts per 10 million)',
 'RM': 'average number of rooms per dwelling',
 'AGE': 'proportion of owner-occupied units built prior to 1940',
 'DIS': 'weighted distances to five Boston employment centres',
 'RAD': 'index of accessibility to radial highways',
 'TAX': 'full-value property-tax rate per $10,000',
 'PTRATIO': 'pupil-teacher ratio by town',
 'B': '1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town',
 'LSTAT': '% lower status of the population',
 'PRICE': 'actual housing price'}

- accessing the data field `rows`

In [6]:
rows = data['rows']
len(rows)

506

- To access a data field/column of a given row, we can use 
```
rows[<row index>][<attribute name>]
``` 
See example:

In [7]:
rows[0]['PRICE']

24.0

### TASK 1

**Compute the average price of all houses**

 We will use reduce because we are converting a list of elements into a single number

 We want to compute the total first, and divide by `len(rows)`. Store the result in `average_price`.

In [8]:
from functools import reduce
prices = [ row['PRICE'] for row in rows ]



total = reduce(lambda x, y: x+y, prices)

average_price = total/len(rows)


In [9]:
print("AVERAGE PRICE: ", average_price)

AVERAGE PRICE:  22.532806324110698


### TASK 2
**Find the houses that are *under* the average price.**

We should create a predicate function to the test if the price is under `average_price`.

Then, display the first 3 results as a list using `filter`

In [10]:
#

def below_avg(i):
    return (i < average_price)
   
num = filter(below_avg, prices)
under_avg = list(num)



In [11]:
# List three houses that are UNDER the average price.


print (under_avg[:3])


[21.6, 16.5, 18.9]


### TASK 3
**Find the houses with prices *between* `average_price-std_dev` and `average_price-std_dev` where `std_dev` is the standand deviation.**

>standard deviation = square root of `variance`

> variance = [sum of (`house["PRICE"]` - `average_price`) ** 2] / `len(houses)`

Strategy
1. We will use `reduce` because we are converting a list of elements into a single number `sum_variance`.
2. We want to compute `sum_variance` first, and divide it by `len(rows)`. Store the result in `variance_price`.
3. We can use the `variance_price` to calculate the `std_dev`.
4. We should create a predicate function to the test if the price is between `average_price` more or less the `std_dev`.

Then, display the first 3 results as a list using `filter`.

In [12]:
# Calculate the varaice / std_dev

total = reduce(lambda x, y: x+y, prices)

average_price = total/len(rows)
n = len(rows)
variance = [(x - average_price)**2 for x in prices]
#std_dev = [x**0.5 for x in variance]
sum_variance =  reduce(lambda x, y: x+y, variance)
variance_price = sum_variance/len(rows)
std_dev = variance_price**.5


def is_between(x):
    houseprices = [average_price > x and x>std_dev]
    return houseprices
   
filtering1 = filter(is_between, prices)
filtering2 = list(filtering1)



In [13]:
# List first 3 houses with prices between average_price-std_dev and average_price+std_dev
print(filtering2[:3])

[24.0, 21.6, 34.7]


### TASK 4

**Generate a report of CRIME and PRICE for the 297 houses that are below average.**

The `houses` should be sorted by `PRICE`. 

The result should be reported using the format: `"CRIME: %.2f, ROOMS: %d, PRICE: %.2f"`

Strategy:
 - Use filter from **Task 2** to get the `houses` below average.
 - Sorted the `houses` using 
<!--          ``` -->
         sorted(<iterable>, key=<key>)
<!--          ``` -->
 - Use `map` to map each of the houses to a report message.
 
 Reference for `sorted` function: https://www.w3schools.com/python/ref_func_sorted.asp

In [14]:
# report message
crime = [row['CRIM'] for row in rows]
rooms = [row['RM'] for row in rows]



In [15]:
# 1. Get the list of houses
list_of_houses = under_avg
# 2. Sort by their price
prices1 = sorted(under_avg, key = lambda x: x)
# 3. Generate the report of the houses souses))
result = (map(lambda row:[row['CRIM'] for row in rows], prices ))
result2 = map(lambda row: [row['RM'] for row in rows], prices)
print("CRIME: %.2f, ROOMS: %d, PRICE: %.2f", result, result2, prices1)

CRIME: %.2f, ROOMS: %d, PRICE: %.2f <map object at 0x7f2e5826db70> <map object at 0x7f2e5826da20> [5.0, 5.0, 5.6, 6.3, 7.0, 7.0, 7.2, 7.2, 7.2, 7.4, 7.5, 8.1, 8.3, 8.3, 8.4, 8.4, 8.5, 8.5, 8.7, 8.8, 8.8, 9.5, 9.6, 9.7, 10.2, 10.2, 10.2, 10.4, 10.4, 10.5, 10.5, 10.8, 10.9, 10.9, 11.0, 11.3, 11.5, 11.7, 11.7, 11.8, 11.8, 11.9, 11.9, 12.0, 12.1, 12.3, 12.5, 12.6, 12.7, 12.7, 12.7, 12.8, 13.0, 13.1, 13.1, 13.1, 13.1, 13.2, 13.3, 13.3, 13.3, 13.4, 13.4, 13.4, 13.4, 13.5, 13.5, 13.6, 13.6, 13.8, 13.8, 13.8, 13.8, 13.8, 13.9, 13.9, 14.0, 14.1, 14.1, 14.1, 14.2, 14.3, 14.3, 14.4, 14.4, 14.5, 14.5, 14.5, 14.6, 14.6, 14.8, 14.9, 14.9, 14.9, 15.0, 15.0, 15.0, 15.1, 15.2, 15.2, 15.2, 15.3, 15.4, 15.4, 15.6, 15.6, 15.6, 15.6, 15.6, 15.7, 16.0, 16.1, 16.1, 16.1, 16.2, 16.2, 16.3, 16.4, 16.5, 16.5, 16.6, 16.6, 16.7, 16.7, 16.8, 16.8, 17.0, 17.1, 17.1, 17.1, 17.2, 17.2, 17.2, 17.3, 17.4, 17.4, 17.4, 17.5, 17.5, 17.5, 17.6, 17.7, 17.8, 17.8, 17.8, 17.8, 17.8, 17.9, 18.0, 18.1, 18.2, 18.2, 18.2, 18.3, 1

## Tutorial Report

At the **end of this tutorial session**, you will deliver a report via Canvas. 

Your report will be the compiled version of this notebook with your solution. You **MUST** submit:
- both the `ipynb` and `PDF` (`File/Download as>PDF`) versions of this notebook.
- both named `<lastname-firstname>-tutorial04`
- Contain your Full name and student ID


*Late tutorial submission policy:*
- All tutorial reports are due at the **end of your tutorial session**.
- Late tutorial reports will be accepted without penalty by (before) your next tutorial session, **no late reports will be accepted after**.

*TA grading and feedback inquiries*
- Your report grades will be posted via Canvas using the rubric provided by the instructor. You are encouraged to ask your TA on MS teams about feedback, as needed, as soon as your grades are published.
