# Metadata

```
title:   Fighting Forest Fires with Functions
course:  DS 5100
module:  04 Functions HW
topics:  User-defined functions, lambda functions, comprehensions,   nested functions
updated: 18 June 2022 (adapted)
```

# Student Info

**<span style="color:red;">Write your name and user iD here below and delete this line when done.</span>**

* Name:
* User ID:

# Objective

Work with functions (built-in and user-defined), lambda functions, and list comprehensions.

# Summary

You will work with the Forest Fires Data Set from UCI.  

Information about the dataset: https://archive.ics.uci.edu/ml/datasets/Forest+Fires

Background: This dataset was used in a regression task, where the aim was to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.

We will apply some of the steps leading to an ML task.

# Instructions

Run the pre-populated code, and along the way, you will be asked to perform several graded tasks <span style="color:blue">(prompted in blue font)</span>.  

Show your code and solutions clearly in the cells following each question.   
When the file is completed, submit the notebook through Collab.

**TOTAL POINTS: 14**

---


# Read in the dataset from the UCI Machine Learning Repository  

We have a local copy of the data. It is in the same directory as this notebook. We can inspect it by importing the CSV file as a list using the `open()` function and the `.readlines()` method.

In [1]:
data_file = open('uci_mldb_forestfires.csv', 'r').readlines()

We look at the first ten lines. Note that we replace commas with tabs for readability. 
Tools like Numpy and Pandas will do formatting for you.

In [7]:
for row in data_file[:10]:
    row = row.replace(',', '\t')
    print(row, end='')

X	Y	month	day	FFMC	DMC	DC	ISI	temp	RH	wind	rain	area
7	5	mar	fri	86.2	26.2	94.3	5.1	8.2	51	6.7	0.0	0.0
7	4	oct	tue	90.6	35.4	669.1	6.7	18.0	33	0.9	0.0	0.0
7	4	oct	sat	90.6	43.7	686.9	6.7	14.6	33	1.3	0.0	0.0
8	6	mar	fri	91.7	33.3	77.5	9.0	8.3	97	4.0	0.2	0.0
8	6	mar	sun	89.3	51.3	102.2	9.6	11.4	99	1.8	0.0	0.0
8	6	aug	sun	92.3	85.3	488.0	14.7	22.2	29	5.4	0.0	0.0
8	6	aug	mon	92.3	88.9	495.6	8.5	24.1	27	3.1	0.0	0.0
8	6	aug	mon	91.5	145.4	608.2	10.7	8.0	86	2.2	0.0	0.0
8	6	sep	tue	91.0	129.5	692.6	7.0	13.1	63	5.4	0.0	0.0


We have a helper script to give us the data in the form of a simple object. Later, we will use a Pandas dataframe object to do this work.

In [8]:
from HW04_uci_mldb_firedata import firedata

# Working with spatial coordinates X, Y

X: x-axis spatial coordinate within the Montesinho park map: 1 to 9  
Y: y-axis spatial coordinate within the Montesinho park map: 2 to 9

In [42]:
X, Y = firedata.X, firedata.Y

In [43]:
X[:10], Y[:10]

(array([7, 7, 7, 8, 8, 8, 8, 8, 8, 7]), array([5, 4, 4, 6, 6, 6, 6, 6, 6, 5]))

## Q1

**<span style="color:blue">(2 PTS) 1. Write a function called `coord_builder()` with these requirements:</span>**

- Takes two lists, X and Y, as inputs. X and Y must be of equal length.
- Returns a list of tuples `[(x1,y1), (x2,y2), ..., (xn,yn)]` where `(xi,yi)` are the ordered pairs from X and Y.
- Uses the `zip()` function to create the returned list.
- Use a list comprehension to actually build the returned list.
- Contains a docstring with short description of the function.

In [14]:
# WRITE FUNCTION

## Q2
**<span style="color:blue">(1 PT) 2. Call your `coord_builder()` function, passing in `X` and `Y`.  
    Then show print the FIRST FIVE TUPLES. </span>**

In [15]:
# CALL FUNCTION

In [16]:
# SHOW RESULTS

# Working with AREA

In [17]:
area = firedata.area

In [19]:
area[-10:]

array([ 0.  ,  0.  ,  2.17,  0.43,  0.  ,  6.44, 54.29, 11.16,  0.  ,
        0.  ])

## Q3
**<span style="color:blue">(1 PT) 3. Write code to print the minimum area and maximum area in a tuple
(min_value, max_value) where the min_value, max_value are floats.</span>** 

In [20]:
# CODE

## Q4
**<span style="color:blue">(2 PTS) 4. Write a lambda function that applies the following function to $x$:</span>**   

$log_{10}(1 + x)$

Assign the function to the variable `mylog10`.

**<span style="color:blue">Then call the lambda function on `area` and print the LAST 10 values.</span>**  

Hints: 
* Use the `log10` function from Python's [`math` module](https://docs.python.org/3/library/math.html). You'll need to import it.
* Use a list comprehension to make the lambda function a one-liner.
* To get the last members of a list, used negative offset slicing. See [the Python documentation on lists](https://docs.python.org/3/tutorial/introduction.html#lists) for a refresher on slicing.

In [24]:
# FUNCTION

In [25]:
# RESULT

# Working with MONTH

The month column contains months of the year in abbreviated form &mdash; 'jan' to 'dec'.

In [32]:
month = firedata.month

In [33]:
month[:10]

array(['mar', 'oct', 'oct', 'mar', 'mar', 'aug', 'aug', 'aug', 'sep',
       'sep'], dtype=object)

## Q5
**<span style="color:blue">(1 PT) 5. Create a function called `unique()` that extracts the unique values from a list. Then function should optionally return the list as sorted in ascending order. The apply it to the *month* column of our data with sorting turned on. Then print the unique months.</span>**   

In [16]:
# WRITE FUNCTIONS

In [17]:
# CALL FUNCTION

In [26]:
# PRINT RESULTS

## Q6
**<span style="color:blue">(1 PT) 6. Write a list comprehension to select all months starting with letter 'a' from the list of unique *month* names you just crreated. The list should contain uppercase strings. Print this result.</span>**   

In [27]:
# WRITE CODE

# Working with DMC
DMC - DMC index from the FWI system: 1.1 to 291.3  

In [29]:
dmc = firedata.DMC

In [30]:
dmc[:10]

array([ 26.2,  35.4,  43.7,  33.3,  51.3,  85.3,  88.9, 145.4, 129.5,
        88. ])

## Q7
**<span style="color:blue">(2 PTS) 7. Write a function called `bandpass_filter()` with these requirements:</span>**

- Takes three inputs: 
  - A numeric array (or list).
  - An integer serving as a lower bound `lower_bound`.
  - An integer serving as an upper bound `upper_bound`.
- Returns a new array containing only the values from the original array which are greater than `lower_bound` and less than `upper_bound`.

In [34]:
# WRITE FUNCTION

## Q8
**<span style="color:blue">(1 PT) 8. Call `bandpass_filter()` passing `dmc` as the array, with `lower_bound=25` and `upper_bound=35`. Then print the result. </span>**


In [35]:
# CALL FUNCTION

In [44]:
# PRINT RESULT

# Working with FFMC
FFMC - FFMC index from the FWI system: 18.7 to 96.20

In [37]:
ffmc = firedata.FFMC

In [38]:
ffmc[:10]

array([86.2, 90.6, 90.6, 91.7, 89.3, 92.3, 92.3, 91.5, 91. , 92.5])

## Q9
**<span style="color:blue">(2 PTS) 9. Write a function called `sum_sq_err()` with these requirements:</span>**

- Takes a numeric list as input.
- Computes the mean $\mu$ of the list. 
- Computes the sum of squared deviations for each item in the list.
- Returns the sum of squared deviations.

To implement this in your function, use these techniques:
- Write a subfunction to compute squared deviation of a list item.
- Apply that function in a for-loop.

Hints: 
* The mean is jus the sum of a list of numeric values divided by the length of that list.
* The squared deviation of a list element $x_i$ is $(x_i - \mu)^2$.
* it will be necessary to keep a running sum of the squared deviations.

In [40]:
# WRITE FUNCTION

## Q10
**<span style="color:blue">(1 PT) 10. Call `sum_sq_err()` passing `ffmc` as the array, printing the result. </span>**

In [41]:
# CALL FUNCTION

---  

# END

Push this to your private repo and then link to it in HW04 in Assessments on Collab.