
In Colaboratory, you cannot make permanent changes to this notebook without making a copy to your Google Drive. **Closing the page without copying the notebook to your drive will result in all of your changes being lost.**

This week we will focus on doing questions that analyze the csv file section2data.csv. This file contains three columns and contains the answer to the "Just for fun questions" that you answered in the starting survey. We also provided you with a smaller file called section2data-truncated.csv which contains a subset of the data from section2data.csv for testing purposes.


| number| countries visited | low temperature  | high temperature| Ideal Weather Condition |
|--------------|-----------------------------|---------------------------|------------------------------|----------------------------------------|
| 1            | 2                              |  50                       | 60                            |Sunny                                 | 


# Format 

This document consists of three different parts. The first part lists important functions to call in order to complete testing and parse the file. 

The next two parts contain problems. 

Manual problems are problems that are meant to be done by hand using the data structure produced from the parse function. Pandas problems are problems that you do assuming you get passed in a pandas `Dataframe` and using pandas functions. The problems in the manual and pandas sections are the same problems, they just have different restrictions for when you complete the problems. 

# Practice Problems Important functions

Reminder: for each of the following problems, there will be a blank cell for you to write the solution and a cell that calls your solution to test if it works. You should use those test cells to see examples of what the function should return for a particular input (the first value passed to `assert_equals` is the expected value). You will have to make sure to run the cell with a solution to your problem BEFORE you run the test cells (otherwise your function won't be defined). 

For all of these problems, you may assume we will only pass parameters of the specified type and they are not `None`, but otherwise, you should make no assumptions about the parameters

**Important:** Please make sure that you have run the following cells before running any of the test cells.

`check_approx_equals` allows you to check floats. 

In [0]:
import math

import requests

import pandas

from google.colab import files

def check_approx_equals(expected, received):
    """
    Checks received against expected, and returns whether or 
    not they match (True if they do, False otherwise). 
    If the argument is a float, will do an approximate check.
    If the arugment is a data structure will do an approximate check
    on all of its contents.
    """
    try:
        if type(expected) == dict:
            # first check that keys match, then check that the
            # values approximately match
            return expected.keys() == received.keys() and \
                all([check_approx_equals(expected[k], received[k])
                    for k in expected.keys()])
        elif type(expected) == list or type(expected) == set:
            # Checks both lists/sets contain the same values
            return len(expected) == len(received) and \
                all([check_approx_equals(v1, v2)
                    for v1, v2 in zip(expected, received)])
        elif type(expected) == float:
            return math.isclose(expected, received, abs_tol=0.001)
        else:
            return expected == received
    except Exception as e:
        print(f'EXCEPTION: Raised when checking check_approx_equals {e}')
        return False


def assert_equals(expected, received):
    """
    Checks received against expected, throws an AssertionError
    if they don't match. If the argument is a float, will do an approximate
    check. If the arugment is a data structure will do an approximate check
    on all of its contents.
    """
    assert check_approx_equals(expected, received), \
        f'Failed: Expected {expected}, but received {received}'

You will also need to upload our starter text file [section2data.csv](https://homes.cs.washington.edu/~nriley16/section2data.csv) by running the following cell.

In [0]:
import requests

def save_file(url, file_name):
  r = requests.get(url)
  with open(file_name, 'wb') as f:
    f.write(r.content)

save_file('https://courses.cs.washington.edu/courses/cse163/19sp/files/section/'
          + '04-11/section2data.csv', 'section2data.csv')
save_file('https://courses.cs.washington.edu/courses/cse163/19sp/files/section/'
          + '04-11/section2data-truncated.csv', 'section2data-truncated.csv')

You will also need to run the parse function to allow you to have the ability to parse the data in the csv in the following cell. 

In [0]:
def parse(file_name, int_cols):
    """
    Parses the CSV file specified by file_name and returns the data as a list
    of dictionaries where each row is represented by a dictionary that
    has keys for each column and value which is the entry for that column
    at that row.

    Also takes a list of column names that should have the data for that column
    converted to integers. All other data will be str.
    """
    data = []
    with open(file_name) as f:
        headers = f.readline().strip().split(',')
        num_cols = len(headers)

        for line in f.readlines():
            row_data = line.strip().split(',')
            row = {}
            for i in range(num_cols):
                if headers[i] in int_cols:
                    row[headers[i]] = int(row_data[i])
                else:
                    row[headers[i]] = row_data[i]
            data.append(row)
    return data


In [0]:
integer_cols = ['number', 'countries visited', 'low temperature', 
                'high temperature']
parsed_entire_file = parse('section2data.csv', integer_cols)
parsed_truncated_file = parse('section2data-truncated.csv', integer_cols)

In [0]:
print(parsed_entire_file)

[{'number': 1, 'countries visited': 19, 'low temperature': 60, 'high temperature': 70, 'ideal weather condition': 'Sunny'}, {'number': 2, 'countries visited': 13, 'low temperature': 80, 'high temperature': 90, 'ideal weather condition': 'Sunny'}, {'number': 3, 'countries visited': 15, 'low temperature': 70, 'high temperature': 40, 'ideal weather condition': 'Mostly Cloudy'}, {'number': 4, 'countries visited': 2, 'low temperature': 60, 'high temperature': 70, 'ideal weather condition': 'Sunny'}, {'number': 5, 'countries visited': 6, 'low temperature': 60, 'high temperature': 70, 'ideal weather condition': 'Sunny'}, {'number': 6, 'countries visited': 3, 'low temperature': 60, 'high temperature': 70, 'ideal weather condition': 'Sunny'}, {'number': 7, 'countries visited': 10, 'low temperature': 50, 'high temperature': 60, 'ideal weather condition': 'Partly Cloudy'}, {'number': 8, 'countries visited': 5, 'low temperature': 30, 'high temperature': 40, 'ideal weather condition': 'Sunny'}, {'n

# Manual Processing Problems

For each of these problems, use the data from the file you parsed using the provided parse function and create solutions manually (as in do not use libraries like pandas).

# Problem 1) weather_count

Write a function called `weather_count` that takes in an ideal weather condition and the parsed weather data and returns the number of people who prefer this weather condition. 

For example, for the call `weather_count`(parsed_truncated_file, 'Thunder Storm') to return 2. 

In [0]:
# Type your solution here
def weather_count(data, weather_condition):
  weather_count = 0
  for line in data: 
    if line["ideal weather condition"] == weather_condition:
      weather_count += 1
  return weather_count

In [0]:
assert_equals(2, weather_count(parsed_truncated_file, 'Thunder Storm'))
assert_equals(1, weather_count(parsed_truncated_file,'Mostly Cloudy'))

# Problem 2) temperature_range

Write a function called `temperature_range` that takes in the parsed student data, a `low` temperature (inclusive) and a `high` temperature (exclusive) and returns the number of students that prefer temperatures within that range. These temperatures are in Farenheit. 

For the dataset [{'number': 1,'countries visited': 3, 'low temperature': 50, 'high temperature': 60, 'ideal weather condition': 'Sunny'}, {'number': 2, 'countries visited': 5, 'low temperature': 30, 'high temperature': 40, 'ideal weather condition': 'Partly Cloudy'}, {'number': 3, 'countries visited':10, 'low temperature': 0, 'high temperature': 30, 'ideal weather condition': 'Rainy'}] if you called `temperature_range`(parsed_truncated_file, 0, 31) we would return 1 because only the third piece of data has both a low and high temperature value that are in between the high and low input into the function. 

For example, the call to `temperature_range`(parsed_truncated_file, 0, 31) = 1. 



In [0]:
# Type your solution here
def temperature_range(data, low, high):
  temperature_count = 0; 
  for line in data: 
    row_low = line['low temperature'] 
    row_high = line['high temperature']
    if row_low >= low and row_high < high:
      temperature_count += 1
  return temperature_count


In [0]:
assert_equals(1, temperature_range(parsed_truncated_file, 0, 31))

# Problem 3) `max_countries_visited`

Write a function called `max_countries_visited` that takes in the `data` that finds the max number of countries visited by someone in the class and returns a tuple containing that person's number in the dataset and the number of countries they have visited.

If there is a tie between multiple students on the maximum number of countries visited, this function will return the student that appeared first in the dataset. 

In [0]:
# Type your solution here
def max_countries_visited(data):
  student_number = 0; 
  max_visited_countries = 0; 
  for line in data: 
    if line['countries visited'] > max_visited_countries:
      student_number = line['number']
      max_visited_countries = line['countries visited']
  return student_number, max_visited_countries
                          
                                

In [0]:
assert_equals((1, 19), max_countries_visited(parsed_truncated_file))

# Problem 4) `unique_weather_combinations`

Write a function called `unique_weather_combinations` that takes in the parsed weather `data` returns the number of unique combinations of ideal weather type and low temperature that individuals in our class listed. 



In [0]:
# Type your solution here
def unique_weather_combinations(data):
  weather_conditions_set = set()
  for line in data: 
    weather_condition = str(line['low temperature']) + " " \
      + line['ideal weather condition']
    weather_conditions_set.add(weather_condition)
  return len(weather_conditions_set)


In [0]:
assert_equals(11, unique_weather_combinations(parsed_truncated_file));

# Problem 5) `average_countries_visited`

Write a function called `average_countries_visited` that takes in the parsed `data` and returns the average number of countries visited in the dataset. 

In [0]:
# Type your code here
def average_countries_visited(data):
  count = 0 
  sum = 0 
  for line in data: 
    count += 1
    sum += line['countries visited']
  return sum / count;

In [0]:
assert_equals(7.5, average_countries_visited(parsed_truncated_file))

# Problem 6) `average_countries_visited_per_weather_type`

Write a function called `average_countries_visited_per_weather_type` that takes in the parsed `data` and  returns a dictionary that has the average number of countries visited for each preferred weather type (ie Sunny, Cloudy, etc)

For `average_countries_visited_per_weather_type`(parsed_truncated_file) the result would be

{'Sunny': 7.777777777777778, 'Mostly Cloudy': 15.0, 'Partly Cloudy': 6.285714285714286, 'Rain': 16.0, 'Thunder Storm': 2.5}

**Note**: We are not being thorough with this problem. In actual testing we would encourage you to make a small dataset that you know the expected result from and test the function on that file. 

In [0]:
# Type your code here
def average_countries_visited_per_weather_type(data):
  counts_dictionary = dict()
  sum_dictionary = dict()
  for line in data: 
    weather_type = line['ideal weather condition']
    if weather_type in counts_dictionary: 
      counts_dictionary[weather_type] += 1
      sum_dictionary[weather_type] += line['countries visited']
    else: 
      counts_dictionary[weather_type] = 1
      sum_dictionary[weather_type] = line['countries visited']
  average_dictionary = dict()
  for weather_condition in counts_dictionary:
    average_dictionary[weather_condition] \
      = sum_dictionary[weather_condition] / counts_dictionary[weather_condition]
  return average_dictionary
    

In [0]:
#testing statements 
print(average_countries_visited_per_weather_type(parsed_truncated_file))

{'Sunny': 7.777777777777778, 'Mostly Cloudy': 15.0, 'Partly Cloudy': 6.285714285714286, 'Rain': 16.0, 'Thunder Storm': 2.5}


# Pandas Problems

For each of these problems, use the data from the file you parsed and create solutions using the pandas library. For a reference to the library click [here](https://colab.research.google.com/drive/1fsW0sTvsMcD79eM4st1fBJgnXiCPU41T#scrollTo=VPFCCiumej8-&forceEdit=true&offline=true&sandboxMode=true)

Word bank: 

*   Get a column of a `DataFrame`
*   Get a row of a `DataFrame` (`loc`)
*    Filtering
*     Loop over Series
*     `groupby`
*     `min`
*     `max`
*     `idxmin`
*     `idxmax`
*      `count`
*      `mean`
*      `unique`





Note: in pandas we parse our data differently and the object we will be processing is a pandas `Dataframe` instead of a list of dictionaries. Run the cell below to get the data in a form compatible with pandas

In [0]:
pandas_truncated_data = pandas.read_csv('section2data-truncated.csv')
pandas_data = pandas.read_csv('section2data.csv')

# Problem 1) weather_count_pandas

Write a function called `weather_count_pandas` that takes in an ideal weather condition and a pandas `Dataframe` of the weather `data` and returns the number of people who prefer this weather condition. 

For example, for the call `weather_count_pandas`(parsed_truncated_file, 'Thunder Storm') to return 2. 

In [0]:
# Type your solution here
def weather_count_pandas(data, weather_condition):
  return len(data[(data['ideal weather condition'] == weather_condition)])


In [0]:
assert_equals(2, weather_count_pandas(pandas_truncated_data, 'Thunder Storm'))
assert_equals(1, weather_count_pandas(pandas_truncated_data,'Mostly Cloudy'))

# Problem 2) temperature_range_pandas

Write a function called `temperature_range_pandas` that takes in a pandas `Dataframe` of the weather `data`,  a `low` temperature (inclusive), and a `high` temperature (exclusive) and returns the number of students that prefer temperatures within that range. These temperatures are in Farenheit. 

For the dataset [{'number': 1,'countries visited': 3, 'low temperature': 50, 'high temperature': 60, 'ideal weather condition': 'Sunny'}, {'number': 2, 'countries visited': 5, 'low temperature': 30, 'high temperature': 40, 'ideal weather condition': 'Partly Cloudy'}, {'number': 3, 'countries visited':10, 'low temperature': 0, 'high temperature': 30, 'ideal weather condition': 'Rainy'}] if you called `temperature_range_pandas`(parsed_truncated_file, 0, 31) we would return 1 because only the third piece of data has both a low and high temperature value that are in between the high and low input into the function. 

For example, the call to `temperature_range`(parsed_truncated_file, 0, 31) = 1. 



In [0]:
# Type your solution here
def temperature_range_pandas(data, low, high):
  filtered = data[(data['low temperature'] >= low) 
                  & (data['high temperature'] < high)]
  return len(filtered);

In [0]:
assert_equals(1, temperature_range_pandas(pandas_truncated_data, 0, 31))

# Problem 3) `max_countries_visited_pandas`

Write a function called `max_countries_visited_pandas` that takes in the pandas `Dataframe` of the weather `data` that finds the max number of countries visited by someone in the class and returns a tuple containing that person's number in the dataset and the number of countries they have visited.

In [0]:
# Type your solution here
def max_countries_visited_pandas(data):
  val = data.loc[data['countries visited'].idxmax()]
  return val['number'], val['countries visited']


In [0]:
assert_equals((1, 19), max_countries_visited_pandas(pandas_truncated_data))

# Problem 4) `unique_weather_combinations_pandas`

Write a function called `unique_weather_conditions_pandas` that takes in a pandas `Dataframe` of the weather `data` returns the number of unique combinations of ideal weather type and low temperature in the dataset. 

You can use the `astype` function on a `Series` to convert the values to a different type. For example, to turn high temperature into a `float`:

```python
data['high temperature'].astype(float)
```

In [0]:
# Type your solution here
def unique_weather_combinations_pandas(data):
  combinations = data['low temperature'].astype(str) + ' ' \
    + data['ideal weather condition']
  return len(combinations.unique())

In [0]:
assert_equals(11, unique_weather_combinations_pandas(pandas_truncated_data));

# Problem 5) `average_countries_visited_pandas`

Write a function called `average_countries_visited_pandas` that takes in a pandas `Dataframe` of the `data` and returns the average number of countries visited in the dataset. 

In [0]:
# Type your solution here
def average_countries_visited_pandas(data):
  return data['countries visited'].mean();

In [0]:
#testing here 
assert_equals(7.5, average_countries_visited_pandas(pandas_truncated_data))

# Problem 6) `average_countries_visited_per_weather_type_pandas`

Write a function called `average_countries_visited_per_weather_type_pandas` that takes in a pandas `Dataframe` of the weather`data` and  returns a dictionary that has the average number of countries visited for each preferred weather type (ie Sunny, Cloudy, etc)


For `average_countries_visited_per_weather_type_pandas`(parsed_truncated_file) the result would be

{'Sunny': 7.777777777777778, 'Mostly Cloudy': 15.0, 'Partly Cloudy': 6.285714285714286, 'Rain': 16.0, 'Thunder Storm': 2.5}

**Note**: We are not being thorough with this problem. In actual testing we would encourage you to make a small dataset that you know the expected result from and test the function on that file. 

In [0]:
# Type your solution here
def average_countries_visited_per_weather_type_pandas(data):
  return dict(data.groupby('ideal weather condition')['countries visited'].mean())

In [0]:
#testing here 
print(average_countries_visited_per_weather_type_pandas(pandas_truncated_data))

{'Mostly Cloudy': 15.0, 'Partly Cloudy': 6.285714285714286, 'Rain': 16.0, 'Sunny': 7.777777777777778, 'Thunder Storm': 2.5}
