# Filtering Data Based on Criteria
In this lesson, we will use a simple, small dataset of weather projections in Chapel Hill for Thursday. March 25th, through Saturyda, April 3rd, where each row is the projection for the next day in that timeframe.

Our analysis goal is to find the average temepratures on days where it is unlikely (less than 30%) to rain.

We will consider approaching this problem from a column-oriented perspective.

First, let's consider our data set.

In [104]:
col_data: dict[str, list[float]] = {
    "high": [77, 84, 78, 79, 65, 67, 74, 61, 55, 61],
    "low": [67, 51, 64, 45, 43, 53, 56, 37, 34, 42],
    "rain": [.3, .2, .4, .8, 0., .2, .4, .5, .1, .1]
}

col_data

{'high': [77, 84, 78, 79, 65, 67, 74, 61, 55, 61],
 'low': [67, 51, 64, 45, 43, 53, 56, 37, 34, 42],
 'rain': [0.3, 0.2, 0.4, 0.8, 0.0, 0.2, 0.4, 0.5, 0.1, 0.1]}

## Produce a "Mask" Based on Criteria

Less than .3 is unlikely to rain.

In [105]:
def less_than(col: list[float], threshold: float) -> list[bool]:
    result: list[bool] = []  #This list of Booleans can be referred to as a 'mask'... The mask() method replaces the values of the rows where the condition evaluates to True.
    for item in col:
        result.append(item < threshold)
        # The above line does the same as the following:
        # if item < threshold:
        #     result.append(True)
        # else:
        #     result.append(False)
    return result

# Example, testing call:
no_rain_mask: list[bool] = less_than(col_data["rain"], 0.3)
no_rain_mask

[False, True, False, False, True, True, False, False, True, True]

# Masked Function

Takes in a column and a list of masks (bool values), returns only the values in the input column where the corresponding mask value is True.

Masking as also known as boolean selection or boolean indexing. We are 'masking' values that are false in order to keep the values that are true.

In [106]:
# To use this, you must have your masked list of bools already made.
def masked(col: list[float], mask: list[bool]) -> list[float]:
    result: list[float] = []
    for i in range(len(mask)):
        if mask[i] == True:
            result.append(col[i])     
    return result

# Test
highs_of_no_rain_days: list[float] = masked(col_data["high"], no_rain_mask)
highs_of_no_rain_days


[84, 65, 67, 55, 61]

## Compute the Average

In [107]:
def mean(col: list[float]) -> float:
    return sum(col) / len(col)

mean(highs_of_no_rain_days)

# OR

# computer the average

# def mean_avg(list_of_floats: list[float]) -> float:
#     total_of_floats: float = 0
#     for element in list_of_floats:
#         total_of_floats += element
#     mean_calculation = total_of_floats / len(list_of_floats)
#     return mean_calculation

# avg_temp_no_rain_days = mean_avg(highs_of_no_rain_days)
# avg_temp_no_rain_days


66.4

In [108]:
def not_mask(mask: list[bool]) -> list[bool]:
  result: list[bool] = []
  for item in mask:
    result.append(not item)
  return result

mask_a: list[bool] = less_than(col_data["high"], 80)
mask_b: list[bool] = not_mask(mask_a)

values: list[float] = masked(col_data["low"], mask_b)
print(mean(values))

51.0


## With these helper functions, we can perform many anaysis!

In [109]:
# What is the average likliehood of precipitation when the day's low is less than 50 degrees?

# Produce mask.

mask_day_low_less_50: list[bool] = less_than(col_data["low"], 50)
print(mask_day_low_less_50)

list_rain_on_days_less_50: list[float] = masked(col_data["rain"], mask_day_low_less_50)
print(list_rain_on_days_less_50)

mean_rain_days_less_50: float = mean(list_rain_on_days_less_50)
print(mean_rain_days_less_50)

[False, False, False, True, True, False, False, True, True, True]
[0.8, 0.0, 0.5, 0.1, 0.1]
0.30000000000000004
