# Down and outlier

Imagine you're a public safety official trying to reduce the number of injuries due to car crashes. You want to know how frequently car crashes result in injuries. You've already made some predictions, but you want to see where your predictions are going wrong. You'll do some counting and filtering to address any outliers you find.

Four reminders:

- Recall that the UNIQUE() function returns a list of unique values for a given row or column. This will help you summarize the data.
- The SORT() function can be used to sort the values of a given range/array (in ascending order by default).
- The COUNTIF() function is helpful for counting cells that meet certain criteria.
- Remember that absolute deviation refers to the difference between your prediction and the actual value.

- See 03.02

# No filter

Now that you've summarized the data and found outliers, you may want to remove them to see if that has an effect on the accuracy of your predictions.

Recall that we can measure prediction accuracy using the average absolute deviation. This indicates how much your predictions differ from your actual values.

- See 03.03

# Addressing outliers

We saw that one way to counter the impact of outliers is to remove them from our analysis. This makes our predictions more accurate, but it also might exclude meaningful variation from our data.

Another alternative would be to explore the outliers and determine why they vary so much from the norm. Now you'll take a look at the outliers where your predictions are least accurate to try to spot patterns.

- See 03.04

# Can't start a fire without a spark(line)

So you've made some predictions, you've checked their accuracy, and you see that crashes with multiple vehicles tend to cause the most injuries. This isn't all that suprising. Since you're charged with public safety, your task is to see how to use your resources to make the roads safer without spending all day creating dashboards.

This is a great opportunity to use sparklines to compare data quickly. Just as a reminder, the SPARKLINE() function takes the data (in this case the cell) as the first argument, and sparkline options are enclosed in a set of curly braces ({}), with a comma (,) to separate keys and settings, and with a semi-colon (;) to separate individual options. Thus, your options argument should take the form of the following:

`{"option1", option1setting; "option2", option2setting}`

- See 03.06

# The max matters

Now you have an easy visual of how many injuries crashes cause in each precinct, which can help you use your resources more effectively. But you might also want to know how frequently car crashes result in injury in different precincts. This might tell you whether lowering the speed limit in an area would save lives.

Now you'll add sparklines for the proportion of accidents that cause injuries. You'll set the limites of the scale at 0 and 1 to make absolute instead of relative comparisons. This will illustrate the impact of context when interpreting sparklines.

The frequency of car crashes causing injury is given in cells J2:J10 as a starting point for your sparklines.

- See 03.07

# There are consequences

Another way to think about reducing car crashes and accidents is to quantify risk in an attempt to minimize it. Recall that we define the risk of an event as the probability of it occurring multiplied by the severity of its consequences. In this exercise you'll compute the consequences of car crashes, as well as the probability that they'll occur.

Remember that we can use the FORECAST() function in this instance to calculate the consequences of the crash (i.e., estimate the probability that the crash will cause injuries).

- See 03.09

# A likely story

You have estimated the severity of the consequences, but you still need to calculate the likelihood of the event occurring to measure risk. This will help you make decisions about which precinct is the riskiest place to drive (and consequently which one could use more support).

To calculate how likely crashes are in each region, you'll want to divide the number of crash-related injuries in the precinct by the number of vehicles involved in crashes. After calculating likelihood and consequences, you can use the matrix described in the video to calculate total risk to determine which precinct needs the most help.

Remember that the SUMIF() function is useful for summing up only the events that meet certain criteria. Remember to specify the range to check, the criterion, and the range to sum.

- See 03.10

# Risky business

Now you know the risk associated with each event. Great job! But how can you use this information to make decisions to improve public safety?

To get a sense of the risk in each precinct, add together all the risk values you calculated for each precinct. This will help you make an informed decision about which precinct is the riskiest and where you want to allocate your resources.

A note that as you do this SPARKLINES() will start to populate in column J to visualize your calculations.

- See 03.11

# Random numbers

Imagine you've presented your recommendations to your supervisor, and she is nearly convinced. As a final step, she asks you to go back and run a few simulations under different conditions and slightly different data. This is a perfect case for using random numbers as we noted in the video. Remember that by multiplying the actual data by some random factor, we can vary the data slightly and test our predictions under slightly different conditions.

- See 03.13

# How random

Introducing variation will help you model your predictions under other conditions and with other data. But you'll want to know how much variation you are introducing into the data. Recall that we can summarize data with descriptive statistics such as the mean and standard deviation using the AVERAGE(), and STDEV() functions, respectively.

Keep in mind that the average of random positive and negative numbers will be closer to 0. As such, you will also want to know how much absolute variation exists. This is a good opportunity to use the ABS() function to summarize absolute values

- See 03.14

# Be fruitful and multiply

Now that you have multipliers, you can adjust your data. To do so, you'll multiply your original risk by the random number you created, and then add that total to your original risk. Keep in mind that you want to remove negative numbers so that you don't skew estimates of risk.

As a reminder, the MAX() function returns the highest value in a set of values (such as a range or array). You can use this to set a minimum value to your calculation.

For example, `MAX(0, (-2*1))` will return 0, whereas `MAX(0, (2*1))` will return 2.

- See 03.15

# Revisiting sparklines

Now you can compare the noisy data you generated with the original predictions you made. Keep in mind what the videos about framing risk to affect behavior.

Given framing effects, which of the following would be the most effective way to persuade your manager of the risks of spending money to reduce traffic accidents?

- If we invest to improve safety in Midtown, we are certain to save $250,000. (See 03.16)