# CrunchieMunchies

You work in marketing for a food company YummyCorps, which is developing a new kind of tasty, wholesome cereal called CrunchieMunchies. You want to demonstrate to consumers how healthy your cereal is in comparison to other leading brands, so you've dug up nutritional data on several different competitors.

Your task is to use NumPy statistical calculations to analyze this data and prove that your CrunchieMunchies cereal is the healthiest choice for consumers.

Look over the cereal.csv file. This file contains the reported calorie amounts for different cereal brands. Load the data from the file and save it as calorie_stats. 

In [38]:
import numpy as np
calorie_stats = np.genfromtxt('cereal.csv', delimiter=',')
calorie_stats

array([ 70., 120.,  70.,  50., 110., 110., 110., 130.,  90.,  90., 120.,
       110., 120., 110., 110., 110., 100., 110., 110., 110., 100., 110.,
       100., 100., 110., 110., 100., 120., 120., 110., 100., 110., 100.,
       110., 120., 120., 110., 110., 110., 140., 110., 100., 110., 100.,
       150., 150., 160., 100., 120., 140.,  90., 130., 120., 100.,  50.,
        50., 100., 100., 120., 100.,  90., 110., 110.,  80.,  90.,  90.,
       110., 110.,  90., 110., 140., 100., 110., 110., 100., 100., 110.])

There are 60 calories per serving of CrunchieMunchies. How much higher is the average calorie count of your competition?

Save the answer to the variable average_calories and print the variable to the terminal to see the answer.

In [5]:
average_calories = np.mean(calorie_stats)
average_calories - 60 # CrunchieMunchies is 46.9 calories under the average

46.883116883116884

Does the average calorie count adequately reflect the distribution of the dataset? Let's sort the data and see.

Sort the data and save the result to the variable calorie_stats_sorted. Print the sorted data to the terminal.

In [7]:
calorie_stats_sorted = np.sort(calorie_stats)
calorie_stats_sorted

array([ 50.,  50.,  50.,  70.,  70.,  80.,  90.,  90.,  90.,  90.,  90.,
        90.,  90., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
       100., 100., 100., 100., 100., 100., 100., 100., 110., 110., 110.,
       110., 110., 110., 110., 110., 110., 110., 110., 110., 110., 110.,
       110., 110., 110., 110., 110., 110., 110., 110., 110., 110., 110.,
       110., 110., 110., 110., 120., 120., 120., 120., 120., 120., 120.,
       120., 120., 120., 130., 130., 140., 140., 140., 150., 150., 160.])

Do you see what I'm seeing? Looks like the majority of the cereals are higher than the mean. Let's see if the median is a better representative of the dataset.

Calculate the median of the dataset and save your answer to median_calories. Print the median so you can see how it compares to the mean.

In [16]:
median_calories = np.median(calorie_stats)
print('median:',median_calories,'- mean:',np.round(np.mean(calorie_stats),1))

median: 110.0 - mean: 106.9


While the median demonstrates that at least half of our values are over 100 calories, it would be more impressive to show that a significant portion of the competition has a higher calorie count that CrunchieMunchies.

Calculate different percentiles and print them to the terminal until you find the lowest percentile that is greater than 60 calories. Save this value to the variable nth_percentile.

In [29]:
nth_percentile = np.percentile(calorie_stats,4)
nth_percentile

70.0

While the percentile shows us that the majority of the competition has a much higher calorie count, it's an awkward concept to use in marketing materials.

Instead, let's calculate the percentage of cereals that have more than 60 calories per serving. Save your answer to the variable more_calories and print it to the terminal.

In [32]:
more_calories = np.mean(calorie_stats > 60)
more_calories #=> 96% of other cereals have more calories

0.961038961038961

Wow! That's a really high percentage. That's going to be very useful when we promote CrunchieMunchies. But one question is, how much variation exists in the dataset? Can we make the generalization that most cereals have around 100 calories or is the spread even greater?

Calculate the amount of variation by finding the standard deviation. Save your answer to calorie_std and print to the terminal. How can we incorporate this value into our analysis?

In [34]:
calorie_std = np.std(calorie_stats)
calorie_std

19.35718533390827

Write a short paragraph that sums up your findings and how you think this data could be used to Yummy Corp's advantage when marketing CrunchieMunchies.

96% of other cereals have more calories than CrunchieMunchies. The median calorie count is 110 and the mean is 106.9, with a standard deviation of 19.4. CrunchieMunchies' calorie count puts it in the 4th percentile in relation to all other breakfast cereals. Of 77 breakfast cereals, only three have fewer calories than CrunchieMunchies. 