# Overview of sugar content of menu items at various restaurants

This project fundamentally relies on data from the [Nutritionix API](http://www.nutritionix.com/api). I am very grateful for the use of their data.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import requests
import json
import os
import minimum_sugar

# Load credential data from file
with open("credentials.json", "r") as f:
    credentials = json.load(f)
    
# Load menu data from file
with open("menu_data.json", "r") as f:
    menu_data = json.load(f)

## Restaurant IDs
Nutritionix identifies restaurants by a unique number, but from what I can tell they do not list those numbers. The code in this section creates a mapping between the names of restaurants I frequent and Nutritionix's restaurant ID.

In [2]:
restaurant_names = ["Wendy's",
                    "McDonald's",
                    "Five Guys",
                    "Qdoba",
                    "Taco Bell",
                    "Chipotle",]

In [None]:
restaurant_ids = [minimum_sugar.fetch_restaurant_id(name, credentials) for name in restaurant_names]

In [3]:
# Save the data to a file for reading at a later date.
# with open("restaurant_ids.json", "w") as f:
#     f.write(json.dumps(restaurant_ids, indent=4, separators=(',', ': ')))

## Restaurant menus
Download restaurant menu nutrition data.

In [4]:
# Fetch menu data and add to each dict in `restaurant_ids`
# for restaurant in restaurant_ids:
#     restaurant["menu"] = minimum_sugar.fetch_menu_item_data(restaurant["id"], credentials)

In [10]:
# Could do some list comprehension here, but I think it code readability would suffer.
menu_data = []
for restaurant in restaurant_ids:
    menu_data.extend(minimum_sugar.fetch_menu_item_data(restaurant["id"], credentials))

In [20]:
# Write the data to a file for future analysis
# with open("menu_data.json", "w") as f:
#     f.write(json.dumps(menu_data, indent=4, separators=(',', ': ')))

## Categorizing menu items
It turns out that the menu items of these various restaurants are not categoriezed according to any universal scheme (e.g. beverage, condiment, etc.). Moreover, some restaurants (like McDonald's) list beverages on their menu whereas others (e.g. Chipotle) do not. Beverages really skew the sugar content of the histogram, and the data I really wanted was simply the sugar content of the entree items.

I ended up categorizing the menu items partially by hand. Some of the intermediate details can be found in `menu_item_categorization.ipynb`. The goal was to categorize menu items according to the following categories:

* beverage
* dessert
* condiment
* side order
* entree

The overall workflow was to first create a list of `item_name`s from the list of menu item data. The `item_names` list was written to a file (`item_names.dat`) with one `item_name` per line. I started with category "beverage" and deleted everything from the `item_names.dat` file that looked like it was a drink. I then used the remainder and the original `item_names` list to create python `set` objects and recover the list of beverage item names. The following code is an example of the process (not exactly what happened, but captures the essence):

```python
# Remove beverage item names from the `item_names.dat` file by hand.

# Bring the result back into memory.
with open("item_names.dat", "r") as f:
    remainder = [line.strip() for line in f]

item_names_set = set(item_names)
remainder_set = set(remainder)

beverage_item_names_set = item_names_set - remainder_set
beverage_item_names = list(beverage_item_names_set)
beverage_item_names.sort()

with open("beverage_item_names.json", "w") as f:
    f.write(json.dumps(beverage_item_names, indent=4, separators=(',', ': ')))
```

I repeated this process of elimination to generate lists of menu items according to each category.

## Update the "database" with `menu_category` field and check results
Once this process was completed, I had several files of item names; each file contained the items of a particular category. For maximum utility, these categories should be included in the makeshift database in `menu_data.json`. Once the `menu_category` field is populated for each item in the `menu_data.json` list, I can write some simple code to check my categorization, broken down by each restaurant.

The code to check the categorization and update `menu_data.json` is as follows:

```python
categories = ["beverage",
    "dessert",
    "condiment",
    "side",
    "entree",]

categorized_item_names = {}

for category in categories:
    filename = category + "_item_names.json"
    with open(filename, "r") as f:
        item_names = json.load(f)

    categorized_items = {item_name: category for item_name in item_names}
    categorized_item_names.update(categorized_items)
    
# Check that items weren't dropped during categorization.
# The best way to do this check is to create two sets:
# One set of all of the menu item names in `menu_data`.
# The other set from the keys of `categorized_item_names`.
# I can then use set operations to compare these two sets.

assert set(minimum_sugar.extract_variable(menu_data, "item_name")) == set(categorized_item_names.keys())

# Categorize each menu item in `menu_data` by adding 
# the "menu_category" field and data from above.
for item in menu_data:
    # I could fold the following line into the set operation on the `item` dict, 
    # but I'm separating the operations for the sake of code readability.
    item_name = item["item_name"]
    item["menu_category"] = categorized_item_names[item_name]
    
# Reformat data into a dict so I'm not viewing superfluous data and so that I can categorize
# according to restaurant and menu category.
categorized_item_names = {}

for restaurant_name in restaurant_names:
    restaurant_menu_data = minimum_sugar.filter_menu_items(menu_data, "brand_name", restaurant_name)

    subdict = {}
    for category in categories:
        restaurant_categorized_menu_items = minimum_sugar.filter_menu_items(restaurant_menu_data, "menu_category", category)
        restr_categ_menu_item_names = minimum_sugar.extract_variable(restaurant_categorized_menu_items, "item_name")
        restr_categ_menu_item_names.sort()

        subdict[category] = restr_categ_menu_item_names

    categorized_item_names[restaurant_name] = subdict
    
# Print the result to check categorization
print json.dumps(categorized_item_names, indent=4, separators=(',', ': '))

# Everything is categorized properly, save the result.
with open("menu_data.json", "w") as f:
    f.write(json.dumps(menu_data, indent=4, separators=(',', ': ')))
```

## Visualize results

In [4]:
entree_items = minimum_sugar.filter_menu_items(menu_data, "menu_category", "entree")
x_max = max(minimum_sugar.extract_variable(entree_items, "nf_sugars"))

# for restaurant_name in restaurant_names:
#     restaurant_menu_items = minimum_sugar.filter_menu_items(menu_data, "brand_name", restaurant_name)
#     restaurant_entree_items = minimum_sugar.filter_menu_items(restaurant_menu_items, "menu_category", "entree")
#     f = minimum_sugar.menu_histogram(restaurant_entree_items, "nf_sugars", title=restaurant_name, param_name="Sugar [g]")
#     ax = f.axes[0]
#     ax.set_xlim([0, x_max])
#     plt.show()

[The American Heart Association recommends](http://www.heart.org/HEARTORG/GettingHealthy/NutritionCenter/HealthyEating/Frequently-Asked-Questions-About-Sugar_UCM_306725_Article.jsp) that women consume no more than 100 calories per day of added sugars and men consume no more than 150 calories per day of added sugars. These values work out to about (VALUE)g for women and (VALUE)g for men.

Its also very important to note that the American Heart Association makes a distinction between naturally occuring sugars and added sugar:

> **Are all sugars bad?**
>
> No, but added sugars add calories and zero nutrients to food. Adding a limited amount of sugars to foods that provide important nutrients—such as whole-grain cereal, flavored milk or yogurt—to improve their taste, especially for children, is a better use of added sugars than nutrient-poor, highly sweetened foods.

Unfortunately, if you are someone who wants to limit their added/refined sugar consumption, there is no US Federal regulation requiring manufacturers to include information about the amount of added vs. naturally occuring sugar in a food product. From what I've read, one can get an idea of the source of sugar in a food product by reading the list of ingredients; if the word "sugar" appears in the list, chances are that sugar is refined. For example, [a giant 32oz container of Dannon plain yogurt](http://www.amazon.com/Dannon-Natural-Quart-Plain-Yogurt/dp/B00RASDV2E/) contains 12g of sugar per serving, which seems like a lot in light of the American Heart Association recommendations above. However, this yogurt is made only from milk and yogurt cultures. Thus, this food product contains no added sugar.


## Analysis
The six restaurants I frequently patronize fall into two main categories: burger places and Mexican. There are a mix of the more legacy fast-food places like McDonald's, Wendy's, (and to perhaps a lesser extent) Taco Bell. The rest, Five Guys, Chipotle, and Qdoba, are upstart fast casual.

I first wanted to get a sense of the sugar in each restaurant's menu. I downloaded the nutrition information of each restaurant's entire menu using the [Nutritionix API](http://www.nutritionix.com/api) and attempted to plot a histogram, but I soon found that each restaurant's menu didn't yield an apples-to-apples comparison. Some restaurants included beverages, condiments, etc. while others did not. Therefore, I categorized the menu items of each restaurant by adding category data to each database element. The categories I used were beverage, dessert, condiment, side, and entree. Note that I did not categorize menu items according to meal, i.e. breakfast, lunch, and dinner. Once the items were categorized, I was able to plot histograms of the entree menu items for each restaurant; those histograms are given below.

(FIGURES)

Perhaps not surprisingly, the individual menu items featuring the maximum amount of sugar are offered by Wendy's and McDonald's. Wendy's is the winner in this category with a healthy sounding menu item: "[Steel-Cut Oatmeal with Cranberries and Pecans](http://www.nutritionix.com/i//steel-cut-oatmeal-with-cranberries-and-pecans/ae4920268b167116cadff337)". I wanted to give Wendy's the benefit of the doubt and believe that the sugar comes from the fruit. Unfortunately, Wendy's website features so much unnecessary HTML bling that I wasn't able to find the ingrediants for this menu item within a single click. This poor website design choice soured my positive feelings about Wendy's and now I assume they are trying to obfuscate their nutrition information because they have something to hide.

The most sugar-rich menu item for McDonald's is the [Big Breakfast With Hotcakes And Egg Whites (Large Biscuit)](http://www.nutritionix.com/i/mcdonalds/big-breakfast-with-hotcakes-and-egg-whites-large-biscuit-/521b95c74a56d006d578b11b) with 18g. Note that Wendy's has (VALUE) entree menu items with sugar greater than McDonald's Big Breakfast.

Of the burger places, Five Guys has the lowest ceiling on sugar. Unfortunately for the vegetarians out there, the two menu items with 14g sugar are the [Veggie Sandwich](http://www.nutritionix.com/i//veggie-sandwich/521b95cb4a56d006d578b9bc) and [Cheese Veggie Sandwich](http://www.nutritionix.com/i/five-guys/cheese-veggie-sandwich/521b95cb4a56d006d578b9a7). I've never ordered either of these sandwiches (nor did I realize they exist), but my guess is the sugar comes from the [bun](http://www.nutritionix.com/i//bun/521b95cb4a56d006d578b9a3) and ketchup -- removing those components should lower the total sugar. On the other hand, removing the bun from these sandwiches leaves you with a pile of vegetables, at which point you should probably just head over to Chopt. In fact, what vegetarian even considers Five Guys as a viable option for food?

Looking at the histograms, it seems that Chipotle's menu has the lowest amount of sugar per menu item. This conclusion is a little deceiving: the menu items listed for Chipotle are actually components used to assemble menu items such as delicious burritos. The [Sofritas](http://www.nutritionix.com/i//sofritas/52cdcbe1051cb9eb320014de) has the most sugar at 5g, but there's no mention of sugar in the [list of ingrediants](http://chipotle.com/ingredient-statement) for this item.

Taco Bell has 9 entree menu items tied for the most sugar (7g). In list form:

* [Biscuit Taco - Bacon, Egg & Cheese]()
* [Biscuit Taco - Sausage, Egg & Cheese]()
* [Biscuit Taco]()
* [Fiesta Taco Salad]()
* [Biscuit Taco - Sausage & Cheese]()
* [Fiesta Taco Salad - Chicken]()
* [Fiesta Taco Salad - Beef]()
* [Fiesta Taco Salad - Steak]()
* [Biscuit Taco - Egg & Cheese]()

Generally speaking, the burger places (McDonald's, Wendy's, and Five Guys) have a wider distribution in terms of sugar than the "Mexican" places.

Its possible to calculate things like average and standard deviation for the histograms above, but I don't see the utility in that information. Nobody goes to a restaurant and selects random items off the menu. My rule of thumb is to not order anything with more than 3g of sugar, and zero sugar is preferred. Using this rubric, the restaurants I considered have the following number of menu items at that part of the distribution:

* Wendy's - 
* McDonald's - 
* Five Guys - 
* Qdoba - 
* Taco Bell - 
* Chipotle - 

Moreover, I'm about 6'2" at 200lbs and I lift weights (bro), and so I am interested in consuming more calories than someone smaller or with a different exercise regimen. Thus, another visualization of interest for me is the ratio of sugar to calories.


## Todo
Either write these in the report or transfer them to the issue tracker:

* Plot beverage histograms on entree histograms. In this way I will be able to make the point that many of these beverages contain a large amount of sugar.
* Issue disclaimer that I am not a nutritionist or a physician and these results should be taken with a grain of salt (heh). The responsible thing to do is to talk to your doctor if you have concerns about how your diet might be affecting your health.
* List menu items containing less than 4g sugar for each restaurant.
* Add links to menu item names in the final report.

## Notes
It would be great if the FDA required restaurants and manufacturers of food to include the amount of added and/or refined sugars to their label.

It would also be great if ingrediant information was ubiquitous for these restaurants.

## Misc. observations during development
This section contains some notes on observations I made during development. I intend to include these observations in the report, but rewritten into the body itself and not this section.

First, the legacy/non-fast casual restaurants have a **ton** of menu items. McDonald's has >350 where Chipotle only has like 25. I can't imagine the amout of complexity that number of menu items adds to the management of the company. I also can't see how McDonalds gets rid of this complexity (i.e. sheds menu items) without alienating the customers these items intended to serve. It seems like this amount of complexity is an accretion over many years and is likely a result of their success. It seems eminently plausible that over the years executives at McDonalds thought, "We are dominating this part of the market which is basically tapped out. In order to experience even more growth, we need to expand into other markets. How do we expand into other markets while leveraging the power of this brand to crush the competition?"

Second, executing this project has made me vaguely aware that some of my development practices may not be suited for data science projects. For example, I think the workflow I am using to perform this analysis may not be the most effective. The workflow is: download all data from the server, then write filtering code to eventually get the data I want. I feel like some incarnation of this workflow is what a seasoned data scientist might do, but I suspect most of the filtering will be done by the server or at the database as opposed to at the level of the data scientists local machine.

As of commit [e297afb9](https://github.com/jrsmith3/minimum_sugar/commit/e297afb990153e07a80e8442aedcd4babb6b458b), I switched to a flat data structure. I thought this approach was going to make things easier, but I didn't realize how much easier it made things. I now just consider the menu item data to be a big pile of data, and I let the computer extract what I need based on queries I submit. I suspect this situation is how things are when one has a well-constructed database. Based on this experience, I should learn how to use SQL.

## Scratch

In [32]:
restaurant_name = "Taco Bell"

restaurant_menu_items = minimum_sugar.filter_menu_items(menu_data, "brand_name", restaurant_name)
restaurant_entree_items = minimum_sugar.filter_menu_items(restaurant_menu_items, "menu_category", "entree")
max_sugar = max(minimum_sugar.extract_variable(restaurant_entree_items, "nf_sugars"))

print "Max sugar:", max_sugar
print "\r"

menu_items = minimum_sugar.filter_menu_items(restaurant_entree_items, "nf_sugars", 7)

print "Number of items:", len(menu_items)
for menu_item in menu_items:
    print "* [" + menu_item["item_name"] + "]()"

Max sugar: 7

Number of items: 9
* [Biscuit Taco - Bacon, Egg & Cheese]()
* [Biscuit Taco - Sausage, Egg & Cheese]()
* [Biscuit Taco]()
* [Fiesta Taco Salad]()
* [Biscuit Taco - Sausage & Cheese]()
* [Fiesta Taco Salad - Chicken]()
* [Fiesta Taco Salad - Beef]()
* [Fiesta Taco Salad - Steak]()
* [Biscuit Taco - Egg & Cheese]()


In [40]:
for restaurant_name in restaurant_names:
    print "*", restaurant_name

* Wendy's
* McDonald's
* Five Guys
* Qdoba
* Taco Bell
* Chipotle
