<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Project 2: Analyzing Chipotle Data

_Author: Joseph Nelson (DC)_

---

For Project 2, you will complete a series of exercises exploring [order data from Chipotle](https://github.com/TheUpshot/chipotle), compliments of _The New York Times'_ "The Upshot."

For these exercises, you will conduct basic exploratory data analysis (Pandas not required) to understand the essentials of Chipotle's order data: how many orders are being made, the average price per order, how many different ingredients are used, etc. These allow you to practice business analysis skills while also becoming comfortable with Python.

---

## Basic Level

### Part 1: Read in the file with `csv.reader()` and store it in an object called `file_nested_list`.

Hint: This is a TSV (tab-separated value) file, and `csv.reader()` needs to be told [how to handle it](https://docs.python.org/2/library/csv.html).

In [1]:
import csv
from collections import namedtuple   # Convenient to store the data rows

DATA_FILE = './data/chipotle.tsv'

In [2]:
with open(DATA_FILE) as csvfile:
    
    # Use tab delimiter since the file is tab delimited
    file_nested_list = csv.reader(csvfile, delimiter='\t')
    
    # Capture the first row in variable = headerby using the 'next' function
    header = next(file_nested_list)
    
    # Capture the data rows in a 'data' array
    data=[]
    for row in file_nested_list:
        data.append(row)

### Part 2: Separate `file_nested_list` into the `header` and the `data`.


In [3]:
# This was done in the previous section

---

## Intermediate Level

### Part 3: Calculate the average price of an order.

Hint: Examine the data to see if the `quantity` column is relevant to this calculation.

Hint: Think carefully about the simplest way to do this!

In [4]:
# 'order_id_counter' = Records order_id. Resets to zero for every new order
order_id_counter = data[0][0]

# 'order_price_counter' = Records cumulative price of order. Resets to zero for every new order
order_price_counter = float(data[0][1]) * float(data[0][4][1:])

# 'orders_counter' = Tracks the number of unique orders.  Increments by 1 for every new order
orders_counter = 1

# 'order_prices' = list that captures total price of each order
order_prices = []

for i in range(1, len(data)):
    if data[i][0] == order_id_counter:
        
        # Convert the currency to a float using 'float(data[i][4][1:])', which strips the '$'
        order_price_counter += float(data[i][1]) * float(data[i][4][1:])
        
        # Append the order price of the very last order, in the event that the very last item is not a new order
        if i == len(data) - 1:
            order_prices.append(order_price_counter)
    else:
        order_prices.append(order_price_counter)
        order_price_counter = float(data[i][1]) * float(data[i][4][1:])
        order_id_counter = data[i][0]
        orders_counter += 1

In [5]:
# Average price is the sum of the items in the 'order_prices' list divided by the number of unique items, captured in items_counter
avg_order_price = sum(order_prices)/orders_counter

# Convert float to price format
print('${:,.2f}'.format(avg_order_price))

$21.39


### Part 4: Create a list (or set) named `unique_sodas` containing all of unique sodas and soft drinks that Chipotle sells.

Note: Just look for `'Canned Soda'` and `'Canned Soft Drink'`, and ignore other drinks like `'Izze'`.

In [6]:
# Capture non-unique soda names in list 'soda[]'
sodas = []

# Loop through rows.  If 'item_name' = 'Canned Soda' or 'Canned Soft Drink', append 'soda[]' with 'choice_description' value 
for i in range(0,len(data)):
    if data[i][2] == 'Canned Soda' or data[i][2] == 'Canned Soft Drink':
        sodas.append(data[i][3])

In [7]:
# Convert sodas to set to get unique values
unique_sodas = list(set(sodas))
print(unique_sodas)

['[Sprite]', '[Coke]', '[Dr. Pepper]', '[Diet Dr. Pepper]', '[Lemonade]', '[Mountain Dew]', '[Diet Coke]', '[Coca Cola]', '[Nestea]']


---

## Advanced Level


### Part 5: Calculate the average number of toppings per burrito.

Note: Let's ignore the `quantity` column to simplify this task.

Hint: Think carefully about the easiest way to count the number of toppings!


In [8]:
# Assume that each salsa counts as a topping
# Capture number of toppings of each Burrito item in list 'toppings[]'
toppings = []
count_of_burritos = 0

for i in range(0,len(data)):
    
    # Look for the word 'Burrito' in 'item_name'
    if 'Burrito' in data[i][2]:
        
        # The number of toppings is equal to the number of commas in 'choice_description' + 1
        commas = 1 + data[i][3].count(',')
        
        toppings.append(commas)
        count_of_burritos += 1

In [9]:
print(sum(toppings)/count_of_burritos)

5.395051194539249


### Part 6: Create a dictionary. Let the keys represent chip orders and the values represent the total number of orders.

Expected output: `{'Chips and Roasted Chili-Corn Salsa': 18, ... }`

Note: Please take the `quantity` column into account!

Optional: Learn how to use `.defaultdict()` to simplify your code.

In [10]:
# The list 'chips[]' stores all item_names with the word 'Chips' in 'item_name'
chips = []
for i in range(0,len(data)):
    if 'Chips' in data[i][2]:
        chips.append(data[i][2])

In [11]:
# Define 'chips_count' as a defaultdict with an int input
from collections import defaultdict
chips_count = defaultdict(int)

# Loop through the list 'chips' and increment the 'chips_count' defaultdict
for x in chips:
    chips_count[x] += 1

In [12]:
# Convert chips_count to dict and print
print(dict(chips_count))

{'Chips and Fresh Tomato Salsa': 110, 'Chips and Tomatillo-Green Chili Salsa': 31, 'Side of Chips': 101, 'Chips and Guacamole': 479, 'Chips and Tomatillo Green Chili Salsa': 43, 'Chips': 211, 'Chips and Tomatillo Red Chili Salsa': 48, 'Chips and Roasted Chili-Corn Salsa': 18, 'Chips and Roasted Chili Corn Salsa': 22, 'Chips and Tomatillo-Red Chili Salsa': 20, 'Chips and Mild Fresh Tomato Salsa': 1}


---

## Bonus: Craft a problem statement about this data that interests you, and then answer it!


# For each major meat type (Chicken, Steak, Carnitas, Barbacoa), find the most common salsa topping

In [13]:
# Strategy is to convert 'item_name' into a list of readable strings
# Then loop through the list and, if the word 'Salsa' appears, append to the corresponding meat counter 

chicken, steak, carnitas, barbacoa = ([] for x in range(4))
for i in range(0,len(data)):
    
    # Store 'item_name' in the variable 'item'. Convert to a list of strings.
    item = data[i][3].split(',')
    
    # Remove brackets and spaces to convert elements to a readable string
    item = [x.replace("[","").replace("]","").strip() for x in item]
    
    # Loop through each item in 'item_name'. If it has 'Salsa' in the name, append the corresponding variable
    for y in item:
        if 'Salsa' in y:
            if 'Chicken' in data[i][2]:
                chicken.append(y)
            elif 'Steak' in data[i][2]:
                steak.append(y)
            elif 'Carnitas' in data[i][2]:
                carnitas.append(y)
            elif 'Barbacoa' in data[i][2]:
                barbacoa.append(y)

In [37]:
# Use Counter from the 'collections' library to get the most commonly occurring item in a list
from collections import Counter
most_common = [Counter(chicken).most_common(1), Counter(carnitas).most_common(1), Counter(steak).most_common(1), Counter(barbacoa).most_common(1)]
meats = ['Chicken', 'Carnitas', 'Steak', 'Barbacoa']

# Print results
for i in range(4 ):
    print('The most common salsa in ' + meats[i] + ' is ' + str(most_common[i][0][0]) + '. It appears ' +  str(most_common[i][0][1]) + ' times.')         

The most common salsa in Chicken is Fresh Tomato Salsa. It appears 654 times.
The most common salsa in Carnitas is Fresh Tomato Salsa. It appears 60 times.
The most common salsa in Steak is Fresh Tomato Salsa. It appears 215 times.
The most common salsa in Barbacoa is Tomatillo Red Chili Salsa. It appears 55 times.
