# Python Homework with Chipotle Data

## Part 1

- Read in the file with csv.reader() and store it in an object called 'file_nested_list'.
- Hint: This is a TSV file, and csv.reader() needs to be told how to handle it.

In [1]:
!pwd

/Users/Pavani/Desktop/GA/Project-1-master


In [2]:
# Change the working directory to the 'data' directory
%cd data


/Users/Pavani/Desktop/GA/Project-1-master/data


In [3]:
# To use csv.reader, we must import the csv module
import csv
# The csv.reader has a delimeter parameter, which we set to '\t' to indicate that the file is tab-separated
# We temporarily refer to the file by the variable name f for file
# Create a list by looping through each line in f
with open('chipotle.tsv', mode='r') as f:   
    file_nested_list = [row for row in csv.reader(f, delimiter='\t')]
    header = file_nested_list[0]
    data = file_nested_list[1:]
    print (header)
    print (data[0])    

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']
['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 ']


### Why use csv.reader?

As stated in the [CSV file reading and writing documentation](https://docs.python.org/2/library/csv.html):

> There is no "CSV standard", so the format is operationally defined by the many applications which 
read and write it. The lack of a standard means that subtle differences often exist in the data 
produced and consumed by different applications. These differences can make it annoying to process 
CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the 
overall format is similar enough that it is possible to write a single module which can efficiently
manipulate such data, hiding the details of reading and writing the data from the programmer.

In other words, depending on the source, there may be intricacies in the data format. These are not always easy to distinguish - for instance, non-visible new line characters. The csv.reader module is built to handle these intricacies, and thus provides an efficient way to load data.

This is why we prefer: `file_nested_list = [row for row in csv.reader(f, delimiter='\t')]`

Instead of: `file_nested_list = [row.split('\t') for row in f]`

## Part 2

- Separate 'file_nested_list' into the 'header' and the 'data'.

In [4]:
# To use csv.reader, we must import the csv module
header = file_nested_list[0]
data = file_nested_list[1:]
print (header)
print (data[0])

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']
['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 ']


## Part 3

- Calculate the average price of an order.
- **Hint:** Examine the data to see if the 'quantity' column is relevant to this calculation.
- **Hint:** Think carefully about the simplest way to do this!

We want to find the average price of an order. This means we need the **sum of the price of all orders** and the **total number of orders**.

In [5]:
all_orders =[]
for price in data:
    all_orders.append(float(price[4].replace('$','')))
    average_price = sum(all_orders)/len(all_orders)
print('Average price of an order: $',round(average_price, 2))


Average price of an order: $ 7.46


### Calculate the average price

## Part 4

- Create a list (or set) of all unique sodas and soft drinks that they sell.
- **Note:** Just look for 'Canned Soda' and 'Canned Soft Drink', and ignore other drinks like 'Izze'.

In [6]:
unique_sodas=[]
for item in data:
    if item[2] == 'Canned Soda' or item[2] == 'Canned Soft Drink':
        unique_sodas.append(item[3])
        
print(set(unique_sodas))

{'[Diet Dr. Pepper]', '[Nestea]', '[Mountain Dew]', '[Coke]', '[Sprite]', '[Coca Cola]', '[Lemonade]', '[Diet Coke]', '[Dr. Pepper]'}


## Part 5

- Calculate the average number of toppings per burrito.
- **Note:** Let's ignore the 'quantity' column to simplify this task.
- **Hint:** Think carefully about the easiest way to count the number of toppings!

To calculate the average number of toppings, we simply need to divide the **total number of burritos** by the **total number of toppings**.

In [7]:
num_topping = []
Burrito_count = 0
for item in data:  
    if (item[2][-7:] == -1): 
        Burrito_count = 0 
    else: 
        num_topping.append(item[3].count(',') + 1)
        Burrito_count += 1
#print(Burrito_count)
#print(sum(num_topping))
#print(len(num_topping))
#avg_topping_per_burrito = Burrito_count/sum(num_topping)
Average_toppings_per_Burrito = sum(num_topping)/len(num_topping)
#print('Average number of toppings per burrito: ', int(round(avg_topping_per_burrito)))
print('Average number of toppings per burrito: ', int(round(Average_toppings_per_Burrito)))

Average number of toppings per burrito:  4


## Part 6

- Create a dictionary in which the keys represent chip orders and the values represent the total number of orders.
- **Expected output:** {'Chips and Roasted Chili-Corn Salsa': 18, ... }
- **Note:** Please take the 'quantity' column into account!
- **Optional:** Learn how to use 'defaultdict' to simplify your code.

In [8]:
chips_dict ={}
for item in data:
    if 'Chips' in item[2]:
        if item[2] in chips_dict.keys():
            chips_dict[item[2]] = int(chips_dict[item[2]])+int(item[1])
        else:
            chips_dict[item[2]] = item[1]
print(chips_dict)    

{'Chips and Fresh Tomato Salsa': 130, 'Chips and Tomatillo-Green Chili Salsa': 33, 'Side of Chips': 110, 'Chips and Guacamole': 506, 'Chips and Tomatillo Green Chili Salsa': 45, 'Chips': 230, 'Chips and Tomatillo Red Chili Salsa': 50, 'Chips and Roasted Chili-Corn Salsa': 18, 'Chips and Roasted Chili Corn Salsa': 23, 'Chips and Tomatillo-Red Chili Salsa': 25, 'Chips and Mild Fresh Tomato Salsa': '1'}
