# Data wrangling


## Notebook Content
It is very often necessary to make data conform to specific requirements. Here's a detailed example. We start with a string of costs, and want the sum of the costs. Here's the input: 

In [None]:
budget = """Airfare 300.00
Hotel 200.00
Food 100.00"""
budget

Our next step is to split the data up into chunks we can handle. We start by splitting it into lines. The line break character is '\n'. 

In [None]:
# splitting at \n converts this to a list of lines. 
lines = budget.split('\n')
lines

Our next step is to split up each line. The lines contain spaces between words. So we split at spaces. 

In [None]:
costs = []
for l in lines: 
    # splitting at ' ' separates the item from its cost .
    thing, cost = l.split(" ")
    # but cost is still a string, and we want a number. 
    number = float(cost)
    costs.append(number)
costs

Finally, we sum up the costs. This is a matter of making a sum of a list of things. Here's the pattern: 

In [None]:
total = 0
for c in costs: 
    total += c
total   

or, we could instead remember to write: 

In [None]:
total = sum(costs)
total

# Some basic observations
1. Each transformation requires the previous one. 
2. I printed the result of each transformation to ensure that I was doing things correctly.
3. The cells are written in the order in which they should be executed.
4. One can thus visually determine whether everything is working correctly. 
5. (The postconditions for the previous step are at least enough to be preconditions for the next step.)

# Some exercises

1. Write the above calculation -- involving multiple cells -- in one cell. 

In [None]:
# replace this comment with your answer

2. **Try this same code on the string below.** Does it work reasonably? 

In [None]:
budget = """Airfare 300.00 TWA
Hotel 200.00 Marriott
Food 100.00 Legal-Seafoods"""
budget

In [None]:
# copy the previous code here, and run it

___Your answer:___

3. **What happens with the budget code if you put one of the columns in one line of data in the wrong order?** Hint: reverse the second and third columns. Try it here: 

In [None]:
# { copy code here to try the experiment, and run it. }

___Your answer:___

4. **Based upon this, what are the complete preconditions for this code to work correctly?** 

___Your answer:___

5. Very often, we need to deal with exceptions. For example, the code 
    
```
    try:
        foo = float(x)
    except: 
        foo = 0.0
```
   tries to convert `x` to `float`, but if it can't be converted for some reason, sets the conversion to 0.0.
   
   Write a version of 

```
costs = []
for l in lines: 
    thing, cost = l.split(" ")
    number = float(cost)
    costs.append(number)
```

   that deals elegantly with columns that are not numbers and adds zero for each such column. Run it on the example below.

In [None]:
budget = """Airfare 300.00 TWA
Hotel 200.00 Marriott
Snack N/A Street-Vendor
Food 100.00 Legal-Seafoods"""

# fill in details here

# When you're done, submit the notebook
You can submit a notebook by saving it as PDF. In the cluster environment, it's File | Print (Save as PDF) and submit to Gradescope. https://www.gradescope.com/courses/182658, On other versions, it may be File | Download As (PDF) and then submit to Gradescope.

To submit to Gradescope, log into the website, add course 9W7PW3 (if not already added) and submit. The assignment name should match the name of this notebook.