## Get revenue per purchase
Develop a function which returns a new Spark Data Frame with new column holding revenue. 
* Following are the details about each item in the Data Frame.
  * It has 4 columns - order id, book title, quantity, retail prices of each book.
* We can compute the revenue of each purchase by multiplying quantity and the retail price of each book. Look at the expected output for further details.
* The function should take `purchases_df` as input and it should return a new Data Frame with order id, book title and the revenue. The revenue should be rounded off to 2 decimals.

In [0]:
purchases = [ 
    [34587, "Learning Python, Mark Lutz", 4, 40.95], 
    [98762, "Programming Python, Mark Lutz", 5, 56.80], 
    [77226, "Head First Python, Paul Barry", 3, 32.95],
    [88112, "Einführung in Python3, Bernd Klein", 3, 24.99]
]

purchases_df = spark.createDataFrame(purchases, 'order_id INT, book_title STRING, quantity INT, retail_price FLOAT')

## Step 1: Preview the data
* Let us first preview the data.

In [0]:
display(purchases_df)

order_id,book_title,quantity,retail_price
34587,"Learning Python, Mark Lutz",4,40.95
98762,"Programming Python, Mark Lutz",5,56.8
77226,"Head First Python, Paul Barry",3,32.95
88112,"Einführung in Python3, Bernd Klein",3,24.99


In [0]:
purchases_df.count()

## Step 2: Provide the solution
Now come up with the solution by developing the required logic. Once the function is developed, go to the next step to take care of the validation.

In [0]:
from pyspark.sql.functions import col, round
def get_revenue_per_item(purchases_df):
    # your code should go here
    revenue_per_item = purchases_df. \
        withColumn('revenue', round(col('quantity') * col('retail_price'), 2)). \
        drop('quantity', 'retail_price')
    return revenue_per_item

### Step 3: Validate the function

Here is the expected output. Ignore rounding off issue while converting to Python list to validate the output.
```python
[{'order_id': 34587,
  'book_title': 'Learning Python, Mark Lutz',
  'revenue': 163.8000030517578},
 {'order_id': 98762,
  'book_title': 'Programming Python, Mark Lutz',
  'revenue': 284.0},
 {'order_id': 77226,
  'book_title': 'Head First Python, Paul Barry',
  'revenue': 98.8499984741211},
 {'order_id': 88112,
  'book_title': 'Einführung in Python3, Bernd Klein',
  'revenue': 74.97000122070312}]
```

In [0]:
items = get_revenue_per_item(purchases_df)

In [0]:
display(items) # There should be 4 records with 3 columns. Revenue should be rounded off to 2 decimals.

order_id,book_title,revenue
34587,"Learning Python, Mark Lutz",163.8
98762,"Programming Python, Mark Lutz",284.0
77226,"Head First Python, Paul Barry",98.85
88112,"Einführung in Python3, Bernd Klein",74.97


In [0]:
items.toPandas().to_dict(orient='records')