In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 2 * matplotlib.rcParams['savefig.dpi']

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

## Introduction

Recall from the previous lesson on best fit line:

1. Given some $x_i$ and $y_i$, we can use a *best fit line* to approximate values of $y$ at unmeasured locations $x$. The line has an equation of the form $y = mx + b$
2. By using a higher-order polynomial instead of a line, we can approximate data that follows a curve.

This module explores a practical examples of this technique.

### Problem statement
([inspiration](http://classroom.synonym.com/everyday-examples-situations-apply-quadratic-equations-10200.html))

Suppose your company sells a product at price $p$. You have been put in charge of determining the best price. If you set the price too low, such as $p=0$, your company will clearly earn no revenue. Additionally, if the price is too high, nobody will buy your product and again your revenue is zero. How can we find the best price?

### Your experiment
In order to determine the best price, you run an experiment. You suspect that the best price is between 40 and 75. At each of your 7 retail outlets, you set a different price.

In [None]:
n = 7   # Number of stores

# At each store, you choose a different price.
# The prices are evenly spaced from 40 to 75
p = np.round(np.linspace(40,75,n),decimals=2)
print p

### Results of your experiment
Each store sells the product at its prescribed price. At the end of the month, the stores report how many items they sold:

In [None]:
# Number of sales of product
sales = np.array([59,54,47,43,37,31,25])

# Revenue gained from sales
r = p*sales

You make a pretty table to give to your boss:

In [None]:
print "%15s %15s  %15s" %("Store ID","Product price", "1-month revenue")
for j in range(n):
    print "%15d %15.2f  %15.2f"%(j+1,p[j],r[j])

Your boss looks at the list and says "Great work! We'll set the price at 45.83 for all stores!"

As a data scientist, though, you know better. In the real world, measurements fluctuate. Some stores do better than average and others do worse just by random chance. What can we do?

Let's visualize the data:

In [None]:
# Draw the plot
plt.plot(p,r,'s')

# Label the axes and title
plt.xlabel("Product Price")
plt.ylabel("1-month revenue")
plt.title("Results of experiment")

# Set the display size
plt.xlim([35,80])
plt.ylim([1800,2600])

We know from economics that price/revenue curves can be modelled by a parabola (order-2 polynomial.) Let's fit the data using `polyfit()`:

In [None]:
# Get the polynomial coefficients for an order-2 polynomial
a = np.polyfit(p,r,2)

# Define a fit function
fit_func = np.poly1d(a)

# Define a range for plotting
pfit = np.linspace(40,75,100)

# Evaluate fit
rfit = fit_func(pfit)

In [None]:
# Plot the fit along with the original data
plt.plot(p,r,'s')
plt.plot(pfit,rfit,'k-')

plt.xlabel("Product Price")
plt.ylabel("1-month revenue")
plt.title("Results of experiment")
plt.legend(["Measurements","Fit"])

plt.xlim([35,80])
plt.ylim([1800,2600])

Using the fit, you deduce that the naive optimal price your boss suggested was likely a fluke! The optimal product price is closer to 50.

We can next find the optimal price using `scipy.optimize.minimize()`. 

(Note: There is no `maximize()` in `scipy`. To find a maximum, we compute the negative revenue and call `minimize()`.)

In [None]:
opt = minimize(-fit_func,50) # Q: What is the type of opt?
print opt

In [None]:
# The opt object contains our optimal price. Let's print it out:
print "Optimal price: ", np.round(opt['x'][0],2)
print "Revenue at optimal price: ", np.round(-opt['fun'],2)

### Final thoughts

The analysis here gives a result that is better than just naively choosing the observed price $p$ that gives the maximum revenue $r$. The improvement is due to the fact that we have used a statistical technique (linear regression) to approximate the entire distribution. This procedure smooths out some of the noise which is inherent in measurement.

In practice, there could be many other considerations which affect the price in different stores. An obvious example is that some stores may be situated in more populated areas. Can you think of a way to control for this? Can you think of other factors which would affect the results?