# Anomaly detection (in non sequential data).

Anomaly detection algorithms detect observations that are significantly different from most of what you've seen before.

One classic example here is in detecting credit card fraud: how do we automatically detect purchases that a legitimate credit card owner is very unlikely to have made?

Another is in systems security: how do we detect activity on a network that's unlikely to be caused be a legitimate user?

The simplest possible case for anomaly detection is observational data with a single, normally distributed feature. 


Let's get a data sample from a normal pdf:

In [None]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import scipy as sp
import pandas as pd
import seaborn as sns

%matplotlib inline

sns.set_style('darkgrid')
plotBlue = sns.color_palette()[0]
np.random.seed(3)

N = 1000
X1 = np.random.normal(4, 12, N)
f, axes = plt.subplots(nrows=2, sharex=True)
axes[0].set_xlim(-50, 50)
axes[0].scatter(X1, np.zeros(N), marker='x', c=plotBlue)
axes[1].hist(X1, bins=50)
plt.show()

To model this data as a normal distribution, we compute the mean and the standard deviation from the sample we have.

In [None]:
sample_mean = X1.mean()
sample_sigma = X1.std()
print('Sample Mean:', sample_mean)
print('Sample Standard Deviation:', sample_sigma)

Now we just have to decide on some $\epsilon$ value, which dictates our probability threshold for anomalous events. 

If we set $\epsilon$ to .01, we're saying that any draw for which there's a probability of 1% or less that it given the normal distribution should be marked as anomalous. 

In general, if a sample follows the normal model with mean $\mu$ and standard error $\sigma$, then the
**confidence interval** of its samples is:

$$ \mu \pm z \times \sigma $$

where $z$ corresponds to the confidence level selected:

| Confidence Level  | z Value  | 
|---|---|
|  90% | 1.65 |
|  95% | 1.96 |
|  99% | 2.58 |
|  99,9% | 3.291 |


These values are the upper and lower bounds for what we consider 'normal', and are represented in the graphs below by the area shaded in red. Our estimate for the distribution therefore looks like this:

In [None]:
base = np.linspace(-50, 50, 100)
normal = sp.stats.norm.pdf(base, sample_mean, sample_sigma)

lower_bound = sample_mean - (2.58 * sample_sigma)
upper_bound = sample_mean + (2.58 * sample_sigma)
anomalous = np.logical_or(base < [lower_bound]*100, base > [upper_bound]*100)

plt.plot(base, normal)
plt.fill_between(base, normal, where=anomalous, color=[1, 0, 0, 0.4])
plt.xlim(-50, 50)
plt.show()
print('Lower Bound:', lower_bound)
print('Upper Bound:', upper_bound)

Let's look at two sample draws to see if they're anomalous.

In [None]:
plt.scatter(X1, np.zeros(N), marker='x', c=plotBlue)
plt.xlim(-50, 50)
plt.scatter(-29, 0, marker='x', color='red', s=150, linewidths=3)
plt.scatter(17, 0, marker='x', color='green', s=150, linewidths=3)
plt.axvline(lower_bound, ymin=.25, ymax=.75, color='red', linewidth=1)
plt.axvline(upper_bound, ymin=.25, ymax=.75, color='red', linewidth=1)
plt.show()

We'll now expand our analysis to multiple variables. Initially we will assume that they are **independently** normal distributed.

In [None]:
N = 1000
X1 = np.random.normal(4, 12, N)
X2 = np.random.normal(9, 5, N)
plt.scatter(X1, X2, c=plotBlue, alpha=0.5)
plt.show()

As before, we can estimate the means and standard deviations of the normal distributions through the samples.

In [None]:
x1_sample_mean = X1.mean()
x2_sample_mean = X2.mean()
x1_sample_sigma = X1.std()
x2_sample_sigma = X2.std()
print('Sample Mean 1:', x1_sample_mean)
print('Sample Mean 2:', x2_sample_mean)
print('Sample Standard Deviation 1:', x1_sample_sigma)
print('Sample Standard Deviation 2:', x2_sample_sigma)

As we would expect, these are not far from the actual values we used to generate the data.

Next, let's look at a heatmap of where we would expect to find observations given the joint probability distributions implied by these distributions.

In [None]:
delta = 0.025
x1 = np.arange(-60, 60, delta)
x2 = np.arange(-15, 30, delta)
x, y = np.meshgrid(x1, x2)

z = plt.mlab.bivariate_normal(x, y, x1_sample_sigma, x2_sample_sigma, x1_sample_mean, x2_sample_mean)
plt.contourf(x, y, z, cmap='bwr')

thinned_points = np.array([n in np.random.choice(N, 300) for n in range(N)])
plt.scatter(X1[thinned_points], X2[thinned_points], c='yellow', alpha=0.5)

plt.show()

As we move in towards the means, we're increasingly likely to draw an observation with those features. As we move away, we're less likely to see an observation with features at those values. We might, for instance, decide that anything in the dark-blue region is anomalous.

If you need a way to calculate the probability that a data point belongs to a normal distribution given some set of parameters. Fortunately SciPy has this built-in.

In [None]:
from scipy import stats  

X=[a for a in zip(X1,X2)]

X=np.array(X)
dist = stats.norm(x1_sample_mean, x1_sample_sigma)  
dist.pdf(X[:,0])[:50]

We just calculated the probability that each of the first 50 instances of our data set's first dimension belong to the distribution that we defined earlier by calculating the mean and variance for that dimension. 

Essentially it's computing how far each instance is from the mean and how that compares to the "typical" distance from the mean for this data.

Let's compute and save the probability density of each of the values in our data set given the Gaussian model parameters we calculated above.

In [None]:
p = np.zeros((X.shape[0], X.shape[1]))  
p[:,0] = stats.norm(x1_sample_mean, x1_sample_sigma).pdf(X[:,0])  
p[:,1] = stats.norm(x2_sample_mean, x2_sample_sigma).pdf(X[:,1])

In [None]:
outliers = np.where(p < 0.0009)

fig, ax = plt.subplots(figsize=(6,4))  
ax.scatter(X[:,0], X[:,1], alpha=0.4)  
ax.scatter(X[outliers[0],0], X[outliers[0],1], s=30, color='r', marker='o')  

The threshold value can be selected by using the $F_1$ score on a cross-validation set where true anomalies should be manually labeled.

The $F_1$ score is a measure of a test's accuracy. It considers both the precision $p$ and the recall $r$ of the test:

$$F1 = 2 * \frac{(precision * recall)}{(precision + recall)}$$

where

$$ p = \frac{tp}{tp+fp} $$

$$ r = \frac{tp}{tp+fn} $$

In [None]:
def select_threshold(pval, yval):  
    best_epsilon = 0
    best_f1 = 0
    f1 = 0

    step = (pval.max() - pval.min()) / 1000

    for epsilon in np.arange(pval.min(), pval.max(), step):
        preds = pval < epsilon

        tp = np.sum(np.logical_and(preds == 1, yval == 1)).astype(float)
        fp = np.sum(np.logical_and(preds == 1, yval == 0)).astype(float)
        fn = np.sum(np.logical_and(preds == 0, yval == 1)).astype(float)

        precision = tp / (tp + fp)
        recall = tp / (tp + fn)
        f1 = (2 * precision * recall) / (precision + recall)

        if f1 > best_f1:
            best_f1 = f1
            best_epsilon = epsilon

    return best_epsilon, best_f1

## Exercise

In this exercise, you will implement the anomaly detection algorithm and apply it to detect failing servers on a network.

The features measure the throughput (mb/s) and latency (ms) of response of each server. While your servers were operating, you collected $m = 307$ examples of how they were behaving, and thus have an unlabeled dataset $(x_1, \dots, x_m)$.

You suspect that the vast majority of these examples are “normal” (non-anomalous) examples of the servers operating normally, but there might also be some examples of servers acting anomalously within this dataset.

You will use a Gaussian model to detect anomalous examples in your dataset.

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(context="notebook", style="white", palette=sns.color_palette("RdBu"))

import numpy as np
import pandas as pd
from scipy import stats

from sklearn.cross_validation import train_test_split

+ Read the cvs file ``files/ex8data1.csv``. 

In [None]:
# Your code here

+ Visualize the distribution

In [None]:
# Your code here

+ Let's suppose independent features. Create a simple function that calculates the mean and variance for each feature in our data set.

In [None]:
# Your code here

+ Find (manually) a threshold value for detcting outliers. Visualize the result.

In [None]:
# Your code here

### Advanced Methods: One-class SVM with non-linear kernel (RBF)

One-class SVM is an unsupervised algorithm that learns a decision function for novelty detection: classifying new data as similar or different to the training set.

nu : An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager
from sklearn import svm

xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500))
# Generate train data
X = 0.3 * np.random.randn(100, 2)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * np.random.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))

# fit the model
clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
n_error_train = y_pred_train[y_pred_train == -1].size
n_error_test = y_pred_test[y_pred_test == -1].size
n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size

# plot the line, the points, and the nearest vectors to the plane
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("Novelty Detection")
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
plt.contourf(xx, yy, Z, levels=[0, Z.max()], colors='palevioletred')

s = 40
b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=s)
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='blueviolet', s=s)
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=s)
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a.collections[0], b1, b2, c],
           ["learned frontier", "training observations",
            "new regular observations", "new abnormal observations"],
           loc="upper left",
           prop=matplotlib.font_manager.FontProperties(size=11))
plt.xlabel(
    "error train: %d/200 ; errors novel regular: %d/40 ; "
    "errors novel abnormal: %d/40"
    % (n_error_train, n_error_test, n_error_outliers))
plt.show()

### Advanced Methods: Robust linear model estimation using RANSAC.

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method.

The input to the RANSAC algorithm is a set of observed data values, a way of fitting some kind of model to the observations, and some confidence parameters. RANSAC achieves its goal by repeating the following steps:

+ Select a random subset of the original data. Call this subset the hypothetical inliers.
+ A model is fitted to the set of hypothetical inliers.
+ All other data are then tested against the fitted model. Those points that fit the estimated model well, according to some model-specific loss function, are considered as part of the consensus set.

The estimated model is reasonably good if sufficiently many points have been classified as part of the consensus set.
Afterwards, the model may be improved by reestimating it using all members of the consensus set.

This procedure is repeated a fixed number of times, each time producing either a model which is rejected because too few points are part of the consensus set, or a refined model together with a corresponding consensus set size. In the latter case, we keep the refined model if its consensus set is larger than the previously saved model.

From Wikipedia [https://en.wikipedia.org/wiki/Random_sample_consensus]

In [None]:
import numpy as np
from matplotlib import pyplot as plt

from sklearn import linear_model, datasets


n_samples = 1000
n_outliers = 50


X, y, coef = datasets.make_regression(n_samples=n_samples, n_features=1,
                                      n_informative=1, noise=10,
                                      coef=True, random_state=0)

# Add outlier data
np.random.seed(0)
X[:n_outliers] = 3 + 0.5 * np.random.normal(size=(n_outliers, 1))
y[:n_outliers] = -3 + 10 * np.random.normal(size=n_outliers)

# Fit line using all data
model = linear_model.LinearRegression()
model.fit(X, y)

# Robustly fit linear model with RANSAC algorithm
model_ransac = linear_model.RANSACRegressor(linear_model.LinearRegression())
model_ransac.fit(X, y)
inlier_mask = model_ransac.inlier_mask_
outlier_mask = np.logical_not(inlier_mask)

# Predict data of estimated models
line_X = np.arange(-5, 5)
line_y = model.predict(line_X[:, np.newaxis])
line_y_ransac = model_ransac.predict(line_X[:, np.newaxis])

# Compare estimated coefficients
print("Estimated coefficients (true, normal, RANSAC):")
print(coef, model.coef_, model_ransac.estimator_.coef_)

lw = 2
plt.scatter(X[inlier_mask], y[inlier_mask], color='yellowgreen', marker='.',
            label='Inliers')
plt.scatter(X[outlier_mask], y[outlier_mask], color='gold', marker='.',
            label='Outliers')
plt.plot(line_X, line_y, color='navy', linestyle='-', linewidth=lw,
         label='Linear regressor')
plt.plot(line_X, line_y_ransac, color='cornflowerblue', linestyle='-',
         linewidth=lw, label='RANSAC regressor')
plt.legend(loc='lower right')
plt.show()

# Anomaly detection (in time series).

People spend a lot of time watching the data.

### The problem

It’s hard to manually spot when something has changed enough to care about. If something has changed, it’s hard to identify why.

### The solution

Implement a system which watches for unexpected changes. When a change occurs, offer likely explanations for the change so people can investigate.

### Examples

Let's suppose we are an online retailer. The business metrics we are interested in are:

+ Number of orders
+ Total value of orders
+ Average value of order
+ Number of web visits
+ Order rate of web visits
+ Bounce rate

And the possible origins of change are:

+ Country
+ Device type (e.g. mobile)
+ Web landing page
+ Web traffic source
+ Retailer

Expected explanations:

+ One retailer forgets to send us order data. > The number of orders that we make is lower than expected.
+ Checkout is broken for mobile web. > The number of orders that we make is lower than expected.
+ One retailer is having a sale. > The average value of an order falls.
+ It’s sale season in a particular country. > The average value of an order falls.
+ A common landing point of our website is broken. > Bounce rate increases.

There are many types of signals, and "change" can mean very different things.

**Broken trend or seasonality**: These are monthly-average daily calls to directory assistance Jan. 62 – Dec 76. We see a sudden drop in activity in this signal.

In [None]:
signal = [350,339,351,364,369,331,331,340,346,341,357,398,381,367,383,375,353,361,375, \
          371,373,366,382,429,406,403,429,425,427,409,402,409,419,404,429,463,428,449, \
          444,467,474,463,432,453,462,456,474,514,489,475,492,525,527,533,527,522,526, \
          513,564,599,572,587,599,601,611,620,579,582,592,581,630,663,638,631,645,682, \
          601,595,521,521,516,496,538,575,537,534,542,538,547,540,526,548,555,545,594, \
          643,625,616,640,625,637,634,621,641,654,649,662,699,672,704,700,711,715,718, \
          652,664,695,704,733,772,716,712,732,755,761,748,748,750,744,731,782,810,777, \
          816,840,868,872,811,810,762,634,626,649,697,657,549,162,177,175,162,161,165, \
          170,172,178,186,178,178,189,205,202,185,193,200,196,204,206,227,225,217,219, \
          236,253,213,205,210,216,218,235,241]
plt.plot(signal, alpha = 0.5)
plt.title('Example: Directory Assistance Calls')

**Abberation in periodicity**: This is a fairly regular sinusoidal signal, shown with an anomaly where one of the waves is squashed.

In [None]:
signal = np.sin(np.linspace(0, 15*np.pi, num=300))
signal[105:155] *= 0.1
signal = 10 * signal + 50

noise = np.random.normal(scale = 1.5, size=300)
signal = signal + noise

plt.ylim(0,100)
plt.plot(signal)
plt.title('Example: periodic signal anomaly')

## Static Mean Detector

We'll start with as basic as a signal that we can consider. It consists of a signal value, repeated, that, at some point, changes.

In [None]:
sig1 = np.ones(150)
sig1[:100] *= 50
sig1[100:] *= 40

# Size of change in our test signal
jump_size = sig1[0] - sig1[-1]

# We'll add a small amount (0.02 x jump_size) of Gaussian noise to the signal. 
noise = np.random.normal(
    size=sig1.shape,
    scale=jump_size * 0.1)

sig1 = sig1 + noise

plt.figure(figsize=(15, 5))
plt.plot(sig1, 'b.')
plt.plot(sig1, '-', alpha=0.2)
plt.ylim(0,100)
plt.title("sig1 : A trivial signal")

The simplest detector calculates the mean of the signal at each step, and uses stopping rules based on if an incoming signal value differs from the mean by some threshold percent:

In [None]:
import pandas as pd
ser = pd.Series(sig1)
mean1 = ser.rolling(window=3,center=False).mean()
mean2 = ser.rolling(window=20,center=False).mean()
mean_dif = abs(mean1 - mean2)
change = mean_dif.argmax()

In [None]:
plt.figure(figsize=(15, 5))
plt.plot(sig1, 'b.', alpha=0.5)
plt.plot(mean1, '-')
plt.plot(mean2, '-')
plt.plot(mean_dif*5, '-')
plt.axvline(x=change, ymin=0, ymax = 100, linewidth=2, color='k')
plt.ylim(0,100)
plt.title("sig1 : A trivial signal")

This change detection method has additional weaknesses:

+ Sensitive to the threshold value, which we are determining manually.
+ Sensitive to anomalous values and outliers
+ Signal must be constant. the detector doesn't work well with drift (trend) or local variation (seasonality).

In [None]:
# Create a seasonal signal
# I imagined a metric that rises from 0 to 5 each calendar month

sig2 = np.linspace(0, 5, num=30)
sig2 = np.concatenate([sig2 for x in xrange(12)])

# Add a jump
jump_size = 5
sig2[250:] = sig2[250:] + jump_size

# Noise
noise = np.random.normal(
    size=sig2.shape,
    scale=jump_size * 0.1)

plt.figure(figsize=(15,5))
plt.plot(sig2 + noise, 'b.', linestyle='')
plt.plot(sig2 + noise, 'b-', alpha=0.15)
plt.ylim(0,15)
plt.xlim(0,365)
plt.title("Imaginary Seasonal signal")

In [None]:
ser = pd.Series(sig2)+noise
mean1 = ser.rolling(window=3,center=False).mean()
mean2 = ser.rolling(window=20,center=False).mean()
mean_dif = abs(mean1 - mean2)
change = mean_dif.argmax()

plt.figure(figsize=(15, 5))
plt.plot(ser, 'b.', alpha=0.5)
plt.plot(mean1, '-')
plt.plot(mean2, '-')
plt.plot(mean_dif, '-')
plt.axvline(x=change, ymin=0, ymax = 14, linewidth=2, color='k')
plt.ylim(0,15)
plt.xlim(0,365)
plt.title("sig2 : A non-trivial signal")

## Exercise

Write a method to eliminate seasonality and apply a static mean detector.

In [None]:
# Your code.

## What is an anomaly? (v1.0)

An anomaly is a point which deviates from our expectation by a significant margin.

> Statistics to the rescue: So we can totally just use $[\mu \pm 2\sigma]$, right? 

No! Things change over time.

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from sklearn import linear_model, datasets
% matplotlib inline

n_samples = 100
X, y, coef = datasets.make_regression(n_samples=n_samples, n_features=1,
                                      n_informative=1, noise=5,
                                      coef=True, random_state=0)

X[50]= [-0.90729836]
y[50]= -75.759

plt.plot([[0] for i in range(n_samples)], y, '.g', label='Inliers')
plt.plot([0], y[50], '.r', label='Inliers')

plt.show()

plt.plot(X, y, '.g', label='Inliers')
plt.plot(X[50], y[50], '.r', label='Inliers')

plt.show()

Recent data should carry more weight. 

## What is an anomaly? (v2.0)

An anomaly is a point which deviates from our **expectation** by a **significant** margin.

Our expectation should be more dependent on the recent past than the whole history.

Our definition of significant should be more dependent on the recent past than the whole history. 

### Idea

Predict the future by exponentially weighted mean of the past. This takes all the past into account, but weights the most recent past as more predictive. This is called Holt method.

Imagine a weighted average where we consider all of the data points, while assigning exponentially smaller weights as we go back in time. For example if we started with 0.9, our weights would be (going back in time):

$$ 0.9^1, 0.9^2, 0.9^3 ... $$

Or

$$ 0.9, 0.81, 0.729, ... $$

…eventually approaching zero. The smaller the starting weight, the faster it approaches zero.

Only… there is a problem: weights do not add up to 1. The sum of the first 3 numbers alone is already 2.439! 

What Holts assures a permanent place in the history of Mathematics is solving this with a succinct and elegant formula:

$$ \hat{y}_x = \alpha \cdot y_x + (1 - \alpha) \cdot \hat{y}_{x-1}  $$

You can think of $\alpha$ as a sort of a starting weight 0.9 in the above  example. It is called the smoothing factor or smoothing coefficient. 

$\alpha$  is a value that dictates how much weight we give the most recent observed value versus the last expected. It’s a kind of a lever that gives more weight to the left side when it’s higher (closer to 1) or the right side when it’s lower (closer to 0): the higher the $\alpha$, the faster the method “forgets”.

Observation:

$$ \begin{eqnarray}
\hat{y}_x & = & \alpha \cdot y_x + (1 - \alpha) \cdot \hat{y}_{x-1} \\
          & = & \alpha \cdot y_x + \alpha \cdot(1 - \alpha) \cdot y_{x-1} + (1 - \alpha)^2 \cdot \hat{y}_{x-2} \\
          & = & \alpha \cdot [ y_x + (1 - \alpha) \cdot y_{x-1} + (1 - \alpha)^2 \cdot y_{x-2} + \dots + (1 - \alpha)^{x-1} \cdot y_1] + (1 - \alpha)^{x} \cdot y_0 \\
\end{eqnarray} $$


Choose alpha between 0 and 1. Lower values of alpha adapt to changes slower, so lead to more stable predictions, but don’t adapt so quickly to genuine change.

In [None]:
def exponential_smoothing(series, alpha):
    # given a series and alpha, return series of smoothed points
    smoothed = [series[0]]
    for i in range(1,len(series)):
        smoothed.append(alpha * series[i] + (1 - alpha) * smoothed[i-1])
    return smoothed

series= [3, 9.3, 11.73, 12.87, 12.08, 10.20, 11.82, 12.89, 13.78, 14.65]

In [None]:
plt.plot(np.arange(len(series)), series, 'or', alpha=0.5)
plt.plot(np.arange(len(series)), exponential_smoothing(series, 0.7), 'g', alpha=0.5)
plt.plot(np.arange(len(series)), exponential_smoothing(series, 0.9), 'b', alpha=0.5)


plt.show()

We can measure the exponentially **weighted mean-squared-error** of previous predictions from the actual values. This gives an expected range of current deviation from predicted value. 

In [None]:
def std_exponential_smoothing(series, alpha, beta):
    import numpy as np
    std = np.zeros(len(series))
    smoothed = np.zeros(len(series))
    smoothed[0] = series[0]
    for i in range(1,len(series)):
        smoothed[i] = (alpha * series[i] + (1 - alpha) * smoothed[i-1])
        std[i] = (1 - beta) * (std[i-1] + beta * (series[i] - smoothed[i-1])**2) 
    return np.sqrt(std) 

In [None]:
series= [3, 9.3, 11.73, 12.87, 12.08, 10.20, 11.82, 12.89, 13.78, 14.65]
pred = exponential_smoothing(series, 0.5)
std = std_exponential_smoothing(series, 0.5, 0.3)

plt.plot(np.arange(len(series)), series, 'or', alpha=0.9)
plt.plot(np.arange(len(series)), pred + std, 'g', alpha=0.5)
plt.plot(np.arange(len(series)), pred - std, 'g', alpha=0.5)
plt.plot(np.arange(len(series)), pred , 'b', alpha=0.5)

In [None]:
series= [3, 9.3, 11.73, 12.87, 12.08, 10.20, 11.82, 12.89, 13.78, 14.65, 8.8, 15.0]
pred = exponential_smoothing(series, 0.5)
std = std_exponential_smoothing(series, 0.5, 0.05)
plt.plot(np.arange(len(series)), series, 'or', alpha=0.9)
plt.plot(np.arange(len(series)), pred + std, 'g', alpha=0.5)
plt.plot(np.arange(len(series)), pred - std, 'g', alpha=0.5)
plt.plot(np.arange(len(series)), pred , 'b', alpha=0.5)