# percentage simple moving average compared against a composite index

More information on this jupyter notebook can be found in [my blog](http://dbadailystuff.com/percentage-of-time-series-over-its-sma-simple-moving-average-compared-against-a-weighted-index).

In [1]:
# IPython magic to create plots within cells
%matplotlib inline

#### Carreguem les llibreries que necessitem

In [2]:
import pandas as pd
import numpy as np
import bqplot as bq

from datetime import datetime

In [3]:
print ('pandas: {v}'.format(v=pd.__version__))
print ('bqplot: {v}'.format(v=bq.__version__))

pandas: 0.23.4
bqplot: 0.11.1


---

## Read and explore the datasets

#### Read all SP&500 components dataset

Filename: `WIKI_PRICES_212b326a081eacca455e13140d7bb9db.csv`

Column header:

`ticker,date,open,high,low,close,volume,ex-dividend,split_ratio,adj_open,adj_high,adj_low,adj_close,adj_volume`

Column example:

`A,1999-11-18,45.5,50.0,40.0,44.0,44739900.0,0.0,1.0,31.041951216877,34.112034304261,27.289627443409,30.018590187749,44739900.0`

Load the CSV file into a Pandas DataFrame. Use of `dtype` parameter to specify the type of each column:

In [4]:
%%time
prices_file = 'WIKI_PRICES_212b326a081eacca455e13140d7bb9db.csv'

stocks = (pd.read_csv(prices_file, sep=',', header=0, index_col=1, 
                    parse_dates=['date'], date_parser=lambda dt: pd.datetime.strptime(dt, '%Y-%m-%d'),
                    dtype={'ticker': str, 'open': np.float32, 'high': np.float32, 'low': np.float32, 'close': np.float32,
                           'volume': np.float32, 'exdivident': np.float32, 'split_ratio': np.float32, 
                           'adj_open': np.float32, 'adj_high': np.float32, 'adj_low': np.float32, 'adj_close': np.float32,
                           'adj_volume': np.float32 })
         .sort_index())

CPU times: user 3min 17s, sys: 3.89 s, total: 3min 21s
Wall time: 3min 18s


In [5]:
stocks.head(3)

Unnamed: 0_level_0,ticker,open,high,low,close,volume,ex-dividend,split_ratio,adj_open,adj_high,adj_low,adj_close,adj_volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1962-01-02,ARNC,65.559998,65.75,65.379997,65.379997,5600.0,0.0,1.0,3.458163,3.468185,3.448668,3.448668,44800.0
1962-01-02,DD,241.5,244.25,241.5,241.5,2000.0,0.0,1.0,4.718414,4.772143,4.718414,4.718414,36000.0
1962-01-02,GE,75.0,76.25,74.25,74.75,21600.0,0.0,1.0,0.329505,0.334997,0.32621,0.328407,2073600.0


In [6]:
stocks.loc[stocks.ticker=='AAPL',['close', 'adj_close']].head(3)

Unnamed: 0_level_0,close,adj_close
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-12-12,28.75,0.422706
1980-12-15,27.25,0.400652
1980-12-16,25.25,0.371246


#### SP&500 index

`Date, Open, High, Low, Close
04/24/18, 2680.80, 2683.55, 2617.32, 2634.56
04/23/18, 2675.40, 2682.86, 2657.99, 2670.29`

In [7]:
%%time
sp500 = (pd.read_csv('HistoricalPrices_SP500.csv', sep=',', header=0, index_col=0, 
                    parse_dates=['Date'], date_parser=lambda dt: pd.datetime.strptime(dt, '%m/%d/%y'),
                    dtype={' Open': np.float32, ' High': np.float32, ' Low': np.float32, ' Close': np.float32})
         .sort_index())

sp500 = sp500.rename(columns={' Open': 'open', ' High': 'high', ' Low': 'low', ' Close': 'close'})

CPU times: user 80.5 ms, sys: 11 µs, total: 80.5 ms
Wall time: 79.3 ms


---

### Prepare the datasets

#### Adjust datasets to be on the same dates

Both datasets don't start at the same time and they should for the plot. The components start earlier that the SP&500 dataset, so we make it start at the same time.

In [8]:
sp500.index.min()

Timestamp('1993-04-05 00:00:00')

In [9]:
stocks.index.min()

Timestamp('1962-01-02 00:00:00')

In [10]:
stocks = stocks.loc[stocks.index >= sp500.index.min()]
stocks.index.min()

Timestamp('1993-04-05 00:00:00')

---

#### Weekly

Resample it to weekly and get just the adjustment close. This step is optional, the daily quotes could be used.

In [11]:
stocks_week = (stocks.groupby(['ticker'])
               .resample('W')
               .mean()
               .adj_close
               .to_frame())

stocks_week.tail()

Unnamed: 0_level_0,Unnamed: 1_level_0,adj_close
ticker,date,Unnamed: 2_level_1
ZUMZ,2018-03-04,19.99
ZUMZ,2018-03-11,20.23
ZUMZ,2018-03-18,20.32
ZUMZ,2018-03-25,23.610001
ZUMZ,2018-04-01,24.125


## Percentage of stocks over the SMA (Simple Moving Average):



In [12]:
def percentage_series_over_sma(df: pd.DataFrame, shift: int, column_name: str):
    '''
        Percentage of stocks over the SMA (Simple Moving Average)
        
        parameters:
        - df: Pandas DataFrame
        - shift: the number of rows to shift. 
                    example:If the DataFrame is in weeks, and shift is 30, it means de SMA of 30 days.
        - column_name: column name to be used to calculate the SMA.
    '''
    if len(df.index.names) != 2:
        raise Exception('index must be made of two columns: grouped_field + datetime')
    # second column must be a datetime column!
    
    groupby_column = df.index.names[0]
    date_column = df.index.names[1]
    
    df['sma'] = df[column_name].rolling(shift).mean()
    df['over_sma'] = np.where(df[column_name] > df['sma'], 1, 0)
    d = df.reset_index().groupby([date_column])
    
    return (d.over_sma.sum().to_frame() / d.over_sma.count().to_frame())

#### Percentatge com stocks over 30 weeks SMA

In [13]:
df = percentage_series_over_sma(stocks_week, 30, 'adj_close')

---

# Plot the percentage of stocks over the 30 weeks SMA and the SP&500

The SP&500 is plotted using a log scale while the percentage is a linear scale. The log scale is better to see percentage movements and in our case it allows us to compare it better with the number of stocks over the 30 weeks SMA.

Both lines, the SP%500 and the percentage over SMA, have a SMA of 30 (weeks) applied on them. To soften the lines.

In [14]:
sc_x = bq.DateScale()
sc_y_sma = bq.LinearScale()
sc_y_sp500 = bq.LogScale()

ax_x = bq.Axis(scale=sc_x)
ax_y_sma = bq.Axis(scale=sc_y_sma, orientation='vertical', side='right', label='percentage over SMA', color='blue',
                    label_color='blue')
ax_y_sp500 = bq.Axis(scale=sc_y_sp500, orientation='vertical', label='SP&500 (log scale)', tick_format='0.0f', color='orange',
                    label_color='orange')

line_perc_over_sma = bq.Lines(x=df.index, y=df.over_sma.rolling(30).mean(), scales={'x': sc_x, 'y': sc_y_sma}, colors=['blue'],
                   display_legend=False, labels=['percentage over SMA'])
line_sp500 = bq.Lines(x=sp500.index, y=sp500.close.rolling(30).mean(), scales={'x': sc_x, 'y': sc_y_sp500}, 
                     colors=['orange'], display_legend=False, labels=['SP&500 (log scale)'])

bq.Figure(marks=[line_sp500, line_perc_over_sma], axes=[ax_y_sma, ax_y_sp500, ax_x])

Figure(axes=[Axis(color='blue', label='percentage over SMA', label_color='blue', orientation='vertical', scale…

---