# Stock Market Analysis for Swing Trading
Shifa Salsabiila

## Table of Content
[1. Background](#1.-Background)<br>
[2. Goals](#2.-Goals)<br>
[3. Historical Aggregates](3.-#Historical-Aggregates)<br>
&nbsp;&nbsp;&nbsp; [3.1. API Call](#3.1.0API-Call)<br>
&nbsp;&nbsp;&nbsp; [3.2. Data Preprocessing](#Data-Preprocessing)<br>
&nbsp;&nbsp;&nbsp; [3.3. Visualization](#Visualization)<br>
[4. ]()

## 1. Background
> This notebook is made as a personal initial analysis on swing trading strategies, potential returns, and risks. The analysis performed will solely focus on the stock prices itself without any consideration of external factors and may involve personal opinions.

Attempting to make money off the stock market takes on many forms. Some people choose to pursue the route of long term investing on companies they believe will succeed in the future, while some others choose to capitalize on the short term fluctuation of stock prices. Long term investments usually aims to make larger gains in the long run, aiming for returns of possibly more than 30% over a span of perhaps 5, 10, or even 15 years. Long term investing is built on top of a company's value analysis, this means that a person would invest on a company they believe have good value that will continue to improve over the years. 

Now let's talk about trading. Trading is stretched over a shorter time period than investing, sometimes over a period of a day (day traders) or a little longer of upto weeks (swing traders). In trading, the value or expected value of a company isn't really a key aspect to observe. Instead, daily and weekly price trends become the key metrics to help traders decide what stocks they want to buy, when to buy them, and when to sell them. The gains of a single trade is usually lower than that of investing, but since it takes place over a much shorter period of time, the frequency of achieving the actual gains is usually higher. 

What makes trading, specifically swing trading intersting to me is that intuitively, small gains of around 4 - 10% happens a lot more often than larger gains. So instead of letting the price of your assets fluctuate up and down over a long period of time until eventually you reach a high gain value, you can take smaller and faster gains, minimize the price drops and build a compounding return from those earlier gains. 

## 2. Goals
This analysis focuses on swing trading, with the attempt to answer the following questions: <br>
&nbsp;&nbsp;&nbsp; 1.  <br>
&nbsp;&nbsp;&nbsp; 2.  <br>

## Depdendencies

In [576]:
import pandas as pd
import requests
import os
import scipy

from enum import Enum
from dotenv import load_dotenv
from datetime import date, datetime
from typing import Optional, List
import plotly.graph_objects as go
import plotly.figure_factory as ff

In [577]:
load_dotenv()

API_KEY = os.getenv('API_KEY')

## Historical Aggregates

Endpoint: /v2/aggs/ticker/{stocksTicker}/range/{multiplier}/{timespan}/{from}/{to}

[Documentation Link](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to)

### API Call

In [578]:
# Path and parameter configurations
stocks_ticker = 'AAPL'
multiplier = '1'
timespan = 'day'
from_date = '2021-01-01'
to_date = date.today().strftime('%Y-%m-%d')

URL = f'https://api.polygon.io/v2/aggs/ticker/{stocks_ticker}/range/{multiplier}/{timespan}/{from_date}/{to_date}?adjusted=true&sort=asc&apiKey={API_KEY}'


In [579]:
r = requests.get(url = URL)
data = r.json()

### Data Preprocessing

Notes:
* For simplicity, the definition of daily gain/loss here is defined to be close - open price.

In [580]:
df = pd.json_normalize(data['results']) 

# Rename columns
df.rename(columns = {
    'v': 'volume',
    'vw': 'avg',
    'o': 'open',
    'c': 'close',
    'h': 'high',
    'l': 'low',
    't': 'window_start_timestamp',
    'n': 'number_of_trx'
}, inplace=True)

# Add date column
df['date'] = df['window_start_timestamp'].apply(lambda x: (datetime.fromtimestamp(x/1000).date()))

# Add gain_loss
df = df.assign(gain_loss = lambda x: ((x['close'] - x['open'])/100))

# Explicitly add index as row column
df['row_index'] = df.index

# Add time to win column
df['time_to_win'] = -1

df

Unnamed: 0,volume,avg,open,close,high,low,window_start_timestamp,number_of_trx,date,gain_loss,row_index,time_to_win
0,66015804.0,134.5907,135.010,134.39,135.4100,134.110,1619496000000,480003,2021-04-27,-0.00620,0,-1
1,107746597.0,134.6526,134.310,133.58,135.0200,133.080,1619582400000,783355,2021-04-28,-0.00730,1,-1
2,151100953.0,134.1771,136.470,133.48,137.0700,132.450,1619668800000,1059387,2021-04-29,-0.02990,2,-1
3,109425466.0,132.1940,131.780,131.46,133.5600,131.065,1619755200000,701292,2021-04-30,-0.00320,3,-1
4,75135100.0,132.8122,132.040,132.54,134.0700,131.830,1620014400000,580631,2021-05-03,0.00500,4,-1
...,...,...,...,...,...,...,...,...,...,...,...,...
499,52456377.0,166.7790,166.090,166.65,167.8700,165.560,1681963200000,493153,2023-04-20,0.00560,499,-1
500,57736141.0,165.0754,165.050,165.02,166.4521,164.490,1682049600000,520279,2023-04-21,-0.00030,500,-1
501,41449581.0,164.9397,165.000,165.33,165.6000,163.890,1682308800000,459499,2023-04-24,0.00330,501,-1
502,48652863.0,164.6479,165.190,163.77,166.3050,163.730,1682395200000,501548,2023-04-25,-0.01420,502,-1


### Visualization

In [581]:
fig = go.Figure(data=[go.Candlestick(x=df['date'],
                open=df['open'],
                high=df['high'],
                low=df['low'],
                close=df['close'])])

fig.show()

## 4. Descriptive Analytics

### 4.1. Overall Statistical Description

In [582]:
df.describe()

Unnamed: 0,volume,avg,open,close,high,low,window_start_timestamp,number_of_trx,gain_loss,row_index,time_to_win
count,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0
mean,82916810.0,151.287009,151.149738,151.337044,153.016851,149.48858,1650871000000.0,652038.8,0.001873,251.5,-1.0
std,25506850.0,13.679457,13.68903,13.707318,13.817261,13.538297,18256120000.0,203526.3,0.024359,145.636534,0.0
min,35195860.0,123.0599,123.16,122.77,124.64,122.25,1619496000000.0,319570.0,-0.0708,0.0,-1.0
25%,65136120.0,142.6575,142.65625,142.77,144.34,140.94875,1635070000000.0,508985.0,-0.011675,125.75,-1.0
50%,77269280.0,149.46805,149.45,149.755,151.21,147.86,1650902000000.0,612306.0,0.00205,251.5,-1.0
75%,94611410.0,161.82575,161.43375,161.865,164.11,159.6325,1666606000000.0,739259.0,0.017225,377.25,-1.0
max,194579000.0,181.4156,182.63,182.01,182.94,179.12,1682482000000.0,1678454.0,0.1016,503.0,-1.0


### 4.2. Daily Gain/Loss Distribution

**Summary**<br>
As it can be seen, daily gains and losses seem to follow a balanced normal curve with a skew value relatively close to zero. 

In [583]:
df['gain_loss'].describe()

count    504.000000
mean       0.001873
std        0.024359
min       -0.070800
25%       -0.011675
50%        0.002050
75%        0.017225
max        0.101600
Name: gain_loss, dtype: float64

In [584]:
fig = ff.create_distplot([df['gain_loss'].to_list()], ['gain_loss'], bin_size=.01)
fig.show()

### 4.2. Weekly Gain/Loss

### 4.3. Average Time to Reach 0.05 (Max Days)

Using a window size of 1 day, counting from each day, how many days does it take to reach a gain of 0.05, with a maximum hold day of 7. Count as fail if the maximum hold day is reached. 

In [585]:
# Enums
class ComparisonType(str, Enum):
    AVG = "avg"
    OPEN = "open"
    CLOSE = "close"
    HIGH = "high"
    LOW = "low"

In [586]:
# Supporting Functions
def get_value(
        df: pd.DataFrame, 
        idx_start: int, 
        idx_end: int,
        comparator: ComparisonType
    ) -> float:
    """
    Returns the gain/loss value between two dates
    [Input]
        df: source dataframe that will be analysed
        idx_start: starting date row
        idx_end: ending date row
        comparator: what metric to use as comparison
    [Output]
        float
    """
    start_val = df[comparator.value].iloc[idx_start]
    end_val = df[comparator.value].iloc[idx_end]
    
    return (end_val - start_val)/abs(start_val)

def window_slide(
        row: pd.Series,
        df: pd.DataFrame, 
        x: float,
        idx_start: int, 
        max_days: int,
        comparator: ComparisonType
    ) -> (int, float):
    """
    Returns a tuple that denotes the minimum number of days it takes to reach a certain gain value from 
    a starting date along with its gain value. Returns -1 if fail condition is reached before win. The fail
    condition is in the form of a maximum number of days. 
    [Input]
        row: the row from the dataframe that denotes the starting date
        df: source dataframe that will be analysed
        x: target gain to be reached
        idx_start: the index of the starting row
        max_days: maximum days of withholding before declaring failure
        comparator: what metric to use as comparison
    [Output]
        (int, float)
    """
    idx_curr = idx_start
    gain_curr = 0
    while (gain_curr < x) and (idx_curr - idx_start < max_days) and (idx_curr < len(df)-1):
        idx_curr += 1
        gain_curr = get_value(df=df, idx_start=idx_start, idx_end=idx_curr, comparator=comparator)

    # Success condition reached
    if (gain_curr > x):
        return idx_curr - idx_start, gain_curr
    
    # Fail condition reached
    return -1, gain_curr


In [587]:
def time_to_reach_x(
        df: pd.DataFrame,
        x: float = 0.05, 
        max_days: int = 14,
        comparison_type: ComparisonType = ComparisonType.AVG
    ) -> pd.DataFrame:
    """
    For each day that the market is open, this function calculates how many days it takes to reach a gain of x,
    with an optional fail condition of either maximum loss threshold or maximum numer of withholding days.
    [Input]
        x: float
        loss_threshold: float
        max_days: int
    [Output]
        pd.DataFrame
    """
    # Initiate new dataframes
    res = pd.DataFrame()
    temp = pd.DataFrame()
    
    temp['window_res'] = df.apply(lambda row : window_slide(row, df, x, row['row_index'], max_days, comparison_type), axis = 1)
    res['start_date'] = df['date']
    res['time_to_win'], res['gain'] = zip(*temp['window_res'])

    return res
    

In [588]:
result = time_to_reach_x(df, x=0.05, max_days=40)
result

Unnamed: 0,start_date,time_to_win,gain
0,2021-04-27,-1,-0.005970
1,2021-04-28,-1,-0.006582
2,2021-04-29,-1,-0.007012
3,2021-04-30,-1,0.019763
4,2021-05-03,-1,0.023567
...,...,...,...
499,2023-04-20,-1,-0.017212
500,2023-04-21,-1,-0.007069
501,2023-04-24,-1,-0.006253
502,2023-04-25,-1,-0.004491


In [589]:
wins = result.drop(result[result['time_to_win'] == -1].index)
wins

Unnamed: 0,start_date,time_to_win,gain
5,2021-05-04,38,0.053461
6,2021-05-05,38,0.051967
7,2021-05-06,37,0.056834
8,2021-05-07,39,0.067120
9,2021-05-10,34,0.054477
...,...,...,...
477,2023-03-20,10,0.055827
478,2023-03-21,19,0.050605
481,2023-03-24,17,0.050482
482,2023-03-27,16,0.051955


In [590]:
win_rate = len(wins)/len(result)
win_rate

0.6329365079365079

In [591]:
result.describe()

Unnamed: 0,time_to_win,gain
count,504.0,504.0
mean,8.765873,0.012516
std,11.229255,0.073655
min,-1.0,-0.221031
25%,-1.0,-0.015675
50%,6.0,0.052352
75%,14.0,0.058031
max,40.0,0.086643


In [592]:
fig = ff.create_distplot([wins['time_to_win'].to_list()], ['time_to_win'])
fig.show()

### 4.4. Average Time to Reach 0.05 (Loss Threshold)

TODO: Using a window size of 1 day, counting from each day, how many days does it take to reach a gain of 0.05, with a maximum loss of also 0.05. Count as fail if the loss threshold is met.