# Stock Market Analysis for Swing Trading
Shifa Salsabiila

## Table of Content
[1. Background](#1.-Background)<br>
[2. Goals](#2.-Goals)<br>
[3. Historical Aggregates](3.-#Historical-Aggregates)<br>
&nbsp;&nbsp;&nbsp; [3.1. API Call](#3.1.0API-Call)<br>
&nbsp;&nbsp;&nbsp; [3.2. Data Preprocessing](#Data-Preprocessing)<br>
&nbsp;&nbsp;&nbsp; [3.3. Visualization](#Visualization)<br>
[4. ]()

## 1. Background
> This notebook is made as a personal initial analysis on swing trading strategies, potential returns, and risks. The analysis performed will solely focus on the stock prices itself without any consideration of external factors and may involve personal opinions.

Attempting to make money off the stock market takes on many forms. Some people choose to pursue the route of long term investing on companies they believe will succeed in the future, while some others choose to capitalize on the short term fluctuation of stock prices. Long term investments usually aims to make larger gains in the long run, aiming for returns of possibly more than 30% over a span of perhaps 5, 10, or even 15 years. Long term investing is built on top of a company's value analysis, this means that a person would invest on a company they believe have good value that will continue to improve over the years. 

Now let's talk about trading. Trading is stretched over a shorter time period than investing, sometimes over a period of a day (day traders) or a little longer of upto weeks (swing traders). In trading, the value or expected value of a company isn't really a key aspect to observe. Instead, daily and weekly price trends become the key metrics to help traders decide what stocks they want to buy, when to buy them, and when to sell them. The gains of a single trade is usually lower than that of investing, but since it takes place over a much shorter period of time, the frequency of achieving the actual gains is usually higher. 

What makes trading, specifically swing trading intersting to me is that intuitively, small gains of around 4 - 10% happens a lot more often than larger gains. So instead of letting the price of your assets fluctuate up and down over a long period of time until eventually you reach a high gain value, you can take smaller and faster gains, minimize the price drops and build a compounding return from those earlier gains. 

## 2. Goals
This analysis focuses on swing trading, with the attempt to answer the following questions: <br>
&nbsp;&nbsp;&nbsp; 1.  <br>
&nbsp;&nbsp;&nbsp; 2.  <br>

## Depdendencies

In [10]:
import pandas as pd
import requests
import os
import scipy

from dotenv import load_dotenv
from datetime import date, datetime
import plotly.graph_objects as go
import plotly.figure_factory as ff

In [11]:
load_dotenv()

API_KEY = os.getenv('API_KEY')

## Historical Aggregates

Endpoint: /v2/aggs/ticker/{stocksTicker}/range/{multiplier}/{timespan}/{from}/{to}

[Documentation Link](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to)

### API Call

In [12]:
# Path and parameter configurations
stocks_ticker = 'AAPL'
multiplier = '1'
timespan = 'day'
from_date = '2021-01-01'
to_date = date.today().strftime('%Y-%m-%d')

URL = f'https://api.polygon.io/v2/aggs/ticker/{stocks_ticker}/range/{multiplier}/{timespan}/{from_date}/{to_date}?adjusted=true&sort=asc&apiKey={API_KEY}'


In [13]:
r = requests.get(url = URL)
data = r.json()

### Data Preprocessing

Notes:
* For simplicity, the definition of daily gain/loss here is defined to be close - open price.

In [14]:
df = pd.json_normalize(data['results']) 

# Rename columns
df.rename(columns = {
    'v': 'volume',
    'vw': 'volume_weighted_avg',
    'o': 'open',
    'c': 'close',
    'h': 'high',
    'l': 'low',
    't': 'window_start_timestamp',
    'n': 'number_of_trx'
}, inplace=True)

# Add date column
df['date'] = df['window_start_timestamp'].apply(lambda x: (datetime.fromtimestamp(x/1000).date()))

# Add gain_loss
df = df.assign(gain_loss = lambda x: ((x['close'] - x['open'])/100))

df

Unnamed: 0,volume,volume_weighted_avg,open,close,high,low,window_start_timestamp,number_of_trx,date,gain_loss
0,66888509.0,134.5418,134.83,134.72,135.0600,133.560,1619409600000,484069,2021-04-26,-0.0011
1,66015804.0,134.5907,135.01,134.39,135.4100,134.110,1619496000000,480003,2021-04-27,-0.0062
2,107746597.0,134.6526,134.31,133.58,135.0200,133.080,1619582400000,783355,2021-04-28,-0.0073
3,151100953.0,134.1771,136.47,133.48,137.0700,132.450,1619668800000,1059387,2021-04-29,-0.0299
4,109425466.0,132.1940,131.78,131.46,133.5600,131.065,1619755200000,701292,2021-04-30,-0.0032
...,...,...,...,...,...,...,...,...,...,...
499,47621166.0,167.2806,165.80,167.63,168.1600,165.540,1681876800000,473866,2023-04-19,0.0183
500,52456377.0,166.7790,166.09,166.65,167.8700,165.560,1681963200000,493153,2023-04-20,0.0056
501,57736141.0,165.0754,165.05,165.02,166.4521,164.490,1682049600000,520279,2023-04-21,-0.0003
502,41449581.0,164.9397,165.00,165.33,165.6000,163.890,1682308800000,459499,2023-04-24,0.0033


### Visualization

In [15]:
fig = go.Figure(data=[go.Candlestick(x=df['date'],
                open=df['open'],
                high=df['high'],
                low=df['low'],
                close=df['close'])])

fig.show()

## 4. Descriptive Analytics

### 4.1. Overall Statistical Description

In [16]:
df.describe()

Unnamed: 0,volume,volume_weighted_avg,open,close,high,low,window_start_timestamp,number_of_trx,gain_loss
count,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0
mean,82959280.0,151.228742,151.093736,151.279425,152.956891,149.430564,1650745000000.0,652020.3,0.001857
std,25462190.0,13.68813,13.697961,13.716027,13.829504,13.543791,18255170000.0,203541.2,0.024358
min,35195860.0,123.0599,123.16,122.77,124.64,122.25,1619410000000.0,319570.0,-0.0708
25%,65162920.0,142.599375,142.5225,142.65,144.3175,140.845,1634854000000.0,508985.0,-0.011675
50%,77269280.0,149.42345,149.405,149.705,151.185,147.84,1650730000000.0,612306.0,0.002
75%,94611410.0,161.6029,161.285,161.8025,163.9,159.5325,1666390000000.0,739259.0,0.017225
max,194579000.0,181.4156,182.63,182.01,182.94,179.12,1682395000000.0,1678454.0,0.1016


### 4.2. Daily Gain/Loss Distribution

**Summary**<br>
As it can be seen, daily gains and losses seem to follow a balanced normal curve with a skew value relatively close to zero. 

In [17]:
df['gain_loss'].describe()

count    504.000000
mean       0.001857
std        0.024358
min       -0.070800
25%       -0.011675
50%        0.002000
75%        0.017225
max        0.101600
Name: gain_loss, dtype: float64

In [18]:
fig = ff.create_distplot([df['gain_loss'].to_list()], ['gain_loss'], bin_size=.01)
fig.show()

### 4.2. Weekly Gain/Loss

### 4.3. Average Time to Reach 0.05 (Loss Threshold)

TODO: Use a window size of 1 day, counting from each day, how many days does it take to reach a gain of 0.05, with a maximum loss of also 0.05. Count as fail if the loss threshold is met.

**Make this into a table form with columns:**
- start_date: str
- days_to_win: int (-1 if fail)

Output: win rate

### 4.4. Average Time to Reach 0.05 (Max Days)

TODO: Use a window size of 1 day, counting from each day, how many days does it take to reach a gain of 0.05, with a maximum hold day of 7. Count as fail if the maximum hold day is reached. 