# Binance Project

# The Plan

## 1. Tidy Data Analysis

Imports used can be found in `imports.py`. (Please ensure libraries are installed for package support).

In this stage, I obtained btcusd trading pair data by querying the Binance REST API hosted at <https://api.binance.us/api/v3/klines>. 

I cleaned and prepped the data by:
  - preparing column labels.
  - setting `'close_time'` as `datetime`.
  - setting `'close_time'` as index.
  - spliting dataset into train, validate, and test.

## 2. Exploratory Data Analysis

  I conducted an initial exploration of the data by examing relationships between each of the features and treated close price as a target. Next, I explored further using premier tools such as Pandas, Python, Statsmodels, etc..., to answer the initial questions posed above. Data exploration continues by delving into the descriptive statistics of the dataset. Further investigation includes up / down -sampling, frequency analysis, lag response, and autocorrelation. I found the frequency analysis revealed potential price indicators.

## 3. Predictive Model Analysis

  I used data from 2022 April 26 from approximately 03:30 - 20:30 to determine if the candlestick close price, in conjuncture with the time index, could be used to determine future close prices, then modeled what the predicted values would like against the acutal values. I offer several models that attempt to predict the future price of btcusd trading pair. I used a last observed value (lov), average, 15 minute simple moving average from TAlib, and a basic Holt's linear trend. Root mean square errors (RMSE) are reported for comparison.


# Executive Summary

## Big Idea

Predict near-future bitcoin prices.

## Goals

1. Use descriptive statistics to identify key price points
2. Predict future price of bitcoin

## Key Findings

> "The future is certain. It is just not known."
> - Johnny Rich

1. Frequency analysis revealed potential price indicators.
2. Additional modeling may be more predictive.

## Recomendations

1. Use price indicators for portfolio.
2. Explore multi-model performance.
   - i.e. SMA(15) overlap SMA(30)


# Findings' Visualizations

Here we see the data flow to our final product.

In [1]:
# imports.py in /utils/
from utils.imports import *

# plotting magic
%matplotlib inline
# plotting defaults
plt.rc('figure', figsize=(16, 9))
plt.style.use('seaborn-darkgrid')
plt.rc('font', size=16)
# plt.style.available
# ^^^ show available seaborn styles

# !!! Warning !!! 
# *** no more warnings ***
# import warnings
# warnings.filterwarnings("ignore")

# custom mods
from utils.tidy import *
from utils.model import *

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


# Recommendations

# "What If... " Visualizations

# Conclusion

# Next Steps

# Appendix

Data dictionary:

Variables |Definition
--- | ---
Open time | time candlestick opened
Open | price at open
High | highest price during 1 minute interval
Low | highest price during 1 minute interval
Close | price at close
Volume | number of $USD traded during 1 minute interval
Close time | time candlestick closed
Quote asset volume | n/a
Number of trades | n/a
Taker buy base asset volume | n/a
Taker buy quote asset volume | n/a
Ignore | n/a

```text
Example data entry:

1499040000000,      // Open time
"0.00386200",       // Open
"0.00386200",       // High
"0.00386200",       // Low
"0.00386200",       // Close
"0.47000000",  // Volume
1499644799999,      // Close time
"0.00181514",    // Quote asset volume
1,                // Number of trades
"0.47000000",    // Taker buy base asset volume
"0.00181514",      // Taker buy quote asset volume
"0" // Ignore.

```
