# Fundamentals of Machine Learning, Fall 2021, Final Project

## Stock Closing Price Prediction

1. Download ```final_project.pdf```, ```Kaggle & Colab Guide.pptx```, and ```utils.py``` from i-campus.
2. Go to [Kaggle competition page](https://www.kaggle.com/c/2021mlfinal), join Kaggle & competition, and download dataset.
3. Following guide slides, upload ```utils.py```.
4. Mount Google Drive.
5. Implement your own model and predict on test dates.
6. Download and submit ```submission.csv``` to Kaggle.
7. Write a report on your project and submit on i-campus.

# INITIAL PACKAGES

In [1]:
# INITIAL PACKAGES
import os
import numpy as np
import pandas as pd

from utils import load_data, run

ModuleNotFoundError: ignored

## Mount Google Drive

Assmue you made ```final_project``` directory on the root,
and data files are there.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
gdrive_root = '/content/gdrive/My Drive'
data_path = os.path.join(gdrive_root, 'final_project')
os.listdir(data_path)

In [None]:
train_data, valid_data, test_input = load_data(data_path)

In [None]:
print(f'train id answer:\n {train_data[0].head()}')
print(f'train input shape: {train_data[1].shape}\n')

print(f'valid id answer:\n {valid_data[0].head()}')
print(f'valid input shape: {valid_data[1].shape}\n')

print(f'test id:\n {test_input[0].head()}')
print(f'test input shape: {test_input[1].shape}\n')

---

# SHOW YOUR WORK
From here, import packages you need as long as they are permitted. <br>
Fill ```train_and_predict``` function with your codes. <br>
If you want, you can implement your own classes or functions within "SHOW YOUR WOKR" block. <br>
The rest of work is ours.

In [None]:
# IMPORT PACKAGES YOU NEED
from sklearn.tree import DecisionTreeRegressor



In [None]:
# YOUR OWN CLASSES OR FUNCTIONS



In [None]:
def train_and_predict(train_data, valid_data, test_data):
    """Train a model and return prediction on test input.

    Given train and valid data, build your model and optimize.
    Then, return predictions on test_input.

    You can import packages you want inside 'EDIT HERE' as long as they are permitted.
    (See document for the list of possible packages)

    arguments:
        train_data: tuple of (pandas.DataFrame, np.array).
        - 0: pandas.DataFrame with columns ['id', 'answer']
          'id' contains unique id assigned to each timestamp.
          'answer' contains closing price ratio corresponding to its timestamp.
        - 1: train input in np.array of (# of train timestamps, 1 + # of stocks, # of previous dates to be input, # of features)

        valid_data: tuple of (pandas.DataFrame, np.array).
        - 0: pandas.DataFrame with columns ['id', 'answer']
          'id' contains unique id assigned to each timestamp.
          'answer' contains closing price ratio corresponding to its timestamp.
        - 1: valid input in np.array of (# of valid timestamps, 1 + # of stocks, # of previous dates to be input, # of features)

        test_data: tuple of (pandas.DataFrame, np.array).
        - 0: pandas.DataFrame with columns ['id']
          'id' contains unique id assigned to each timestamp.
        - 1: test input in np.array of (# of test timestamps, 1 + # of stocks, # of previous dates to be input, # of features)
    
    returns:
        pandas.DataFrame, predictions on test input with columns ['id', 'answer'].
        'id' should contain unique id assigned to test input. 
        'answer' should contain prediction on the test input correspond to its id

    """
    # Example code for DecisionTreeRegressor:
    train_id_answer, train_input = train_data
    valid_id_answer, valid_input = valid_data
    test_id, test_input = test_data

    num_train = len(train_input)
    num_valid = len(valid_input)
    num_test = len(test_input)

    # Separate index
    index_train = train_input[:, 0]
    x_train = train_input[:, 1:]
    y_train = train_id_answer['answer'].values

    index_valid = valid_input[:, 0]
    x_valid = valid_input[:, 1:]
    y_valid = valid_id_answer['answer'].values

    index_test = test_input[:, 0]
    x_test = test_input[:, 1:]

    # Use last 60 days to train, 10 days to valid
    x_train = x_train[-300:]
    y_train = y_train[-300*x_train.shape[1]:]

    x_valid = x_valid[-100:]
    y_valid = y_valid[-100*x_valid.shape[1]:]

    # Use previous 3 days to predict
    x_train = x_train[:, :, -3:]
    x_valid = x_valid[:, :, -3:]
    x_test = x_test[:, :, -3:]

    # Fit data shape for model
    x_train_shape = x_train.shape
    x_train = x_train.reshape(x_train_shape[0] * x_train_shape[1], -1)

    x_valid_shape = x_valid.shape
    x_valid = x_valid.reshape(x_valid_shape[0] * x_valid_shape[1], -1)

    x_test_shape = x_test.shape
    x_test = x_test.reshape(x_test_shape[0] * x_test_shape[1], -1)
    
    # Build DecisionTreeRegressor
    # You must set random seed to specific number.
    # We will check reproducibility of your model.
    model = DecisionTreeRegressor(random_state=2021, criterion="absolute_error")

    # Fit model
    print(x_train.shape, y_train.shape)
    model.fit(x_train, y_train)

    print("validation score: ", model.score(x_valid, y_valid))

    prediction = model.predict(x_test)

    print(prediction.shape)
    
    # Make prediction data frame
    test_id['answer'] = prediction
    pred = test_id.loc[:, ['id', 'answer']]

    return pred

---

# YOUR WORK IS DONE!
Do not touch any line below. <br>
```run``` function will grap your prediction and make ```submission.csv```. <br>
Take it and submit to Kaggle!

In [None]:
run(train_and_predict, train_data, valid_data, test_input)