## Prerequisites

### Course Server Setup

#### Miniconda 3 setup

Miniconda is a lightweight, open-source package and environment manager developed by Anaconda, Inc. It provides a simple and efficient way to install, manage, and distribute Python packages and their dependencies across multiple platforms, including Windows, macOS, and Linux. Unlike Anaconda, which includes a large collection of pre-installed scientific computing packages, Miniconda only ships the core Conda functionality, allowing users to customize their own package collections according to their specific requirements. With Miniconda, users can easily create isolated environments, switch between them, and share them with others via portable archives or cloud services. Additionally, Miniconda supports fast and parallel package installation through its mamba engine, which significantly improves the overall performance and usability of Conda. Overall, Miniconda offers a flexible and scalable solution for managing Python packages and environments, especially for data scientists, researchers, and developers who work with complex and diverse datasets and applications.

To install miniconda3, start shell in your server.  For this case, I use course server mcs1.wlu.ca.

```bash
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
```

Once miniconda3 installed, run the following command to populate conda environment setup script into .bash_profile

```bash
~/miniconda3/bin/conda init bash
```

#### Create conda environement

Conda is a versatile tool for managing packages, dependencies, and environments for various programming languages, including Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more. It is particularly popular in the fields of data science and machine learning. The conda create --name command is designed to create a new isolated environment within conda. The --name flag is followed by the name of the environment, in this case, cp631-final.


```bash
conda create --name cp631-final
```

#### Required packages installation by conda

The conda install command is utilized to install packages in a specific conda environment, with the --name flag specifying the name of the environment.

```bash
conda install --name cp631-final requirement.txt
```

#### SSH Tunneling

To allow local machine connecting to Jupyter Notebook server running in course server, VPN connection must be up and running.  Then you can use SSH Tunnelling to forward all traffic of port 8888 in local macine to course server.

```bash
ssh -L 8888:localhost:8888 wlai11@mcs1.wlu.ca
```

#### Start Jupyter Notebook

Once the shell has been spawn in remote server, run the following command to start jupyter notebook server with new conda environment cp631-final

```bash
conda activate cp631-final
jupyter notebook --no-browser --port=8888
```

### Delare params dict

In [39]:
params = {}

### Google Colab Environment Checking

In [40]:
import os
import sys

params["in_colab"] = 'google.colab' in sys.modules
print("In colab: ", params["in_colab"])
os.environ["PROJECT_ROOT"] = "./"
print("Project root: ", os.environ["PROJECT_ROOT"])



In colab:  False
Project root:  ./


### Jupyter Notebook Environment Checking

In [41]:
from IPython import get_ipython
def in_notebook():
    try:
        ipython_instance = get_ipython()
        if ipython_instance is None:
            return False
        elif ipython_instance and 'IPKernelApp' not in get_ipython().config:  # pragma: no cover
            return False
    except ImportError:
        return False
    return True

params["in_notebook"] = in_notebook()
print(f"in_notebook: {params['in_notebook']}")

in_notebook: True


### MacOS Environment Checking

In this code, platform.system() returns the name of the operating system dependent module imported. The returned value is 'Darwin' for MacOS, 'Linux' for Linux, 'Windows' for Windows and so on. If the returned value is 'Darwin', it means you are using MacOS.

In [42]:
import subprocess
import importlib.util

if params["in_notebook"] and importlib.util.find_spec("distro") is None:
    # !conda install distro -y
    subprocess.run(["conda", "install", "distro", "-y"])

In [43]:
import platform
import distro

if platform.system() == 'Darwin':
    params["is_macos"] = True
else:
    params["is_macos"] = False

print(f'is_macos: {params["is_macos"]}')

if platform.system() == 'Linux':
    distro_name = distro.id()
    if 'debian' in distro_name.lower() or 'ubuntu' in distro_name.lower():
        params["is_debian"] = True
        params["is_redhat"] = False
    elif 'centos' in distro_name.lower() or 'rhel' in distro_name.lower():
        params["is_debian"] = False
        params["is_redhat"] = True
    else:
        params["is_debian"] = False
        params["is_redhat"] = False
else:
    params["is_debian"] = False
    params["is_redhat"] = False
    
print(f'is_debian: {params["is_debian"]}')
print(f'is_redhat: {params["is_redhat"]}')

if platform.system() == 'Windows':
    params["is_windows"] = True
else:
    params["is_windows"] = False
print(f'is_windows: {params["is_windows"]}')

is_macos: False
is_debian: False
is_redhat: True
is_windows: False


### Check if using Ubuntu WSL 2.0 or not

In [44]:
def is_wsl():
    try:
        with open('/proc/version', 'r') as fh:
            return 'microsoft' in fh.read().lower()
    except FileNotFoundError:
        return False

params["is_wsl"] = is_wsl()
print(f"is_wsl: {params['is_wsl']}")


is_wsl: False


### Check if MPI installed in OS

Use the mpirun command to see if MPI is up and running.

In [45]:
import subprocess

def is_mpi_installed():
    try:
        if params["is_macos"]:
          subprocess.check_output(["/usr/local/bin/mpirun", "--version"])
        else:
          subprocess.check_output(["mpirun", "--version"])
        return True
    except (subprocess.CalledProcessError, FileNotFoundError):
        return False

params["mpi_installed"] = is_mpi_installed()
print(f'MPI installed: {params["mpi_installed"]}')

if not params["mpi_installed"]:
    print("[FATAL] MPI is not installed")

MPI installed: True


### Check if NVIDIA CUDA toolkit installed

Use the numba command to see if MPI is up and running.

In [46]:
if params["in_notebook"] and importlib.util.find_spec("numba") is None:
    subprocess.run(["conda", "install", "numba=0.55.0", "-y"])

In [47]:
from numba import cuda

def is_cuda_installed():
    try:
        cuda.detect()
        return True
    except cuda.CudaSupportError:
        return False

params["cuda_installed"] = is_cuda_installed()
print(f'CUDA installed: {params["cuda_installed"]}')

if not params["cuda_installed"]:
    print("[FATAL] CUDA is not installed")

Found 1 CUDA devices
id 0    b'Tesla P100-PCIE-16GB'                              [SUPPORTED]
                      Compute Capability: 6.0
                           PCI Device ID: 0
                              PCI Bus ID: 59
                                    UUID: GPU-f059d201-c530-5269-8b00-0300bdfd892b
                                Watchdog: Disabled
             FP32/FP64 Performance Ratio: 2
Summary:
	1/1 devices are supported
CUDA installed: True


### Install MPI and CUDA if not installed

**Reminder**: Because latest Macbook does not bundle with NVIDIA CUDA compatible GPU and CUDA toolkits since at least CUDA 4.0 have not supported an ability to run cuda code without a GPU, this program cannot support MacOS environment.

If mpi_installed of the above result show False, please install openmpi binary and library based on your platform.

In Ubuntu you can install Open MPI as follow

```bash
sudo apt update
sudo apt install openmpi-bin
sudo apt install libopenmpi-dev
```

The following code will install Open MPI in Google Colab

In [48]:
import os

if params["in_notebook"]:
    if params["in_colab"] and not params["mpi_installed"]:
        print("Installing MPI")
        subprocess.run(["conda", "install", "openmpi", "-y" ])
        print("MPI installed")
    elif params["mpi_installed"]:
        print("MPI is installed")

MPI is installed


if cuda_installed show False, please install NVIDIA CUDA toolkit in your platform

In Ubuntu (except Ubuntu WSL 2.0 under Windows 10/11) you can install CUDA as follow

```bash
sudo apt update
sudo apt install -y gpupg2
wget https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/cuda-repo-debian10_10.2.89-1_amd64.deb
sudo dpkg -i cuda-repo-debian10_10.2.89-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/7fa2af80.pub
sudo apt update
sudo apt-get install cuda
```

Under Google Colab, cuda is bundled.

## Environment Setup

### Kaggle Authenticiation

In this notebook, we will download a dataset from Kaggle. Before beginning the download process, it is necessary to ensure an account on Kaggle available. If you do not wish to sign in and would rather bypass the login prompt by uploading your kaggle.json file directly instead, then obtain it from your account settings page and save it either in the project root directory or content directory of Google Colab before starting this notebook. This way, you can quickly access any datasets without needing to log into Kaggle every time!

### Install PyPi packages

Installing PyPi packages is an essential step in this notebook. Among the mandatory packages, mpi4py and opendatasets provide crucial functionalities for data manipulation, distributed computing, and accessing large datasets. While Google Colab offers the convenience of bundled packages such as numpy, matplotlib, pandas, and seaborn, these packages still need to be installed separately in a local environment.

In [49]:
if params["in_notebook"]:
    subprocess.run(["conda", "install", "pip", "-y"])
    if importlib.util.find_spec("mpi4py") is None:
        subprocess.run(["conda", "install", "-c", "conda-forge", "mpi4py=3.1.4", "-y"])
    if importlib.util.find_spec("kaggle") is None:
        subprocess.run(["conda", "install", "-c", "conda-forge", "kaggle", "-y"])
    # if importlib.util.find_spec("opendatasets") is None:
    #     subprocess.run(["conda", "install", "-c", "conda-forge", "opendatasets", "-y"])
    if importlib.util.find_spec("yfinance") is None:
        subprocess.run(["conda", "install", "-c", "conda-forge", "yfinance", "-y"])

    if not params["in_colab"]:
        print("Installing required packages for local environment")
        if importlib.util.find_spec("numpy") is None:
            subprocess.run(["conda", "install", "numpy", "-y"])
        if importlib.util.find_spec("matplotlib") is None:
            subprocess.run(["conda", "install", "matplotlib", "-y"])
        if importlib.util.find_spec("seaborn") is None:
            subprocess.run(["conda", "install", "seaborn", "-y"])
        if importlib.util.find_spec("pandas") is None:
            subprocess.run(["conda", "install", "pandas", "-y"])
        
        if params["cuda_installed"]:
            if importlib.util.find_spec("cudatoolit") is None:
                subprocess.run(["conda", "install", "cudatoolkit", "-y"])

        print("Common required packages installed")


Channels:
 - defaults
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Installing required packages for local environment
Channels:
 - defaults
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Common required packages installed


Check numba info

In [50]:
subprocess.run(["nvcc", "--version"])

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


CompletedProcess(args=['nvcc', '--version'], returncode=0)

In [51]:
if params["in_notebook"]:
    subprocess.run(["numba", "-s"])

System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2024-03-26 22:19:38.803977
UTC start time                                : 2024-03-27 02:19:38.803984
Running time (s)                              : 1.881194

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake-avx512
CPU Count                                     : 80
Number of accessible CPUs                     : 80
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 avx512bw
                          

### Import required packages

In [52]:
# import datetime
import csv
import logging
import numpy as np
# import opendatasets as od
import kaggle
import os
import pandas as pd
import random
import time
import yfinance as yf

from datetime import datetime, timedelta

if params["mpi_installed"]:
    from mpi4py import MPI

if params["cuda_installed"]:
    from numba import cuda, float32


## S&P 500 Constituents Dataset Download

I will first need to download S&P 500 constituents from my Kaggle repository

In [53]:
# od.download("https://www.kaggle.com/datasets/reidlai/s-and-p-500-constituents")
kaggle.api.authenticate()
kaggle.api.dataset_download_files('reidlai/s-and-p-500-constituents', path="s-and-p-500-constituents", unzip=True)

## Stock Price History Download

In [54]:
class Row:
    def __init__(self, timestamp, open, high, low, close, adjclose, volume):
        self.timestamp = timestamp
        self.open = open
        self.high = high
        self.low = low
        self.close = close
        self.adjclose = adjclose
        self.volume = volume

def get_stock_price_history_quotes(stock_symbol, start_date, end_date):
    start_date = datetime.strptime(start_date, "%Y-%m-%dT%H:%M:%S")
    end_date = datetime.strptime(end_date, "%Y-%m-%dT%H:%M:%S")

    try:
        data = yf.download(stock_symbol, start=start_date, end=end_date)
    except Exception as e:
        logging.error(f"Symbol not found: {stock_symbol}")
        return []

    quotes = []
    for index, row in data.iterrows():
        quote = Row(index, row['Open'], row['High'], row['Low'], row['Close'], row['Adj Close'], row['Volume'])
        quotes.append(quote)

    quotes.sort(key=lambda x: x.timestamp)

    # convert quotes into dataframe
    quotes_df = pd.DataFrame([vars(quote) for quote in quotes])
    # add symbol column
    quotes_df['symbol'] = stock_symbol
    return quotes_df

## Technical Analysis

### EMA

In [55]:
def ema(values, days=12):
    alpha = 2 / (days + 1)
    ema_values = np.empty_like(values)  # create an array to store all EMA values
    ema_values[0] = values[0]  # start with the first value
    for i in range(1, len(values)):
        ema_values[i] = alpha * values[i] + (1 - alpha) * ema_values[i - 1]
    return ema_values


### RSI

In [56]:
def rsi(values, days=14):
    gains = []
    losses = []
    for i in range(1, len(values)):
        change = values[i] - values[i - 1]
        if change > 0:
            gains.append(change)
            losses.append(0)
        else:
            gains.append(0)
            losses.append(-change)
    avg_gain = sum(gains[:days]) / days
    avg_loss = sum(losses[:days]) / days
    rs = avg_gain / avg_loss if avg_loss != 0 else 0
    rsi_value = 100 - (100 / (1 + rs))
    return rsi_value

### MACD

In [57]:

def macd(df, short_period=12, long_period=26, signal_period=9):

    df["MACD"] = df["EMA12"] - df["EMA26"]
    return df

if params["cuda_installed"]:
    @cuda.jit
    def macd_cuda(ema12, ema26, macd):
        i = cuda.grid(1)
        if i < len(ema12):
            macd[i] = ema12[i] - ema26[i]
            
    def macd_gpu(df, signal_period=9):
        ema12_device = cuda.to_device(df["EMA12"].values)
        ema26_device = cuda.to_device(df["EMA26"].values)
        macd_device = cuda.to_device(np.empty_like(df["EMA12"].values))
        macd_cuda[df["EMA12"].shape[0], 1](ema12_device, ema26_device, macd_device)
        macd = macd_device.copy_to_host()
        df["MACD"] = macd
        return df

## Core Main Program

### Read CSV files

In [58]:
# Read symbols from the CSV file
def read_symbols_from_csvfile(csvfile_path):
    symbols = []
    with open(csvfile_path, 'r') as csvfile:
        reader = csv.reader(csvfile)
        next(reader)  # Skip the header
        for row in reader:
            symbols.append(row[0])  # Assuming the symbol is the first column
    return symbols

### Core Logic

In [59]:
def emarsi(mode, symbols, start_date, end_date, rank, size, params):

    results = pd.DataFrame()
    # Fetch stock price history quotes using the local symbols
    for symbol in symbols:

        # Load the stock price history data into pandas DataFrame
        stock_price_history_df = get_stock_price_history_quotes(symbol, start_date, end_date)
        if stock_price_history_df.shape[0] > 0:
            stock_price_history_df['EMA12'] = stock_price_history_df['close'].ewm(span=12, adjust=False).mean()
            stock_price_history_df['EMA26'] = stock_price_history_df['close'].ewm(span=26, adjust=False).mean()
            stock_price_history_df['RSI'] = stock_price_history_df['close'].rolling(window=14).apply(rsi, raw=True)
            results = pd.concat([results, stock_price_history_df])
    return results

### Main Logic with Serial Programming

In [60]:
def main_serial(params):

    previous_day = datetime.now() - timedelta(days=1)
    first_day = previous_day - timedelta(days=int(params["numberOfDays"]))

    start_date = first_day.strftime('%Y-%m-%dT%H:%M:%S')
    end_date = previous_day.strftime('%Y-%m-%dT%H:%M:%S')
    
    data_dir = './data'

    gpu_cores = 0
    rank = 0
    size = 1
    serial_fetching_stock_start_time = time.time()

    print(f"Rank: {rank}, Size: {size}")

    # Read symbols from the CSV file
    symbols = read_symbols_from_csvfile(os.environ["PROJECT_ROOT"] + "s-and-p-500-constituents/sandp500-20240310.csv")
    symbols = symbols[:params["numberOfStocks"]]


    # ************** #
    # * Core logic * #
    # ************** #
    results = emarsi("serial", symbols, start_date, end_date, rank, size, params)
    results = macd(results)

    serial_fetching_stock_end_time = time.time()
    print(f"Serial fetching stock price history quotes completed in {serial_fetching_stock_end_time - serial_fetching_stock_start_time} seconds")
    serial_elapsedtime = serial_fetching_stock_end_time - serial_fetching_stock_start_time
    return results, serial_elapsedtime

### Main Logic with Hybrid Programming

In [61]:
def main_hybrid(params):
    # Calculate the start date based on the days in params
  
    previous_day = datetime.now() - timedelta(days=1)
    first_day = previous_day - timedelta(days=int(params["numberOfDays"]))

    start_date = first_day.strftime('%Y-%m-%dT%H:%M:%S')
    end_date = previous_day.strftime('%Y-%m-%dT%H:%M:%S')
    
    data_dir = './data'

    # Create a lock for each GPU
    if params["cuda_installed"]:

        device = cuda.get_current_device()
        print(f"GPU name: {device.name.decode('utf-8')}")

        gpu_cores = len(cuda.gpus)
        print(f"GPU cores: {gpu_cores}")

    else:
        print("CUDA is not available")
        gpu_cores = 0
        
    if params["mpi_installed"]:
        # Initialize MPI
        comm = MPI.COMM_WORLD

        # check if mpi is initialized
        if comm:
            rank = comm.Get_rank()
            size = comm.Get_size()

            # MPI WTime
            parallel_fetching_stock_start_time = MPI.Wtime()
        else:
            rank = 0
            size = 1
            serial_fetching_stock_start_time = time.time()
    else:
        rank = 0
        size = 1
        serial_fetching_stock_start_time = time.time()

    print(f"Rank: {rank}, Size: {size}")

    # Root process should scatter the symbols to all processes
    if rank == 0:

        # Read symbols from the CSV file
        symbols = read_symbols_from_csvfile(os.environ["PROJECT_ROOT"] + "s-and-p-500-constituents/sandp500-20240310.csv")
        symbols = symbols[:params["numberOfStocks"]]

        # Calculate how many symbols each process should receive
        symbols_per_process = len(symbols) // size
        if size > 1:
            remainder = len(symbols) % size
            if remainder != 0 and rank < remainder:
                symbols_per_process += 1

            # Scatter symbols to all processes and each process should receive length of symbols / size blocks
            local_symbols = [symbols[i:i + symbols_per_process] for i in range(0, len(symbols), symbols_per_process)]
        else:
          local_symbols = [symbols]

    else:
        local_symbols = None

    if comm:
        local_symbols = comm.scatter(local_symbols, root=0)

    # ************** #
    # * Core logic * #
    # ************** #
    print(f"params: {params}")

    results = emarsi("parallel", local_symbols, start_date, end_date, rank, size, params)
    # display(results)

    ## Gather the results from all processes
    remote_results = pd.DataFrame()
    if comm:
        remote_result = comm.gather(results, root=0)

    if rank == 0:
        results = pd.concat([results, remote_results])
        results = macd_gpu(results)
        elapsed_time = 0.0;
        if params["mpi_installed"] and comm:
            # MPI WTime
            parallel_fetching_stock_end_time = MPI.Wtime()
            print(f"Parallel fetching stock price history quotes completed in {parallel_fetching_stock_end_time - parallel_fetching_stock_start_time} seconds")
            elapsed_time = parallel_fetching_stock_end_time - parallel_fetching_stock_start_time
        else:
            serial_fetching_stock_end_time = time.time()
            print(f"Serial fetching stock price history quotes completed in {serial_fetching_stock_end_time - serial_fetching_stock_start_time} seconds")
            elapsed_time = serial_fetching_stock_end_time - serial_fetching_stock_start_time
        return results, elapsed_time
    else:
        return None, None
    


### Core Logic

In [62]:
def core_logic(df, index, params):
    # Remove data directory recursively if exists
    if os.path.exists("data"):
        os.system("rm -rf data")
    results, serial_elapsedtime = main_serial(params)
    results.rename(columns={
        "EMA12": "EMA12_S",
        "EMA26": "EMA26_S", 
        "RSI": "RSI_S", 
        "MACD": "MACD_S", 
    }, inplace=True)
    
    temp_result, temp_elapsedtime = main_hybrid(params)
    if temp_result is not None:
        results["EMA12_P"] = temp_result["EMA12"]
        results["EMA26_P"] = temp_result["EMA26"]
        results["RSI_P"] = temp_result["RSI"]
        results["MACD_P"] = temp_result["MACD"]
        

    if not os.path.exists(os.environ["PROJECT_ROOT"] + "data"):
        os.makedirs(os.environ["PROJECT_ROOT"] + "data")
    results.to_csv(os.environ["PROJECT_ROOT"] + "data/results-%d-%d.csv" % (params["numberOfStocks"], params["numberOfDays"]), index=False)
    
    df.loc[index, "numberOfRows"] = results.shape[0]
    df.loc[index, "serialElapsedTimes"] = serial_elapsedtime
    df.loc[index, "parallelElapsedTimes"] = temp_elapsedtime
    
    print("Returning df from core_logic")
    return df
        
    

## Main Body

In [63]:

df = pd.DataFrame()
df["numberOfStocks"] = [10, 50, 100, 200, 400]
df["numberOfDays"] = [30, 90, 180, 365, 730]

df["numberOfRows"] = df["numberOfStocks"] * df["numberOfDays"]

# Fill zeros
df["serialElapsedTimes"] = [0.0] * len(df)
df["parallelElapsedTimes"] = [0.0] * len(df)

for index, row in df.iterrows():
    print(f"Processing {row['numberOfStocks']} stocks for {row['numberOfDays']} days")
    params["numberOfStocks"] = row["numberOfStocks"].astype(int)
    params["numberOfDays"] = row["numberOfDays"].astype(int)
    
    print(f"Params: {params}")
    df = core_logic(df, index, params)
    
    print("Received df from core_logic")
    
print(df)
        


[*********************100%%**********************]  1 of 1 completed


[*********************100%%**********************]  1 of 1 completed

Processing 10.0 stocks for 30.0 days
Params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 10, 'numberOfDays': 30}
Rank: 0, Size: 1



[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%******

Serial fetching stock price history quotes completed in 0.8489174842834473 seconds
GPU name: Tesla P100-PCIE-16GB
GPU cores: 1
Rank: 0, Size: 1
params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 10, 'numberOfDays': 30}
Parallel fetching stock price history quotes completed in 0.18507775000000493 seconds
Returning df from core_logic
Received df from core_logic
Processing 50.0 stocks for 90.0 days
Params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 50, 'numberOfDays': 90}


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

Rank: 0, Size: 1



[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%******

Serial fetching stock price history quotes completed in 4.934847116470337 seconds
GPU name: Tesla P100-PCIE-16GB
GPU cores: 1
Rank: 0, Size: 1
params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 50, 'numberOfDays': 90}


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Parallel fetching stock price history quotes completed in 4.8684129820000095 seconds
Returning df from core_logic
Received df from core_logic
Processing 100.0 stocks for 180.0 days
Params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 100, 'numberOfDays': 180}
Rank: 0, Size: 1


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Serial fetching stock price history quotes completed in 11.432635068893433 seconds
GPU name: Tesla P100-PCIE-16GB
GPU cores: 1
Rank: 0, Size: 1
params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 100, 'numberOfDays': 180}


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Parallel fetching stock price history quotes completed in 10.62521291200001 seconds
Returning df from core_logic
Received df from core_logic
Processing 200.0 stocks for 365.0 days
Params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 200, 'numberOfDays': 365}
Rank: 0, Size: 1


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Serial fetching stock price history quotes completed in 23.169722318649292 seconds
GPU name: Tesla P100-PCIE-16GB
GPU cores: 1
Rank: 0, Size: 1
params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 200, 'numberOfDays': 365}


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Parallel fetching stock price history quotes completed in 22.70458286799999 seconds


[*********************100%%**********************]  1 of 1 completed

Returning df from core_logic
Received df from core_logic
Processing 400.0 stocks for 730.0 days
Params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 400, 'numberOfDays': 730}
Rank: 0, Size: 1



[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%******

Serial fetching stock price history quotes completed in 54.41808271408081 seconds
GPU name: Tesla P100-PCIE-16GB
GPU cores: 1
Rank: 0, Size: 1
params: {'in_colab': False, 'in_notebook': True, 'is_macos': False, 'is_debian': False, 'is_redhat': True, 'is_windows': False, 'is_wsl': False, 'mpi_installed': True, 'cuda_installed': True, 'numberOfStocks': 400, 'numberOfDays': 730}



[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%******

Parallel fetching stock price history quotes completed in 54.24068215300002 seconds
Returning df from core_logic
Received df from core_logic
   numberOfStocks  numberOfDays  numberOfRows  serialElapsedTimes  \
0              10            30           210            0.848917   
1              50            90          3050            4.934847   
2             100           180         12054           11.432635   
3             200           365         49698           23.169722   
4             400           730        198939           54.418083   

   parallelElapsedTimes  
0              0.185078  
1              4.868413  
2             10.625213  
3             22.704583  
4             54.240682  


## Export notebook into Python Script and Run with mpirun

```bash
mpirun -np 1 -mca opal_cuda_support 1 ~/miniconda3/envs/cp631-final/bin/python ~/cp631-final/cp631_final.py
```

## Data Visualization

## Performance Analysis

# Exit

In [64]:
if not params["in_notebook"]:
    exit(0)