# Portfolio Optimization on the Jamaica Stock Exchange

##### Steps:
1. Scrape data
2. Clean data
3. Put data into a useful format
4. Add and calculate relevant metrics for risk and return
5. Store these data in MongoDB
6. Use this + user data-determined risk level to optimize portfolio with geentic algorithms (details coming soon)
7. Display results

### Importing libraries for web scraping and data cleaning

In [97]:
import requests # Get URL data
from bs4 import BeautifulSoup # Manipulate URL data
import datetime # useful since stock prices are IRL time-series data
from pandas import DataFrame as df # shortening to make easier
# for manipulating data after scraping
import numpy as np
import pandas as pd

### Steps 1-3: Scrape, clean and format data
We need a function to retrieve stock price data from the JSE website given any date. Let's get it to clean the data, format into useful data types and put it into a pandas dataframe.

In [239]:
def scrapePrices(date):
    url_string = "https://www.jamstockex.com/trading/trade-summary/?market=combined-market&date="
    date = str(date)[:10]
    test_page = requests.get(url_string + date)
    soup = BeautifulSoup(test_page.text, "html.parser")
    soup.prettify() # this gives HTML text for a given web page

    rows = soup.find_all("tr")
    tickers = []
    closingPrices = []
    for row in rows:
        rowData = row.get_text().split()
        if rowData[0] != "Symbol": 
            ticker = rowData[0]
            if ('USD' in rowData or "(USD)" in rowData) and "USD" not in rowData[0]:
                 ticker += "USD"
            tickers.append(ticker)
            try:
                price = float(rowData[-3])
            except ValueError:
                price = float(rowData[-1])
            closingPrices.append(price)  

    data = {"Ticker": tickers,
            date: np.array(closingPrices,dtype=object)}
    pdframe = df(data)
    return pdframe

Here's an example of this working:

In [235]:
scrapePrices(datetime.datetime(2021,1,29))

Unnamed: 0,Ticker,2021-01-29
0,138SL,4.02
1,AFS,23.21
2,CAC,10.6
3,CFF,1.99
4,CPJ,2.67
...,...,...
78,PBSUSD,0.79
79,PROVEN,37.0
80,RJR,1.36
81,SELECTMD,0.7


### Running for (about) a year's worth of data
Time to scrape, clean and format much more data than for one day.

In [247]:
date = datetime.datetime(2021,2,1)
pdframe = scrapePrices(date)
pdframe = pdframe.set_index('Ticker')
for i in range(3):
    date += datetime.timedelta(days=1)
    frame = scrapePrices(date)
    frame = frame.set_index('Ticker')   
    pdframe = pdframe.merge(frame, on = 'Ticker', how = 'outer')

In [248]:
pdframe = pdframe.drop_duplicates()
pdframe.to_csv('takealook.csv')
pdframe

Unnamed: 0_level_0,2021-02-01,2021-02-02,2021-02-03,2021-02-04
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AMG,1.61,1.6,1.6,1.62
BIL,82.29,82.32,82.36,81.49
CAC9.50,1.11,,1.11,1.11
CABROKERS,1.82,1.84,1.84,1.85
KREMI,5.05,5.04,4.95,5.0
...,...,...,...,...
SML,,,5.35,
GWEST,,,0.74,
JMMBGL7.25C,,,1.84,
ELMIC,,,1.96,
