## Portfolio Optimization Using Python
> Original content by [areed1192](https://github.com/areed1192)
> Available at [portfolio-optimization](https://github.com/areed1192/portfolio-optimization)


### Modules imported below and what they do

1. `pathlib`: Module for working with file paths in a platform-independent manner. It simplifies tasks related to file and directory manipulation.

2. `numpy`: NumPy is a powerful library for numerical operations in Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.

3. `pandas`: Pandas is a data manipulation library that provides data structures like DataFrames and Series. It simplifies data analysis and manipulation tasks, especially for working with tabular data.

4. `matplotlib.pyplot`: Matplotlib is a plotting library in Python, and `pyplot` is a module within it. It provides a MATLAB-like interface for creating a variety of plots and visualizations.

5. `scipy.optimize`: SciPy is an open-source library for mathematics, science, and engineering. The `optimize` module within SciPy provides functions for optimization problems, including various optimization algorithms.

6. `fake_useragent.UserAgent`: The `UserAgent` class from the `fake_useragent` module is used to generate random User-Agent strings. This is commonly used in web scraping to mimic different web browsers and devices, reducing the chances of being blocked by websites.

7. `pprint`: Stands for "pretty-print." It's a module for printing data structures in a more human-readable and aesthetically pleasing way. It's often used for debugging and displaying complex structures.

8. `StandardScaler` from `sklearn.preprocessing`: Scikit-learn is a machine learning library, and `StandardScaler` is a class within scikit-learn used for scaling features. It standardizes features by removing the mean and scaling to unit variance.

9. `pyopt.client.PriceHistory`: Importing a specific module (`PriceHistory`) from the `pyopt.client` package. This is a custom module related to financial data, for handling price history data. It is located in `pyopt/client.py`


In [1]:
# Import necessary modules
import pathlib           # Module for working with file paths
import numpy as np       # NumPy, a library for numerical operations in Python
import pandas as pd      # Pandas, a powerful data manipulation library
import matplotlib.pyplot as plt  # Matplotlib for creating plots
import scipy.optimize as sci_plt  # SciPy's optimization module

from fake_useragent import UserAgent # Module to mimic a web browser
from pprint import pprint  # Pretty-print module for enhanced printing of data structures
from sklearn.preprocessing import StandardScaler  # Scikit-learn's StandardScaler for feature scaling
from pyopt.client import PriceHistory  # Importing a specific module 'PriceHistory' from 'pyopt.client'

# Set display options for Pandas
pd.set_option('display.max_colwidth', None)  # Display full content of DataFrame columns without truncation
pd.set_option('expand_frame_repr', False)    # Prevent DataFrame from wrapping across multiple lines when displayed


In [2]:
# Define the symbols for the stock portfolio
symbols = ['AAPL', 'MSFT', 'SQ', 'AMZN']

# Get the number of stocks in the portfolio
number_of_symbols = len(symbols)

# Check if there is no existing data file, grab data from NASDAQ
if not pathlib.Path("data/stock_data.csv").exists():
    # Initialize the PriceHistory Client with a UserAgent for Chrome
    # This simulates the use of a browser to access the data
    price_history_client = PriceHistory(symbols=symbols, user_agent=UserAgent().chrome)
    
    # Grab the data and save it to a CSV file
    price_history_client.price_data_frame.to_csv("data/stock_data.csv", index=False)
    
    # Display the obtained data
    display(price_history_client.price_data_frame)
    
    # Store the data frame for further use
    price_data_frame: pd.DataFrame = price_history_client.price_data_frame
    
else:
    # Load the existing CSV file if data already exists
    price_data_frame: pd.DataFrame = pd.read_csv("data/stock_data.csv")

# Display the first few rows of the loaded or obtained data
display(price_data_frame.head())


Unnamed: 0,date,close,volume,open,high,low,symbol
0,2024-01-08,149.1,46757050,146.74,149.4,146.15,AMZN
1,2024-01-05,145.24,45153150,144.69,146.59,144.53,AMZN
2,2024-01-04,144.57,56039810,145.59,147.38,144.05,AMZN
3,2024-01-03,148.47,49425500,149.2,151.05,148.33,AMZN
4,2024-01-02,149.93,47339420,151.54,152.38,148.39,AMZN


In [3]:
# Grab columns needed
price_data_frame = price_data_frame[['date', 'symbol', 'close']]

# Pivot Dataframe
price_data_frame = price_data_frame.pivot(
    index='date',
    columns='symbol',
    values='close')

display(price_data_frame)

symbol,AAPL,AMZN,MSFT,SQ
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-07-10,188.61,127.13,331.83,69.06
2023-07-11,188.08,128.78,332.47,71.10
2023-07-12,189.77,130.80,337.20,71.22
2023-07-13,190.54,134.30,342.66,76.20
2023-07-14,190.69,134.68,345.24,75.46
...,...,...,...,...
2024-01-02,185.64,149.93,370.87,72.22
2024-01-03,184.25,148.47,370.60,68.63
2024-01-04,181.91,144.57,367.94,68.15
2024-01-05,181.18,145.24,367.75,66.96
