## Prerequisites

*Coding along with the Udemy Couse [Machine Learning Applied to Stock & Crypto Trading](https://www.udemy.com/course/machine-learning-applied-to-stock-crypto-trading-python/) by Shaun McDonogh.*

### Different Sections of Machine Learning

<img src="../assets/images/sections_of_ml.png" width="80%" />

*Screenshot taken from the Udemy Couse [Machine Learning Applied to Stock & Crypto Trading](https://www.udemy.com/course/machine-learning-applied-to-stock-crypto-trading-python/) by Shaun McDonogh.*

#### __Methods used in Unsupervised Learning:__

- If we're doing __dimensionality reduction__, we want to reduce the amount of features. For this we use principal component analysis.

- If we're doing __clustering__ we use k-means clustering and the hidden markov models.

#### __Methods used in Supervised Learning:__

- __Classification__ is a supervised machine learning problem where we use xgboost or random forest classifier.

- For __regression__ we use a random forest regressor (xgboost regression can be used here too).

#### __Methods used in Reinforcement Learning:__

- For __reinforcement learning (or deep learning)__ we are using a method that's known as proximal policy gradient (proximal policy optimization, PPO).

### Performance Evaluation – Confusion Matrix

<img src="../assets/images/confusion_matrix_1.png" width="80%" />

*Screenshot taken from the Udemy Couse [Machine Learning Applied to Stock & Crypto Trading](https://www.udemy.com/course/machine-learning-applied-to-stock-crypto-trading-python/) by Shaun McDonogh.*

A very useful tool in both binary and multiclass classification is the ***confusion matrix***.

- True Positive (bottom right): we predict a one and get a one (one=market goes up)

- True Negative (top left): we predict a zero and get a zero (zero=market goes down)

__TRUE means we got them right.__

- False Positive (top right): a false positive is where we predicted a one but it was actually a zero

- False Negative (bottom left): a false negative is where we predicted a zero but it was actually a one

__FALSE means we got things wrong.__

The confusion matrix can help us to get insights into how is your model making predictions or where is it leaning towards. If the predictions of the model turn out to be false, we can ask
- Why might it be predicting that?
- Have we given it the right data?
- Have we given it the right size of data?
- Have we, for example, only given it data where the market's gone up and therefore it's learned just to predict up?

<img src="../assets/images/confusion_matrix_2.png" width="80%" />

*Screenshot taken from the Udemy Couse [Machine Learning Applied to Stock & Crypto Trading](https://www.udemy.com/course/machine-learning-applied-to-stock-crypto-trading-python/) by Shaun McDonogh.*

Based on a confusion matrix we can calculate some metrics:

1. __Accuracy:__ How many did I get correct in total versus the total amount of predictions or available data that there is? Accuracy is very straightforward but not very useful for us.

2. __Precision:__ Precision takes the true positives as a percentage of the true positives and the false positives. What it's basically saying is, what percentage out of all of my predictions of a one (TRUE) did I get right? Where I'm wrong her, __I will lose money__, therefore I I want that percentage to be as high as possible. ***When we get into supervised machine learning we will optimise xgboost, I will to aim for improving precision.***

3. __Recall:__ What are my true positives as a percentage of my true positives and false negatives? The recall is basically looking at how many of the pool did you get. Out of everything I could have predicted, how many of them did I catch? Missed opportunities means you missed positive trades but __you didn't lose money__.

4. __F1 score:__ The F1 score can be seen as the true accuracy. It looks at the precision multiplied by the recall over the precision plus the recall multiplied by two (2 x ((Precision x Recall) / (Precision + Recall))) and takes like a balanced view of the precision and recall.

### Data Sources

In [39]:
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt

In [40]:
data = yf.download("AAPL", "2017-01-01", "2022-01-01")

[*********************100%***********************]  1 of 1 completed


In [41]:
data.head()

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2017-01-03,26.891964,29.0375,29.0825,28.690001,28.950001,115127600
2017-01-04,26.861856,29.004999,29.127501,28.9375,28.9625,84472400
2017-01-05,26.998465,29.1525,29.215,28.952499,28.98,88774400
2017-01-06,27.299446,29.477501,29.540001,29.1175,29.195,127007600
2017-01-09,27.549498,29.747499,29.8575,29.485001,29.487499,134247600


### Retrieve Data and Calculate Returns

In [42]:
start_date = "2017-01-01"
end_date = "2022-06-01"
symbol = "ETH-USD"
df = yf.download(symbol, start=start_date, end=end_date)

[*********************100%***********************]  1 of 1 completed


In [43]:
df

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,ETH-USD,ETH-USD,ETH-USD,ETH-USD,ETH-USD,ETH-USD
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2017-11-09,320.884003,320.884003,329.451996,307.056000,308.644989,893249984
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.670990,885985984
2017-11-11,314.681000,314.681000,319.453003,298.191986,298.585999,842300992
2017-11-12,307.907990,307.907990,319.153015,298.513000,314.690002,1613479936
2017-11-13,316.716003,316.716003,328.415009,307.024994,307.024994,1041889984
...,...,...,...,...,...,...
2022-05-27,1724.922852,1724.922852,1814.656494,1721.264771,1802.543823,25470760032
2022-05-28,1757.941772,1757.941772,1757.941772,1724.635986,1724.635986,23214777872
2022-05-29,1812.031006,1812.031006,1818.776611,1765.937378,1792.184448,10642556101
2022-05-30,1996.441284,1996.441284,2005.210815,1804.456055,1811.885986,19580808705


In [46]:
# explicitly renaming the columns
# when we want to add columns at a later step with loc and a conditional we're running into an index error 
# if we stay with two index levels (or a nested index)
df.columns = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume']

In [47]:
df.head()

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-11-09,320.884003,320.884003,329.451996,307.056,308.644989,893249984
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992
2017-11-12,307.90799,307.90799,319.153015,298.513,314.690002,1613479936
2017-11-13,316.716003,316.716003,328.415009,307.024994,307.024994,1041889984


In [48]:
df.tail(3)

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2022-05-29,1812.031006,1812.031006,1818.776611,1765.937378,1792.184448,10642556101
2022-05-30,1996.441284,1996.441284,2005.210815,1804.456055,1811.885986,19580808705
2022-05-31,1942.328003,1942.328003,2005.490967,1932.352051,1996.408081,18363115560


In [49]:
df.describe()

Unnamed: 0,Adj Close,Close,High,Low,Open,Volume
count,1665.0,1665.0,1665.0,1665.0,1665.0,1665.0
mean,1094.638688,1094.638688,1130.148491,1052.779558,1093.925164,12688490000.0
std,1254.965284,1254.965284,1294.074634,1210.697752,1255.461068,11143680000.0
min,84.308296,84.308296,85.342743,82.829887,84.279694,621733000.0
25%,199.841003,199.841003,205.151001,195.242996,199.690994,3533410000.0
50%,407.252014,407.252014,418.63797,392.953003,406.800995,10024180000.0
75%,1918.362061,1918.362061,1985.395996,1827.571533,1916.598389,18102280000.0
max,4812.087402,4812.087402,4891.70459,4718.039062,4810.071289,84482910000.0


In [50]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1665 entries, 2017-11-09 to 2022-05-31
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Adj Close  1665 non-null   float64
 1   Close      1665 non-null   float64
 2   High       1665 non-null   float64
 3   Low        1665 non-null   float64
 4   Open       1665 non-null   float64
 5   Volume     1665 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 91.1 KB


### Working with Returns

In [51]:
# add a column
df["Returns"] = df["Close"].pct_change() # calculatig oercentage change for the close column
df.head(3)

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume,Returns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-11-09,320.884003,320.884003,329.451996,307.056,308.644989,893249984,
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984,-0.067411
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992,0.051555


In [52]:
# drop Na
df.dropna(inplace=True) # inplace=True overwrites the dataFrame
df.head(3)

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume,Returns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984,-0.067411
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992,0.051555
2017-11-12,307.90799,307.90799,319.153015,298.513,314.690002,1613479936,-0.021523


In [53]:
# calculate Log Returns
# log returns are important because whenever you're summing returns for a certain asset
# like when we're doing back testing or equity curves, we'll need log returns, the natural log return to be specific
# shift(1) means we're dividing the close price by the close price one row before
df["lrets"] = np.log(df["Close"] / df["Close"].shift(1))
df.head(3)

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume,Returns,lrets
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984,-0.067411,
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992,0.051555,0.05027
2017-11-12,307.90799,307.90799,319.153015,298.513,314.690002,1613479936,-0.021523,-0.021758


<span style="color:green;line-height:1.5;font-weight:bold">Claude.ai, What is the natural logarithm and why is it preferable or necessary to use it when calculating returns in finance?</span>

> The natural logarithm (ln) is a mathematical function with some unique properties that make it particularly useful in financial calculations, especially when analyzing returns and compounding.
> 
> Let me break this down in a few key points:
> 
> __1. Definition of Natural Logarithm__<br/>
> The natural logarithm is the logarithm to the base e (approximately 2.71828), which is a fundamental mathematical constant. In mathematical notation, it's written as ln(x), which represents the power to which e must be raised to obtain the value x.
> 
> __2. Why It's Useful in Finance__<br/>
> In financial calculations, the natural logarithm is preferable to simple percentage returns for several important reasons:
> 
> - Symmetry of Returns: When you use natural logarithmic returns (also called log returns), the mathematical properties create a more symmetric representation of gains and losses. With percentage returns, a 50% gain followed by a 50% loss does not bring you back to the original value, but log returns solve this asymmetry.
> 
> - Compounding Behavior: Log returns can be easily added together to calculate multi-period returns. This makes them superior for analyzing cumulative performance over time, especially when dealing with multiple investment periods.
> 
> - Statistical Properties: Log returns are more normally distributed, which makes them more suitable for statistical analysis and financial modeling. This characteristic is crucial for risk management and portfolio theory.
> 
> __3. Calculation Example__<br/>
> If a stock price changes from \\$100 to \\$110, the calculation differs:
> - Percentage Return: (110 - 100) / 100 = 10%
> - Log Return: ln(110/100) = ln(1.1) ≈ 0.0953 or 9.53%
> 
> The log return provides a more mathematically consistent representation of the price change.
> 
> __4. Practical Application__<br/>
> Financial analysts and quantitative traders often use log returns when:
> - Calculating portfolio performance
> - Developing statistical models
> - Analyzing time series data
> - Constructing risk metrics
> 
> The natural logarithm transforms multiplicative processes (like stock price changes) into additive processes, which simplifies many complex financial calculations.

In [54]:
# cumulative sum of the log returns
# CumSum Log Returns
df["lrets_cumsum"] = df["lrets"].cumsum()
df.head()

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume,Returns,lrets,lrets_cumsum
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984,-0.067411,,
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992,0.051555,0.05027,0.05027
2017-11-12,307.90799,307.90799,319.153015,298.513,314.690002,1613479936,-0.021523,-0.021758,0.028512
2017-11-13,316.716003,316.716003,328.415009,307.024994,307.024994,1041889984,0.028606,0.028204,0.056716
2017-11-14,337.631012,337.631012,340.177002,316.763,316.763,1069680000,0.066037,0.063948,0.120664


In [55]:
# Normalise Log Returns
# to get the true returns, the true cumulative sum returns, we need to normalize it with the exponents
df["rets_normal"] = np.exp(df["lrets_cumsum"]) - 1
df.head()

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984,-0.067411,,,
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992,0.051555,0.05027,0.05027,0.051555
2017-11-12,307.90799,307.90799,319.153015,298.513,314.690002,1613479936,-0.021523,-0.021758,0.028512,0.028922
2017-11-13,316.716003,316.716003,328.415009,307.024994,307.024994,1041889984,0.028606,0.028204,0.056716,0.058355
2017-11-14,337.631012,337.631012,340.177002,316.763,316.763,1069680000,0.066037,0.063948,0.120664,0.128246


In [56]:
# filling the NaNs instead of droppimg them
df.fillna(0, inplace=True)
df.head(3)

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984,-0.067411,0.0,0.0,0.0
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992,0.051555,0.05027,0.05027,0.051555
2017-11-12,307.90799,307.90799,319.153015,298.513,314.690002,1613479936,-0.021523,-0.021758,0.028512,0.028922


### Structure Changes of the pandas DataFrame

In [57]:
# create a new copy of the dataframe
df_new = df.copy()
df_new.head(3)

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2017-11-10,299.252991,299.252991,324.717987,294.541992,320.67099,885985984,-0.067411,0.0,0.0,0.0
2017-11-11,314.681,314.681,319.453003,298.191986,298.585999,842300992,0.051555,0.05027,0.05027,0.051555
2017-11-12,307.90799,307.90799,319.153015,298.513,314.690002,1613479936,-0.021523,-0.021758,0.028512,0.028922


In [58]:
# dropping some columns in the copied dataframe
df_new.drop(columns=["High", "Low", "Adj Close"], inplace=True)
df_new.head()

Unnamed: 0_level_0,Close,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-11-10,299.252991,320.67099,885985984,-0.067411,0.0,0.0,0.0
2017-11-11,314.681,298.585999,842300992,0.051555,0.05027,0.05027,0.051555
2017-11-12,307.90799,314.690002,1613479936,-0.021523,-0.021758,0.028512,0.028922
2017-11-13,316.716003,307.024994,1041889984,0.028606,0.028204,0.056716,0.058355
2017-11-14,337.631012,316.763,1069680000,0.066037,0.063948,0.120664,0.128246


In [59]:
# searching for rows and columns
# work with certain rows and columns usinh iloc
#iloc[before the comma is rowa, after the comma is columns]
df_new.iloc[1:3, 2:-1]

Unnamed: 0_level_0,Volume,Returns,lrets,lrets_cumsum
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-11-11,842300992,0.051555,0.05027,0.05027
2017-11-12,1613479936,-0.021523,-0.021758,0.028512


### Conditionals

In [60]:
# add conditional statements
# df["Close"].shift(-1) is one step in the future
# is close of 'one step in the future' greater than today's close?
# then creating a target column and setting it's vaule to one if condition is met
# the TARGET column is the future we want to predict
df_new.loc[df["Close"].shift(-1) > df["Close"], "TARGET"] = 1
df_new.loc[df["Close"].shift(-1) <= df["Close"], "TARGET"] = -1
df_new.head()

# this could also be achieved with numpy's np.where:
# create the TARGET column
# df["TARGET"] = np.where(df["Close"].shift(-1) > df["Close"], 1, -1)

Unnamed: 0_level_0,Close,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal,TARGET
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2017-11-10,299.252991,320.67099,885985984,-0.067411,0.0,0.0,0.0,1.0
2017-11-11,314.681,298.585999,842300992,0.051555,0.05027,0.05027,0.051555,-1.0
2017-11-12,307.90799,314.690002,1613479936,-0.021523,-0.021758,0.028512,0.028922,1.0
2017-11-13,316.716003,307.024994,1041889984,0.028606,0.028204,0.056716,0.058355,1.0
2017-11-14,337.631012,316.763,1069680000,0.066037,0.063948,0.120664,0.128246,-1.0


In [61]:
df_new.describe()

Unnamed: 0,Close,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal,TARGET
count,1664.0,1664.0,1664.0,1664.0,1664.0,1664.0,1664.0,1663.0
mean,1095.103684,1094.397087,12695580000.0,0.002406,0.001124,0.614432,2.659458,0.039086
std,1255.199062,1255.690746,11143280000.0,0.051004,0.051771,1.177616,4.194441,0.999536
min,84.308296,84.279694,621733000.0,-0.423472,-0.550732,-1.266809,-0.718271,-1.0
25%,199.679073,199.534943,3550877000.0,-0.021585,-0.021777,-0.404579,-0.332742,-1.0
50%,407.698105,407.073502,10033600000.0,0.001541,0.001539,0.309237,0.362386,1.0
75%,1918.65506,1917.117065,18104100000.0,0.028098,0.027711,1.85809,5.411482,1.0
max,4812.087402,4810.071289,84482910000.0,0.264581,0.234741,2.777597,15.080332,1.0


In [62]:
df_new.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1664 entries, 2017-11-10 to 2022-05-31
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Close         1664 non-null   float64
 1   Open          1664 non-null   float64
 2   Volume        1664 non-null   int64  
 3   Returns       1664 non-null   float64
 4   lrets         1664 non-null   float64
 5   lrets_cumsum  1664 non-null   float64
 6   rets_normal   1664 non-null   float64
 7   TARGET        1663 non-null   float64
dtypes: float64(7), int64(1)
memory usage: 117.0 KB


### Iterations

In [63]:
# iterating over DataFrame
# for each row in a dataframe do something
i = 0
for index, row in df_new.iterrows():
    print(index, row["Close"], row["TARGET"])
    if i >= 10:
        break
    i += 1

2017-11-10 00:00:00 299.25299072265625 1.0
2017-11-11 00:00:00 314.6809997558594 -1.0
2017-11-12 00:00:00 307.9079895019531 1.0
2017-11-13 00:00:00 316.71600341796875 1.0
2017-11-14 00:00:00 337.6310119628906 -1.0
2017-11-15 00:00:00 333.35699462890625 -1.0
2017-11-16 00:00:00 330.92401123046875 1.0
2017-11-17 00:00:00 332.3940124511719 1.0
2017-11-18 00:00:00 347.61199951171875 1.0
2017-11-19 00:00:00 354.385986328125 1.0
2017-11-20 00:00:00 366.7300109863281 -1.0


### Extract Arrays and Values from a DataFrame

In [67]:
# get close prices
df_ex = df_new.copy()
close_prices = df_ex["Close"].values
close_prices

array([ 299.25299072,  314.68099976,  307.9079895 , ..., 1812.03100586,
       1996.44128418, 1942.32800293])

In [69]:
# first four
close_prices[:4]

array([299.25299072, 314.68099976, 307.9079895 , 316.71600342])

In [70]:
# putting them into a python list
list(close_prices[:4])

[np.float64(299.25299072265625),
 np.float64(314.6809997558594),
 np.float64(307.9079895019531),
 np.float64(316.71600341796875)]

In [71]:
# change Close Prices
mod_close_prices = close_prices / 2
mod_close_prices[:4]

array([149.62649536, 157.34049988, 153.95399475, 158.35800171])

In [72]:
# add back to DataFrame with new column
df_ex["Mod_Close"] = mod_close_prices
df_ex.head()

Unnamed: 0_level_0,Close,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal,TARGET,Mod_Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2017-11-10,299.252991,320.67099,885985984,-0.067411,0.0,0.0,0.0,1.0,149.626495
2017-11-11,314.681,298.585999,842300992,0.051555,0.05027,0.05027,0.051555,-1.0,157.3405
2017-11-12,307.90799,314.690002,1613479936,-0.021523,-0.021758,0.028512,0.028922,1.0,153.953995
2017-11-13,316.716003,307.024994,1041889984,0.028606,0.028204,0.056716,0.058355,1.0,158.358002
2017-11-14,337.631012,316.763,1069680000,0.066037,0.063948,0.120664,0.128246,-1.0,168.815506


In [76]:
# det just one item
price = df_ex["Close"].iloc[1:2].item() # specific refernce to one item
price

314.6809997558594

### Multiple Adjustments

In [77]:
# Change Close, Volume and Open
df_ex[["Open", "Close", "Volume"]] = df_ex[["Open", "Close", "Volume"]] / df_ex[["Open", "Close", "Volume"]].max()
df_ex.tail()

Unnamed: 0_level_0,Close,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal,TARGET,Mod_Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2022-05-27,0.358456,0.374744,0.30149,-0.043788,-0.044776,1.751648,4.764096,1.0,862.461426
2022-05-28,0.365318,0.358547,0.274787,0.019142,0.018961,1.77061,4.874433,1.0,878.970886
2022-05-29,0.376558,0.37259,0.125973,0.030769,0.030305,1.800914,5.055181,1.0,906.015503
2022-05-30,0.414881,0.376686,0.231772,0.10177,0.096918,1.897832,5.671416,-1.0,998.220642
2022-05-31,0.403635,0.415048,0.217359,-0.027105,-0.027479,1.870353,5.490588,,971.164001


### Create Series and DataFrame 

In [78]:
# create a dataframe from scratch with our own numbers
# Create a Series
series_1_list = [1, 3, 8, 4, 3]
series_1 = pd.Series(series_1_list)
series_1 # index plus values

0    1
1    3
2    8
3    4
4    3
dtype: int64

In [79]:
# create DataFrame out of series
df_created = pd.DataFrame(series_1_list, columns=["SERIES"])
df_created

Unnamed: 0,SERIES
0,1
1,3
2,8
3,4
4,3


### Save and Load DataFrame

In [84]:
# save DataFrame to csv file
df_ex.to_csv("../assets/data/pandas_test_1.csv")

In [85]:
# load DataFrame from csv file
my_df = pd.read_csv("../assets/data/pandas_test_1.csv")
my_df.head()

Unnamed: 0,Date,Close,Open,Volume,Returns,lrets,lrets_cumsum,rets_normal,TARGET,Mod_Close
0,2017-11-10,0.062188,0.066667,0.010487,-0.067411,0.0,0.0,0.0,1.0,149.626495
1,2017-11-11,0.065394,0.062075,0.00997,0.051555,0.05027,0.05027,0.051555,-1.0,157.3405
2,2017-11-12,0.063986,0.065423,0.019098,-0.021523,-0.021758,0.028512,0.028922,1.0,153.953995
3,2017-11-13,0.065817,0.06383,0.012333,0.028606,0.028204,0.056716,0.058355,1.0,158.358002
4,2017-11-14,0.070163,0.065854,0.012661,0.066037,0.063948,0.120664,0.128246,-1.0,168.815506
