# Class Exercises

This file contains solutions for exercises proposed during class. Notice there are multiple possible solutions for each one.

## Exercise 1.1

_From `prices.csv`, read data for `tsla` symbol and select 90% of `Close` column for training and reserve 10% for model later checking._

**Step 1:** Read prices data and filter `tsla` data

In [1]:
import pandas as pd
prices = pd.read_csv('./data/prices.csv', index_col=[0,1])
tsla = prices.loc['tsla'] 
tsla.head(2)

# Alternativelly:
# tsla = prices[prices.index.get_level_values('Symbol').isin(['tsla'])]
# tsla.index = tsla.index.droplevel(0)

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-06-29,23.89,25.0,17.54,19.0,18783276
2010-06-30,23.83,30.4192,23.3,25.96,17194394


**Step 2:** Get DataFrame lenght, pick using head/tail

In [2]:
import math
tsla_count = len(tsla)
train = tsla.Close.head(math.floor(tsla_count*0.9))
check = tsla.Close.head(math.floor(tsla_count*0.1))

If you just want to pick _one_ data set, you could use _sample_ function. Watch out duplicates using different fraction over same data! Documentation can be checked here: [pandas.DataFrame.sample](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html?highlight=sample#pandas.DataFrame.sample)

In [3]:
tsla.Close[0:5].sample(frac=0.2, random_state=1)

Date
2010-07-01    21.96
Name: Close, dtype: float64

**Step 3:** To avoid the duplicates problem, you could generate the data sets using [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) function from `scikit-learn`:

In [4]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(tsla.Close, test_size=0.15, random_state=42)
[len(train), len(test)]

[1768, 313]

## Exercise 1.2

_From `prices.csv` file, read `msft` data and write it as a Feather file in `data/msft.feather`._

In [5]:
prices = pd.read_csv('./data/prices.csv', index_col=[0,1])
msft = prices.loc['msft'].copy()
msft.reset_index().to_feather('data/msft.feather')

That `copy()` call is pretty important to avoid the classic `SettingWithCopyWarning` in the next exercise. More information in [this link](https://stackoverflow.com/questions/42379818/correct-way-to-set-new-column-in-pandas-dataframe-to-avoid-settingwithcopywarnin).

## Exercise 1.3

_Add a boolean column to previous `msft` data frame indicating if trades open with values greater than 100_

In [6]:
msft.loc[:,'Important'] = msft.Open > 100
msft["2018-06":].head(2)

Unnamed: 0_level_0,Close,High,Low,Open,Volume,Important
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-06-01,100.79,100.86,99.17,99.2798,28655624,False
2018-06-04,101.67,101.86,100.851,101.26,27281623,True


## Exercise 1.4

_Look in Pandas documentation for styling functions and mark in red all `close` values below 30_

In [7]:
def warning_style(v):    
    return 'background-color: #ff4081; color: white' if v < 30.0 else ''

s = msft['2010-05':].head().style.applymap(warning_style, subset=['Close'])
s

Unnamed: 0_level_0,Close,High,Low,Open,Volume,Important
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-05-03,30.86,31.0606,30.58,30.67,43990036,False
2010-05-04,30.13,30.55,29.75,30.5,82085579,False
2010-05-05,29.85,30.09,29.69,29.72,66833777,False
2010-05-06,28.98,29.88,27.91,29.56,128612951,False
2010-05-07,28.21,28.95,27.32,28.86,173718024,False


## Exercise 1.5

_Read CSV data for Tesla `tsla` directly into a `Series` and compute _closing_ values for each quarter._

In [8]:
tsla = pd.read_csv("./data/tsla.csv", header=0, index_col=0, squeeze=True, usecols=[0, 4], parse_dates=True)
quarter_ohlc = tsla.resample("Q").pad()
quarter_ohlc.head()

Date
2010-06-30    25.96
2010-09-30    22.00
2010-12-31    26.57
2011-03-31    26.50
2011-06-30    28.50
Freq: Q-DEC, Name: Open, dtype: float64