# Step 3 - Daily Prices

Our next dataset to explore will be the `trading.prices` table which contains the daily price and volume data for the 2 cryptocurrency tickers: `ETH` and `BTC` (Ethereum and Bitcoin!)

## View The Data
Before we try to solve our next set of questions below - you can try viewing a few rows from the trading.prices dataset:

In [1]:
import pandas as pd
import mysql.connector as sql
import os

In [2]:
connection = sql.connect(
    host = os.environ.get('mysql_host'),
    user = os.environ.get('mysql_user'),
    password = os.environ.get('mysql_password')
)

Example Bitcoin price data:

In [3]:
pd.read_sql_query(
    """
    SELECT *
    FROM trading.prices
    WHERE ticker = 'BTC'
    LIMIT 5;
    """,
    connection
)

Unnamed: 0,ticker,market_date,price,open,high,low,volume,change
0,BTC,2021-08-29,48255.0,48899.7,49621.7,48101.9,40.96K,-1.31%
1,BTC,2021-08-28,48897.1,49062.8,49289.4,48428.5,36.73K,-0.34%
2,BTC,2021-08-27,49064.3,46830.2,49142.0,46371.5,62.47K,4.77%
3,BTC,2021-08-26,46831.6,48994.4,49347.8,46360.4,73.79K,-4.41%
4,BTC,2021-08-25,48994.5,47707.4,49230.2,47163.3,63.54K,2.68%


Example Ethereum price data:

In [5]:
pd.read_sql_query(
    """
    SELECT *
    FROM trading.prices
    WHERE ticker = 'ETH'
    LIMIT 5;
    """,
    connection
)

Unnamed: 0,ticker,market_date,price,open,high,low,volume,change
0,ETH,2021-08-29,3177.84,3243.96,3282.21,3162.79,582.04K,-2.04%
1,ETH,2021-08-28,3243.9,3273.78,3284.58,3212.24,466.21K,-0.91%
2,ETH,2021-08-27,3273.58,3093.78,3279.93,3063.37,839.54K,5.82%
3,ETH,2021-08-26,3093.54,3228.03,3249.62,3057.48,118.44K,-4.17%
4,ETH,2021-08-25,3228.15,3172.12,3247.43,3080.7,923.13K,1.73%


## Data Dictionary
|Column Name|	Description|
|---|---|
|ticker	|one of either BTC or ETH|
|market_date|	the date for each record|
|price	|closing price at end of day|
|open|	the opening price|
|high|	the highest price for that day|
|low|	the lowest price for that day|
|volume	|the total volume traded|
|change	|% change price in price|

## Data Exploration Questions
Let's answer a few simple questions to help us better understand the `trading.prices` table.

## Question 1
How many total records do we have in the `trading.prices` table?

In [7]:
pd.read_sql_query(
    """
    SELECT
        COUNT(*) AS records_count
    FROM trading.prices;
    """,
    connection
)

Unnamed: 0,records_count
0,3404


## Question 2
How many records are there per `ticker` value?

In [9]:
pd.read_sql_query(
    """
    SELECT
        ticker,
        COUNT(*) AS ticker_count
    FROM trading.prices
    GROUP BY ticker;
    """,
    connection
)

Unnamed: 0,ticker,ticker_count
0,ETH,1702
1,BTC,1702


## Question 3
What is the minimum and maximum `market_date` values?

In [11]:
pd.read_sql_query(
    """
    SELECT
        MIN(market_date) AS date_min,
        MAX(market_date) AS date_max
    FROM trading.prices;
    """,
    connection
)

Unnamed: 0,date_min,date_max
0,2017-01-01,2021-08-29


## Question 4
Are there differences in the minimum and maximum `market_date` values for each ticker?

In [12]:
pd.read_sql_query(
    """
    SELECT
        ticker,
        MIN(market_date) AS date_min,
        MAX(market_date) AS date_max
    FROM trading.prices
    GROUP BY ticker;
    """,
    connection
)

Unnamed: 0,ticker,date_min,date_max
0,ETH,2017-01-01,2021-08-29
1,BTC,2017-01-01,2021-08-29


## Question 5
What is the average of the `price` column for Bitcoin records during the year 2020?

In [17]:
pd.read_sql_query(
    """
    SELECT
        AVG(price)
    FROM trading.prices
    WHERE ticker = 'BTC' 
        AND EXTRACT(YEAR FROM market_date)='2020'
    """,
    connection
)

Unnamed: 0,AVG(price)
0,11111.631152


## Question 6
What is the monthly average of the `price` column for Ethereum in 2020? Sort the output in chronological order and also round the average price value to 2 decimal places

In [29]:
pd.read_sql_query(
    """
    SELECT
        EXTRACT(MONTH FROM market_date) AS month,
        ROUND(AVG(price), 2) AS average_eth_price
    FROM trading.prices
    WHERE ticker='ETH'
        AND market_date BETWEEN '2020-01-01' AND '2020-12-31'
    GROUP BY month
    ORDER BY month;
    """,
    connection
)

Unnamed: 0,month,average_eth_price
0,1,156.65
1,2,238.76
2,3,160.18
3,4,171.29
4,5,207.45
5,6,235.92
6,7,259.57
7,8,401.73
8,9,367.77
9,10,375.79


## Question 7
Are there any duplicate `market_date` values for any ticker value in our table?

In [26]:
pd.read_sql_query(
    """
    SELECT
        ticker,
        COUNT(market_date) AS total_count,
        COUNT(DISTINCT market_date) AS unique_count
    FROM trading.prices
    GROUP BY ticker;
    """,
    connection
)

Unnamed: 0,ticker,total_count,unique_count
0,BTC,1702,1702
1,ETH,1702,1702


## Question 8
How many days from the `trading.prices` table exist where the high price of Bitcoin is over $30,000?

In [31]:
pd.read_sql_query(
    """
    SELECT 
        COUNT(*) as days
    FROM trading.prices
    WHERE ticker = 'BTC'
        AND price > 30000 
    """,
    connection
)

Unnamed: 0,days
0,239


## Question 9
How many "breakout" days were there in 2020 where the `price` column is greater than the open column for each ticker?

In [39]:
pd.read_sql_query(
    """
    SELECT
        ticker,
        SUM(CASE WHEN price > open THEN 1 ELSE 0 END) AS breakout_days
    FROM trading.prices
    WHERE market_date BETWEEN '2020-01-01' AND '2020-12-31'
    GROUP BY ticker;
    """,
    connection
)

Unnamed: 0,ticker,breakout_days
0,ETH,200.0
1,BTC,207.0


## Question 10
How many "non_breakout" days were there in 2020 where the `price` column is less than the open column for each ticker?

In [50]:
pd.read_sql_query(
    """
    SELECT
        ticker,
        SUM(CASE WHEN price < open THEN 1 ELSE 0 END) AS non_breakout_days
    FROM trading.prices
    WHERE market_date BETWEEN '2020-01-01' AND '2020-12-31'
    GROUP BY ticker;
    """,
    connection
)

Unnamed: 0,ticker,non_breakout_days
0,ETH,166.0
1,BTC,159.0


## Question 11
What percentage of days in 2020 were breakout days vs non-breakout days? Round the percentages to 2 decimal places

In [49]:
pd.read_sql_query(
    """
    SELECT
        ticker,
        ROUND(SUM(CASE WHEN open < price THEN 1 ELSE 0 END)/COUNT(*), 2) AS breakout_percentage,
        ROUND(SUM(CASE WHEN open > price THEN 1 ELSE 0 END)/COUNT(*), 2) AS non_breakout_percentage
    FROM trading.prices
    WHERE market_date >= '2020-01-01' AND market_date <= '2020-12-31'
    GROUP BY ticker;
    """,
    connection
)

Unnamed: 0,ticker,breakout_percentage,non_breakout_percentage
0,ETH,0.55,0.45
1,BTC,0.57,0.43


# Appendix
## Date Manipulations

There are all valid methods to qualify `DATE` or `TIMESTAMP` values within a range using a `WHERE` filter:

```
- market_date BETWEEN '2020-01-01' AND '2020-12-31'
- EXTRACT(YEAR FROM market_date) = 2020
- DATE_TRUNC('YEAR', market_date) = '2020-01-01' (doesn'work in MySQL)
- market_date >= '2020-01-01' AND market_date <= '2020-12-31'
```
The only additional thing to note is that `DATE_TRUNC` returns a `TIMESTAMP` data type which can be cast back to a regular `DATE`.

# References
- [Data With Danny Course - Step 3](https://github.com/DataWithDanny/sql-masterclass/blob/main/course-content/step3.md)