### If someone wants to collaborate I can share the notebook
This notebook is born from the desire of undestanding how VWAP is implemented/calculated. I started a discussion with this topic which can be found in this [link](https://www.kaggle.com/c/g-research-crypto-forecasting/discussion/286491) 


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

As seen in the data tab of the competition (https://www.kaggle.com/c/g-research-crypto-forecasting/data) an explanation of each file and it's contents.
### Files and features:
* train.csv - The training set
 * timestamp - A timestamp for the minute covered by the row.
 * Asset_ID - An ID code for the cryptoasset.
 * Count - The number of trades that took place this minute.
 * Open - The USD price at the beginning of the minute.
 * High - The highest USD price during the minute.
 * Low - The lowest USD price during the minute.
 * Close - The USD price at the end of the minute.
 * Volume - The number of cryptoasset units traded during the minute.
 * VWAP - The volume weighted average price for the minute.
 * Target - 15 minute residualized returns. See the '[Prediction and Evaluation](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)' section of this notebook for details of how the target is calculated.


* example_test.csv - An example of the data that will be delivered by the time series API. The data is just copied from train.csv.

* example_sample_submission.csv - An example of the data that will be delivered by the time series API. The data is just copied from train.csv.

* asset_details.csv - Provides the real name and of the cryptoasset for each Asset_ID and the weight each cryptoasset receives in the metric.

* gresearch_crypto - An unoptimized version of the time series API files for offline work. You may need Python 3.7 and a Linux environment to run it without errors.

* supplemental_train.csv - After the submission period is over this file's data will be replaced with cryptoasset prices from the submission period. The current copy, which is just filled approximately the right amount of data from train.csv is provided as a placeholder.

In [None]:
mainFolder = '/kaggle/input/g-research-crypto-forecasting/'

In [None]:
%%time 
# datatable installation with internet
!pip install datatable==0.11.0 > /dev/null

In [None]:
import datatable as dt

As suggested by [@julian3833](https://www.kaggle.com/julian3833) in this [notebook](https://www.kaggle.com/julian3833/s-proposal-for-a-meaningful-lb) I will leave the test data out of the training data (as he said Only keep data from before 2021-06-13 00:00:00.). 

In [None]:
# Optimize reading speed of csv with datatable library
def read_csv_strict(fullPath='/kaggle/input/g-research-crypto-forecasting/train.csv'):
    csvDT = dt.fread(fullPath)
    df = csvDT.to_pandas()
    df['datetime'] = pd.to_datetime(df['timestamp'], unit='s')
    df = df[df['datetime'] < '2021-06-13 00:00:00']
    return df

In [None]:
%%time
df = read_csv_strict()

In [None]:
df.shape

In [None]:
df.head(10)

## Let's explore assets details

In [None]:
asset_details = pd.read_csv(mainFolder + 'asset_details.csv')
asset_details.sort_values(by='Weight', ascending=False)

In [None]:
bitcoinAsset = asset_details[asset_details['Asset_Name'] == 'Bitcoin']
bitcoinWeight = bitcoinAsset.Weight
bitcoinWeight

In [None]:
# Get all bitcoin asset data
bitcoin = df[df['Asset_ID'] == 1]

In [None]:
# Checking there is only bitcoin asset
bitcoin['Asset_ID'].nunique()

In [None]:
bitcoin.head()

In [None]:
from sklearn.metrics import mean_absolute_error

As [@abdelghanibelgaid](https://www.kaggle.com/abdelghanibelgaid) suggested I tried this function to recreate VWAP:

In [None]:
def VWAP(df, period=1):
    df['VWAP_Pre'] = ((df['High']+df['Low']+ df['Close'])/3) * df['Volume']
    df['VWAP_Source'] = pd.Series(df['VWAP_Pre'].rolling(period,min_periods=period).mean())
    df['VolumeSMA'] = pd.Series(df['Volume'].rolling(period, min_periods=period).mean())
    df['VWAP2'] =  df['VWAP_Source']/df['VolumeSMA']
    return df['VWAP2']

In [None]:
VWAP(bitcoin)

In [None]:
bitcoin.head(10)

In [None]:
mean_absolute_error(bitcoin['VWAP'], bitcoin['VWAP2'])

But it was some error. 