<img src=https://user-images.githubusercontent.com/49825286/90585259-06971800-e19a-11ea-88e2-c4866aa8caaf.png width="350">

# Whitehole Package (vers. 0.0.3)
***

_Whitehole is a zarr-format-based-tool that allows individuals or small teams, interested in processing financial information, make investment decisions._

**General objective:**

The main goal of the `whitehole` project is to create a proper environment to upload high-quality data, analize top-tier strategies and complex models efficiently. Tools and features such as backtester, parallelization and Machine Learning structures will be implemented in next updates. 

**Specific objective:**

`whitehole` package version **0.0.3** allows to read [free tick-market data provided by Quantmoon Technologies](https://www.quantmoon.com/tickdata).

### Abstract:

The current `whitehole` package version **0.0.2** works as a _**decryptor**_ for Quantmoon free tick-market data. This data is composed by [Zarr](https://zarr.readthedocs.io/en/stable/) **metadata** files. It is useful for download data in a convenient storage space. However, this data needs to be restored to use it. In that sense, `whitehole` allows to read properly this metadata and get information based on some user inputs. 

The basic parameters are:

- `repo_path` : e.g., `'D:/data_zarr/'` (where zarr files are located)
- `symbol`  :e.g., `'AAPL'`
- `date`  :e.g., `'2020-08-12'`
- `full_day`: e.g., `True`, otherwise `False` (need to set a range of hours)

Full list of parameters are:

- `repo_path`: str-path with data Zarr files.
- `symbol`: str with equity symbol
- `date`: unique str with date as _YYYY-MM-DD_
- `start_date`: str start date as _YYYY-MM-DD_ for range
- `end_date`: end date as _YYYY-MM-DD_ for range
- `full_day`: bool | if `False`, `start_time` and `end_time` should be defined 
- `start_time`: str hour as "##:##" 24hrs format for range
- `end_time`: str hour as "##:##" 24hrs format for range
- `save`: bool | if `True`, `storage_path` should be defined
- `dataframe`: bool | if `False`, numpy array as output automatically 
- `storage_path`: str-path to save new data


The standard output is a two-dimensional numpy array: 

```python
array([[1.59724260e+12, 4.45410000e+02, 4.40000000e+01],
       [1.59724260e+12, 4.45410000e+02, 1.00000000e+00],
       [1.59724260e+12, 4.45410000e+02, 1.00000000e+00],
       ...,
       [1.59726599e+12, 4.52010000e+02, 1.00000000e+00],
       [1.59726599e+12, 4.52050000e+02, 1.00000000e+00],
       [1.59726599e+12, 4.52000000e+02, 1.00000000e+00]])
```
Where each subarray contains three variables: **[TICK, PRICE, VOLUME]**

Hence, 
```
        output dimensions: (#ticks, #variables)
```

While `#variables` is always 3, `#ticks` may change depending on data length. 

***

### Previous requirements:

It is not mandatory, but it is advisable to download [Quantmoon free tick-market data](https://www.quantmoon.com/tickdata). This is because Quantmoon `Zarr` files store information properly using metadata labels, and `whitehole` was developed to understand it.
<div>
<img src="attachment:imagen.png" width="750"/>
</div>

#### How to get data?

Once you acess to the Quantmoon Data Repository, there is a list of `.rar` files with the name of the equity. Quantmoon provides the Zarr format in this type for memory convenience:

<div>
<img src="attachment:imagen-2.png" width="550"/>
</div>

Then, once you download the `.rar` file, you should extract the content into a path directory to use `whitehole` later, like:

![imagen.png](attachment:imagen.png)

That directory will work as **your own data repository** with `whitehole` package.

_Note: Quantmoon provides a free tick-data repository of more than 500+ equities of the US. market._

***

### `whitehole` Installation:

Once you get your data, installation is quite easy using:

`pip install whitehole`

In Anaconda or Command Prompt.

The required dependencies are:

* `pandas>=0.22.0`
* `numpy==1.18.2`
* `zarr==2.3.0`
* `xarray>=0.13.0`
* `pandas_market_calendar>=1.4.2`

`whitehole` is only available in Windows and `python>3.5`.

Important: dependecies will be installed by default.

***

### How to use

The unique possible statement in the `whitehole` package is the following:

```python

    whitehole.Decryptor("paramters here").run_decryptor()

```

This allows to get the information contained in the Zarr files.

### Examples

In [1]:
#Import package:
import whitehole as wh

##### Example 1: full day data

In [2]:
# Paramters
repo_path= "D:/data_zarr/"
symbol= "AAPL"
date= "2020-08-12" 
full_day= True

In [3]:
#whitehole
example1_ = wh.Decryptor(repo_path=repo_path, 
                         symbol=symbol, 
                         date=date, 
                         full_day=full_day).run_decryptor()

[1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727662e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727663e+12 1.59727663e+12
 1.59727663e+12 1.59727664e+12 1.59727664e+12 1.59727664e+12
 1.59727665e+12 1.59727665e+12 1.59727666e+12 1.59727667e+12
 1.59727667e+12 1.59727668e+12 1.59727668e+12 1.59727668e+12
 1.59727671e+12 1.59727671e+12 1.59727671e+12 1.59727671e+12
 1.59727672e+12 1.59727672e+12 1.59727672e+12 1.59727672e+12
 1.59727672e+12 1.59727672e+12 1.59727673e+12 1.59727673e+12
 1.59727673e+12 1.59727674e+12 1.59727674e+12 1.59727674e+12
 1.59727674e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.59727676e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.597276

In [4]:
#output
example1_

array([[1.59724260e+12, 4.45410000e+02, 4.40000000e+01],
       [1.59724260e+12, 4.45410000e+02, 1.00000000e+00],
       [1.59724260e+12, 4.45410000e+02, 1.00000000e+00],
       ...,
       [1.59726599e+12, 4.52010000e+02, 1.00000000e+00],
       [1.59726599e+12, 4.52050000e+02, 1.00000000e+00],
       [1.59726599e+12, 4.52000000e+02, 1.00000000e+00]])

In [5]:
#output shape (#ticks, #variables)
example1_.shape

(377575, 3)

##### Example 2: partial day data

In [6]:
# Paramters
repo_path= "D:/data_zarr/"
symbol= "AAPL"
date= "2020-08-12" 
start_time = "10:10"
end_time = "14:05"

In [7]:
#whitehole
example2_ = wh.Decryptor(repo_path=repo_path, 
                         symbol=symbol, 
                         date=date, 
                         start_time=start_time, 
                         end_time=end_time).run_decryptor()

[1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727662e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727663e+12 1.59727663e+12
 1.59727663e+12 1.59727664e+12 1.59727664e+12 1.59727664e+12
 1.59727665e+12 1.59727665e+12 1.59727666e+12 1.59727667e+12
 1.59727667e+12 1.59727668e+12 1.59727668e+12 1.59727668e+12
 1.59727671e+12 1.59727671e+12 1.59727671e+12 1.59727671e+12
 1.59727672e+12 1.59727672e+12 1.59727672e+12 1.59727672e+12
 1.59727672e+12 1.59727672e+12 1.59727673e+12 1.59727673e+12
 1.59727673e+12 1.59727674e+12 1.59727674e+12 1.59727674e+12
 1.59727674e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.59727676e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.597276

In [8]:
#output 
example2_

array([[1.5972450e+12, 4.4738780e+02, 2.0000000e+00],
       [1.5972450e+12, 4.4739960e+02, 5.0000000e+00],
       [1.5972450e+12, 4.4737000e+02, 7.2000000e+01],
       ...,
       [1.5972591e+12, 4.5070000e+02, 3.0000000e+01],
       [1.5972591e+12, 4.5069000e+02, 3.0000000e+01],
       [1.5972591e+12, 4.5072000e+02, 2.0000000e+02]])

In [9]:
#output shape (#ticks, #variables)
example2_.shape

(258780, 3)

##### Example 3: partial range data

In [10]:
# parameteers
repo_path= "D:/data_zarr/"
symbol= "AAPL"
start_date = "2020-08-01"
end_date= "2020-08-12" 
start_time = "10:10"
end_time = "14:05"

In [11]:
#whitehole
example3_ = wh.Decryptor(repo_path=repo_path, 
                         symbol=symbol, 
                         start_date=start_date,
                         end_date=end_date,
                         start_time=start_time, 
                         end_time=end_time).run_decryptor()

[1.59623993e+12 1.59623993e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623995e+12
 1.59623995e+12 1.59623995e+12 1.59623995e+12 1.59623995e+12
 1.59623995e+12 1.59623995e+12 1.59623995e+12 1.59623995e+12
 1.59623995e+12 1.59623995e+12 1.59623995e+12 1.59623995e+12
 1.59623996e+12 1.59623996e+12 1.59623996e+12 1.59623996e+12
 1.59623996e+12 1.59623996e+12 1.59623996e+12 1.59623996e+12
 1.59623996e+12 1.59623996e+12 1.59623996e+12 1.59623997e+12
 1.59623997e+12 1.59623997e+12 1.59623997e+12 1.59623997e+12
 1.59623997e+12 1.59623997e+12 1.59623997e+12 1.59623997e+12
 1.59623997e+12 1.59623997e+12 1.59623997e+12 1.59623997e+12
 1.59623998e+12 1.59623998e+12 1.59623998e+12 1.59623998e+12
 1.59623998e+12 1.59623998e+12 1.59623998e+12 1.59623998e+12
 1.59623998e+12 1.596239

In [12]:
#output
example3_

array([[1.5962082e+12, 4.0954240e+02, 4.3200000e+02],
       [1.5962082e+12, 4.0959000e+02, 1.0000000e+02],
       [1.5962082e+12, 4.0959000e+02, 1.0000000e+02],
       ...,
       [1.5962223e+12, 4.1823000e+02, 1.0000000e+02],
       [1.5962223e+12, 4.1825500e+02, 5.0000000e+00],
       [1.5962223e+12, 4.1822020e+02, 2.0000000e+02]])

In [13]:
#output shape (#ticks, #variables)
example3_.shape

(406476, 3)

##### Example 4: full range data

In [14]:
# parameters
repo_path= "D:/data_zarr/"
symbol= "AAPL"
start_date = "2020-08-01"
end_date= "2020-08-12" 
full_day= True

In [15]:
#whitehole
example4_ = wh.Decryptor(repo_path=repo_path, 
                         symbol=symbol, 
                         start_date=start_date,
                         end_date=end_date, 
                         full_day=True).run_decryptor()

[1.59623993e+12 1.59623993e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623994e+12
 1.59623994e+12 1.59623994e+12 1.59623994e+12 1.59623995e+12
 1.59623995e+12 1.59623995e+12 1.59623995e+12 1.59623995e+12
 1.59623995e+12 1.59623995e+12 1.59623995e+12 1.59623995e+12
 1.59623995e+12 1.59623995e+12 1.59623995e+12 1.59623995e+12
 1.59623996e+12 1.59623996e+12 1.59623996e+12 1.59623996e+12
 1.59623996e+12 1.59623996e+12 1.59623996e+12 1.59623996e+12
 1.59623996e+12 1.59623996e+12 1.59623996e+12 1.59623997e+12
 1.59623997e+12 1.59623997e+12 1.59623997e+12 1.59623997e+12
 1.59623997e+12 1.59623997e+12 1.59623997e+12 1.59623997e+12
 1.59623997e+12 1.59623997e+12 1.59623997e+12 1.59623997e+12
 1.59623998e+12 1.59623998e+12 1.59623998e+12 1.59623998e+12
 1.59623998e+12 1.59623998e+12 1.59623998e+12 1.59623998e+12
 1.59623998e+12 1.596239

In [16]:
#output
example4_

array([[1.5962058e+12, 4.0576000e+02, 7.0000000e+01],
       [1.5962058e+12, 4.0575000e+02, 5.0000000e+01],
       [1.5962058e+12, 4.0575000e+02, 1.0000000e+00],
       ...,
       [1.5962292e+12, 4.2635000e+02, 5.0000000e+00],
       [1.5962292e+12, 4.2639000e+02, 3.9000000e+01],
       [1.5962292e+12, 4.2620000e+02, 1.0000000e+01]])

In [17]:
#output shape (#ticks, #variables)
example4_.shape

(761163, 3)

##### Example 5: dataframe format data 
(also available for range or partial data)

In [18]:
# parameters
repo_path= "D:/data_zarr/"
symbol= "AAPL"
date= "2020-08-12"
dataframe = True
full_day = True

In [19]:
#whitehole
example5_ = wh.Decryptor(repo_path=repo_path, 
                         symbol=symbol, 
                         date=date,
                         full_day=True,
                         dataframe=dataframe).run_decryptor()

[1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727662e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727663e+12 1.59727663e+12
 1.59727663e+12 1.59727664e+12 1.59727664e+12 1.59727664e+12
 1.59727665e+12 1.59727665e+12 1.59727666e+12 1.59727667e+12
 1.59727667e+12 1.59727668e+12 1.59727668e+12 1.59727668e+12
 1.59727671e+12 1.59727671e+12 1.59727671e+12 1.59727671e+12
 1.59727672e+12 1.59727672e+12 1.59727672e+12 1.59727672e+12
 1.59727672e+12 1.59727672e+12 1.59727673e+12 1.59727673e+12
 1.59727673e+12 1.59727674e+12 1.59727674e+12 1.59727674e+12
 1.59727674e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.59727676e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.597276

In [20]:
#output dataframe
example5_.head()

Unnamed: 0,ts,price,vol
0,1597243000000.0,445.41,44.0
1,1597243000000.0,445.41,1.0
2,1597243000000.0,445.41,1.0
3,1597243000000.0,445.42,2.0
4,1597243000000.0,445.42,13.0


##### Example 6: store data 
This feature allows to store selected data in .txt format.

In [21]:
example6_ = wh.Decryptor(repo_path=repo_path, 
                         symbol=symbol, 
                         date=date,
                         full_day=True,
                         save=True,
                         storage_path="D:/storage_path/").run_decryptor()

[1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727661e+12
 1.59727661e+12 1.59727661e+12 1.59727661e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727662e+12 1.59727662e+12
 1.59727662e+12 1.59727662e+12 1.59727663e+12 1.59727663e+12
 1.59727663e+12 1.59727664e+12 1.59727664e+12 1.59727664e+12
 1.59727665e+12 1.59727665e+12 1.59727666e+12 1.59727667e+12
 1.59727667e+12 1.59727668e+12 1.59727668e+12 1.59727668e+12
 1.59727671e+12 1.59727671e+12 1.59727671e+12 1.59727671e+12
 1.59727672e+12 1.59727672e+12 1.59727672e+12 1.59727672e+12
 1.59727672e+12 1.59727672e+12 1.59727673e+12 1.59727673e+12
 1.59727673e+12 1.59727674e+12 1.59727674e+12 1.59727674e+12
 1.59727674e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727675e+12 1.59727675e+12
 1.59727675e+12 1.59727675e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.59727676e+12 1.59727676e+12 1.59727676e+12
 1.59727676e+12 1.597276

***

About us:

_Quantmoon Technologies is the first Quant Research Group in Peru. Our mission is to democratize quantitative finance, filling the gap between the industry and enthusiasts like us. Our vision is to create a Quantitative Private Investment Fund to be our laboratory—and the quantmoons, their scientists._

If you want to contribute to Whitehole, feel free to do it or contact us at contacto@quantmoon.com

Best,

<div>
<img style="float: right;" src="attachment:imagen.png" width="100"/>
</div>