# stock2vec

Create vectors for stocks based on their relative volatility.

### Explore

Let's see how our data looks.

In [37]:
import math
import numpy as np
import pandas as pd

def calculate_volatility(row):
    return np.sqrt((row.high - row.low) / row.close)

def pretty_print_ticker(row):
    volatility = calculate_volatility(row)
    print(row.ticker.values[0], '  ', row.date.values[0], ' ', volatility.values[0])

reader = pd.read_csv('data/wiki_prices.csv', chunksize=10000, iterator=True)

print('ticker     date     volatility')

for i in range(0,20):
    r = reader.get_chunk()
    pretty_print_ticker(r)

ticker     date     volatility
A    1999-11-18   0.476731294623
AAN    1991-01-10   0.0
AAON    2003-01-21   0.225701809375
AAPL    1991-03-13   0.237915475715
AAWW    2013-06-06   0.177802169323
ABC    2001-12-06   0.212854152101
ABCO    2003-07-18   0.169252636615
ABG    2006-04-24   0.142171607424
ABM    2013-05-02   0.157470995798
ABT    1989-06-05   0.111571200496
ACAD    2016-04-28   0.214440126612
ACAT    2009-12-18   0.366929849405
ACCO    2007-03-26   0.237255091996
ACET    1997-03-03   0.16629255471
ACFN    2011-10-12   0.155881574987
ACHC    2008-08-19   0.20203050891
ACI    2009-05-27   0.23020727987
ACLS    2011-08-04   0.347228132337
ACO    1996-08-14   0.174197668142
ACRX    2017-03-06   0.246182981959


## Preprocessing

The data is ordered by ticker. We want it ordered by date so import the data into postgres with [pgfutter](https://github.com/lukasmartinelli/pgfutter).

```shell
DB_NAME=stocks pgfutter csv wiki_prices.csv
```

Convert the date column and add an index.

```sql
ALTER TABLE "import"."wiki_prices" ALTER COLUMN "date" 
  SET DATA TYPE date using(date::date);
  
CREATE INDEX idx_date ON import.wiki_prices(date);
```

Now we can build batches based on the date.

In [38]:
from sqlalchemy import create_engine
engine = create_engine('postgresql://localhost:5432/stocks')

def get_month():
    query = """SELECT ticker, date, abs((high::numeric(12,4) - low::numeric(12,4)) / close::numeric(12,4)) as volt
    FROM import.wiki_prices
    WHERE date >= '2016-01-01'::date AND date < '2016-02-01'::date
    AND high != '' AND low != '' AND close != ''
    ORDER BY date, volt DESC"""
    return pd.read_sql_query(query, con=engine)

stocks = get_month()

print(stocks.head())

  ticker        date      volt
0   UNIS  2016-01-04  0.617558
1   ULTR  2016-01-04  0.404441
2   RBCN  2016-01-04  0.388752
3   CIDM  2016-01-04  0.278358
4   SFXE  2016-01-04  0.275641


**Build Batches**

For each stock, find C stocks that have the closest volatility in that day.

## Vector Math

Apple - Google = ?