# P2V-MAP in PyTorch (and Plotly)

This includes the following steps:
1. Build a data streamer
1. Train the P2V model (`torch`) and visualize the weight matrix using `plotly` (as function of training iterations)
1. Implement t-SNE dimensionality reduction using `sklearn` and plot product map using `plotly`

In [1]:
import pandas as pd
from loguru import logger

import p2vmap_lib

In [2]:
config = p2vmap_lib.read_yaml("p2vmap_config.yaml")

In [3]:
baskets = pd.read_parquet(f"{config['paths']['data']}/market-baskets.parquet")
products = pd.read_parquet(f"{config['paths']['data']}/products.parquet")
n_products = products.shape[0]
logger.info(f"n_products = {n_products}")

2021-10-30 11:57:05.921 | INFO     | __main__:<module>:4 - n_products = 300


<br>

## Step 1: Build P2V data streamer

In [4]:
p2v_data_stream = p2vmap_lib.DataStreamP2V(
    data=baskets, **config["data"]["data_streamer"]
)

In [5]:
p2v_data_stream.generate_batch()

(array([ 51,  51,  51, ..., 164, 174, 174]),
 array([113, 154, 165, ...,  35, 251, 291]),
 array([[ 15,  38, 258, ..., 128, 232,  35],
        [108, 182, 206, ..., 206, 194,  36],
        [217, 288, 114, ..., 177, 188, 108],
        ...,
        [119, 242, 122, ..., 128,  66, 243],
        [275, 150, 258, ...,  90, 215,  41],
        [214, 271, 113, ..., 127, 224, 283]], dtype=int32))

<br>

## Step 2: P2V

### Build data loader

In [6]:
dl_train, dl_valid = p2vmap_lib.build_data_loader(
    streamer=p2v_data_stream,
    **config["p2v"]["data-loader"],
)

### Train product vectors

In [7]:
p2v_model = p2vmap_lib.P2V(
    n_products=n_products,
    **config["p2v"]["model"],
)

In [8]:
p2v_trainer = p2vmap_lib.TrainerP2V(
    model=p2v_model,
    train=dl_train,
    validation=dl_valid,
    path=config["paths"]["results"],
)

In [None]:
p2v_trainer.fit(**config["p2v"]["trainer"])

epoch = 0


Visualize training loss in tensorboard:
```
make runtb
```

### Visualize embedding

In [None]:
dashboard = p2vmap_lib.DashboardP2V(
    path=f"{config['paths']['results']}/weights",
)

In [None]:
dashboard.plot_product_embedding(
    idx=[0, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 40, 70, 80, 100, 150, 200]
)

<br>

## Step 3: t-SNE

In [None]:
dashboard.plot_tsne_map(products, config["tsne"])