Skip to content

Commit

Permalink
Merge pull request #1 from p768lwy3/dev
Browse files Browse the repository at this point in the history
Merge dev to master
  • Loading branch information
p768lwy3 committed Oct 24, 2019
2 parents dd08963 + d6a34c6 commit b357829
Show file tree
Hide file tree
Showing 48 changed files with 1,802 additions and 933 deletions.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,19 @@ Thank you for ReadTheDocs!!!
| [Deep Field-Aware Factorization Machine](torecsys/models/ctr/deep_ffm.py) | [Junlin Zhang et al, 2019. FAT-DeepFFM: Field Attentive Deep Field-aware Factorization Machine](https://arxiv.org/abs/1905.06336) | Click Through Rate |
| [Deep Factorization Machine](torecsys/models/ctr/deep_fm.py) | [Huifeng Guo et al, 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247) | Click Through Rate |
| [Deep Matching Correlation Prediction](torecsys/models/ctr/deep_mcp.py) | [Wentao Ouyang et al, 2019. Representation Learning-Assisted Click-Through Rate Prediction](https://arxiv.org/pdf/1906.04365.pdf) | Click Through Rate |
| [Elaborated Entire Space Supervised Multi Task Model](torecsys/models/ctr/elaborated_entire_space_supervised_multi_task.py) | [Hong Wen et al, 2019. Conversion Rate Prediction via Post-Click Behaviour Modeling](https://arxiv.org/abs/1910.07099) | Click Through Rate |
| [Entire Space Multi Task Model](torecsys/models/ctr/entire_space_multi_task.py) | [Xiao Ma et al, 2019. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931) | Click Through Rate |
| [Factorization Machine](torecsys/models/ctr/factorization_machine.py) | [Steffen Rendle, 2010. Factorization Machine](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) | Click Through Rate |
| [Factorization Machine Support Neural Network](torecsys/models/ctr/factorization_machine_supported_neural_network.py) | [Weinan Zhang et al, 2016. Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction](https://arxiv.org/abs/1601.02376) | Click Through Rate |
| [Field Attentive Deep Field Aware Factorization Machine](torecsys/models/ctr/fat_deep_ffm.py) | [Junlin Zhang et al, 2019. FAT-DeepFFM: Field Attentive Deep Field-aware Factorization Machine](https://arxiv.org/abs/1905.06336) | Click Through Rate |
| [Field-Aware Factorization Machine](torecsys/models/ctr/field_aware_factorization_machine.py) | [Yuchin Juan et al, 2016. Field-aware Factorization Machines for CTR Prediction](https://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf) | Click Through Rate |
| [Field-Aware Neural Factorization Machine](torecsys/models/ctr/field_aware_neural_factorization_machine.py) | [Li Zhang et al, 2019. Field-aware Neural Factorization Machine for Click-Through Rate Prediction](https://arxiv.org/abs/1902.09096) | Click Through Rate |
| [Logistic Regression](torecsys/models/ctr/logistic_regression.py) | / | Click Through Rate |
| [Neural Collaborative Filtering](torecsys/models/ctr/neural_collaborative_filtering.py) | [Xiangnan He, 2017. Neural Collaborative Filtering](https://arxiv.org/abs/1708.05031) | Click Through Rate |
| [Neural Factorization Machine](torecsys/models/ctr/neural_factorization_machine.py) | [Xiangnan He et al, 2017. Neural Factorization Machines for Sparse Predictive Analytics](https://arxiv.org/abs/1708.05027) | Click Through Rate |
| [Product Neural Network](torecsys/models/ctr/product_neural_network.py) | [Yanru QU, 2016. Product-based Neural Networks for User Response Prediction](https://arxiv.org/abs/1611.00144) | Click Through Rate |
| [eXtreme Deep Factorization Machine](torecsys/models/ctr/xdeep_fm.py) | [Jianxun Lian et al, 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems](https://arxiv.org/abs/1803.05170.pdf) | Click Through Rate |
| [Matrix Factorization](torecsys/models/emb/matrix_factorization.py) | / | Embedding |
| [Starspace](torecsys/models/emb/starspace.py)| [Ledell Wu et al, 2017 StarSpace: Embed All The Things!](https://arxiv.org/abs/1709.03856) | Embedding |

## More About ToR[e]cSys

Expand Down
74 changes: 59 additions & 15 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,11 @@
#
import os
import sys
sys.path.insert(0, os.path.abspath("../.."))


# -- Project information -----------------------------------------------------

project = 'torecsys'
copyright = '2019, Jasper, Li Wai Yin'
author = 'Jasper, Li Wai Yin'

# The full version, including alpha/beta/rc tags
release = '0.0.1-dev'
## import pytorch_sphinx_theme
import sphinx_theme

sys.path.insert(0, os.path.abspath("../.."))

# -- General configuration ---------------------------------------------------

Expand All @@ -32,38 +25,89 @@
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosectionlabel',
'sphinx.ext.autosummary',
'sphinx.ext.coverage',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.mathjax',
'sphinx.ext.napoleon'
'sphinx.ext.napoleon',
'sphinx.ext.todo',
'sphinx.ext.viewcode'
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

pygments_style = 'sphinx'

# -- Project information -----------------------------------------------------

needs_sphinx = "2.1.2"

project = 'torecsys'
copyright = '2019, Jasper, Li Wai Yin'
author = 'Jasper, Li Wai Yin'

# The full version, including alpha/beta/rc tags
version = "dev-0.0.1"
release = "dev"

# The master toctree document.
master_doc = 'index'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True

# Disable docstring inheritance
autodoc_inherit_docstrings = True

# autosectionlabel throws warnings if section names are duplicated.
# The following tells autosectionlabel to not throw a warning for
# duplicated section names that are in different documents.
autosectionlabel_prefix_document = True

# Configuration of sphinx
add_function_parentheses = False
add_module_names = False
autoclass_content = "both"
autodoc_mock_imports = ["torch", "torchaudio", "torchvision", "torchtext"]
master_doc = 'index'


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'classic'
html_theme = 'neo_rtd_theme'
html_theme_path = [sphinx_theme.get_html_theme_path()]

html_theme_options = {
'canonical_url': 'https://torecsys.readthedocs.io/en/latest/',
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

# -- Options for HTMLHelp output ------------------------------------------

# Output file base name for HTML help builder.
intersphinx_mapping = {
'python': ('https://docs.python.org/', None),
'numpy': ('https://docs.scipy.org/doc/numpy/', None),
'pytorch': ('https://pytorch.org/docs/', None)
}
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ numpy>=1.17.0
pandas>=0.25.0
scipy>=1.3.1
scikit-learn>=0.21.3
sphinx==2.1.2
sphinx_theme==1.0
sqlalchemy>=1.3.6
tensorboard==1.14.0
texttable==1.6.2
Expand Down
20 changes: 11 additions & 9 deletions torecsys/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,16 @@

__version__ = "0.0.1-dev"

from .data import *
from .estimators import *
from .functional import *
from .inputs import *
from .layers import *
from .losses import *
from .metrics import *
from .models import *
import torecsys.data
import torecsys.estimators
import torecsys.functional
import torecsys.inputs
import torecsys.layers
import torecsys.losses
import torecsys.models
import torecsys.utils

from .metrics import metrics
from .utils.training.ranking_trainer import RankingTrainer
from .utils.training.trainer import Trainer
import torecsys.utils

2 changes: 1 addition & 1 deletion torecsys/data/subsampling/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
r"""torecsys.data.subsampling is a module of subsampling algorithms
"""

from .subsampler import subsampling
import torecsys.data.subsampling.subsampler
22 changes: 10 additions & 12 deletions torecsys/inputs/base/concat_inputs.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from . import _Inputs
import torch
import torch.nn as nn
from torecsys.utils.decorator import jit_experimental, no_jit_experimental_by_namedtensor
from typing import Dict, List
from typing import Dict, List, Union


class ConcatInputs(_Inputs):
Expand All @@ -18,6 +19,7 @@ def __init__(self, inputs: List[_Inputs]):
i.e. class of trs.inputs.base. e.g.
.. code-block:: python
import torecsys as trs
# initialize embedding layers used in ConcatInputs
Expand Down Expand Up @@ -64,7 +66,7 @@ def __init__(self, inputs: List[_Inputs]):
self.length = sum([len(inp) for inp in self.inputs])

def __getitem__(self, idx: Union[int, slice, str]) -> Union[nn.Module, List[nn.Module]]:
"""Get Embedding Layer by index of the schema.
"""Get Embedding Layer by index from inputs.
Args:
idx (Union[int, slice, str]): index to get embedding layer from the schema.
Expand All @@ -73,24 +75,24 @@ def __getitem__(self, idx: Union[int, slice, str]) -> Union[nn.Module, List[nn.M
Union[nn.Module, List[nn.Module]]: Embedding layer(s) of the given index
"""
if isinstance(idx, int):
emb_layers = self.schema[idx][0]
emb_layers = self.inputs[idx]

elif isinstance(idx, slice):
emb_layers = []

# parse the slice object into integers used in range()
start = idx.start if idx.start is not None else 0
stop = idx.stop if idx.stop is not None else len(self.schema)
stop = idx.stop if idx.stop is not None else len(self.inputs)
step = idx.step if idx.step is not None else 1

for i in range(start, stop, step):
emb_layers.append(self.schema[i][0])
emb_layers.append(self.inputs[i])

elif isinstance(idx, str):
emb_layers = []
for i in self.schema:
if idx in i[1]:
emb_layers.append(i[0])
for inp in self.inputs:
if idx in inp.schema.inputs:
emb_layers.append(inp)

else:
raise ValueError("getitem only accept int, slice, and str.")
Expand Down Expand Up @@ -128,10 +130,6 @@ def forward(self, inputs: Dict[str, torch.Tensor]) -> torch.Tensor:
# check if output dimension is less than 3, then .unsqueeze(1)
if output.dim() < 3:
output = output.unflatten("E", [("N", 1), ("E", output.size("E"))])

# embed / transform tensors
embedded = embedding(*args)
## embedded.names = ("B", "N", "E")

# append tensor to outputs
outputs.append(output)
Expand Down
6 changes: 4 additions & 2 deletions torecsys/inputs/base/multi_indices_emb.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,10 @@ def __init__(self,
self.embedding = nn.Embedding(sum(field_sizes), embed_size, **kwargs)

# create offsets to re-index inputs by adding them up
self.offsets = torch.Tensor((0, *np.cumsum(field_sizes)[:-1])).long().unsqueeze(0)
self.offsets.names = ("B", "N")
## self.offsets = torch.Tensor((0, *np.cumsum(field_sizes)[:-1])).long().unsqueeze(0)
self.offsets = torch.Tensor((0, *np.cumsum(field_sizes)[:-1])).long()
self.offsets.names = ("N", )
self.offsets = self.offsets.unflatten("N", [("B", 1), ("N", self.offsets.size("N"))])
self.offsets.to(device)

# bind length to embed_size * length of field_sizes (i.e. num_fields) if flatten is True
Expand Down
17 changes: 9 additions & 8 deletions torecsys/inputs/base/multi_indices_field_aware_emb.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,14 @@


class MultiIndicesFieldAwareEmbedding(_Inputs):
r"""Base Inputs class for field-aware embedding of multi-indices, which is used in Field Aware
Factorization (FFM) or its variants. The shape of output is :math:`(B, N * N, E)`, where the
embedding tensor :math:`E_{feat_{i, k}, field_{j}}` are looked up the k-th row from the j-th
matrix of i-th feature.
r"""Base Inputs class for Field-aware embedding of multiple indices, which is used in Field-aware
Factorization (FFM) or the variants. The shape of output is :math:`(B, N * N, E)`, where the embedding
tensor :math:`E_{feat_{i, k}, field_{j}}` are looked up the k-th row from the j-th tensor of i-th feature.
:Reference:
#. `Yuchin Juan et al, 2016. Field-aware Factorization Machines for CTR Prediction <https://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf>`_.
"""
@no_jit_experimental_by_namedtensor
def __init__(self,
Expand Down Expand Up @@ -49,8 +48,10 @@ def __init__(self,
])

# create offsets to re-index inputs by adding them up
self.offsets = torch.Tensor((0, *np.cumsum(field_sizes)[:-1])).long().unsqueeze(0)
self.offsets.names = ("B", "N")
## self.offsets = torch.Tensor((0, *np.cumsum(field_sizes)[:-1])).long().unsqueeze(0)
self.offsets = torch.Tensor((0, *np.cumsum(field_sizes)[:-1])).long()
self.offsets.names = ("N", )
self.offsets = self.offsets.unflatten("N", [("B", 1), ("N", self.offsets.size("N"))])
self.offsets.to(device)

# initialize nn.Embedding with xavier_uniform_ initializer
Expand Down
6 changes: 3 additions & 3 deletions torecsys/inputs/base/sequence_indices_emb.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def __init__(self,

# bind embedding to pre-trained embedding module if nn_embedding is not None
if nn_embedding is not None:
self.length = nn_embedding.size(1)
self.length = nn_embedding.size("E")
self.embedding = nn.Embedding.from_pretrained(nn_embedding)
# else, create a embedding module with the given arguments
else:
Expand Down Expand Up @@ -104,11 +104,11 @@ def __init__(self,
elif output_method == "max_pooling":
self.aggregation = nn.AdaptiveMaxPool1d(1)
elif output_method == "mean":
self.aggregation = partial(torch.mean, dim=1, keepdim=True)
self.aggregation = partial(torch.mean, dim="N", keepdim=True)
elif output_method == "none":
self.aggregation = torch.Tensor
elif output_method == "sum":
self.aggregation = partial(torch.sum, dim=1, keepdim=True)
self.aggregation = partial(torch.sum, dim="N", keepdim=True)
else:
raise ValueError('output_method only allows ["avg_pooling", "max_pooling", "mean", "none", "sum"].')
self.output_method = output_method
Expand Down
2 changes: 1 addition & 1 deletion torecsys/inputs/base/single_index_emb.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def __init__(self,

# bind embedding to pre-trained embedding module if nn_embedding is not None
if nn_embedding is not None:
embed_size = nn_embedding.size(1)
embed_size = nn_embedding.size("E")
self.embedding = nn.Embedding.from_pretrained(nn_embedding)
# else, create a embedding module with the given arguments
else:
Expand Down
10 changes: 4 additions & 6 deletions torecsys/inputs/base/stacked_inp.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from . import _Inputs
import torch
import torch.nn as nn
from torecsys.utils.decorator import jit_experimental, no_jit_experimental_by_namedtensor
from typing import Dict, List
from typing import Dict, List, Union


class StackedInputs(_Inputs):
Expand All @@ -17,6 +18,7 @@ def __init__(self, inputs: List[_Inputs]):
i.e. class of trs.inputs.base. e.g.
.. code-block:: python
import torecsys as trs
# initialize embedding layers used in StackedInputs
Expand Down Expand Up @@ -96,7 +98,7 @@ def __getitem__(self, idx: Union[int, slice, str]) -> Union[nn.Module, List[nn.M
emb_layers = []
for inp in self.inputs:
if idx in inp.schema.inputs:
emb_layers.append(i)
emb_layers.append(inp)

else:
raise ValueError("getitem only accept int, slice, and str.")
Expand Down Expand Up @@ -143,10 +145,6 @@ def forward(self, inputs: Dict[str, torch.Tensor]) -> torch.Tensor:
if output.dim() < 3:
output = output.unflatten("E", [("N", 1), ("E", output.size("E"))])

# embed / transform tensors
embedded = embedding(*args)
## embedded.names = ("B", "N", "E")

# append tensor to outputs
outputs.append(output)

Expand Down
1 change: 1 addition & 0 deletions torecsys/inputs/inputs_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ def __init__(self,
where keys are names of inputs' fields, and values are tensor of fields. e.g.
.. code-block:: python
import torecsys as trs
# initialize embedding layers used in InputsWrapper
Expand Down

0 comments on commit b357829

Please sign in to comment.