-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][GSoC 2018] Similarity Learning #2050
Changes from 107 commits
e249ed4
62f6c82
1677e98
e8c08f8
258d033
875c65c
0aa8426
390f333
8c508c7
c826b19
35dc681
67f6a14
7b1f612
fafee70
9d99660
6c06fbc
c5a0e6e
fdd2aab
974d587
ddb3556
c54d8a9
f379616
46d3885
2cf5625
8578e3d
a43fea3
340a8cf
0b62407
5b7a6c2
48ad4dc
3cf1e0b
218d133
93b18f2
90db732
7a6e868
82eed4a
8baaae9
dc5db12
ba3053a
2af2a83
5691116
28004ac
3639619
43829b3
0f58bdf
f6a6175
9285d76
8814a5a
da4d00b
be348b7
bf7d0eb
2e4a5a8
28fa12f
7600dc4
ff1903e
984a025
ef73071
d7c931a
c4f1818
651f532
e96913e
fe64ee0
cff7fb8
617f4a4
473030c
ae50259
216dc01
cab3ac1
1443ac3
c65beb0
554833a
5972471
13a396b
55958a4
d8a1409
9e4554e
4bbd365
1b534cc
2e18938
cc440cb
3e8b8e5
07da82b
afab56a
ee58169
c719939
d56bbc1
2de9b55
08acdc5
13a51fb
d556a00
5f83741
2ac7f31
d6818ee
256f319
5c72137
350f4aa
2e68051
376b28f
0651f44
5864db0
20fbbfc
50386af
aab50fa
236e4b7
66e2385
d662bac
127b441
4190b99
1299bb7
157b7d7
451e3b1
fd575ea
5280853
7dec231
5219d9e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
:mod:`models.experimental.drmm_tks` -- Similarity Learning | ||
============================================================================ | ||
|
||
.. automodule:: gensim.models.experimental.drmm_tks | ||
:synopsis: Neural Network Similarity Learning | ||
:members: | ||
:inherited-members: | ||
:undoc-members: | ||
:show-inheritance: |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
"""This package will host some experimental modules for Similarity Learning""" | ||
|
||
from .drmm_tks import DRMM_TKS # noqa:F401 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. all of this import required an keras -> this will be broken by default. Need to add conditional imports (probably directly to your NN models) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something like the below?:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 2 variants:
"""My module"""
try:
import keras
IS_KERAS_AVAILABLE = True
except ImportError:
IS_KERAS_AVAILABLE = Flase
...
class MyNet:
def __init__(...):
if not IS_KERAS_AVAILABLE:
raise .... |
||
from .custom_losses import rank_hinge_loss # noqa:F401 | ||
from .custom_layers import TopKLayer # noqa:F401 | ||
from .custom_callbacks import ValidationCallback # noqa:F401 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
import logging | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. General question: is this have the sense to have distinct files for callbacks/layers/losses here? Why not place it directly in model file? Or, at least, join all of it to one file like utils? |
||
try: | ||
from keras.callbacks import Callback | ||
KERAS_AVAILABLE = True | ||
except ImportError: | ||
KERAS_AVAILABLE = False | ||
|
||
logger = logging.getLogger(__name__) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why you need logger here? |
||
logging.basicConfig( | ||
format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) | ||
|
||
|
||
class ValidationCallback(Callback): | ||
"""Callback for providing validation metrics on the model trained so far""" | ||
def __init__(self, test_data): | ||
""" | ||
Parameters | ||
---------- | ||
test_data : dict | ||
A dictionary which holds the validation data | ||
It consists of the following keys: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure that documentation will be built correctly? |
||
"X1" : numpy array | ||
The queries as a numpy array of shape (n_samples, text_maxlen) | ||
"X2" : numpy array | ||
The candidate docs as a numpy array of shape (n_samples, text_maxlen) | ||
"y" : list of int | ||
It is the labels for each of the query-doc pairs as a 1 or 0 with shape (n_samples,) | ||
where 1: doc is relevant to query | ||
0: doc is not relevant to query | ||
"doc_lengths" : list of int | ||
It contains the length of each document group. I.e., the number of queries | ||
which represent one topic. It is needed for calculating the metrics. | ||
""" | ||
|
||
if not KERAS_AVAILABLE: | ||
raise ImportError("Please install Keras to use this class") | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea to check that input matched by your criteria here (because this is non-trivial structure) |
||
# Check if all test_data is a dicitonary with all the right keys | ||
try: | ||
# If an empty dict is passed | ||
if len(test_data.keys()) == 0: | ||
raise ValueError( | ||
"test_data dictionary is empty. It doesn't have the keys: 'X1', 'X2', 'y', 'doc_lengths'" | ||
) | ||
for key in test_data.keys(): | ||
if key not in ['X1', 'X2', 'y', 'doc_lengths']: | ||
raise ValueError("test_data dictionary doesn't have the keys: 'X1', 'X2', 'y', 'doc_lengths'") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. incorrect check: if |
||
except AttributeError: | ||
raise ValueError("test_data must be a dictionary with the keys: 'X1', 'X2', 'y', 'doc_lengths'") | ||
self.test_data = test_data | ||
|
||
def on_epoch_end(self, epoch, logs={}): | ||
# Import has to be here to prevent cyclic import | ||
from evaluation_metrics import mapk, mean_ndcg | ||
X1 = self.test_data["X1"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the type of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
X2 = self.test_data["X2"] | ||
y = self.test_data["y"] | ||
doc_lengths = self.test_data["doc_lengths"] | ||
|
||
predictions = self.model.predict(x={"query": X1, "doc": X2}) | ||
|
||
Y_pred = [] | ||
Y_true = [] | ||
offset = 0 | ||
|
||
for doc_size in doc_lengths: | ||
Y_pred.append(predictions[offset: offset + doc_size]) | ||
Y_true.append(y[offset: offset + doc_size]) | ||
offset += doc_size | ||
|
||
logger.info("MAP: %.2f", mapk(Y_true, Y_pred)) | ||
for k in [1, 3, 5, 10, 20]: | ||
logger.info("nDCG@%d : %.2f", k, mean_ndcg(Y_true, Y_pred, k=k)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
try: | ||
from keras.engine.topology import Layer | ||
import keras.backend as K | ||
KERAS_AVAILABLE = True | ||
except ImportError: | ||
KERAS_AVAILABLE = False | ||
|
||
"""Script where all the custom keras layers are kept.""" | ||
|
||
|
||
class TopKLayer(Layer): | ||
"""Layer to get top k values from the interaction matrix in drmm_tks model""" | ||
def __init__(self, output_dim, topk, **kwargs): | ||
""" | ||
|
||
Parameters | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. empty line before any section start |
||
---------- | ||
output_dim : tuple of ints | ||
The dimension of the tensor after going through this layer | ||
topk : int | ||
The k topmost values to be returned | ||
""" | ||
self.output_dim = output_dim | ||
self.topk = topk | ||
super(TopKLayer, self).__init__(**kwargs) | ||
|
||
def build(self, input_shape): | ||
super(TopKLayer, self).build(input_shape) | ||
|
||
def call(self, x): | ||
return K.tf.nn.top_k(x, k=self.topk, sorted=True)[0] | ||
|
||
def compute_output_shape(self, input_shape): | ||
return (input_shape[0], self.output_dim[0], self.output_dim[1]) | ||
|
||
def get_config(self): | ||
config = { | ||
'topk': self.topk, | ||
'output_dim': self.output_dim | ||
} | ||
base_config = super(TopKLayer, self).get_config() | ||
return dict(list(base_config.items()) + list(config.items())) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
try: | ||
from keras import backend as K | ||
from keras.layers import Lambda | ||
KERAS_AVAILABLE = True | ||
except ImportError: | ||
KERAS_AVAILABLE = False | ||
|
||
"""Script where all the custom loss functions will be defined""" | ||
|
||
|
||
def rank_hinge_loss(y_true, y_pred): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. docstring (with a link to the description of current loss?) |
||
"""Loss function for Ranking Similarity Learning tasks | ||
More details here : https://en.wikipedia.org/wiki/Hinge_loss | ||
|
||
Parameters | ||
---------- | ||
y_true : list of list of int | ||
The true relation between a query and a doc | ||
It can be either 1 : relevant or 0 : not relevant | ||
y_pred : list of list of float | ||
The predicted relation between a query and a doc | ||
""" | ||
if not KERAS_AVAILABLE: | ||
raise ImportError("Please install Keras to use this function") | ||
margin = 0.5 | ||
y_pos = Lambda(lambda a: a[::2, :], output_shape=(1,))(y_pred) | ||
y_neg = Lambda(lambda a: a[1::2, :], output_shape=(1,))(y_pred) | ||
loss = K.maximum(0., margin + y_neg - y_pos) | ||
return K.mean(loss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need also include other files to documentation building (like callbacks, layers, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@menshikh-iv
Please refer to the link below which shows the diff of the requested changes
451e3b1?utf8=%E2%9C%93&diff=unified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note
tox -e docs
will throw errors. Not on my files but on some keras files since I am inheriting from the Keras Layer class which has some unformatted docs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aneesh-joshi that's shouldn't happen (because you include only your files, not Keras), can you show me log of
tox -e docs
that mention the error in some Keras file (not your)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't implemented any of the above functions. Just inherited the Layer class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, looks like you are right (issue with docstring of the parent class that we can't control).
Simple workaround - define these methods yourself and call super (but don't worry much about it now), you have more critical tasks now.