-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with WCD #26
Comments
Ah sorry for being unclear: the WCD is the distance you get from first
representing documents as the weighted average of the word vectors in a
document where the weights are the normalized BOW weights, and then
computing the Euclidean distance. So for instance, to compute the WCD in
python you would do:
# load data
with open(load_file) as f:
[X, BOW_X, y, C, words] = pickle.load(f)
# compute WCD between documents i and j #
# -------------------------------------------------------- #
# normalize BOW for document i, j
bow_i = BOW_X[i]
bow_i = bow_i / np.sum(bow_i)
bow_j = BOW_X[j]
bow_j = bow_j / np.sum(bow_j)
# I haven't debugged the code below, may need to add an additional
dimension to bow_i, bow_j (i.e., make (n,1) instead of (n,))
v_i = np.dot(X[i].T, bow_i)
v_j = np.dot(X[j].T, bow_j)
# Euclidean distance (can use distance.m to parallelize this in matlab,
similar functions exist in python as well)
wcd_ij = np.sqrt( np.sum( (v_i - v_j)**2 ) )
Does this make sense?
…On Mon, Nov 19, 2018 at 11:50 PM 08s011003 ***@***.***> wrote:
@mkusner <https://github.com/mkusner> I read your paper and want to use
your WCD+RWMD method to calculate docs similarity in my doc recommendation
project. I found the code for RWMD in matlab, but didn't find the code for
WCD. Is it the file named distance.m?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIJS0Ww3Vkq0nDnL5MVbII04cyKCUAGEks5uw0OvgaJpZM4YqHEA>
.
|
@mkusner Thank you very much. But I have another question in "v_i = np.dot(X[i].T, bow_i) |
Yes, X[i].T is the transpose of X[i], not another variable.
…On Thu, Nov 22, 2018, 4:29 AM 08s011003 ***@***.*** wrote:
@mkusner <https://github.com/mkusner> Thank you very much. But I have
another question in "v_i = np.dot(X[i].T, bow_i)
v_j = np.dot(X[j].T, bow_j)". You didn't save the variabe T in load_file.
So are you sure it needs to make operation "X[j].T" here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIJS0bGo24OGMM6tq9kZzWJ0fCeAGKKuks5uxighgaJpZM4YqHEA>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@mkusner I read your paper and want to use your WCD+RWMD method to calculate docs similarity in my doc recommendation project. I found the code for RWMD in matlab, but didn't find the code for WCD. Is it the file named distance.m?
The text was updated successfully, but these errors were encountered: