Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse this in scicloj.ml.smile ? #1

Closed
behrica opened this issue Oct 19, 2022 · 2 comments
Closed

reuse this in scicloj.ml.smile ? #1

behrica opened this issue Oct 19, 2022 · 2 comments

Comments

@behrica
Copy link

behrica commented Oct 19, 2022

For reference:

https://github.com/scicloj/scicloj.ml.smile/blob/d70c7e3caff93935d05ab81ed6b2d1e4846ad42b/src/scicloj/ml/smile/nlp.clj#L281

If possible I would like to re-use this implementation.

I only made one, because I did not find any some months ago.

@behrica behrica changed the title reuse this in sciclloj.ml.smile ? reuse this in scicloj.ml.smile ? Oct 19, 2022
@behrica
Copy link
Author

behrica commented Oct 19, 2022

I had a quick look and 2 comments:

  1. I was thinking to change my implementation and convert the "vocabulary" into a list of ints, instead of list of strings.
    Just hope to gain memory with this.
    Any comment on this ?

  2. I implemented several "alternative" calculations.
    By either doing "the same as sklearn" and / or looking at all formulas here:
    https://en.wikipedia.org/wiki/Tf%E2%80%93idf
    For both tf and idf it lists several weighting schemes.

  1. The current public API here looks good to me, for potentially plugin in into scicloj.ml.smile

  2. is important as I noticed big differences in downstream uses cases of tf-idf (comparing documents in my case)

I have maybe 4 out of 10 implemented.

@simongray
Copy link
Member

Hi @behrica, I've added an MIT licence, so you're welcome to use the code in any way that fits the licence. Basically, it just means that you need to maintain the copyright notice. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants