Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about glint #54

Open
codlife opened this issue Sep 12, 2016 · 4 comments
Open

A question about glint #54

codlife opened this issue Sep 12, 2016 · 4 comments

Comments

@codlife
Copy link

codlife commented Sep 12, 2016

Hello Rolf!
I have had a look about your code, what troubles me is how glint is interface with spark,i even don't see
a line code related to spark.
Best Wishes!
Codelife

@rjagerman
Copy link
Owner

You're right that Glint is stand-alone and not necessarily interfaced with Spark. You could use it entirely without Spark. The documentation has a section that shows how Glint can easily be used within Spark: http://rjagerman.github.io/glint/gettingstarted/spark/

The main idea is that "BigVector" and "BigMatrix" objects are serializable and safe to be used within Spark closures. You can iterate over a dataset like you would in Spark but simultaneously use Glint to "pull" and "push" parts of a distributed model. The entire documentation is in need of an overhaul to make all this more clear.

I am still debating whether to integrate Glint more closely to Spark. One of the advantages is that we can run Glint within the Spark runtime (I have a proof-of-concept of this ready). This means we don't have to run the parameter servers as separate java processes. Anyone can just include Glint as a dependency and it will run automatically in their Spark cluster together with their code.

An example of Glint working together with Spark is GlintLDA, a state-of-the-art LDA algorithm that achieves Web-scale topic modeling beyond what was possible with mllib.

@codlife
Copy link
Author

codlife commented Sep 12, 2016

Thank you ! I will have a look about your doc, Thanks again!

@codlife
Copy link
Author

codlife commented Sep 12, 2016

Your current implement don't support cluster? how dou you store the bigMatrix if there are many servers? I think Flint can be a component of spark upon spark core.

@cstur4
Copy link

cstur4 commented Oct 15, 2016

We can setup parameter servers in spark application, and use glint as a component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants