Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Has anyone used snorkel for tabular numerical data? #803

Closed
matt256 opened this issue Sep 15, 2017 · 15 comments
Closed

Question: Has anyone used snorkel for tabular numerical data? #803

matt256 opened this issue Sep 15, 2017 · 15 comments
Labels

Comments

@matt256
Copy link

matt256 commented Sep 15, 2017

I have a very large sampling of tabular data including mainly numerical fields where each line is an example I would like to try and label. Looking through the documentation and examples, I don't see a way to use the tool in this manner or at least easily get the data into a usable format. Does anyone know if this can or has been done? Any thoughts? The "concept" seems similar, but your sought after audience was text based labeling. Just wondering if it could be adapted. Thanks! Also, great work. Heard about the package on O'Reilly Data Show.

@ajratner
Copy link
Contributor

Hi @matt256 thanks for listening and checking Snorkel out! Is this tabular data in a standardized format that's easily machine readable, or embedded in text / PDFs / etc? If the latter, you can check out Fondeur (see this blog post), which will be merged into master as a module soon!

@ajratner
Copy link
Contributor

Might also want to check out http://pages.cs.wisc.edu/~thodrek/, he does some work in the area of structured tabular data that might be of interest!

@matt256
Copy link
Author

matt256 commented Sep 17, 2017

thank you for the responses @ajratner. It's actually already in a csv/tabular format. I'm betting there are ways to make it work, though. A lot of possibilities here. The link to Fondeur was quite valuable than you again.

@ajratner
Copy link
Contributor

Great!

@chrismre
Copy link

The Snorkel idea is leveraged in HoloClean https://arxiv.org/abs/1702.00820 as Alex pointed out @thodrek is going to release some open source too! My guess is that these techniques might be helpful for the type of structured data that you're describing.

@thodrek
Copy link

thodrek commented Sep 18, 2017

Hi @matt256 please checkout our blogpost on HoloClean (http://dawn.cs.stanford.edu/2017/05/12/holoclean/). I believe the problem you are describing can be viewed as a data cleaning task. Think of labeling as trying to suggest a correct value for each cell in your data. HoloClean will do this for you. The weak-supervision rules here correspond to a set of integrity constraints over the data. We are actively refactoring the HoloClean code and it will be released end of this month. I will keep you posted.

@jim-bo
Copy link

jim-bo commented Dec 12, 2017

I was wondering if your group was still working on multi-modal problem you mentioned on your website? I'm looking to incorporate some tabular data with my unstructured text to aide in label generation and eventually in the discriminative model itself.

@matt256
Copy link
Author

matt256 commented Dec 13, 2017

We ended up delaying that project a bit so we could check out what the released holoclean code looked like and learn some other's experiences. Your question was pretty timely, though, as we are about to get things rolling again. I noticed that it hadn't been released yet.

But after reading the blog post recommended above, I think thodrek's response was spot on for what we want to do.

@thodrek, do you all still intend to release a version? looking forward to it, if so. Looks like you all have done some great work.

@thodrek
Copy link

thodrek commented Dec 13, 2017

@matt256 @jim-bo Hey guys the Holoclean release will happen very soon. We are done refactoring our initial code. We are in the phase of cleaning it up and expecting the first release to happen within December. The repo is still in "private" mode but once released the code will be hosted here: https://github.com/HoloClean

I will keep you up to date :)

@matt256
Copy link
Author

matt256 commented Dec 13, 2017

Thank you so much, @thodrek. looking forward to it

@ajratner ajratner added the Q&A label Dec 14, 2017
@ajratner
Copy link
Contributor

Closing for now--will be accessible via the "Q&A" link in README--but feel free to re-open!

@thodrek
Copy link

thodrek commented Apr 6, 2018

@matt256 @jim-bo I just wanted to let you know that holoclean was released. You can find more info here: http://www.holoclean.io Please do not hesitate to ping me in case of questions

@asstergi
Copy link

@matt256 Did you manage to use Snorkel, Fonduer or Holoclean for your purposes? I have a similar task and I'm looking for some guidance.

@thodrek
Copy link

thodrek commented Jan 11, 2019

@asstergi @matt256 HoloClean can be applied to tabular numerical data. Please post an issue here https://github.com/HoloClean/holoclean and we will follow up there.

@asstergi
Copy link

@thodrek I posted an new issue in HoloClean. Looking forward to your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants