Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubts loading data to shogun. #4147

Closed
spothound opened this issue Feb 3, 2018 · 3 comments
Closed

Doubts loading data to shogun. #4147

spothound opened this issue Feb 3, 2018 · 3 comments

Comments

@spothound
Copy link
Contributor

Hi there! Here a student trying to start using Shogun in real-world problems...

I've started using shogun toolbox with the kaggle's titanic problem and, trying not to get stucked, I'm also followin the shogun introduction notebook... but I have some doubts about loading data in shogun-way.

This is a example of my dataset:

1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q

I've started trying to load it using

f=SparseRealFeatures() trainlab=f.load_with_labels(data_file) mat=f.get_full_feature_matrix()

as in the introduction to shogun, as I thought it would be good to use a sparse matrix from the beginning but when using it, I got a segmentation fault (core dumped). I think that's because SparseRealFeatures doesn't accept non numerical data, so I've tried to use CStringFeatures instead... resulting in get this output:

"from shogun import CStringFeatures
ImportError: cannot import name 'CStringFeatures'"

I'm working in python3 and I've installed shogun from source code... and I don't know why I can't import this class... I'm a bit lost with Shogun docs.

I've seen other examples at the showroom but they use scipy.io loadmat to load files... which seems to be the same as if I load dataset with, for example, pandas dataframes and then pass to RealFeatures a resulting numpy array...

The reason to open this issue is to ask people accustomed to work with shogun how do they load this kind of datasets with non-numerical features and work with them, if someone know what's happen with CStringFeatures, and if they think it real matter to use CFile class to load datasets (at least working in languages like python which offers lot of doing the same... and Shogun seems to need only the final resulting RealFeatures and Labels to work.

Thats all! I ask that here because there aren't a forum or something like that, sorry if is not correct to open issues for something like that!

@karlnapf
Copy link
Member

karlnapf commented Feb 3, 2018

This is not a github issue, but a topic for the mailing list. Pls post it there

@karlnapf karlnapf closed this as completed Feb 3, 2018
@karlnapf
Copy link
Member

karlnapf commented Feb 3, 2018

Or StackOverflow

@spothound
Copy link
Contributor Author

Sorry, didn't know there are a mailing list :/ I should have found it... I'll post it there, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants