Conversation
| self.assertEqual(dataset.name, 're1.wc') | ||
| self.assertEqual(feature.name, 'CLASS_LABEL') | ||
| self.assertEqual(feature.data_type, 'nominal') | ||
| self.assertEqual(len(feature.nominal_values), 25) |
There was a problem hiding this comment.
Could you please add a check for the type of the output value?
There was a problem hiding this comment.
Sorry for not being clear enough here. Could you please load X and y and check their type, dtype and shape?
There was a problem hiding this comment.
As discussed, I'm refraining from changing this test now. Have created an issue to take care of such checks independently.
| np.array(type_, dtype=np.float32) | ||
| # checks if the strings which should be the class labels | ||
| # can be encoded into integers | ||
| pd.factorize(type_)[0] |
There was a problem hiding this comment.
Didn't you mention in person that you need to assign the value of this function call?
There was a problem hiding this comment.
Yes, but with further testing realized that assignment doesn't make sense here since this loop iterates over the attributes. Whereas, if anything needs to be checked, we should check the data. Which is not seemingly throwing any issue.
I checked this chunk of code with both a Sparse_Arff and Arff data formats, the type_ receives exactly the same type and structure of the output. I don't know why the attribute list is being checked for type whereas the arff.ArffDecoder.decode() seems to return the target feature as a list of the classes. Don't know why a sparse format requires numeric encoding of that attribute list.
Hence, I replaced the numpy check with the pandas categorical encoding.
Codecov Report
@@ Coverage Diff @@
## develop #823 +/- ##
===========================================
+ Coverage 88.05% 89.28% +1.23%
===========================================
Files 36 36
Lines 4243 4768 +525
===========================================
+ Hits 3736 4257 +521
- Misses 507 511 +4
Continue to review full report at Codecov.
|
Reference Issue
Fixes #758.
What does this PR implement/fix? Explain your changes.
There was a try block checking for a numpy based conversion. Converted that to pandas categorical encoding.
How should this PR be tested?
import openmlopenml.datasets.get_dataset(395)