Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish tests for Field class #119

Merged
merged 18 commits into from
Oct 6, 2017
Merged

Conversation

nelson-liu
Copy link
Contributor

@nelson-liu nelson-liu commented Sep 13, 2017

This PR is a continuation of #47 and finishes testing the various functions for the field class (build_vocab and numericalize).

I also wanted to fix #78 and add a test for it, but I'm unsure how to properly do this. One way would be to simply convert the numerical features to python ints dictated by the input tensor_type member (e.g. int for LongTensor, float for FloatTensor, etc). This feels pretty brittle, does anyone else have any other suggestions?

@jekbradbury
Copy link
Contributor

I think that would be okay? I can't imagine any situation where someone has data in a string that looks like a float and is asking for a FloatTensor, but doesn't want the data converted from string to float. If it's already of the target datatype, this is a no-op anyway.

@nelson-liu
Copy link
Contributor Author

nelson-liu commented Sep 14, 2017 via email

@jekbradbury
Copy link
Contributor

If you write a custom Dataset that e.g. loads from HDF5, maybe?

"Please raise an issue at "
"https://github.com/pytorch/text/issues".format(self.tensor_type))
numericalization_func = self.tensor_types[self.tensor_type]
# It doesn't make sense to explictly coerce to a numeric type if

This comment was marked as off-topic.

@jekbradbury jekbradbury mentioned this pull request Sep 22, 2017
@jekbradbury jekbradbury merged commit fdfc1a6 into pytorch:master Oct 6, 2017
jekbradbury pushed a commit that referenced this pull request Oct 9, 2017
Also:
- Add a dataset for testing numeric features (float and int)
- Coerce non-sequential data with use_vocab=False to numeric types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using a field representing real numbers with the iterator
2 participants