Finish tests for Field class #119

nelson-liu · 2017-09-13T06:57:00Z

This PR is a continuation of #47 and finishes testing the various functions for the field class (build_vocab and numericalize).

I also wanted to fix #78 and add a test for it, but I'm unsure how to properly do this. One way would be to simply convert the numerical features to python ints dictated by the input tensor_type member (e.g. int for LongTensor, float for FloatTensor, etc). This feels pretty brittle, does anyone else have any other suggestions?

jekbradbury · 2017-09-14T19:10:48Z

I think that would be okay? I can't imagine any situation where someone has data in a string that looks like a float and is asking for a FloatTensor, but doesn't want the data converted from string to float. If it's already of the target datatype, this is a no-op anyway.

nelson-liu · 2017-09-14T19:27:44Z

If it's already of the target datatype, this is a no-op anyway.

Is this even possible with the standard data flow? I feel like everything is converted to string.

…

On September 14, 2017 at 12:10:51 PM, jekbradbury ***@***.***) wrote: I think that would be okay? I can't imagine any situation where someone has data in a string that looks like a float and is asking for a FloatTensor, but doesn't want the data converted from string to float. If it's already of the target datatype, this is a no-op anyway. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#119 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AG72X8eFlJ49_fiPLLXKjPOknoitqd8nks5siXo7gaJpZM4PVqTv> .

jekbradbury · 2017-09-14T19:29:12Z

If you write a custom Dataset that e.g. loads from HDF5, maybe?

torchtext/data/field.py

+                    "Please raise an issue at "
+                    "https://github.com/pytorch/text/issues".format(self.tensor_type))
+            numericalization_func = self.tensor_types[self.tensor_type]
+            # It doesn't make sense to explictly coerce to a numeric type if


Also: - Add a dataset for testing numeric features (float and int) - Coerce non-sequential data with use_vocab=False to numeric types

nelson-liu added 9 commits September 12, 2017 23:15

Edit unicode input for field preprocess test

aa224c9

Clarify that preprocess ignores include_lengths if not sequential

92bc490

Fix Field preprocess test ground truth

5c6961d

Add test for Field.build_vocab

7b988f1

Enhance field.numericalize docstring and slightly clean up

a033fc2

Add test cases for field.numericalize with various args

f6d8168

Make verify_numericalized_example gold lengths optional

accc4bb

Add test for Field postprocessing

9ec8d18

Add test for Field.numericalize input validation

f283157

nelson-liu and others added 3 commits September 14, 2017 18:47

Merge branch 'master' into field_tests

d017622

Add a dataset for testing numeric features (float and int)

e05a0eb

Coerce non-sequential data with use_vocab=False to numeric types

0e74af5

nelson-liu commented Sep 15, 2017

View reviewed changes

nelson-liu and others added 4 commits September 14, 2017 19:32

Fix lint

7c715e5

Move verify_numericalized_example to torchtext_test_case

9024e0d

Fix lint

1a77c3a

Merge branch 'master' into field_tests

b635fcb

jekbradbury mentioned this pull request Sep 22, 2017

A Field of tensors #127

Closed

nelson-liu and others added 2 commits September 21, 2017 21:36

Merge branch 'master' into field_tests

d17e57d

Merge branch 'master' into field_tests

8995ef0

jekbradbury merged commit fdfc1a6 into pytorch:master Oct 6, 2017

jekbradbury pushed a commit that referenced this pull request Oct 9, 2017

Finish tests for Field class (#119)

2bc3948

Also: - Add a dataset for testing numeric features (float and int) - Coerce non-sequential data with use_vocab=False to numeric types

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish tests for Field class #119

Finish tests for Field class #119

nelson-liu commented Sep 13, 2017 •

edited

Loading

jekbradbury commented Sep 14, 2017

nelson-liu commented Sep 14, 2017 via email

jekbradbury commented Sep 14, 2017

This comment was marked as off-topic.

Finish tests for Field class #119

Finish tests for Field class #119

Conversation

nelson-liu commented Sep 13, 2017 • edited Loading

jekbradbury commented Sep 14, 2017

nelson-liu commented Sep 14, 2017 via email

jekbradbury commented Sep 14, 2017

This comment was marked as off-topic.

nelson-liu commented Sep 13, 2017 •

edited

Loading