Sequence Labeling Dataset #157

sivareddyg · 2017-10-21T02:01:01Z

Very handy for working with sequences tagged with labels. Once this request is accepted, I will send a pull request of a sequence labeling task to pytorch/examples

TODO more documentation

jekbradbury · 2017-10-21T03:30:24Z

Thanks so much, this is really useful!

I believe you don't need to provide a custom splits method; the standard Dataset.splits should work fine here. Also, we're using root as the name for the directory like .data and path as the name for the full dataset path, so your keyword args should be updated to agree with that.

sivareddyg · 2017-10-22T01:11:36Z

@jekbradbury thanks! I have incorporated your suggestion. Travis CL tests pass.

jekbradbury · 2017-10-22T09:33:33Z

What I mean is that you can delete the entire splits method because the parent class already has exactly the same implementation.

sivareddyg · 2017-10-22T16:49:56Z

That makes sense. I removed it. Thanks!

sivareddyg · 2017-10-26T14:39:29Z

@jekbradbury do you want me to do anything else for this to be merged? Feel free. The only difference with other datasets is I introduced load_default_dataset function. I can get rid of it but I thought this function is more transparent than loading default dataset using splits.

I plan to push sequence labeling task to pytorch/examples and would appreciate if this can be merged. Thanks!

jekbradbury · 2017-10-26T18:01:35Z

Sorry, one last thing: I think the clearest approach would be to make SequenceLabelingDataset (or maybe call it TaggingDataset) an abstract class that doesn't have a "default" dataset, and then add a subclass UniversalDependenciesPOS that provides URLs/dirname attributes and a splits method with the default filenames that uses super to call Dataset.splits, along the lines of what we do for TranslationDatasets. I can make these changes and merge tomorrow if you want.

sivareddyg · 2017-10-26T21:31:30Z

No worries, I can do this. Thanks!

sivareddyg · 2017-10-26T21:32:38Z

I would prefer SequenceTagging or SequenceLabeling instead of just Tagging. What do you suggest?

jekbradbury · 2017-11-10T21:27:27Z

I likeSequenceTaggingDataset

sivareddyg added 10 commits October 13, 2017 10:50

bug correction: fd_org should be in read mode in order to read lines

7caa9b5

Merge branch 'master' of github.com:pytorch/text

8203209

Merge branch 'master' of github.com:pytorch/text

c2f15e0

Sequence Labeling Dataset

f73e95e

Merge branch 'master' of github.com:pytorch/text

8df4df4

sequence labeling dataset documentation

00a790b

Merge branch 'master' of github.com:pytorch/text

f10e152

Sequence Labeling basic documentation

302700a

TODO more documentation

linter

c399ea2

Merge branch 'master' of github.com:sivareddyg/text

b0d1169

sivareddyg added 2 commits October 21, 2017 17:40

splits to match Dataset

2d77f68

default root value

692f4f8

redundant splits function removed

d41c25d

unused import

95219db

Merge branch 'master' into master

b6ae6f3

jekbradbury added 6 commits December 22, 2017 03:27

Update README.md

83f3532

Update sequence_labeling.py

0cbd41b

Rename sequence_labeling.py to sequence_tagging.py

2b521cd

Update __init__.py

2872259

separate into abstract and concrete classes

aa4d108

Rename sequence_labeling.py to sequence_tagging.py

b95e2f4

jekbradbury added 10 commits December 22, 2017 03:39

Update README.md

f8d9144

Merge branch 'master' into master

8f501b3

Update sequence_tagging.py

1110126

Update __init__.py

67f8196

lint

4f742e9

Update sequence_tagging.py

63c66fc

Update sequence_tagging.py

3483414

Update sequence_tagging.py

64d3e7b

Merge branch 'master' into master

6f36053

Update sequence_tagging.py

89fba94

jekbradbury merged commit 7a2e442 into pytorch:master Dec 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence Labeling Dataset #157

Sequence Labeling Dataset #157

sivareddyg commented Oct 21, 2017

jekbradbury commented Oct 21, 2017

sivareddyg commented Oct 22, 2017

jekbradbury commented Oct 22, 2017

sivareddyg commented Oct 22, 2017

sivareddyg commented Oct 26, 2017

jekbradbury commented Oct 26, 2017

sivareddyg commented Oct 26, 2017

sivareddyg commented Oct 26, 2017

jekbradbury commented Nov 10, 2017

Sequence Labeling Dataset #157

Sequence Labeling Dataset #157

Conversation

sivareddyg commented Oct 21, 2017

jekbradbury commented Oct 21, 2017

sivareddyg commented Oct 22, 2017

jekbradbury commented Oct 22, 2017

sivareddyg commented Oct 22, 2017

sivareddyg commented Oct 26, 2017

jekbradbury commented Oct 26, 2017

sivareddyg commented Oct 26, 2017

sivareddyg commented Oct 26, 2017

jekbradbury commented Nov 10, 2017