Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
aec2140
Put pytest as a requirement
kmkurn Jan 25, 2019
4e30dcb
Add flake8 and yapf config files
kmkurn Jan 25, 2019
1172de9
Make package installable
kmkurn Jan 25, 2019
29cf0d9
Implement Dataset class
kmkurn Jan 25, 2019
eb98947
Implement StreamDataset class
kmkurn Jan 25, 2019
94db6d6
Make an abstract class for datasets
kmkurn Jan 25, 2019
a0dd117
Install pytest-cov
kmkurn Jan 25, 2019
1f809f8
Add pytest config file
kmkurn Jan 25, 2019
d3f16d8
Make tests for shuffle stronger
kmkurn Jan 25, 2019
9fd4786
Ignore abstract class from coverage
kmkurn Jan 25, 2019
cea2965
Refactor a bit
kmkurn Jan 25, 2019
a0c355f
Make coverage 100%
kmkurn Jan 25, 2019
5dc2820
Fix import order
kmkurn Jan 25, 2019
6c9f557
Use Counter as finite stream
kmkurn Jan 25, 2019
77f8ea6
Add docstrings
kmkurn Jan 25, 2019
eabde78
Rename "minibatches" to "batches"
kmkurn Jan 25, 2019
26ef7ce
Make type hints a bit more specific
kmkurn Jan 25, 2019
b8be8b3
Make the simplest kind of dataset: iterable of ints
kmkurn Jan 26, 2019
f7c85f0
Implement Batches class
kmkurn Jan 26, 2019
07655df
Test if batch size is nonpositive
kmkurn Jan 26, 2019
5f417d5
Implement StreamBatches class fully
kmkurn Jan 26, 2019
c57ec2b
Refactor a bit
kmkurn Jan 26, 2019
14d4e9f
Add docstrings
kmkurn Jan 26, 2019
b4507c0
Create an abstract class for batches
kmkurn Jan 26, 2019
8fd68ca
Add property drop_last to batches
kmkurn Jan 26, 2019
47db7f5
Let batch() and batch_exactly() return Batches object
kmkurn Jan 26, 2019
110f829
Combine some tests
kmkurn Jan 26, 2019
9a8e7ed
Reorganize into modules
kmkurn Jan 26, 2019
ca588cc
Fix Counter not reset
kmkurn Jan 26, 2019
d6a224f
Add test to check StreamDataset can be iterated > 1x
kmkurn Jan 26, 2019
29fa8a4
Fix when dataset size is divisible by batch size
kmkurn Jan 26, 2019
c3b6f1d
Refactor some tests
kmkurn Jan 26, 2019
ca4b91a
Convert to numpy's ndarray instead of torch's tensor
kmkurn Jan 26, 2019
fb5fe78
Change name to text2array
kmkurn Jan 26, 2019
ec423b9
Make sure to report only the project's files
kmkurn Jan 26, 2019
cf57cfa
Also report coverage as HTML
kmkurn Jan 26, 2019
d369e4e
Simplify some tests
kmkurn Jan 26, 2019
2279ab3
Strengthen and refactor tests
kmkurn Jan 26, 2019
16d2195
Make all stream fixtures finite
kmkurn Jan 26, 2019
8609841
Shorten streams
kmkurn Jan 26, 2019
7694af1
Mention that samples must support what kinds of indexing
kmkurn Jan 26, 2019
e095086
Make Batches.to_array() return type more abstract
kmkurn Jan 26, 2019
0f883b1
Try to make samples/stream contains objects
kmkurn Jan 26, 2019
5428186
Try to make custom Batch class
kmkurn Jan 27, 2019
5ac1e51
Let DatasetABC.batch() return iterator of Batch
kmkurn Jan 27, 2019
f50422f
Remove Batches stuff and refactor
kmkurn Jan 27, 2019
d3fcf77
Add tests for when batch size evenly divides number of samples
kmkurn Jan 27, 2019
c4e255a
Properly ignore abstract methods from coverage
kmkurn Jan 27, 2019
adfc724
Measure branch coverage as well
kmkurn Jan 27, 2019
436886b
Change Sample class name to SampleABC for consistency
kmkurn Jan 27, 2019
65fefaf
Add docstring to Batch class
kmkurn Jan 27, 2019
77d3099
Implement Batch.to_array()
kmkurn Jan 27, 2019
bdce90a
Fix typo
kmkurn Jan 27, 2019
b82c825
Allow FieldValue to be float
kmkurn Jan 27, 2019
3717766
Initial attempt to let FieldValue be str
kmkurn Jan 27, 2019
2bdd1b1
Handle when there are more than one str fields
kmkurn Jan 27, 2019
afe221f
Add a todo
kmkurn Jan 27, 2019
1ef946f
Use clearer field names
kmkurn Jan 27, 2019
f83719b
Don't do vocab in to_array() and use Mapping instead of BatchArray
kmkurn Jan 27, 2019
595a9cf
Sample is just a mapping from field name to field value
kmkurn Jan 27, 2019
13970ac
Allow FieldValue to be a sequence of floats or ints
kmkurn Jan 27, 2019
ea42090
Allow custom padding value
kmkurn Jan 27, 2019
ca2ebeb
Add a todo
kmkurn Jan 27, 2019
752f9b8
Refactor tests for Batch.to_array() into a class
kmkurn Jan 27, 2019
01e6c55
Support recursive sequences for field values
kmkurn Jan 28, 2019
4692371
Explicitly setup sequences in test_batch.py
kmkurn Jan 28, 2019
bc0383a
Add todos
kmkurn Jan 28, 2019
6d911da
Let Dataset._shuffle_copy() use Dataset._shuffle_inplace()
kmkurn Jan 28, 2019
bf637c8
Implement Dataset.shuffle_by() method
kmkurn Jan 28, 2019
53bce6e
Remove unnecessary comments in config files
kmkurn Jan 28, 2019
43e5edb
Add a todo
kmkurn Jan 28, 2019
51ae4c0
Initial implementation of Vocab class
kmkurn Jan 28, 2019
1a44a31
Handle several fields need vocab
kmkurn Jan 28, 2019
1e56fe5
Refactor tests
kmkurn Jan 28, 2019
13d7534
Make Vocab a mapping
kmkurn Jan 28, 2019
50f7d13
Shorten variable names when the type is clear
kmkurn Jan 28, 2019
6da5138
Complete type hints in _StringStore class
kmkurn Jan 28, 2019
99fdf07
Handle sequence of str field values
kmkurn Jan 28, 2019
5f02c65
Handle sequence of sequence field values
kmkurn Jan 28, 2019
8206507
Improve coverage
kmkurn Jan 28, 2019
7a5215e
Relax so vocab can be built from from just iterable of samples
kmkurn Jan 28, 2019
ca5e4b3
Let Vocab._get_values return iterator to conserve memory
kmkurn Jan 28, 2019
0e20f67
Fill some type hints
kmkurn Jan 28, 2019
1217173
Simplify some code
kmkurn Jan 28, 2019
9860fac
Handle when samples is empty
kmkurn Jan 28, 2019
0a0d3f3
Add todos
kmkurn Jan 28, 2019
31a8fda
Turns an assertion into runtime check
kmkurn Jan 28, 2019
d8ca752
Add todos
kmkurn Jan 28, 2019
bafff97
Write docstrings
kmkurn Jan 28, 2019
b2113f1
Customize min count
kmkurn Jan 28, 2019
f4a1a1d
Fix customizing min count for each field names
kmkurn Jan 28, 2019
eb40321
Add some todos
kmkurn Jan 28, 2019
5e9b0b2
Change the Vocab API; now it's a mapping to str-to-int mapping
kmkurn Jan 28, 2019
aff3a57
Allow customizing unknown token
kmkurn Jan 28, 2019
0f49d30
Allow customizing padding token
kmkurn Jan 28, 2019
81ea45b
Allow customizing max vocab size
kmkurn Jan 28, 2019
c0982ef
Make better docstrings for options
kmkurn Jan 28, 2019
e61b465
Fix a todo
kmkurn Jan 28, 2019
3ba1ea0
Implement Dataset.apply_vocab()
kmkurn Jan 29, 2019
6342fee
Handle when value is not in vocab
kmkurn Jan 29, 2019
885da68
Add a todo
kmkurn Jan 29, 2019
bf5c96d
Raise error when value is not in vocab
kmkurn Jan 29, 2019
315fd49
Say "key error" instead of "value not in vocab"
kmkurn Jan 29, 2019
d6ae658
Add a test with actual vocab object
kmkurn Jan 29, 2019
11ff9b8
Organize tests for Dataset.apply_vocab() in a class
kmkurn Jan 29, 2019
58ed68e
Refactor and make Dataset.apply_vocab() modify the dataset
kmkurn Jan 29, 2019
083ba6a
Do apply vocab inplace if possible
kmkurn Jan 29, 2019
085c884
Remove unnecessary Dataset._shuffle_copy() method
kmkurn Jan 29, 2019
aa667c2
Refactor a bit
kmkurn Jan 29, 2019
63c910e
Update docstring
kmkurn Jan 29, 2019
22493d8
Implement StreamDataset.apply_vocab()
kmkurn Jan 29, 2019
b171f59
Move _apply() classmethod to DatasetABC._app_vb_to_val()
kmkurn Jan 29, 2019
5d5454b
Refactor applying vocab to a sample
kmkurn Jan 29, 2019
ac7ad66
Refactor name a bit
kmkurn Jan 29, 2019
faddc98
Add a todo
kmkurn Jan 30, 2019
67e4e0f
Update a todo
kmkurn Jan 30, 2019
bb3dd88
Make Batch.get() private
kmkurn Jan 30, 2019
8557fe3
Assume all samples have the same fields
kmkurn Jan 30, 2019
e668913
Refactor a bit and add more test
kmkurn Jan 30, 2019
564721c
Delete unused import
kmkurn Jan 30, 2019
8500181
Add contains test for vocab
kmkurn Jan 30, 2019
74807b4
Add a todo
kmkurn Jan 30, 2019
4e6f0f4
Make Vocab.from_samples() accept iterator
kmkurn Jan 30, 2019
33fbe38
Add more tests and refactor a bit
kmkurn Jan 30, 2019
eb82df2
Add more todos
kmkurn Jan 30, 2019
e677c65
Refactor tests to only check what's necessary
kmkurn Jan 30, 2019
7d299e1
Add padding only for sequential fields
kmkurn Jan 30, 2019
7266d61
Add a todo
kmkurn Jan 30, 2019
abd54e8
Use typing classes for isinstance check
kmkurn Jan 30, 2019
6a3137d
Change to class variable
kmkurn Jan 30, 2019
1b7b5a2
Refactor dataset tests to use fixtures only when makes sense
kmkurn Jan 30, 2019
2978228
Refactor test_batch.py
kmkurn Jan 30, 2019
f8f6912
Use classes from typing instead of collections.abc
kmkurn Jan 30, 2019
29eee98
Refactor datasets.py a bit
kmkurn Jan 30, 2019
e290924
Handle when nesting depth is inconsistent
kmkurn Jan 31, 2019
da00bf6
Handle when some values are strings
kmkurn Jan 31, 2019
4b4f261
Refactor Batch.to_array()'s types and var names
kmkurn Jan 31, 2019
108e450
Add a todo
kmkurn Jan 31, 2019
e4cd29e
Add numpy as dependency
kmkurn Jan 31, 2019
7436afa
Write longer readme
kmkurn Jan 31, 2019
87cad0c
Finish tutorial stuff on readme
kmkurn Jan 31, 2019
e192126
Add str as FieldValue as well
kmkurn Jan 31, 2019
22f8b6e
Fix readme
kmkurn Jan 31, 2019
411110d
Complete readme
kmkurn Jan 31, 2019
250c938
Add license
kmkurn Jan 31, 2019
042eba7
Add Makefile
kmkurn Jan 31, 2019
91a650d
Add manifest file
kmkurn Jan 31, 2019
87e0086
Delete unnecessary manifest file
kmkurn Jan 31, 2019
b71ebd8
Fix .PHONY target
kmkurn Jan 31, 2019
70c9aab
Remove todo
kmkurn Jan 31, 2019
0ddb42e
Improve examples on readme and add nested fields for padding
kmkurn Feb 1, 2019
87c72f8
Fix unnamed language code block
kmkurn Feb 1, 2019
93d22e2
Add todos
kmkurn Feb 1, 2019
2a7aa07
Add spacemacs badge
kmkurn Feb 1, 2019
a99af40
Add .travis.yml
kmkurn Feb 1, 2019
72555d5
Add coveralls setup to travis
kmkurn Feb 1, 2019
51c8a13
Fix installing dependencies in travis
kmkurn Feb 1, 2019
031432a
Try to fix no coverage data found warning
kmkurn Feb 1, 2019
7bbb320
Try to fix KeyError when running coveralls
kmkurn Feb 1, 2019
8cc0f85
Change to coveralls
kmkurn Feb 1, 2019
b3c267c
Add travis and coveralls badges [skip ci]
kmkurn Feb 1, 2019
4396407
Remove todos [skip ci]
kmkurn Feb 1, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[run]
branch = True

[report]
exclude_lines =
# Re-enable standard pragma
pragma: no cover
# abstract method
abstractmethod
# debugging stuff
__repr__
5 changes: 5 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Config file for flake8

[flake8]
ignore = E,W # let yapf handle stylistic issues
show-source = True
8 changes: 8 additions & 0 deletions .style.yapf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Config file for YAPF Python formatter

[style]
based_on_style = pep8
coalesce_brackets = True
column_limit = 96
split_before_first_argument = True
split_complex_comprehension = True
9 changes: 9 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
language: python
python: 3.6
install:
- pip install -r requirements.txt
- pip install -e .
script: pytest
after_success:
- pip install coveralls
- coveralls
21 changes: 21 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2019 Kemal Kurniawan

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
11 changes: 11 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.PHONY: build upload test-upload

build:
python setup.py bdist_wheel
python setup.py sdist

upload: build
twine upload --skip-existing dist/*

test-upload: build
twine upload -r testpypi --skip-existing dist/*
Loading