Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite for spaCy v3 #173

Closed
wants to merge 279 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
279 commits
Select commit Hold shift + click to select a range
1b41182
Fix flake errors
honnibal Apr 24, 2020
901dc7f
Clean up annotation_setters module
honnibal Apr 24, 2020
2836a80
Update
honnibal Apr 24, 2020
1561dfd
Format
honnibal Apr 24, 2020
1045511
Fix flake errors
honnibal Apr 24, 2020
0fca4fe
Add extension implementations
honnibal Apr 25, 2020
69d2a9b
Tidy and fix
honnibal Apr 25, 2020
e22ce04
Format
honnibal Apr 25, 2020
fa9fec5
Fix model wrapper
honnibal Apr 25, 2020
4da3daf
Fix component
honnibal Apr 25, 2020
f827ed7
Add imports
honnibal Apr 25, 2020
a9e62b2
Fix bug
honnibal Apr 25, 2020
05e4b8a
Work on dummy tokenizer
honnibal Apr 25, 2020
5f6fdf6
Add test util
honnibal Apr 26, 2020
35729d2
Move dummy test thing to tests module
honnibal Apr 26, 2020
54b0ab6
Fix dummy model
honnibal Apr 26, 2020
4816a20
Fix model wrapper
honnibal Apr 26, 2020
9682176
Add test for pipeline component
honnibal Apr 26, 2020
718973c
Fix test
honnibal Apr 26, 2020
3283791
Fix alignment
honnibal Apr 26, 2020
cb41214
Test set_annotations
honnibal Apr 26, 2020
0199a57
Fix TransformerOutput.empty method
honnibal Apr 26, 2020
7c59d91
Remove print statement
honnibal Apr 26, 2020
1e4654e
Try to set nO dimension
honnibal Apr 26, 2020
7a38302
Fix listener
honnibal Apr 26, 2020
3ee96f8
Add test for pipeline component
honnibal Apr 26, 2020
1091604
Move add_extensions to util.py
honnibal Apr 26, 2020
f623c90
Add layers
honnibal Apr 26, 2020
ac0a58a
Rethink extensions
honnibal Apr 26, 2020
437cdd6
Reorg
honnibal Apr 26, 2020
0270bbd
Add docs property to TransformerOutput
honnibal Apr 26, 2020
c53977c
Remove print statement
honnibal Apr 26, 2020
0ef4339
Reorg
honnibal Apr 26, 2020
93946d4
Reorg
honnibal Apr 26, 2020
43b5fc4
Reorg
honnibal Apr 26, 2020
15c2827
Reorg
honnibal Apr 26, 2020
b2dbaa1
Get tests passing after refactor
honnibal Apr 26, 2020
a500fd2
Starting to train
honnibal Apr 27, 2020
06b2afa
Update util
honnibal Apr 27, 2020
9b14684
Temporarily add train-from-config script
honnibal Apr 27, 2020
816ae2b
Fix training
honnibal Apr 27, 2020
b7ba70d
Dont pad by default
honnibal Apr 27, 2020
b762ccf
Add pos cfg
honnibal Apr 27, 2020
6363aed
Update cfg
honnibal Apr 27, 2020
6b3f90b
Update tok2vec
honnibal Apr 28, 2020
19c10ae
Update examples
honnibal Apr 28, 2020
67693b7
Refactor
honnibal Apr 28, 2020
64c3230
Refactor
honnibal Apr 28, 2020
e6a5db7
Add BatchEncoding
honnibal Apr 28, 2020
cf29807
Fix flake8 errors
honnibal Apr 28, 2020
82d9ae4
Format
honnibal Apr 28, 2020
1ba3808
Get test_pipeline_component passing
honnibal Apr 28, 2020
401df8f
Get test_model_wrapper passing
honnibal Apr 28, 2020
a62df8f
Fix tok2vec
honnibal Apr 28, 2020
ab52347
Fix alignment
honnibal Apr 28, 2020
4fd2fd3
Improve example script
honnibal Apr 28, 2020
c3d6f18
Fixes
honnibal Apr 28, 2020
d1d5bbc
Fix train-from-config script
honnibal Apr 28, 2020
e27aa7e
Add dep config
honnibal Apr 29, 2020
4653aca
Set nO for transformer
honnibal Apr 30, 2020
99ceed8
Update example configs
honnibal Apr 30, 2020
81d2292
Update example scripts
honnibal Apr 30, 2020
7a1ad3b
Fix lr rate in config
honnibal Apr 30, 2020
2926159
update srsly requirement
svlandeg May 4, 2020
fc10aa2
also update package requirements in setup.cfg
svlandeg May 4, 2020
dec669f
Hack in a function for strided spans
honnibal May 5, 2020
18be0a7
Update wrapper: average tokens instead of last
honnibal May 5, 2020
bdfea45
Update train_from_config script
honnibal May 5, 2020
09fd320
Fix tok2vec wrapper
honnibal May 5, 2020
00f1dbe
Merge branch 'feature/spacy-v3' of https://github.com/explosion/spacy…
honnibal May 5, 2020
9db8f9a
Update DummyTransformer test util
honnibal May 6, 2020
03b4e93
Fix width property of TransformerData
honnibal May 6, 2020
9fc6ba5
Add find_last_hidden util
honnibal May 6, 2020
6ab6a9d
Add linear layer in transformer tok2vec
honnibal May 6, 2020
32cc467
Add linear layer for transformers
honnibal May 6, 2020
de9aa66
Remove print statements
honnibal May 8, 2020
a45a398
Cleanup
honnibal May 8, 2020
ba44f41
Allow max_batch_size in transformer component
honnibal May 8, 2020
025b448
Fix alignment
honnibal May 8, 2020
be28ed4
Fix tok2vec gradient
honnibal May 8, 2020
eeb96b8
Clean up wrapper a bit
honnibal May 8, 2020
906f46a
Register span functions
honnibal May 8, 2020
023aea5
Update configs
honnibal May 8, 2020
cff9f5a
Clean up wrapper
honnibal May 8, 2020
506a7d6
Refactor alignment
honnibal May 10, 2020
ffe2c1e
Move types
honnibal May 10, 2020
7a529f2
Update requirements
honnibal May 10, 2020
0ef4a18
Update alignment
honnibal May 10, 2020
0806d1f
Use Ragged for alignment
honnibal May 10, 2020
6269f7f
Use configurable alignment pooling
honnibal May 10, 2020
1eae8d7
Update alignment
honnibal May 10, 2020
95282fd
Start testing alignment
honnibal May 10, 2020
e595810
Remove unused code
honnibal May 10, 2020
8eaafc9
WIP on refactor
honnibal May 10, 2020
46b78e6
WIP on new alignment
honnibal May 10, 2020
a7710da
Tmp
honnibal May 10, 2020
a5a416c
Configure mypy
honnibal May 10, 2020
8f9013e
Fix test util
honnibal May 10, 2020
3aa21e5
Update alignment
honnibal May 10, 2020
f7546f8
Hack types
honnibal May 10, 2020
164e8b5
Update tok2vec
honnibal May 10, 2020
333b7f4
Format
honnibal May 10, 2020
c92407c
Remove unused imports
honnibal May 10, 2020
85580c3
Work on updating tests
honnibal May 10, 2020
42bb18a
Work on updating tests
honnibal May 10, 2020
458e81f
Use thinc 8.0.0a6
honnibal May 11, 2020
199a257
Fix wrapper
honnibal May 11, 2020
9758951
Fix dummy tokenizer
honnibal May 11, 2020
1b250fd
Fix test
honnibal May 11, 2020
362e7eb
Update alignment test
honnibal May 11, 2020
51408da
Fix test
honnibal May 11, 2020
b2ad4a6
New alignment mostly working, apart from strided
honnibal May 11, 2020
cadac25
Mostly working -- need to fix overlapping spans
honnibal May 12, 2020
34d22e9
WIP on span alignment
honnibal May 12, 2020
6b233eb
Update alignment to use spans
honnibal May 12, 2020
a1d42ec
Update wrapper
honnibal May 12, 2020
3ffa762
fix config files
svlandeg May 13, 2020
c2a68ac
Fix alignment
honnibal May 13, 2020
50d7a28
Kludge type errors
honnibal May 13, 2020
c083d2f
Tweak alignment code
honnibal May 13, 2020
212e9c7
Fix span stride
honnibal May 14, 2020
21c54f1
Fix strided alignment
honnibal May 14, 2020
b5c99db
Fix strided alignment
honnibal May 14, 2020
6165495
Flake8
honnibal May 14, 2020
4dbec80
Format
honnibal May 14, 2020
76f481c
Fix doc stride
honnibal May 14, 2020
c6357f4
Merge remote-tracking branch 'upstream/feature/spacy-v3' into feature…
svlandeg May 14, 2020
3301ead
Document joint-core-bert config file
honnibal May 14, 2020
c10ef4e
set doc extension when creating pipeline component, force to True
svlandeg May 14, 2020
9765e41
testing a TransformerFromFile IO solution (WIP)
svlandeg May 14, 2020
d06eb21
shift IO responsibility to pipeline component
svlandeg May 14, 2020
0ed9d22
add newline
svlandeg May 14, 2020
e65d73f
tidying
svlandeg May 14, 2020
c08d6a8
Add comment on get_alignment
honnibal May 16, 2020
e0eefe5
Update configs for simple vs tb NER
honnibal May 17, 2020
7f7421f
Set accumulate_gradient = 2
honnibal May 17, 2020
f8afee3
Set accumulate_gradient = 2
honnibal May 17, 2020
1587637
Fallback to rule-based sentencizer if no SBD in get_sent_spans spanner
honnibal May 17, 2020
3534797
Update joint-core-bert config
honnibal May 17, 2020
ab044fa
Add config for joint-dep-pos-bert
honnibal May 17, 2020
590eb8d
Tmp
honnibal May 17, 2020
85b2a34
Remove distilbert config
honnibal May 17, 2020
fb7fc1b
Add nan check
honnibal May 18, 2020
cc5dce4
Update joint-core-bert config. Getting pretty great onto5 results
honnibal May 18, 2020
f0e5139
Update joint-dep-pos-bert config
honnibal May 18, 2020
23b7983
More explicitly clear gradient in case called multiple times
honnibal May 18, 2020
daf28b5
Update joint-core-bert.cfg
honnibal May 18, 2020
e04cc22
merge upstream spacy-v3 branch
svlandeg May 18, 2020
9e15dc4
Merge pull request #178 from svlandeg/feature/v3-io
honnibal May 18, 2020
5e8f275
Call cuda on transformer if necessary
honnibal May 18, 2020
b489fe1
Add todo
honnibal May 18, 2020
eec24c2
Merge branch 'feature/spacy-v3' of https://github.com/explosion/spacy…
honnibal May 18, 2020
26ad481
Document and test apply_alignment function
honnibal May 19, 2020
a9f76c6
Clean up unused stuff
honnibal May 19, 2020
74090c6
Add length assertion
honnibal May 19, 2020
fc5c22d
Fix alignment if tokens excluded from spans
honnibal May 19, 2020
446c3d6
Improve strided span
honnibal May 19, 2020
7e8cf0f
Add test for strided spans
honnibal May 19, 2020
48a2cc3
Pin spacy and thinc versions
honnibal May 19, 2020
209662d
Fix merge conflicts
honnibal May 19, 2020
f54eee1
Handle case where no tokens align
honnibal May 20, 2020
7d78a9e
Update backprop
honnibal May 20, 2020
562efd6
Merge branch 'feature/spacy-v3' of https://github.com/explosion/spacy…
honnibal May 20, 2020
9e2fde0
Update joint-core-bert example
honnibal May 20, 2020
b153992
Update readme
honnibal May 20, 2020
7d41650
Merge branch 'feature/spacy-v3' of https://github.com/explosion/spacy…
honnibal May 20, 2020
c20cc7d
Start refactor
honnibal May 21, 2020
f0c8bc4
Refactor
honnibal May 21, 2020
7c6445d
Update pipeline component
honnibal May 21, 2020
75c0455
Refactor
honnibal May 21, 2020
20be836
Try to untangle serialization
honnibal May 21, 2020
374d67f
Refactor
honnibal May 21, 2020
08003d5
Refactor layers
honnibal May 21, 2020
fe4f62a
Refactor
honnibal May 21, 2020
270f343
Update init
honnibal May 21, 2020
4b1b091
Refactor
honnibal May 21, 2020
1c19f1e
Refactor
honnibal May 21, 2020
ad3f0f7
Refactor
honnibal May 21, 2020
15fb75b
Refactor
honnibal May 21, 2020
225eb7d
Add arch
honnibal May 21, 2020
fdaff21
Refactor
honnibal May 21, 2020
c3154ee
Update
honnibal May 21, 2020
ae6cd8c
Fix flake8
honnibal May 21, 2020
61990bf
Fix flake8
honnibal May 21, 2020
c06478a
Format
honnibal May 21, 2020
76d6a0f
Refactor
honnibal May 21, 2020
fc3f21e
Clean up assert
honnibal May 21, 2020
7da9a12
Fix init
honnibal May 21, 2020
32dc792
Fix imports
honnibal May 21, 2020
4ea210a
Fix bugs
honnibal May 21, 2020
c5a0dce
Format
honnibal May 21, 2020
37b9fb8
Refactor
honnibal May 21, 2020
2e6a709
Update test
honnibal May 21, 2020
86e845b
Update
honnibal May 21, 2020
b9b52a2
Get tests passing
honnibal May 21, 2020
93578be
Register TransformerModel architecture
honnibal May 21, 2020
d03eaa1
Update config
honnibal May 21, 2020
23c11d6
Remove width arg
honnibal May 21, 2020
a570aab
Remove width
honnibal May 21, 2020
31bb778
Remove width arg
honnibal May 21, 2020
6aefe52
Remove old example tasks
honnibal May 21, 2020
dd6447f
Update component
honnibal May 21, 2020
a53f18a
Fix type annotation
honnibal May 21, 2020
87281e6
Remove old examples
honnibal May 21, 2020
697c565
Move configs
honnibal May 21, 2020
e48a08a
Trim redundant configs
honnibal May 21, 2020
48d99b2
Update readme
honnibal May 21, 2020
cf919b7
Rename config
honnibal May 21, 2020
94bdc05
Update albert config
honnibal May 21, 2020
cc8fbbb
Update spacy_transformers/tests/test_alignment.py
honnibal May 21, 2020
510b466
Merge branch 'feature/spacy-v3' of https://github.com/explosion/spacy…
honnibal May 21, 2020
3126e37
Fix span_getters
honnibal May 21, 2020
5cd4632
Fix catalogue
honnibal May 21, 2020
01b7a0e
Improve shape inference
honnibal May 21, 2020
6922dfa
Fix max batch size
honnibal May 21, 2020
83fc2c9
Update train-from-config
honnibal May 21, 2020
3d36e4a
Update bert config
honnibal May 21, 2020
f67b4e1
Update config
honnibal May 21, 2020
dc5d9dd
Update spacy_transformers/pipeline_component.py
honnibal May 21, 2020
3d3d790
Try to register spaCy entry-points
honnibal May 22, 2020
c507f82
Simplify get_token_positions
honnibal May 22, 2020
8919620
Preserve doc structure in span getters
honnibal May 22, 2020
1bf37e8
Fix entry points
honnibal May 22, 2020
99ae851
Fix import
honnibal May 22, 2020
4bfe673
Increment version
honnibal May 22, 2020
69c7289
Specify use_pytorch_for_gpu_memory
honnibal May 22, 2020
ae4269d
Align via offset mapping if possible
honnibal May 23, 2020
aed918d
Align via offset mapping if possible
honnibal May 23, 2020
f8ad5ad
Update config
honnibal May 23, 2020
0a723e9
Update config
honnibal May 24, 2020
f2e3e2d
Merge master into v3 branch (#185)
svlandeg May 26, 2020
64796f1
Merge branch 'master' into feature/spacy-v3
svlandeg May 26, 2020
f57f246
Delete language.py
svlandeg May 26, 2020
1af2a00
fix align indices in split_by_doc
svlandeg May 27, 2020
b03e542
remove duplicate pytokenizations requirement
svlandeg May 27, 2020
e9fecdb
width argument in Listener was removed
svlandeg May 27, 2020
72128f0
small edits in the readme documentation
svlandeg May 27, 2020
64ab863
typing fixes
svlandeg May 27, 2020
eefcd97
remove unused imports
svlandeg May 27, 2020
e80dd87
update Azure pipelines
svlandeg May 27, 2020
0f28289
Update Azure pipelines
svlandeg May 27, 2020
cf73466
update Azure pipelines
svlandeg May 27, 2020
6a2ac81
Fix requirement
honnibal May 27, 2020
c83c831
Fix pipeline
honnibal May 27, 2020
2f7d3ca
Fix CI
honnibal May 27, 2020
6e0a2f8
Fix CI
honnibal May 27, 2020
9a38d52
Fix test
honnibal May 27, 2020
7e6cd50
Format
honnibal May 27, 2020
7e91931
Merge pull request #188 from svlandeg/fixes/varia
honnibal May 27, 2020
2c61bfe
Merge pull request #186 from svlandeg/bugfix/split-by-doc
honnibal May 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
697 changes: 131 additions & 566 deletions README.md

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ jobs:
- script: |
python -m pip install --upgrade pip wheel
pip install -r requirements.txt
python -m spacy download en
displayName: 'Install dependencies'

- script: python setup.py sdist
Expand All @@ -43,5 +42,5 @@ jobs:
- script: pip install dist/*.tar.gz
displayName: 'Install from sdist'

- script: python -m pytest tests --cov=spacy_transformers
- script: python -m pytest spacy_transformers --cov=spacy_transformers
displayName: 'Run tests'
Loading