Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-added Universe readme #3688

Merged
merged 1 commit into from May 6, 2019
Merged

Re-added Universe readme #3688

merged 1 commit into from May 6, 2019

Conversation

BramVanroy
Copy link
Contributor

@BramVanroy BramVanroy commented May 6, 2019

Re-added Universe readme. This fixes a broken link on the website, specifically at the bottom of the Universe page.. This closes issue #3680.

Types of change

Documentation.

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@ines
Copy link
Member

ines commented May 6, 2019

Thanks so much! 🙏

@ines ines added the docs Documentation and website label May 6, 2019
@ines ines merged commit 8e6f8de into explosion:master May 6, 2019
ines pushed a commit that referenced this pull request May 6, 2019
kiku-jw added a commit to kiku-jw/spaCy that referenced this pull request Jun 18, 2019
* Add failing test for explosion#3356

* Fix test that caused pytest to choke on Python3

* adding kb_id as field to token, el as nlp pipeline component

* annotate kb_id through ents in doc

* kb snippet, draft by Matt (wip)

* documented some comments and todos

* hash the entity name

* add pyx and separate method to add aliases

* fix compile errors

* adding aliases per entity in the KB

* very minimal KB functionality working

* adding and retrieving aliases

* get candidates by alias

* bugfix adding aliases

* use StringStore

* raising error when adding alias for unknown entity + unit test

* avoid value 0 in preshmap and helpful user warnings

* check and unit test in case prior probs exceed 1

* correct size, not counting dummy elements in the vector

* check the length of entities and probabilities vector + unit test

* create candidate object from entry pointer (not fully functional yet)

* store entity hash instead of pointer

* unit test on number of candidates generated

* property getters and keep track of KB internally

* Entity class

* ensure no candidates are returned for unknown aliases

* minimal EL pipe

* name per entity

* select candidate with highest prior probabiity

* use nlp's vocab for stringstore

* error msg and unit tests for setting kb_id on span

* delete sandbox folder

* Update v2-1.md

* Fix xfail marker

* Update wasabi pin

* Fix tokenizer on Python2.7 (explosion#3460)

spaCy v2.1 switched to the built-in re module, where v2.0 had been using
the third-party regex library. When the tokenizer was deserialized on
Python2.7, the `re.compile()` function was called with expressions that
featured escaped unicode codepoints that were not in Python2.7's unicode
database.

Problems occurred when we had a range between two of these unknown
codepoints, like this:

```
    '[\\uAA77-\\uAA79]'
```

On Python2.7, the unknown codepoints are not unescaped correctly,
resulting in arbitrary out-of-range characters being matched by the
expression.

This problem does not occur if we instead have a range between two
unicode literals, rather than the escape sequences. To fix the bug, we
therefore add a new compat function that unescapes unicode sequences
using the `ast.literal_eval()` function. Care is taken to ensure we
do not also escape non-unicode sequences.

Closes explosion#3356.

- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Set version to 2.1.2

* 'entity_linker' instead of 'el'

* specify unicode strings for python 2.7

* Merge branch 'spacy.io' [ci skip]

* Add missing space in landing page (explosion#3462) [ci skip]

* Fix train loop for train_textcat example

* property annotations for fields with only a getter

* adding future import unicode literals to .py files

* error and warning messages

* Update Binder [ci skip]

* Fix typo [ci skip]

* Update landing example [ci skip]

* Improve landing example [ci skip]

* Add xfailing test for explosion#3468

* Slightly modify test for explosion#3468

Check for Token.is_sent_start first (which is serialized/deserialized correctly)

* Fix test for explosion#3468

* Add xfail test for explosion#3433. Improve test for add label.

* 💫 Fix class mismap on parser deserializing (closes explosion#3433) (explosion#3470)

v2.1 introduced a regression when deserializing the parser after
parser.add_label() had been called. The code around the class mapping is
pretty confusing currently, as it was written to accommodate backwards
model compatibility. It needs to be revised when the models are next
retrained.

Closes explosion#3433

* 💫 Add better and serializable sentencizer (explosion#3471)

* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs

* Add cheat sheet to spaCy 101

* Add blog post to v2.1 page

* Bug fixes and options for TextCategorizer (explosion#3472)

* Fix code for bag-of-words feature extraction

The _ml.py module had a redundant copy of a function to extract unigram
bag-of-words features, except one had a bug that set values to 0.
Another function allowed extraction of bigram features. Replace all three
with a new function that supports arbitrary ngram sizes and also allows
control of which attribute is used (e.g. ORTH, LOWER, etc).

* Support 'bow' architecture for TextCategorizer

This allows efficient ngram bag-of-words models, which are better when
the classifier needs to run quickly, especially when the texts are long.
Pass architecture="bow" to use it. The extra arguments ngram_size and
attr are also available, e.g. ngram_size=2 means unigram and bigram
features will be extracted.

* Fix size limits in train_textcat example

* Explain architectures better in docs

* Fix formatting [ci skip]

* Merge branch 'spacy.io' [ci skip]

* Small tweak to ensemble textcat model

* Set version to v2.1.3

* Update binderVersion

* Update favicon (closes explosion#3475) [ci skip]

* Update Thai tag map (explosion#3480)

* Update Thai tag map

Update Thai tag map

* Create wannaphongcom.md

* Add Estonian to docs [ci skip] (closes explosion#3482)

* entity as one field instead of both ID and name

* Fix GPU training for textcat. Closes explosion#3473

* Fix social image

* DOC: Update tokenizer docs to include default value for batch_size in pipe (explosion#3492)

* Fix/irreg adverbs extension (explosion#3499)

* extended list of irreg adverbs

* added test to exceptions

* fixed typo

* fix(util): fix decaying function output (explosion#3495)

* fix(util): fix decaying function output

* fix(util): better test and adhere to code standards

* fix(util): correct variable name, pytestify test, update website text

* adds textpipe to universe (explosion#3500) [ci skip]

* Adds textpipe to universe

* signed contributor agreement

* Adjust formatting, code style and use "standalone" category

* Fix met a description in universe projects [ci skip]

* Tags are joined with a comma and padded with asterisks (explosion#3491)

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

Fix a bug in the test of JapaneseTokenizer.
This PR may require @polm's review.

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

Bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Add spaCy IRL to landing [ci skip]

* Update landing.js

* added tag_map for indonesian

* changed tag map from .py to .txt to see if tests pass

* added symbols import

* added utf8 encoding flag

* added missing SCONJ symbol

* Auto-format

* Remove unused imports

* Make tag map available in Indonesian defaults

* Auto-format

* added tag_map for indonesian (explosion#3515)

* added tag_map for indonesian

* changed tag map from .py to .txt to see if tests pass

* added symbols import

* added utf8 encoding flag

* added missing SCONJ symbol

* Auto-format

* Remove unused imports

* Make tag map available in Indonesian defaults

* Update compatibility [ci skip]

* failing test for Issue explosion#3449

* failing test for Issue explosion#3521

* fixing Issue explosion#3521 by adding all hyphen variants for each stopword

* unicode string for python 2.7

* specify encoding in files

* Update links and http -> https (explosion#3532)

* update links and http -> https

* SCA

* Update Thai tokenizer_exception list (explosion#3529)

* add tokenizer_exceptions word (ก-น) from https://goo.gl/JpJ2qq

* update tokenizer_exceptions word list

* add contributor file

* Remove non-existent example (closes explosion#3533)

* Don't make "settings" or "title" required in displaCy data (closes explosion#3531)

* addressed all comments by Ines

* Improved Dutch language resources and Dutch lemmatization (explosion#3409)

* Improved Dutch language resources and Dutch lemmatization

* Fix conftest

* Update punctuation.py

* Auto-format

* Format and fix tests

* Remove unused test file

* Re-add deleted test

* removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains

* Cleaner lemmatization files

* updated tag map with missing tags

* fixed tag_map.py merge conflict

* fix typos in tag_map flagged by `python -m debug-data` (explosion#3542)

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.


Co-authored-by: Ines Montani <ines@ines.io>

* Update Thai stop words (explosion#3545)

* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability

* Added Ludwig among the projects (explosion#3548) [ci skip]

* Added Ludwig among the projects

* Create w4nderlust.md

* Add Uber to logo wall

* Removes duplicate in table (explosion#3550)

* Removes duplicate in table

Just fixing typos.

* Remove newline


Co-authored-by: Ines Montani <ines@ines.io>

* Auto-format

* Make sure path is string (resolves explosion#3546)

* Add xfailing test for explosion#3555

* Fix typo in web docs cli.md (explosion#3559)

* Tidy up and auto-format

* Ensure match pattern error isn't raised on empty errors (closes explosion#3549)

* Fix website docs for Vectors.from_glove (explosion#3565)

* Fix website docs for Vectors.from_glove

* Add myself as a contributor

* Added project gracyql to Universe (explosion#3570) (resolves explosion#3568)

As discussed with Ines in explosion#3568 , adding a new project proposal for the community in SpaCy Universe website

GracyQL a tiny graphql wrapper aroung spacy using graphene and starlette.

## Description
Change only in universe.json file to add a new project

### Types of change
New project reference in Universe

## Checklist
- [x ] I have submitted the spaCy Contributor Agreement.
- [x ] I ran the tests, and all new and existing tests passed.
- [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Add myself to contributors (explosion#3575)

* Signed agreement (explosion#3577)

* Added Turkish Lira symbol(₺) (explosion#3576)

Added Turkish Lira symbol(₺) 
https://en.wikipedia.org/wiki/Turkish_lira

* Change default output format from `jsonl` to `json` for cli convert (explosion#3583) (closes explosion#3523)

* Changing default ouput format from jsonl to json for cli convert

* Adding Contributor Agreement

* Remove Datacamp

* Fix formatting

* Improved training and evaluation (explosion#3538)

* Add early stopping

* Add return_score option to evaluate

* Fix missing str to path conversion

* Fix import + old python compatibility

* Fix bad beam_width setting during cpu evaluation in spacy train with gpu option turned on

* Fix symlink creation to show error message on failure (explosion#3589) (resolves explosion#3307))

* Fix symlink creation to show error message on failure. Update tests to reflect those changes.

* Fix test to succeed on non windows systems.

* Fix issue explosion#3551: Upper case lemmas

If the Morphology class tries to lemmatize a word that's not in the
string store, it's forced to just return it as-is. While loading
exceptions, the class could hit a case where these strings weren't in
the string store yet. The resulting lemmas could then be cached, leading
to some words receiving upper-case lemmas. Closes explosion#3551.

* Set version to v2.1.4.dev0

* Create fizban99.md (explosion#3601)

* entity types for colors should be in uppercase (explosion#3599)

although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.

* Create Dobita21.md (explosion#3614)

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Update landing and feature overview

* Remove unused image

* Add course to 101

* Update link [ci skip]

* Add Thai norm_exceptions (explosion#3612)

* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability

* add Thai norm_exception

* Add Dobita21 SCA

* editรึ : หรือ,

* Update Dobita21.md

* Auto-format

* Integrate norms into language defaults

* Add save after `--save-every` batches for `spacy pretrain` (explosion#3510)

<!--- Provide a general summary of your changes in the title. -->

When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches.

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

To test...

Save this file to `sample_sents.jsonl`

```
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
```

Then run `--save-every 2` when pretraining.

```bash
spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2
```

And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name.

At the end the training, you should see these files (`ls here/`):

```bash
config.json     model2.bin      model5.bin      model8.bin
log.jsonl       model2.temp.bin model5.temp.bin model8.temp.bin
model0.bin      model3.bin      model6.bin      model9.bin
model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin
model1.bin      model4.bin      model7.bin
model1.temp.bin model4.temp.bin model7.temp.bin
```

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

This is a new feature to `spacy pretrain`.

🌵 **Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error).** 

```
Processing matcher.pyx
[Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx'
Traceback (most recent call last):
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module>
    run(args.root)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run
    process(base, filename, db)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process
    preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd
    func(*args)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx
    raise Exception("Cython failed")
Exception: Cython failed
Traceback (most recent call last):
  File "setup.py", line 276, in <module>
    setup_package()
  File "setup.py", line 209, in setup_package
    generate_cython(root, "spacy")
  File "setup.py", line 132, in generate_cython
    raise RuntimeError("Running cythonize failed")
RuntimeError: Running cythonize failed
```

Edit: Fixed! after deleting all `.cpp` files: `find spacy -name "*.cpp" | xargs rm`

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

* Allow jupyter=False to override Jupyter mode (closes explosion#3598)

* Make flag shortcut consistent and document

* Update spacy evaluate example

* Auto-format

* Rename early_stopping_iter to n_early_stopping

* Document early stopping

* update norm_exceptions (explosion#3627)

* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability

* add Thai norm_exception

* Add Dobita21 SCA

* editรึ : หรือ,

* Update Dobita21.md

* Auto-format

* Integrate norms into language defaults

* add acronym and some norm exception words

* Update seo.js

* Update Universe Website for pyInflect (explosion#3641)

* Improve redundant variable name (explosion#3643)

* Improve redundant variable name

* Apply suggestions from code review

Co-Authored-By: pickfire <pickfire@riseup.net>

* Doc changes for local website setup (explosion#3651)

* Create yaph.md so I can contribute (explosion#3658)

* Fix broken link to Dive Into Python 3 website (explosion#3656)

* Fix broken link to Dive Into Python 3 website

* Sign spaCy Contributor Agreement

* Remove dangling M (explosion#3657)

I assume this is a typo. Sorry if it has a meaning that I'm not aware of.

* Update French example sents and add two German stop words (explosion#3662)

* Update french example sentences

* Add 'anderem' and 'ihren' to German stop words

* update response after calling add_pipe (explosion#3661)

* update response after calling add_pipe

component:print_info is appened in the last, so need show it at the end of  pipeline

* Create henry860916.md

* Add Thai lex_attrs (explosion#3655)

* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability

* add Thai norm_exception

* Add Dobita21 SCA

* editรึ : หรือ,

* Update Dobita21.md

* Auto-format

* Integrate norms into language defaults

* add acronym and some norm exception words

* add lex_attrs

* Add lexical attribute getters into the language defaults

* fix LEX_ATTRS


Co-authored-by: Donut <dobita21@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>

* Update universe.json (explosion#3653) [ci skip]

* Update universe.json

* Update universe.json

* Relax jsonschema pin (closes explosion#3628)

* Adjust wording and formatting [ci skip]

* Fix inconsistant lemmatizer issue explosion#3484 (explosion#3646)

* Fix inconsistant lemmatizer issue explosion#3484

* Remove test case

* Rewrite example to use Retokenizer (resolves explosion#3681)

Also add helper to filter spans

* Fix typo (see explosion#3681)

* Simplify helper (see explosion#3681) [ci skip]

* Auto-format [ci skip]

* Re-added Universe readme (explosion#3688) (closes explosion#3680)

* Fix offset bug in loading pre-trained word2vec. (explosion#3689)

* Fix offset bug in loading pre-trained word2vec.

* add contributor agreement

* Add util.filter_spans helper (explosion#3686)

* Request to include Holmes in spaCy Universe (explosion#3685)

* Request to add Holmes to spaCy Universe

Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.

* Added

* Add version tag to `--base-model` argument (closes explosion#3720)

* Submit contributor agreement (explosion#3705)

* fix thai bug (explosion#3693)

fix tokenize for pythainlp

* Update glossary.py to match information found in documentation (explosion#3704) (closes #explosion#3679)

* Update glossary.py to match information found in documentation

I used regexes to add any dependency tag that was in the documentation but not in the glossary. Solves explosion#3679 👍

* Adds forgotten colon

* fixing regex matcher examples (explosion#3708) (explosion#3719)

* Improve Token.prob and Lexeme.prob docs (resolves explosion#3701)

* Fix DependencyParser.predict docs (resolves explosion#3561)

* Make "text" key in JSONL format optional when "tokens" key is provided (explosion#3721)

* Fix issue with forcing text key when it is not required

* Extending the docs to reflect the new behavior

* Call rmtree and copytree with strings (closes explosion#3713)

* Auto-format

* Add TWiML podcast to universe [ci skip]

* Fix return value of Language.update (closes explosion#3692)

* Set version to v2.1.4.dev1

* Fix push-tag script

* Fix .iob converter (closes explosion#3620)

* Replace cytoolz.partition_all with util.minibatch

* Set version to v2.1.4

* Merge branch 'spacy.io' [ci skip]

* 💫 Improve introspection of custom extension attributes (explosion#3729)

* Add custom __dir__ to Underscore (see explosion#3707)

* Make sure custom extension methods keep their docstrings (see explosion#3707)

* Improve tests

* Prepend note on partial to docstring (see explosion#3707)

* Remove print statement

* Handle cases where docstring is None

* Add check for callable to 'Language.replace_pipe' to fix explosion#3737 (explosion#3741)

* Fix lex_id docs (closes explosion#3743)

* Enhancing Kannada language Resources  (explosion#3755)

* Updated stop_words.py

Added more stopwords

* Create ujwal-narayan.md

Enhancing Kannada language resources

* Update Scorer and add API docs

* Update Language.update docs

* Document Language.evaluate

* Marathi Language Support (explosion#3767)

* Adding Marathi language details and folder to it

* Adding few changes and running tests

* Adding few changes and running tests

* Update __init__.py

mh -> mr

* Rename spacy/lang/mh/__init__.py to spacy/lang/mr/__init__.py

* mh -> mr

* Update norm_exceptions.py (explosion#3778)

* Update norm_exceptions.py

Extended the Currency set to include Franc, Indian Rupee, Bangladeshi Taka, Korean Won, Mexican Dollar, and Egyptian Pound

* Fix formatting [ci skip]

* Use string name in setup.py

Hopefully this will trick GitHub's parser into recognising it as a Python package and show us the dependents / "used by" statistics 🤞

* Corrected example model URL in requirements.txt (explosion#3786)

The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).

* Make jsonschema dependency optional (explosion#3784)

* fix all references to BILUO annotation format (explosion#3797)

* Incorrect Token attribute ent_iob_ description (explosion#3800)

* Incorrect Token attribute ent_iob_ description

* Add spaCy contributor agreement

* Fix typos in docs (closes explosion#3802) [ci skip]

* Improve E024 text for incorrect GoldParse (closes explosion#3558)

* Update UNIVERSE.md

* Create NirantK.md (explosion#3807) [ci skip]

* Add Baderlab/saber to universe.json (explosion#3806)

* Overwrites default getter for like_num in Spanish by adding _num_words and like_num to lex_attrs.py (explosion#3810) (closes explosion#3803))

* (explosion#3803) Spanish like_num returning false for number-like token

* (explosion#3803) Spanish like_num now returning True for number-like token

* Add multiple packages to universe.json (explosion#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

* Tidy up universe [ci skip]

* Update universe [ci skip]

* Update universe [ci skip]

* Update universe [ci skip]

* Fix for explosion#3811 (explosion#3815)

Corrected type of seed parameter.

* Create intrafindBreno.md (explosion#3814)

* minor fix to broken link in documentation (explosion#3819) [ci skip]

* Update universe [ci skip]

* Update srsly pin

* Add merge_subtokens as parser post-process. Re explosion#3830

* Create Azagh3l.md (explosion#3836)

* Update lex_attrs.py (explosion#3835)

Corrected typos, added french (from France) versions of some numbers.

* Add resume logic to spacy pretrain (explosion#3652)

* Added ability to resume training

* Add to readmee

* Remove duplicate entry

* Tidy up [ci skip]

* Add regression test for explosion#3839

* Update exemples.py (explosion#3838)

Added missing hyphen and accent.

* Update error raising for CLI pretrain to fix explosion#3840 (explosion#3843)

* Add check for empty input file to CLI pretrain

* Raise error if JSONL is not a dict or contains neither `tokens` nor `text` key

* Skip empty values for correct pretrain keys and log a counter as warning

* Add tests for CLI pretrain core function make_docs.

* Add a short hint for the `tokens` key to the CLI pretrain docs

* Add success message to CLI pretrain

* Update model loading to fix the tests

* Skip empty values and do not create docs out of it

* Change vector training to work with latest gensim (fix explosion#3749) (explosion#3757)

* Dependency tree pattern matcher (explosion#3465)

* Functional dependency tree pattern matcher

* Tests fail due to inconsistent behaviour

* Renamed dependencymatcher and added optimizations

* Add optional `id` property to EntityRuler patterns (explosion#3591)

* Adding support for entity_id in EntityRuler pipeline component

* Adding Spacy Contributor aggreement

* Updating EntityRuler to use string.format instead of f strings

* Update Entity Ruler to support an 'id' attribute per pattern that explicitly identifies an entity.

* Fixing tests

* Remove custom extension entity_id and use built in ent_id token attribute.

* Changing entity_id to ent_id for consistent naming

* entity_ids => ent_ids

* Removing kb, cleaning up tests, making util functions private, use rsplit instead of split

* Update tokenizer.md for construction example (explosion#3790)

* Update tokenizer.md for construction example

Self contained example.  You should really say what nlp is so that the example will work as is

* Update CONTRIBUTOR_AGREEMENT.md

* Restore contributor agreement

* Adjust construction examples

* Auto-format [ci skip]
ines added a commit that referenced this pull request Aug 5, 2019
* Update from master

* Re-added Universe readme (#3688) (closes #3680)

* Fix typo

* Add version tag to `--base-model` argument (closes #3720)

* fixing regex matcher examples (#3708) (#3719)

* Improve Token.prob and Lexeme.prob docs (resolves #3701)

* Fix DependencyParser.predict docs (resolves #3561)

* Update languages.json


Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Aaron Kub <aaronkub@gmail.com>
polm pushed a commit to polm/spaCy that referenced this pull request Aug 18, 2019
* Update from master

* Re-added Universe readme (explosion#3688) (closes explosion#3680)

* Fix typo

* Add version tag to `--base-model` argument (closes explosion#3720)

* fixing regex matcher examples (explosion#3708) (explosion#3719)

* Improve Token.prob and Lexeme.prob docs (resolves explosion#3701)

* Fix DependencyParser.predict docs (resolves explosion#3561)

* Update languages.json


Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Aaron Kub <aaronkub@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation and website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants