Skip to content

Commit

Permalink
Update spacy requirements across projects for weasel
Browse files Browse the repository at this point in the history
  • Loading branch information
adrianeboyd committed Nov 7, 2023
1 parent 50ea0de commit 5e7d16c
Show file tree
Hide file tree
Showing 87 changed files with 372 additions and 379 deletions.
14 changes: 7 additions & 7 deletions benchmarks/healthsea_spancat/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Healthsea-Spancat
# 🪐 Weasel Project: Healthsea-Spancat

This spaCy project uses the Healthsea dataset to compare the performance between the Spancat and NER architecture.

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -29,7 +29,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -42,11 +42,11 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
| --- | --- | --- |
| `assets/annotation.jsonl` | URL | NER annotations exported from Prodigy with 5000 examples and 2 labels |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
12 changes: 6 additions & 6 deletions benchmarks/nel/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: NEL Benchmark
# 🪐 Weasel Project: NEL Benchmark

Pipeline for benchmarking NEL approaches (incl. candidate generation and entity disambiguation).

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -36,7 +36,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -45,7 +45,7 @@ inputs have changed.
| `all` | `download_mewsli9` &rarr; `download_model` &rarr; `wikid_clone` &rarr; `preprocess` &rarr; `wikid_download_assets` &rarr; `wikid_parse` &rarr; `wikid_create_kb` &rarr; `parse_corpus` &rarr; `compile_corpora` &rarr; `train` &rarr; `evaluate` &rarr; `compare_evaluations` |
| `training` | `train` &rarr; `evaluate` |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->

Notes:
> **Warning**: Parts of this project are currently not platform-agnostic and run only on Linux. Making the entire
Expand Down
1 change: 0 additions & 1 deletion benchmarks/nel/project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
title: 'NEL Benchmark'
description: "Pipeline for benchmarking NEL approaches (incl. candidate generation and entity disambiguation)."
spacy_version: ">=3.0.0,<3.6.0"
vars:
run: "cg-default"
language: "en"
Expand Down
1 change: 1 addition & 0 deletions benchmarks/nel/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ rapidfuzz>=2.0.0
spacyfishing
virtualenv
pysqlite3-binary
spacy>=3.0.0,<3.6.0
14 changes: 7 additions & 7 deletions benchmarks/ner_conll03/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Named Entity Recognition (CoNLL-2003)
# 🪐 Weasel Project: Named Entity Recognition (CoNLL-2003)

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -25,7 +25,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -36,7 +36,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -47,4 +47,4 @@ in the project directory.
| `assets/conll2003/train.iob` | Local | Training data (not available publicly so you have to add the file yourself) |
| `assets/orth_variants.json` | URL | A file containing orth variants for data augmentation |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
14 changes: 7 additions & 7 deletions benchmarks/ner_embeddings/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Comparing embedding layers in spaCy
# 🪐 Weasel Project: Comparing embedding layers in spaCy

This project contains the code to reproduce the results of the
[Multi hash embeddings in spaCy](https://arxiv.org/abs/2212.09255) technical report by Explosion.
Expand Down Expand Up @@ -29,12 +29,12 @@ the hash embedding layers. We apologize for the inconvenience.

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -54,7 +54,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -66,7 +66,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -76,4 +76,4 @@ in the project directory.
| `assets/fasttext.nl.gz` | URL | Dutch fastText vectors. |
| `span-labeling-datasets` | Git | |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
14 changes: 7 additions & 7 deletions benchmarks/parsing_penn_treebank/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Dependency Parsing (Penn Treebank)
# 🪐 Weasel Project: Dependency Parsing (Penn Treebank)

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -25,7 +25,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -36,7 +36,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -47,4 +47,4 @@ in the project directory.
| `assets/vectors.zip` | URL | GloVe vectors |
| `assets/orth_variants.json` | URL | A file containing orth variants for data augmentation |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
14 changes: 7 additions & 7 deletions benchmarks/pretraining_morphologizer_oscar/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Enhancing Morphological Analysis with spaCy Pretraining
# 🪐 Weasel Project: Enhancing Morphological Analysis with spaCy Pretraining

This project explores the effectiveness of pretraining techniques on morphological analysis (morphologizer) by conducting experiments on multiple languages. The objective of this project is to demonstrate the benefits of pretraining word vectors using domain-specific data on the performance of the morphological analysis. We leverage the OSCAR dataset to pretrain our vectors for tok2vec and utilize the UD_Treebanks dataset to train a morphologizer component. We evaluate and compare the performance of different pretraining techniques and the performance of models without any pretraining.

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand Down Expand Up @@ -43,7 +43,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -59,11 +59,11 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
| --- | --- | --- |
| `assets/ud-treebanks-v2.5.tgz` | URL | |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
2 changes: 0 additions & 2 deletions benchmarks/pretraining_morphologizer_oscar/project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ vars:
# Choose -1 for CPU
gpu: -1

spacy_version: ">=3.5.2,<4.0.0"

# These are the directories that the project needs. The project CLI will make
# sure that they always exist.
directories: ["assets", "scripts", "data", "training", "pretraining", "metrics"]
Expand Down
3 changes: 2 additions & 1 deletion benchmarks/pretraining_morphologizer_oscar/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
spacy
datasets
spacy-transformers
matplotlib
matplotlib
spacy>=3.5.2,<4.0.0
14 changes: 7 additions & 7 deletions benchmarks/span-labeling-datasets/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Span labeling datasets
# 🪐 Weasel Project: Span labeling datasets

This project compiles various NER and more general spancat datasets
and their converters into the [spaCy format](https://spacy.io/api/data-formats).
Expand All @@ -12,12 +12,12 @@ or to potentially pre-train them for your application.

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand Down Expand Up @@ -47,7 +47,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -63,7 +63,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -78,4 +78,4 @@ in the project directory.
| `assets/restaurant-train_raw.iob` | URL | Training data from the MIT Restaurants Review dataset |
| `assets/restaurant-test_raw.iob` | URL | Test data from the MIT Restaurants Review dataset |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
2 changes: 0 additions & 2 deletions benchmarks/span-labeling-datasets/project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ description: |
You can use this to try out experiment with `ner` and `spancat`
or to potentially pre-train them for your application.
spacy_version: ">=3.2.5,<4.0.0"

vars:
spans_key: "sc"
gpu_id: 0
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/span-labeling-datasets/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
spacy
spacy>=3.2.5,<4.0.0
typer
wasabi
pandas
Loading

0 comments on commit 5e7d16c

Please sign in to comment.