-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flair with the icelandic_ner dataset #2114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Unfortunately, this code does not compile for me. Some of the variables are undefined. Can you check and update?
flair/datasets/sequence_labeling.py
Outdated
data_folder = base_path / dataset_name | ||
|
||
# download data if necessary | ||
ZipFile.extractall(path=icelandic_ner, members="https://repository.clarin.is/repository/xmlui/handle/20.500.12537/42/allzip", pwd=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable icelandic_ner
is not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this morning please can you check it again?
flair/datasets/sequence_labeling.py
Outdated
outfile.write(contents) | ||
|
||
# download files if not present locally | ||
cached_path(f"{icelandic_ner_path}ned.testa", data_folder / 'raw') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable icelandic_ner_path
is not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it this morning please can you check it again?
flair/datasets/sequence_labeling.py
Outdated
|
||
# we need to slightly modify the original files by adding some new lines after document separators | ||
train_data_file = data_folder / 'train.txt' | ||
if not train_data_file.is_file(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part necessary? Are extra offsets needed? Maybe you can use the files as they are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it this morning please can you check it again?
flair/datasets/sequence_labeling.py
Outdated
with open("icelandic_ner_path/train.txt", "w") as outfile: | ||
# download zip | ||
icelandic_ner ="https://repository.clarin.is/repository/xmlui/handle/20.500.12537/42/allzip" | ||
icelandic_ner_path = cached_path(icelandic_ner, Path("datasets") / dataset_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are indentation problems here, causing the program to break,
Hello @alanakbik, default dataset folder is the cache root
and the error /home/aimsgh/home/aimsgh/SCIoI/flair/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.) Process finished with exit code 1 |
You are not specifying the correct path to the file. You are only giving the open method the filename. You need to specify the full path to the file. Also for the outfile. |
alright, I just add a commit.
…On Fri, 5 Mar 2021 at 12:55, Alan Akbik ***@***.***> wrote:
You are not specifying the correct path to the file. You are only giving
the open method the filename. You need to specify the full path to the
file. Also for the outfile.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2114 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANOMGJ6HS3U3X2R3HRALFVLTCDBCDANCNFSM4YG27IVA>
.
--
DISCLAIMER: The contents of this email and any attachments are
confidential. They are intended for the named recipient(s) only. If you
have received this email by mistake, please notify the sender immediately
and you are herewith notified that the contents are legally privileged and
that you do not have permission to disclose the contents to anyone, make
copies thereof, retain or distribute or act upon it by any means,
electronically, digitally or in print. The views expressed in this
communication may be of a personal nature and not be representative of
AIMS-NEI and/or any of its Centres or Initiatives.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but please remove the local file and add the tag_to_bioes parameter.
load_dataset.py
Outdated
@@ -0,0 +1,15 @@ | |||
from flair.datasets import ICELANDIC_NER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not add local files to git!
data_folder, | ||
columns, | ||
train_file='icelandic_ner.txt', | ||
in_memory=in_memory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add the tag_to_bioes
parameter here? i.e
`tag_to_bioes=tag_to_bioes,`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@TatianaMoteuN thanks for adding this! |
No description provided.