-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: section papers using --run_sectioning before search #13
Comments
Although it does install the papers and makes Cproject, but I get this error message after the command finishes executing |
Additional info that might help understand the issue : I get this error message "docanalysis: error: unrecognized arguments: --run_sectioning" when I try to section the papers |
I have just run this:
pm286macbook:awena-wikidata-crawler pm286$ docanalysis --help
/opt/anaconda3/lib/python3.8/site-packages/_distutils_hack/__init__.py:36:
UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
usage: docanalysis [-h] [--run_pygetpapers] [--run_sectioning] [-q QUERY]
[-k HITS]
[--project_name PROJECT_NAME] [-d DICTIONARY] [-o OUTPUT]
[--make_ami_dict MAKE_AMI_DICT] [-l LOGLEVEL] [-f
LOGFILE]
[--section [SECTION [SECTION ...]]] [--entities
[ENTITIES [ENTITIES ...]]]
[--spacy_model SPACY_MODEL] [--html HTML]
Welcome to Docanalysis version 0.0.7. -h or --help for help
optional arguments:
-h, --help show this help message and exit
…--run_pygetpapers queries EuropePMC via pygetpapers
--run_sectioning make sections
-q QUERY, --query QUERY
query to pygetpapers
-k HITS, --hits HITS numbers of papers to download from pygetpapers
--project_name PROJECT_NAME
name of CProject folder
-d DICTIONARY, --dictionary DICTIONARY
Ami Dictionary to tag sentences and support
supervised entity
extraction
-o OUTPUT, --output OUTPUT
Output CSV file [default=entities.csv]
--make_ami_dict MAKE_AMI_DICT
if provided will make ami dict with given title
-l LOGLEVEL, --loglevel LOGLEVEL
[All] Provide logging level. Example --log warning
<<info,warning,debug,error,critical>>,
default='info'
-f LOGFILE, --logfile LOGFILE
[All] save log to specified file in output
directory as well as
printing to terminal
--section [SECTION [SECTION ...]]
Which section to get
--entities [ENTITIES [ENTITIES ...]]
Which entities to get. Default(ALL)
--spacy_model SPACY_MODEL
Optional. (spacy, scispacy). Default(spacy)
--html HTML Saves output in html format to given path
[...]
(base) pm286macbook:projects pm286$ docanalysis -q "lantana" -k 5
--run_pygetpapers --run_sectioning
/opt/anaconda3/lib/python3.8/site-packages/_distutils_hack/__init__.py:36:
UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
*INFO:* making project/searching lantana for 5 hits into
/Users/pm286/projects/2022_05_29_09_04_26
*INFO:* Total Hits are 2174
1it [00:00, 323.31it/s]
*INFO:* Saving XML files to
/Users/pm286/projects/2022_05_29_09_04_26/*/fulltext.xml
100%|█████████████████████████████████████████████████████████████| 5/5
[00:01<00:00, 3.79it/s]
*WARNING:* Making sections in
/Users/pm286/projects/2022_05_29_09_04_26/PMC9095257/fulltext.xml
*INFO:* dict_keys: dict_keys(['abstract', 'acknowledge', 'affiliation',
'author', 'conclusion', 'discussion', 'ethics', 'fig_caption', 'front',
'introduction', 'jrnl_title', 'keyword', 'method', 'octree', 'pdfimage',
'pub_date', 'publisher', 'reference', 'results_discuss', 'search_results',
'sections', 'svg', 'table', 'title'])
*WARNING:* loading templates.json
*INFO:* wrote XML sections for
/Users/pm286/projects/2022_05_29_09_04_26/PMC9095257/fulltext.xml
/Users/pm286/projects/2022_05_29_09_04_26/PMC9095257/sections
*WARNING:* Making sections in
/Users/pm286/projects/2022_05_29_09_04_26/PMC8933013/fulltext.xml
*INFO:* wrote XML sections for
/Users/pm286/projects/2022_05_29_09_04_26/PMC8933013/fulltext.xml
/Users/pm286/projects/2022_05_29_09_04_26/PMC8933013/sections
*WARNING:* Making sections in
/Users/pm286/projects/2022_05_29_09_04_26/PMC8879267/fulltext.xml
*INFO:* wrote XML sections for
/Users/pm286/projects/2022_05_29_09_04_26/PMC8879267/fulltext.xml
/Users/pm286/projects/2022_05_29_09_04_26/PMC8879267/sections
*WARNING:* Making sections in
/Users/pm286/projects/2022_05_29_09_04_26/PMC8593682/fulltext.xml
*INFO:* wrote XML sections for
/Users/pm286/projects/2022_05_29_09_04_26/PMC8593682/fulltext.xml
/Users/pm286/projects/2022_05_29_09_04_26/PMC8593682/sections
*WARNING:* Making sections in
/Users/pm286/projects/2022_05_29_09_04_26/PMC8896935/fulltext.xml
*INFO:* wrote XML sections for
/Users/pm286/projects/2022_05_29_09_04_26/PMC8896935/fulltext.xml
/Users/pm286/projects/2022_05_29_09_04_26/PMC8896935/sections
*INFO:* starting tokenization on 1 paragraphs
100%|████████████████████████████████████████████████████████| 847/847
[00:01<00:00, 716.46it/s]
*INFO:* Found 2610 sentences
*INFO:* getting terms from/to False
*INFO:* Loading spacy
100%|██████████████████████████████████████████████████████| 2610/2610
[00:14<00:00, 175.90it/s]
/opt/anaconda3/lib/python3.8/site-packages/docanalysis/entity_extraction.py:257:
FutureWarning: The default value of regex will change from True to False in
a future version. In addition, single character regular expressions will
*not* be treated as literal strings when regex=True.
df[col] = df[col].astype(str).str.replace(
*INFO:* wrote output to
/Users/pm286/projects/2022_05_29_09_04_26/entities.csv
(base) pm286macbook:projects pm286$ ls -lt | more
total 88200
drwxr-xr-x 9 pm286 staff 288 29 May 10:04 2022_05_29_09_04_26
drwxr-xr-x 21 pm286 staff 672 22 May 18:29 presentations
[...]
(base) pm286macbook:projects pm286$ tree 2022_05_29_09_04_26/ | more
2022_05_29_09_04_26/
├── PMC8593682
│ ├── eupmc_result.json
│ ├── fulltext.xml
│ └── sections
│ ├── 0_processing-meta
│ │ └── 0_restricted-by.xml
│ ├── 1_front
│ │ ├── 0_journal-meta
│ │ │ ├── 0_journal-id.xml
│ │ │ ├── 1_journal-id.xml
│ │ │ ├── 2_journal-id.xml
│ │ │ ├── 3_journal-title-group.xml
│ │ │ ├── 4_issn.xml
│ │ │ └── 5_publisher.xml
│ │ └── 1_article-meta
│ │ ├── 0_article-id.xml
│ │ ├── 10_pub-date.xml
│ │ ├── 11_pub-date.xml
│ │ ├── 12_pub-date.xml
│ │ ├── 13_volume.xml
│ │ ├── 14_issue.xml
│ │ ├── 15_elocation-id.xml
│ │ ├── 16_history.xml
│ │ ├── 17_permissions.xml
│ │ ├── 18_self-uri.xml
│ │ ├── 19_abstract.xml
│ │ ├── 1_article-id.xml
│ │ ├── 20_kwd-group.xml
│ │ ├── 21_funding-group
│ │ │ ├── 0_award-group
│ │ │ │ ├── 0_funding-source
│ │ │ │ │ └── 0_institution-wrap
│ │ │ │ │ ├── 0_institution.xml
│ │ │ │ │ └── 1_institution-id.xml
│ │ │ │ ├── 1_award-id.xml
│ │ │ │ ├── 2_principal-award-recipient
│ │ │ │ │ └── 0_name.xml
│ │ │ │ ├── 3_principal-award-recipient
│ │ │ │ │ └── 0_name.xml
│ │ │ │ ├── 4_principal-award-recipient
│ │ │ │ │ └── 0_name.xml
│ │ │ │ ├── 5_principal-award-recipient
│ │ │ │ │ └── 0_name.xml
│ │ │ │ ├── 6_principal-award-recipient
│ │ │ │ │ └── 0_name.xml
│ │ │ │ ├── 7_principal-award-recipient
│ │ │ │ │ └── 0_name.xml
│ │ │ │ └── 8_principal-award-recipient
│ │ │ │ └── 0_name.xml
│ │ │ ├── 1_award-group
[...]
So it works for me , although pip install seems to give version 0.0.7
Shweata, any thoughts?
P.
On Sun, May 29, 2022 at 9:21 AM Kaartik7 ***@***.***> wrote:
Additional info that might help understand the issue : I get this error
message "docanalysis: error: unrecognized arguments: --run_sectioning" when
I try to section the papers
—
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS4JHHVUA6IWRABUHYDVMMSHPANCNFSM5XHZYWGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When I run the following command in terminal on my mac - docanalysis --run_pygetpapers -q "terpene" -k 10 --project_name terpene_10, I run into the above mentioned error. Kindly help me with it
The text was updated successfully, but these errors were encountered: