ERROR: section papers using --run_sectioning before search #13

Kaartik7 · 2022-05-29T08:13:24Z

When I run the following command in terminal on my mac - docanalysis --run_pygetpapers -q "terpene" -k 10 --project_name terpene_10, I run into the above mentioned error. Kindly help me with it

Kaartik7 · 2022-05-29T08:16:10Z

Although it does install the papers and makes Cproject, but I get this error message after the command finishes executing

Kaartik7 · 2022-05-29T08:21:00Z

Additional info that might help understand the issue : I get this error message "docanalysis: error: unrecognized arguments: --run_sectioning" when I try to section the papers

petermr · 2022-05-29T09:11:20Z

I have just run this: pm286macbook:awena-wikidata-crawler pm286$ docanalysis --help /opt/anaconda3/lib/python3.8/site-packages/_distutils_hack/__init__.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") usage: docanalysis [-h] [--run_pygetpapers] [--run_sectioning] [-q QUERY] [-k HITS] [--project_name PROJECT_NAME] [-d DICTIONARY] [-o OUTPUT] [--make_ami_dict MAKE_AMI_DICT] [-l LOGLEVEL] [-f LOGFILE] [--section [SECTION [SECTION ...]]] [--entities [ENTITIES [ENTITIES ...]]] [--spacy_model SPACY_MODEL] [--html HTML] Welcome to Docanalysis version 0.0.7. -h or --help for help optional arguments: -h, --help show this help message and exit

…

--run_pygetpapers queries EuropePMC via pygetpapers

--run_sectioning make sections -q QUERY, --query QUERY query to pygetpapers -k HITS, --hits HITS numbers of papers to download from pygetpapers

--project_name PROJECT_NAME name of CProject folder -d DICTIONARY, --dictionary DICTIONARY Ami Dictionary to tag sentences and support supervised entity extraction -o OUTPUT, --output OUTPUT Output CSV file [default=entities.csv]

--make_ami_dict MAKE_AMI_DICT if provided will make ami dict with given title -l LOGLEVEL, --loglevel LOGLEVEL [All] Provide logging level. Example --log warning <<info,warning,debug,error,critical>>, default='info' -f LOGFILE, --logfile LOGFILE [All] save log to specified file in output directory as well as printing to terminal

--section [SECTION [SECTION ...]] Which section to get

--entities [ENTITIES [ENTITIES ...]] Which entities to get. Default(ALL)

--spacy_model SPACY_MODEL Optional. (spacy, scispacy). Default(spacy)

--html HTML Saves output in html format to given path [...] (base) pm286macbook:projects pm286$ docanalysis -q "lantana" -k 5 --run_pygetpapers --run_sectioning /opt/anaconda3/lib/python3.8/site-packages/_distutils_hack/__init__.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") *INFO:* making project/searching lantana for 5 hits into /Users/pm286/projects/2022_05_29_09_04_26 *INFO:* Total Hits are 2174 1it [00:00, 323.31it/s] *INFO:* Saving XML files to /Users/pm286/projects/2022_05_29_09_04_26/*/fulltext.xml 100%|█████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 3.79it/s] *WARNING:* Making sections in /Users/pm286/projects/2022_05_29_09_04_26/PMC9095257/fulltext.xml *INFO:* dict_keys: dict_keys(['abstract', 'acknowledge', 'affiliation', 'author', 'conclusion', 'discussion', 'ethics', 'fig_caption', 'front', 'introduction', 'jrnl_title', 'keyword', 'method', 'octree', 'pdfimage', 'pub_date', 'publisher', 'reference', 'results_discuss', 'search_results', 'sections', 'svg', 'table', 'title']) *WARNING:* loading templates.json *INFO:* wrote XML sections for /Users/pm286/projects/2022_05_29_09_04_26/PMC9095257/fulltext.xml /Users/pm286/projects/2022_05_29_09_04_26/PMC9095257/sections *WARNING:* Making sections in /Users/pm286/projects/2022_05_29_09_04_26/PMC8933013/fulltext.xml *INFO:* wrote XML sections for /Users/pm286/projects/2022_05_29_09_04_26/PMC8933013/fulltext.xml /Users/pm286/projects/2022_05_29_09_04_26/PMC8933013/sections *WARNING:* Making sections in /Users/pm286/projects/2022_05_29_09_04_26/PMC8879267/fulltext.xml *INFO:* wrote XML sections for /Users/pm286/projects/2022_05_29_09_04_26/PMC8879267/fulltext.xml /Users/pm286/projects/2022_05_29_09_04_26/PMC8879267/sections *WARNING:* Making sections in /Users/pm286/projects/2022_05_29_09_04_26/PMC8593682/fulltext.xml *INFO:* wrote XML sections for /Users/pm286/projects/2022_05_29_09_04_26/PMC8593682/fulltext.xml /Users/pm286/projects/2022_05_29_09_04_26/PMC8593682/sections *WARNING:* Making sections in /Users/pm286/projects/2022_05_29_09_04_26/PMC8896935/fulltext.xml *INFO:* wrote XML sections for /Users/pm286/projects/2022_05_29_09_04_26/PMC8896935/fulltext.xml /Users/pm286/projects/2022_05_29_09_04_26/PMC8896935/sections *INFO:* starting tokenization on 1 paragraphs 100%|████████████████████████████████████████████████████████| 847/847 [00:01<00:00, 716.46it/s] *INFO:* Found 2610 sentences *INFO:* getting terms from/to False *INFO:* Loading spacy 100%|██████████████████████████████████████████████████████| 2610/2610 [00:14<00:00, 175.90it/s] /opt/anaconda3/lib/python3.8/site-packages/docanalysis/entity_extraction.py:257: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True. df[col] = df[col].astype(str).str.replace( *INFO:* wrote output to /Users/pm286/projects/2022_05_29_09_04_26/entities.csv (base) pm286macbook:projects pm286$ ls -lt | more total 88200 drwxr-xr-x 9 pm286 staff 288 29 May 10:04 2022_05_29_09_04_26 drwxr-xr-x 21 pm286 staff 672 22 May 18:29 presentations [...] (base) pm286macbook:projects pm286$ tree 2022_05_29_09_04_26/ | more 2022_05_29_09_04_26/ ├── PMC8593682 │ ├── eupmc_result.json │ ├── fulltext.xml │ └── sections │ ├── 0_processing-meta │ │ └── 0_restricted-by.xml │ ├── 1_front │ │ ├── 0_journal-meta │ │ │ ├── 0_journal-id.xml │ │ │ ├── 1_journal-id.xml │ │ │ ├── 2_journal-id.xml │ │ │ ├── 3_journal-title-group.xml │ │ │ ├── 4_issn.xml │ │ │ └── 5_publisher.xml │ │ └── 1_article-meta │ │ ├── 0_article-id.xml │ │ ├── 10_pub-date.xml │ │ ├── 11_pub-date.xml │ │ ├── 12_pub-date.xml │ │ ├── 13_volume.xml │ │ ├── 14_issue.xml │ │ ├── 15_elocation-id.xml │ │ ├── 16_history.xml │ │ ├── 17_permissions.xml │ │ ├── 18_self-uri.xml │ │ ├── 19_abstract.xml │ │ ├── 1_article-id.xml │ │ ├── 20_kwd-group.xml │ │ ├── 21_funding-group │ │ │ ├── 0_award-group │ │ │ │ ├── 0_funding-source │ │ │ │ │ └── 0_institution-wrap │ │ │ │ │ ├── 0_institution.xml │ │ │ │ │ └── 1_institution-id.xml │ │ │ │ ├── 1_award-id.xml │ │ │ │ ├── 2_principal-award-recipient │ │ │ │ │ └── 0_name.xml │ │ │ │ ├── 3_principal-award-recipient │ │ │ │ │ └── 0_name.xml │ │ │ │ ├── 4_principal-award-recipient │ │ │ │ │ └── 0_name.xml │ │ │ │ ├── 5_principal-award-recipient │ │ │ │ │ └── 0_name.xml │ │ │ │ ├── 6_principal-award-recipient │ │ │ │ │ └── 0_name.xml │ │ │ │ ├── 7_principal-award-recipient │ │ │ │ │ └── 0_name.xml │ │ │ │ └── 8_principal-award-recipient │ │ │ │ └── 0_name.xml │ │ │ ├── 1_award-group [...] So it works for me , although pip install seems to give version 0.0.7 Shweata, any thoughts? P.

On Sun, May 29, 2022 at 9:21 AM Kaartik7 ***@***.***> wrote: Additional info that might help understand the issue : I get this error message "docanalysis: error: unrecognized arguments: --run_sectioning" when I try to section the papers — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS4JHHVUA6IWRABUHYDVMMSHPANCNFSM5XHZYWGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: section papers using --run_sectioning before search #13

ERROR: section papers using --run_sectioning before search #13

Kaartik7 commented May 29, 2022

Kaartik7 commented May 29, 2022

Kaartik7 commented May 29, 2022

petermr commented May 29, 2022 via email

ERROR: section papers using --run_sectioning before search #13

ERROR: section papers using --run_sectioning before search #13

Comments

Kaartik7 commented May 29, 2022

Kaartik7 commented May 29, 2022

Kaartik7 commented May 29, 2022

petermr commented May 29, 2022 via email