-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema & Scraping helpers #9
Comments
Names stems for drugs and chemicals |
Categories for Drug Activities — No ID numbers though |
On Thu, Aug 29, 2019 at 5:27 PM Emanuel Faria ***@***.***> wrote:
Names stems for drugs and chemicals
https://druginfo.nlm.nih.gov/drugportal/jsp/drugportal/DrugNameGenericStems.jsp
very useful for generic drugs. Thanks. Needs me to mend the regex (regular
expression) parser.
—
… You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAFTCSZFSLPZC35EAM7WHL3QG72H5A5CNFSM4ISEKRC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5PCB2Y#issuecomment-526262507>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS2SQDM56ESQNFJNKGTQG72H5ANCNFSM4ISEKRCQ>
.
--
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
|
Useful. We can hack activitties out of this. Messy - the sort of thing
while watching cricket
…On Thu, Aug 29, 2019 at 5:33 PM Emanuel Faria ***@***.***> wrote:
Categories for Drug Activities — No ID numbers though
https://druginfo.nlm.nih.gov/drugportal/drug/categories
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAFTCSZUUCY7WHY5IEZ3R5LQG727HA5CNFSM4ISEKRC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5PCS3A#issuecomment-526264684>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS5OCPRK5AON4SCUDWTQG727HANCNFSM4ISEKRCQ>
.
--
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
|
@petermr I Found some more schema links. Let me know if you want me to keep posting these. I really don't know how to help you identify things we can connect to or pull from easily. I googled "taxonomy of pharmalogical Activities" and ended up here: http://apps.who.int/medicinedocs/en/d/Js4895e/5.html Maybe this contacting this journal could help? https://www.tandfonline.com/doi/full/10.1080/13880209.2017.1323225 https://en.wikipedia.org/wiki/Semantic_Web https://www.mediawiki.org/wiki/Wikibase/EntityData https://schema.org/MedicalEnumeration https://schema.org/docs/tree.jsonld |
On Mon, Sep 2, 2019 at 9:16 PM Emanuel Faria ***@***.***> wrote:
@petermr <https://github.com/petermr> I Found some more schema links. Let
me know if you want me to keep posting these. I really don't know how to
help you identify things we can connect to or pull from easily.
Slow down on this! there are a million terms in UMLS/MesH and we need abut
1%.
We'll talk.
More important to see if there is a consistent structure to the papers.
That's what will be valuable.
if we get sections on
activity
constitution
plant
thats what we need.
(I have used complex taxonomies in the past. they're sometimes useful but a
simple list of words in wikipedia is the most valuable). One problem of
multiple taxonomies is that they don't map onto each other.
I googled "taxonomy of pharmalogical Activities" and ended up here:
http://apps.who.int/medicinedocs/en/d/Js4895e/5.html
Maybe this contacting this journal could help?
https://www.tandfonline.com/doi/full/10.1080/13880209.2017.1323225
No
schema.org is valuable and works closely with Wikidata.
https://schema.org/MedicalEnumeration
https://schema.org/DrugClass
https://schema.org/DietarySupplement
https://schema.org/docs/tree.jsonld
https://schema.org/version/3.9/schema-all.html
https://schema.org/docs/releases.html
schemaorg/schemaorg#2306
<schemaorg/schemaorg#2306>
The key thing is that hierarchies can be expanded or contracted according
to granuality. Thus
we may wish to search for "infective diseases" and have that automatically
expanded to - say 300diseases or contrariwise find terms in text and want
to know the genral sort.
I'll explain
But first we'll do it with plants and I need your help.
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAFTCSZTWP7MFRFHRBBCX5LQHVYAJA5CNFSM4ISEKRC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WP3MA#issuecomment-527236528>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS4R4DWYECSGV345C3LQHVYAJANCNFSM4ISEKRCQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Whew. That’s great. You lead, I’ll follow.---- On Mon, 02 Sep 2019 17:28:06 -0400 notifications@github.com<notifications@github.com> wrote ----On Mon, Sep 2, 2019 at 9:16 PM Emanuel Faria <notifications@github.com>
wrote:
@petermr <https://github.com/petermr> I Found some more schema links. Let
me know if you want me to keep posting these. I really don't know how to
help you identify things we can connect to or pull from easily.
Slow down on this! there are a million terms in UMLS/MesH and we need abut
1%.
We'll talk.
More important to see if there is a consistent structure to the papers.
That's what will be valuable.
if we get sections on
activity
constitution
plant
thats what we need.
(I have used complex taxonomies in the past. they're sometimes useful but a
simple list of words in wikipedia is the most valuable). One problem of
multiple taxonomies is that they don't map onto each other.
I googled "taxonomy of pharmalogical Activities" and ended up here:
http://apps.who.int/medicinedocs/en/d/Js4895e/5.html
Maybe this contacting this journal could help?
https://www.tandfonline.com/doi/full/10.1080/13880209.2017.1323225
No
schema.org is valuable and works closely with Wikidata.
https://schema.org/MedicalEnumeration
https://schema.org/DrugClass
https://schema.org/DietarySupplement
https://schema.org/docs/tree.jsonld
https://schema.org/version/3.9/schema-all.html
https://schema.org/docs/releases.html
schemaorg/schemaorg#2306
<schemaorg/schemaorg#2306>
The key thing is that hierarchies can be expanded or contracted according
to granuality. Thus
we may wish to search for "infective diseases" and have that automatically
expanded to - say 300diseases or contrariwise find terms in text and want
to know the genral sort.
I'll explain
But first we'll do it with plants and I need your help.
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAFTCSZTWP7MFRFHRBBCX5LQHVYAJA5CNFSM4ISEKRC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WP3MA#issuecomment-527236528>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS4R4DWYECSGV345C3LQHVYAJANCNFSM4ISEKRCQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.
|
We have 3200 diseases in ContentMine already. Some are not well organized
for searching. I'm keen that we develop so simple Machine Learning
(word2vec or tensorflow or keras) so you can recognize from environment
(e.g
"12 patients suffering from acne vulgaris, 10 from eczema and 9 from acute
vogonitis".
You've never hear of vogonitis? Nor have I - I made it up, but you can tell
it's a disease. This is called a Hearst Pattern and the software will pick
it up. We can do the same for chemicals and plants.
On Mon, Sep 2, 2019 at 10:34 PM Emanuel Faria <notifications@github.com>
wrote:
…
Whew. That’s great. You lead, I’ll follow.---- On Mon, 02 Sep 2019
17:28:06 -0400 ***@***.******@***.***> wrote
----On Mon, Sep 2, 2019 at 9:16 PM Emanuel Faria ***@***.***>
wrote:
> @petermr <https://github.com/petermr> I Found some more schema links.
Let
> me know if you want me to keep posting these. I really don't know how to
> help you identify things we can connect to or pull from easily.
>
Slow down on this! there are a million terms in UMLS/MesH and we need abut
1%.
We'll talk.
More important to see if there is a consistent structure to the papers.
That's what will be valuable.
if we get sections on
activity
constitution
plant
thats what we need.
(I have used complex taxonomies in the past. they're sometimes useful but
a
simple list of words in wikipedia is the most valuable). One problem of
multiple taxonomies is that they don't map onto each other.
> I googled "taxonomy of pharmalogical Activities" and ended up here:
> http://apps.who.int/medicinedocs/en/d/Js4895e/5.html
>
> Maybe this contacting this journal could help?
> https://www.tandfonline.com/doi/full/10.1080/13880209.2017.1323225
>
No
> https://en.wikipedia.org/wiki/Semantic_Web
>
> https://www.mediawiki.org/wiki/Wikibase/EntityData
>
schema.org is valuable and works closely with Wikidata.
> https://schema.org/MedicalEnumeration
> https://schema.org/DrugClass
> https://schema.org/DietarySupplement
>
> https://schema.org/docs/tree.jsonld
> https://schema.org/version/3.9/schema-all.html
> https://schema.org/docs/releases.html
> schemaorg/schemaorg#2306
> <schemaorg/schemaorg#2306>
>
The key thing is that hierarchies can be expanded or contracted according
to granuality. Thus
we may wish to search for "infective diseases" and have that automatically
expanded to - say 300diseases or contrariwise find terms in text and want
to know the genral sort.
I'll explain
But first we'll do it with plants and I need your help.
—
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#9?email_source=notifications&email_token=AAFTCSZTWP7MFRFHRBBCX5LQHVYAJA5CNFSM4ISEKRC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WP3MA#issuecomment-527236528>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AAFTCS4R4DWYECSGV345C3LQHVYAJANCNFSM4ISEKRCQ>
> .
>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
—You are receiving this because you authored the thread.Reply to this
email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAFTCS6KMW6B7DJY2JE2K3TQHWBIBA5CNFSM4ISEKRC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WSIDY#issuecomment-527246351>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS7R3WEC3E6M2SWT6MTQHWBIBANCNFSM4ISEKRCQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
:-)
…On Tue, Sep 3, 2019 at 2:13 PM Emanuel Faria ***@***.***> wrote:
Vogonitis description:
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS6ZhrKHEQZhfNFlx43aIGGjvZKIUs8smMKWEHCXpCA-rN0Y8-cjAlrBo5U
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAFTCS2FCF2FLI7WIZHNPS3QHZPFZA5CNFSM4ISEKRC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5YESJA#issuecomment-527452452>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS4HQG5VC6EW2642XFLQHZPFZANCNFSM4ISEKRCQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
### Stuff to help us identify useful Text and Terms
The text was updated successfully, but these errors were encountered: