-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put 583 remaining compounds in Wikidata #5
Comments
@petermr, where can I find the list of 200 CIDs? |
on https://github.com/gilienv/EssOilDB/tree/master/tables/chemistry/
It's a bit messy as we have split / forked the chemistry and the
disambiguation and cleaning is going on there. Ambarish Kumar is doing a
good job, but he's not working on CEV. See
gilienv/EssOilDB#76 which has 100 comments and
look at the latest. It may be easiest just to download the tables. from
/tables/chemistry/
I have posted an html table today.
Do you need access? I'll post something.
…On Wed, Aug 28, 2019 at 11:27 AM Egon Willighagen ***@***.***> wrote:
@petermr <https://github.com/petermr>, where can I find the list of 200
CIDs?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAFTCS7XWSVT3XIPHLEOZF3QGZHJ5A5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5KUVEQ#issuecomment-525683346>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS4F4VZAOEYDMGS2HALQGZHJ5ANCNFSM4IRB6WNA>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
I need to part this until at least next week, I'm afraid. I got some urgent stuff to solve first :( |
There's no rush
…On Wed, Aug 28, 2019 at 2:57 PM Egon Willighagen ***@***.***> wrote:
I need to part this until at least next week, I'm afraid. I got some
urgent stuff to solve first :(
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAFTCSYS5FU2K7RVTTX3PS3QGZ76HA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LGZOQ#issuecomment-525757626>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS5AV663QZXN3XGGI4LQGZ76HANCNFSM4IRB6WNA>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Sorry, there are too many files in that folder... I have no idea at this moment how to see which compounds have not been found in Wikidata yet (and that I should add). Help/suggestions welcome. |
We have copied the 2112 EssoilDB compounds to CEVOpen. @ambarish Kumar
<ambari73_sit@jnu.ac.in> is working on them. We found far too many synonyms
in PubChem and ChEBI so we've dropped those to about 300 which were found
in EssoilDB 1.0
Ambarish ,
do we have a simple list of compounds in CEVOpen that do not have Wikidata
entries?
…On Tue, Sep 10, 2019 at 8:50 AM Egon Willighagen ***@***.***> wrote:
Sorry, there are too many files in that folder... I have no idea at this
moment how to see which compounds have not been found in Wikidata yet (and
that I should add).
Help/suggestions welcome.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAFTCS6OH3AOGXBURWACJM3QI5GSVA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KF3FY#issuecomment-529816983>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS6XPN6MJBIIXSTFHRDQI5GSVANCNFSM4IRB6WNA>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Yes sir.
Please check the list of compounds which does not has Wikidata Id.
notFoundWikidata.csv -
https://github.com/petermr/CEVOpen/blob/master/notFoundWikidata.csv
Total number of records 583.
On Tue, Sep 10, 2019 at 4:11 PM Peter Murray-Rust <
peter.murray.rust@googlemail.com> wrote:
… We have copied the 2112 EssoilDB compounds to CEVOpen. @ambarish Kumar
***@***.***> is working on them. We found far too many
synonyms in PubChem and ChEBI so we've dropped those to about 300 which
were found in EssoilDB 1.0
Ambarish ,
do we have a simple list of compounds in CEVOpen that do not have Wikidata
entries?
On Tue, Sep 10, 2019 at 8:50 AM Egon Willighagen ***@***.***>
wrote:
> Sorry, there are too many files in that folder... I have no idea at this
> moment how to see which compounds have not been found in Wikidata yet (and
> that I should add).
>
> Help/suggestions welcome.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#5?email_source=notifications&email_token=AAFTCS6OH3AOGXBURWACJM3QI5GSVA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KF3FY#issuecomment-529816983>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAFTCS6XPN6MJBIIXSTFHRDQI5GSVANCNFSM4IRB6WNA>
> .
>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
--
*AMBARISH KUMAR*
*M.Tech - 2014-2016*
*SC&IS ; Jawaharlal Nehru University,*
*New Delhi, INDIA*
*+91 - 8377964303*
*ambari73_sit@jnu.ac.in <http://goog_1594941533>*
*er.ambarish@gmail.com <er.ambarish@gmail.com>*
|
Thanks Ambarish,
Egon does this look tractable?
(I suspect that some of these - especially the esters which may have
missing spaces - are not what the authors intended, but that doesn't alter
the validity of linking Wikidata to Pubchem - it just means they may not
get used frequently.
On Tue, Sep 10, 2019 at 12:29 PM Ambarish Kumar <ambari73_sit@jnu.ac.in>
wrote:
… Yes sir.
Please check the list of compounds which does not has Wikidata Id.
notFoundWikidata.csv -
https://github.com/petermr/CEVOpen/blob/master/notFoundWikidata.csv
Total number of records 583.
On Tue, Sep 10, 2019 at 4:11 PM Peter Murray-Rust <
***@***.***> wrote:
> We have copied the 2112 EssoilDB compounds to CEVOpen. @ambarish Kumar
> ***@***.***> is working on them. We found far too many
> synonyms in PubChem and ChEBI so we've dropped those to about 300 which
> were found in EssoilDB 1.0
>
> Ambarish ,
> do we have a simple list of compounds in CEVOpen that do not have
> Wikidata entries?
>
>
> On Tue, Sep 10, 2019 at 8:50 AM Egon Willighagen <
> ***@***.***> wrote:
>
>> Sorry, there are too many files in that folder... I have no idea at this
>> moment how to see which compounds have not been found in Wikidata yet (and
>> that I should add).
>>
>> Help/suggestions welcome.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#5?email_source=notifications&email_token=AAFTCS6OH3AOGXBURWACJM3QI5GSVA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KF3FY#issuecomment-529816983>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AAFTCS6XPN6MJBIIXSTFHRDQI5GSVANCNFSM4IRB6WNA>
>> .
>>
>
>
> --
> Peter Murray-Rust
> Founder ContentMine.org
> and
> Reader Emeritus in Molecular Informatics
> Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
>
--
*AMBARISH KUMAR*
*M.Tech - 2014-2016*
*SC&IS ; Jawaharlal Nehru University,*
*New Delhi, INDIA*
*+91 - 8377964303*
***@***.*** <http://goog_1594941533>*
***@***.*** ***@***.***>*
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Yes, thanks! |
Processing the file... |
This looks promising :)
|
For the next batch, I do find hits in Wikidata, tho. But that's not a problem. |
Okay, this is the workflow. On the above linked CSV file, I run this script: https://github.com/egonw/ons-wikidata/blob/master/EssOil/prepareInput.groovy This prepares the content for https://github.com/egonw/ons-wikidata/blob/master/Wikidata/createWDitemsFromSMILES.groovy which I run after that. The first (new) script fetches the SMILES for the compounds from PubChem. |
I'm now doing the remaining batch: https://tools.wmflabs.org/quickstatements/#/batch/18772 |
Egon this is great
We are writing a paper for mat todd and would be great to put all this in
…On Wed, 18 Sep 2019, 15:59 Egon Willighagen, ***@***.***> wrote:
I'm now doing the remaining batch:
https://tools.wmflabs.org/quickstatements/#/batch/18772
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAFTCS4IEXJXKENAXQDFU4LQKI64NA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7ALXUY#issuecomment-532724691>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFTCS2NPURYQCK2RM2MNJ3QKI64NANCNFSM4IRB6WNA>
.
|
Hi all, so what is next for this issue? |
By creating a Bacting script that takes PubChem CIDs and adds the corresponding compounds to Wikidata.
The text was updated successfully, but these errors were encountered: