Data visualisation #40

TS404 · 2020-04-07T10:25:59Z

I'm a big fan of a good visualisation. I'm going to start thinking about some possible visualisations using [R] and Shinyapps. Any assistance and ideas welcomed on possible individual or combined visualisations for:

Topics, themes, findings
Citations
Authors
Changes over time
Others?

Some existing examples:

Obviously networks and multidimensional scaling projections could be useful. Also probably circos plots between themes?

petermr · 2020-04-08T17:45:44Z

Very keen on this.
If you can make this appeal to a citizen audience that would be great. Citations and Authors may be seen as niche academic subjects whereas themes in everyday discourse (respirator, social distance) are likely to engage people.

TS404 · 2020-04-14T12:30:29Z

It's possible to do simple static diagrams via [R] packages like igraph.

A major bonus, however, might be interactive/responsive graphics. I've tested out the networkD3 and chorddiag packages (both of which are based on D3.js run via the R2D3 package). See

These should at least be sufficient to adapt for grouping and displaying sets articles based on coauthors, citations or topics.

Ideally, eventually would love to use the bundle variant of a chord diagram (example1 or example2, tutorial.

Initial tests of network for some covid authors:

petermr · 2020-04-14T13:21:41Z

Great!

On Tue, Apr 14, 2020 at 1:30 PM Thomas Shafee ***@***.***> wrote: It's possible to do simple static diagrams via [R] packages like igraph <https://igraph.org/r/>. A major bonus, however, might be interactive/responsive graphics. I've tested out the networkD3 <https://christophergandrud.github.io/networkD3> and chorddiag <https://github.com/mattflor/chorddiag> packages (both of which are based on D3.js <https://d3js.org/> run via the R2D3 package <https://rstudio.github.io/r2d3/articles/gallery.html>). See

Excellent. This could work for cooccurrences - e.g. in Counties or diseases. I have just created (but not pushed) the first extraction of biorxiv700 (695 papers) - due to coding bugs there are only 600.

These should at least be sufficient to adapt for grouping and displaying sets articles based on coauthors, citations or topics. Ideally, eventually would love to use the bundle variant of a chord diagram (example1 ***@***.***/hierarchical-edge-bundling> or example2 ***@***.***/hierarchical-edge-bundling/2>, tutorial <https://www.youtube.com/watch?v=ROflkF1CVhI>. Initial tests of network for some covid authors:

Excellent. How do you want to receive the data? P. [image: image]

…

<https://user-images.githubusercontent.com/10216013/79225072-86853c80-7e9f-11ea-91cf-1e2c68071adf.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS6JBNLRR3O64CUGH5LRMRJPJANCNFSM4MC73AVA> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

TS404 · 2020-04-15T10:54:06Z

I've now added the visualisation code to wikiPackageTesting.R in the #Visualisations section

Data options:

Easiest: I should be able to read the html tables (e.g. full.dataTables.html as a matrix of article vs (author / topic / citing article) in any tabular format (csv, tsv whatever) should be sufficient to import.
Most size-efficient: a table of edges (start, end, weight for each) and a table of nodes (name and properties for each) per MisLins and MisNodes here.
Ideal: Store all info in wikidata, where I can then pull via SPARQL e.g. all publications with a main subject (P921) of covid-19 (Q84263196), SARS-CoV-2 (Q82069695), Coronavirus (Q290805) etc along with their other topics, authors, citations, etc. e.g:

SELECT DISTINCT ?work ?workLabel ?pdate ?topic ?topicLabel ?author1 ?author1Label ?citing_work WHERE {
  VALUES ?topics { wd:Q82069695 wd:Q84263196 wd:Q81068910 }
  ?work wdt:P31 wd:Q13442814;
    wdt:P921 ?topics.
  OPTIONAL { ?work wdt:P577 ?pdate. }
  OPTIONAL { ?work wdt:P921 ?topic. }
  OPTIONAL { ?work wdt:P50  ?author1. }
  OPTIONAL { ?citing_work wdt:P2860 ?work. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?pdate ?work ?workLabel  ?topic ?topicLabel ?author1 ?author1Label ?citing_work

petermr · 2020-04-15T12:34:26Z

Thanks @TS404

I've now added the visualisation code to wikiPackageTesting.R in the #Visualisations section

Well done. Can you give some screen shots?

Data options:

Easiest: I should be able to read the html tables (e.g. full.dataTables.html as a matrix of article vs (author / topic / citing article) in any tabular format (csv, tsv whatever) should be sufficient to import.

That should be possible. Note there are usually many entries in a facet-cell. If you are just looking at bibliographic data we may manage things. There are multiple authors per article. How do we manage that?

And what is a "citing" article? we don't have, and won't have , a citation graph.

Most size-efficient: a table of edges (start, end, weight for each) and a table of nodes (name and properties for each) per MisLins and MisNodes here.

Don't understand where these edges come from, and what a MisLin or MisNode.

Ideal: Store all info in wikidata, where I can then pull via SPARQL e.g. all publications with a main subject (P921) of covid-19 (Q84263196), SARS-CoV-2 (Q82069695), Coronavirus (Q290805) etc along with their other topics, authors, citations, etc. e.g:

That would be great. Presumably a questioon of getting this accepted by Wikidata-ns , but DanielM put millions of bibliographic references into Wikidata.

(HEY! we should be adding QIDs for publications. That would be great!)

Note also that I have not got pointers back to Biorxiv working properly.

Are they putting preprints into Wikidata?

deadlyvices · 2020-04-15T13:21:07Z

When I worked at AZ the New Opportunities group did a visualization where they examined the author list, and then ranked them by first author, last author and secondary contributor count in papers. It was a triangular plot, as you'd use for a three component phase diagram.. I think they call it a 'ternary plot':

So we could do that for a particular topic search and then get to identify the key opinion leaders.

petermr · 2020-04-15T13:56:44Z

Note that we haven't got a simple approach to bibliography. We can do JATS from EPMC . JATS is not always much fun as there can be authorstrings (i.e. all authors run together) and disambiguation (no ORCIDs).
What is the driver for this? I suspect academics will use it but who else?

deadlyvices · 2020-04-15T15:14:49Z

I think it's useful to know who is helping to lead an area of investigation. I've been playing around and have been able to generate the percentages of publications for each author as first author, last author and other. Spotfire doesn't do ternary plots, so I generated a ||el coordinate plot:

deadlyvices · 2020-04-15T15:15:36Z

So it's possible to generate the input

TS404 · 2020-04-16T12:10:12Z

Data format and storage

The facet cell listing multiple items is fine (essentially I'll aim to turn it into a nested list in [R]). Similarly, ideally there should be a column listing all the authors of a publication (disambiguating to QIDs will be the greatest challenge) but as plaintext strings is fine as a backup.

I've checked over at Wikidata's Wikiproject COVID-19 and it seems there are already a few hundred preprints already listed in wikidata, so it shouldn't be too controversial to add all the covid-relevant ones (and eventually others).

Visualisations

Visualisations focusing on topics and publications has the clearest immediate public value to show where the main research threads are heading.

Visualisations focusing on authors can demonstrate which authors are collaborative (and which are in silos) and in what roles and can help researchers to identify people to watch or contact for collaboration. I like the idea of separating first/middle/last if possible (like this query).

I've done a bit more stress-testing of the code for networks of different sizes (e.g. see Anthony Fauci's co-author network below). Next step, I'll start tweaking it to make the nodes=publications and the links=topic_similarity.

Anthony Fauci's co-author network, larger circles and thicker lines indicate people he's co-authored more with. For interactive version, see WDNetworkVis.nb.html

TS404 · 2020-04-19T09:54:16Z

Ok, so I've managed to get the concomitant co-topic graph working reasonably robustly!

In order to make it interactive, I've built a simple shiny app. It works locally fine locally, but the version on shinyapps.io seems to still be having problems (I've left a query on stack overflow).

Website: https://ts404.shinyapps.io/topicnetwork
Code: https://github.com/TS404/TopicNetwork

Once I've managed to get it properly working online, next steps for the visualisation:

Take the topics graph from biorxiv700/full.dataTables.html as the input rather than only wikidata
Present chord diagram as well
Improve the click actions
a. select node to list publications on that topic?
b. select multiple nodes to subset?
c. easy navigation to wikidata/publication/scholia
d. loading time indicator? (larger wikidata queries can take >30)

Local instance of TS404/topicnetwork.

Same data visualised as chord diagram (not yet included in TS404/topicnetwork).

petermr · 2020-04-19T15:20:05Z

Well done! I also get "Disconnected from the server" - is that the problem? (Chrome)

…

On Sun, Apr 19, 2020 at 10:54 AM Thomas Shafee ***@***.***> wrote: Ok, so I've managed to get the concomitant *co-topic* graph working reasonably robustly! In order to make it interactive, I've built a simple shiny app. It works locally fine locally, but the version on shinyapps.io seems to still be having problems (I've left a query on stack overflow <https://stackoverflow.com/questions/61301407/immediate-disconnect-from-server-in-shinyapps-local-working-no-errors-reported> ). - Website: https://ts404.shinyapps.io/topicnetwork - Code: https://github.com/TS404/TopicNetwork Once I've managed to get it properly working online, next steps: 1. Take the topics graph from openVirus as the input 2. Present chord diagram as well 3. Improve the click actions (e.g. select node to list publications on that topic, click multiple nodes to subset?) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS4BN5CML7T6MTUTQLTRNLC5HANCNFSM4MC73AVA> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr · 2020-04-19T15:20:42Z

Ah... you seem to have got some suggestions on StackOverflow On Sun, Apr 19, 2020 at 4:19 PM Peter Murray-Rust < peter.murray.rust@googlemail.com> wrote:

…

Well done! I also get "Disconnected from the server" - is that the problem? (Chrome) On Sun, Apr 19, 2020 at 10:54 AM Thomas Shafee ***@***.***> wrote: > Ok, so I've managed to get the concomitant *co-topic* graph working > reasonably robustly! > > In order to make it interactive, I've built a simple shiny app. It works > locally fine locally, but the version on shinyapps.io seems to still be > having problems (I've left a query on stack overflow > <https://stackoverflow.com/questions/61301407/immediate-disconnect-from-server-in-shinyapps-local-working-no-errors-reported> > ). > > - Website: https://ts404.shinyapps.io/topicnetwork > - Code: https://github.com/TS404/TopicNetwork > > Once I've managed to get it properly working online, next steps: > > 1. Take the topics graph from openVirus as the input > 2. Present chord diagram as well > 3. Improve the click actions (e.g. select node to list publications > on that topic, click multiple nodes to subset?) > > — > You are receiving this because you modified the open/close state. > Reply to this email directly, view it on GitHub > <#40 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAFTCS4BN5CML7T6MTUTQLTRNLC5HANCNFSM4MC73AVA> > . > -- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

TS404 · 2020-04-20T00:53:43Z

The CJ Yetman comment fixed it! Try https://ts404.shinyapps.io/topicnetwork now! I'll have to test why the fix works to avoid re-introducing it later, but v. useful for now.

TS404 · 2020-04-27T11:02:44Z

Updates to https://ts404.shinyapps.io/topicnetwork now enable it to report back the list of publications that are about a set of subjects. Currently picked based on checkboxes, but eventually I'd like it to be based on clicking the nodes.

petermr · 2020-04-27T11:40:44Z

This is fantastic. (Small comments. It's somewhat slow computationally. And it's not easy to read the labels. But it shows new clusters. Excitin g.)

…

On Mon, Apr 27, 2020 at 12:02 PM Thomas Shafee ***@***.***> wrote: Updates to https://ts404.shinyapps.io/topicnetwork now enable it to report back the list of publications that are about a set of subjects. Currently picked based on checkboxes, but eventually I'd like it to be based on clicking the nodes. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS23LCIHOKIOQ6G43RTROVQ6FANCNFSM4MC73AVA> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

TS404 closed this as completed Apr 15, 2020

TS404 reopened this Apr 15, 2020

This was referenced Apr 15, 2020

Feature request: collapse or bundle nodes christophergandrud/networkD3#258

Closed

Customizing the label/text for each link in ForceNetwork output? christophergandrud/networkD3#271

Closed

New visualisations WDscholia/scholia#1108

Open

petermr closed this as completed Apr 15, 2020

petermr reopened this Apr 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data visualisation #40

Data visualisation #40

TS404 commented Apr 7, 2020

petermr commented Apr 8, 2020

TS404 commented Apr 14, 2020

petermr commented Apr 14, 2020 via email

TS404 commented Apr 15, 2020 •

edited

Loading

petermr commented Apr 15, 2020

deadlyvices commented Apr 15, 2020

petermr commented Apr 15, 2020 •

edited

Loading

deadlyvices commented Apr 15, 2020

deadlyvices commented Apr 15, 2020

TS404 commented Apr 16, 2020

TS404 commented Apr 19, 2020 •

edited

Loading

petermr commented Apr 19, 2020 via email

petermr commented Apr 19, 2020 via email

TS404 commented Apr 20, 2020 •

edited

Loading

TS404 commented Apr 27, 2020

petermr commented Apr 27, 2020 via email

Data visualisation #40

Data visualisation #40

Comments

TS404 commented Apr 7, 2020

petermr commented Apr 8, 2020

TS404 commented Apr 14, 2020

petermr commented Apr 14, 2020 via email

TS404 commented Apr 15, 2020 • edited Loading

petermr commented Apr 15, 2020

deadlyvices commented Apr 15, 2020

petermr commented Apr 15, 2020 • edited Loading

deadlyvices commented Apr 15, 2020

deadlyvices commented Apr 15, 2020

TS404 commented Apr 16, 2020

Data format and storage

Visualisations

TS404 commented Apr 19, 2020 • edited Loading

petermr commented Apr 19, 2020 via email

petermr commented Apr 19, 2020 via email

TS404 commented Apr 20, 2020 • edited Loading

TS404 commented Apr 27, 2020

petermr commented Apr 27, 2020 via email

TS404 commented Apr 15, 2020 •

edited

Loading

petermr commented Apr 15, 2020 •

edited

Loading

TS404 commented Apr 19, 2020 •

edited

Loading

TS404 commented Apr 20, 2020 •

edited

Loading