Teams & Ideas

Martin Fenner edited this page Jul 6, 2013 · 36 revisions

Don't wait until the day of the event (July 6) to come up with your idea or organize your team. Hack events are about self-organization whether you decided to join a team or go solo. Use this space to pitch your ideas ahead of time and start forming any teams around those ideas.

Suggested format for those with an idea:

  1. The idea (No more than 1-2 paragraphs, but link outs to other pages is OK).
  2. Your name
  3. What skills you bring to make your idea happen
  4. What complementary skills you still need in any teammates.

Anyone wanting to join that team should add their name below the idea.


Figure Mining & Enrichment

  1. Idea: Figure Mining & Enrichment Mashup PLOS figures / Classify them into types & enrich/further annotate existing metadata. See if we can extract data from figures (e.g. the coordinates of an x,y plot) and provide that data in a machine-readable form.

Tools/Approaches: OCR, Machine Learning, Supervised machine learning, broad metadata ontology

I also have 4 million unique DOI's from Citeulike that I'd like to explore, classify by publisher, journal etc (not sure if this is relevant to the hack day aims but I'll just throw it out there, it's an interesting chunk of data...)

  1. Ross Mounce, Community Coordinator for Open Science at the Open Knowledge Foundation

  2. Skills: enthusiasm

  3. Need team mates!

Peter Murray-Rust bringing PDF2XML technology (alpha) - see what we can read.


Suggestion engine for relevant / interesting scholarly articles

  1. No idea if this is in the remit of the event, or if it is too big / ambitious (I'm really not sure what to expect on the day!) but I'm going to throw it in and see what people think!

Mendelay is a great tool for organising papers / articles / conference proceedings for accademic work. However, for discovery of new articles it could do more. Listening to Spotify Radio one day, it occured to me, can the same algorithms used by Spotify and its like (, Pandora, TasteKid, etc.) use to find new music be used to discover new research articles? Can we use some technique (e..g multivariate classifier, SVM, etc.), to learn associations between articles and use these to recommend articles to users not currently in their Mendelay account. Mendelay has over 2m accounts from which associations between articles be derived. Citation relationships, which maybe can be pulled from databases such as PubMed, could also be used. The matching algorithm may also be restricted to consider only one article or group of articles if the user wants to find something on a specific topic.

As a researcher, I randomly find articles that are highly relevant to me and wish I found earlier but didn't because I was using the wrong search terms or looking in the wrong databases / journals. Such a tool would increase researchers' exposure to the latest trends in their research field.

  1. Mark Drakesmith, a post-doctoral neuroscientist at Cardiff University

  2. Some programming skills (matlab, python and a bit of c++) but no experience of 'hacking', handling databases, etc. A keenness to do something outside my comfort zone!

  3. Anyone who is interested! Particularly people with more knowledge or experience of accessing / using this type of data.

Georg Walther:

Love the idea. We could start by fingerprinting the abstract and / or main text (if available) of articles. Maybe one way of making a start would be to use a library such as to parse the corresponding bodies of text and count the occurrence of all / some words as a fingerprint. This would probably require storing these fingerprints for later queries. Your idea also seems to be of general interest to Mendeley:

Just a bit of fun ... probably nothing useful at this stage.

PDFs can probably be handled with pdftotext and pdfinfo (both Linux command line tools) which seem to be used by Zotero for meta data extraction.

Browser Plugin(s) for

  1. seems to be a new and promising platform for post-publication review / discussion of scientific articles.

Let's have a stab at developing a browser plugin -- assuming we can get a hold of pubpeer's API.

Ideas for the plugin:

  • add a browser button / GUI element for the plugin
  • when the user visits the abstract or full view of an article, detect the article's DOI and query pubpeer for existing comments -- to fetch DOIs or other identifiers we can probably make use of Zotero translators (if I understand these correctly)
  • if comments exist alert the user to their existence (non-intrusively)
  • give the user the ability to easily jump directly to the corresponding pubpeer page of the viewed article
  • (more advanced?) open a new panel / window that shows the corresponding pubpeer comments and allows the user to leave comments while keeping the browser window pointed at the article
  1. Georg Walther

  2. Python, C; hardly any experience with web services nor browser plugins but keen to learn

  3. Browser plugin people; JavaScript; XUL (XML)

Author contributions in PLOS papers

All PLOS papers (currently about 80,000) include author contributions in a format similar to this:

Conceived and designed the experiments: HQ JKC AR NH. 
Performed the experiments: HQ JKC AR MP. 
Analyzed the data: HQ JKC AR MP NH. 
Contributed reagents/materials/analysis tools: CH. 
Wrote the paper: HQ JKC AR NH.

I want to use the PLOS Search API to do a systematic analysis of these author contributions, e.g. how many times the first author was involved in writing the paper, or how often we have co-authors who appear only in the "contributed reagents/materials/analysis tools" section.

This idea is also an exercise in searching the PLOS CC-BY content for machine-readable information, and in using R for data analysis and visualization. I would be happy to introduce people to R and the rplos package created by rOpenSci that makes working with the PLOS Search API much easier. We will do some nice visualizations with the results, and will write a report in markdown (using the R knitr package) that can be posted to the hack4ac website.

  1. Martin Fenner, technical lead of the PLOS article-level metrics project
  2. Experience in R, Ruby, Javascript, PHP
  3. People who can help with asking good questions, data analysis and writing. People with skills in R or interested in learning R, or experience in Solr query syntax a bonus.

Scott Chamberlain: Do let me know if you have any problems with github issues or yell at me on twitter at @recology_

Scott, the basic function we need is working fine:

result <- searchplos(terms = "*:*", fields = "id,author_notes", toquery='doc_type:full', limit=10, key=[your key])

Great! (from scott)

More information in the README of the repo for this hackathon.

Open Journal Typesetting and Citation Parsing

  1. Idea: Open Journal Typesetting and Citation Parsing Improve the layout mechanisms of and help me work on an automated Word/OpenOffice to NLM-journal XML typesetter, including citation parsing.

Tools/Approaches: XSLT, XML, regular expressions, machine learning, free_cite

  1. Martin Eve, Founder, The Open Library of Humanities

  2. Skills: XSLT, XML, Regex

  3. Need team mates!


  1. Open Educational Resources are valuable for independent learners, but contextualising and adapting them to a specific curriculum, or localised needs is often impossible. Learning facilitators need a platform which allows them to 'remix' OERs from a range of different sources into the sequence which suits their context. The first step towards doing this is to adapt an LMS (currently thinking of working with the edX platform - ) for course remixing and mashups in mind.

While publishing is the main output for some academics, many produce teaching content, and want to share that as well. Sharing these teaching resources, alongside academic publications, creates a rich pool of Open Educational Resources for remixing by lecturers, institutions and others who may lack certain areas of expertise, or may simply want to work with a "flipped classroom" pedagogical approach. I'd like to create an easy tool for sharing and remixing high quality OERs into other formats that can make academia more widely accessible and pedagogically progressive.

  1. Joel Mitchell

  2. design, UX/UI, html, js, ruby

  3. Python, Django (thinking of edX platform here, but could use something else)

BIOTEA, RDFizing PubMed Central in support for the paper as an interface to the Web of Data

  1. BIOTEA ( provides RDF for the full-text, open-access subset of PubMed Central (PMC). PMC is a free full-text archive of biomedical literature; currently, it includes 1,679 journals. We identify semantic entities in the content and structure these using the Annotation Ontology. We have the content fully immersed within the web of data. BIOTEA is fully compliant with Bio2RDF. We want to create a geo map of authors and affiliations. We are also interested in new interfaces to scholar documents, in this case this specific dataset, how can we deliver new reading experiences? how can we facilitate rapid concept based reading of scholarly documents? how to best use mash up technology in order to build a new reading experience when it comes to scholarly documents? We have some prototypes we would like to share and discuss. BTW the corresponding paper,

  2. Leyla Garcia, Alexander Garcia

  3. design, Interactive Interfaces, js, Human Interface Interaction, creative, imaginative, think out of the box

  4. a pencil and a piece of paper where u can draft a workflow supporting a new experience in reading, searching and retrieving documents

JailBreaking the PDF

  1. Currently, the bulk of peer-reviewed scientific knowledge is locked up in PDF documents, which are difficult to get information. We are changing this. to change that. We recently had a hackathon at the ESWC in Montpellier, France. It was a great experience, we want to share the outcomes and challenges and invite u all to join us. How to extract meaningful information from PDF? how to transform everything PDF into usable data? how to get all citations in usable formats from PDFs? how to regain control over our content instead of keep having it jailed in PDFs? we want to build a REAL TRULYU OPEN library of scholarly communication. Our technology is applicable to any other domain. We have datasets available,

  2. Leyla Garcia, Alexander Garcia

  3. design, Interactive Interfaces, js, JAVA, creative, imaginative, think out of the box, any programing language

  4. a pencil and a piece of paper where u can draft a workflow supporting a new experience in archiving and delivering usable data from PDFs.

Science gists - A thingamajig to bridge the science gap, A "simple english" version of science

  1. One of the key issues (there are quite a few) in science today is the distance between the general public and the scientist (the science gap). For an entertaining overview of the problem, here's a talk by Jorge Cham (the creator of PhD comics): There are simply too many steps a scientific discovery has to take before it makes it from the mind of a scientist to the mind of a non-scientist, which means the message often gets quite distorted as a result (a game of Chinese whispers played by way too many people) - in the age of the internet, this is unacceptable. I want scientists to communicate more with the world and I want to enable them to do this by building a website where they can contribute simple explanations of the work they did in a particular paper. Each "gist" is linked to a paper and can be referenced in a paper with a friendly URL. The CC BY license is perfect for this, since it allows remixing of the paper content (key figures, quotes). The "gists" can also be contributed by other people (i.e. not authors of the paper), which the CC BY license also enables.

  2. My name is: Jure Triglav

  3. What I bring to the table: I'm a full stack developer / designer.

  4. What skills I need most in my team: Story writers to help convey the idea to both ends of the spectrum of users (scientists and the public)

Martin: PLOS Medicine (and most certainly other journals that provide their content as CC-BY) has the Editor's Summary, a second abstract that describes the paper in simpler terms. Here is an example.

CC0 images

  1. Images are often the most important part of a paper, such as photographs of astronomy, gels, histology, animals, electron-micrographs, etc. In my (and I hope) hack4ac)'s opinion these are data, not copyrightable by publishers. However some publishers such as Springer have copyrighted every image they publish (even when those are not their copyright, see and previous blogs). If authors added a stamp to images asserting that their material was Open (CC0) then the image would be protected for all time. It is easy (I hope) to do this and will try to hack it before the workshop.

  2. Peter Murray-Rust ( )

  3. I can hack Java and hope to bring a working prototype.

  4. I'd like at least one person who is happy to set up a Java server for running a demonstration service and also anyone who wants to evangelise it. It would be fantastic if a CC-BY publisher (or all of them) thought it was a Good Idea and offered it to their authors. Then the idea would spread virally (we can at least hope).

Sort scientific research (specifically psychology) by method used

  1. When trying to design research it is vital to find all the relevant research already done. This sometimes leads to scientists searching for a specific methodology used in a specific contest. As it is now, it is not possible to sort papers by their methodology, even in the two most basic categories - qualitative and quantitative. A sorting tool of this type would very useful to scientists in the field of psychology.

    I have no experience in coding so am not quite sure how this could be accomplished. I have, however, read plenty of psychological research, so I have some knowledge of what such a tool can look for. In an ideal world, this information would be put in the metadata of each paper. This is probably not going to happen, so if the papers can be processed in some way, there are some things that can be used as guides as to the methodology (again, this applies to psychology research) - sample size, for example.

  2. Nevelina Aleksandrova

  3. Basic Python and experience in the scientific field as an undergraduate psychology student.

  4. Programming experience and ideas of any sort would be welcomed.

Audio files of research papers

  1. Reading papers on a computer screen is exhausting and it is a real strain on the eyes and brain. Printing the papers creates a lot of waste and is not environmentally friendly. I have tried using PDF-to-audio software previously, however, it does not quite work, mostly due to the two column layout of most papers. Having a tool that can essentially "read" PDF out loud would be great.

  2. Nevelina Aleksandrova

  3. Basic Python and experience in the scientific field as an undergraduate psychology student.

  4. Programming experience of any sort would be welcomed.

  • Matt thinks this is cool, and believes a naive, manual prototype (along with any coding project or automation) would be excellent! Reminds me of Spoken Wikipedia. It's possible to record then edit with open source software like Audacity and publish with [Wikimedia Commons](] or similar. Chat with me and Daniel (from project below)!

Crossing the Streams: OA Media, Videos on Youtube

  1. Put OA videos on Youtube, remix through the youtube editor, popcorn from Mozilla Webmaker, or other.
  2. Daniel Mietchen + Matt Senate
  3. Bringing Wikimedia/Wikipedia expertise, open science and open access experience, building off OAMI codebase, python coding, video knowledge, enthusiasm for the future of science (duh, it's going to be open) with lots of moving pictures.
  4. Folks at all skill levels welcome! Could use folks with skills in coding, design, text editing, video editing, or even basic web usage. There might be multiple parallel directions and opportunities under re-use of media, and using social media sites like Youtube :)

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.