Skip to content
No description, website, or topics provided.
HTML Other
  1. HTML 99.1%
  2. Other 0.9%
Branch: pmr-remote-mas…
Clone or download
Pull request Compare This branch is 6 commits ahead, 20 commits behind ContentMine:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
examplesOld
regex
searchwords
src
.gitignore
.hgignore
LICENSE
README.md
amiGraph.png
notes.txt
pom.xml
word.frequencies.count.xml
word.frequencies.snippets.xml

README.md

AMI

AMI provides a generic infrastructure where plugins can search, index or transform structured documents on a high-through basis. The typical input is structured, normalized, tagged XHTML, possibly containing (or linked to) SVG and PNG files. The plugins are designed to analyse text or graphics or a combination according to the discipline.

Running

There are binaries provided with releases which enable running of ami's plugins; shell and batch scripts are provided to make running these easier. Detailed tutorials of these can be found in the ContentMine software tutorials.

Run on a CTree which contains scholarly.html files of the papers you are analysing. This can be made using Norma.

Schema

ami2-gene --project zika --g.gene --g.type human

Building

mvn package

After the build, shell scripts and batch files to run plugins are in target/appassembler/bin/. Further documentation is in docs/.

Plugins

AMI has a plugin architecture where each problem or community has its own plugin. Examples are "species", "sequence", "regex" and soon some chemistry.

It is often straightforward to develop text-based searches, and this is accessible to most committed scientists. Graphics is always harder and requires bespoke programming. Plugins have been developed for at least:

text targets

  • indexing text by regular expressions (regex).
  • Genbank IDs
  • PDB ids
  • farm-related / agronomy terms
  • chemical species (OSCAR)
  • computational phylogenetics
  • terms identifying Ebola and other haemorrhagic diseases

graphical targets

  • phylogenetic trees
  • chemical structures

The plugin architecture is moderately stable and it requires very little alteration to the codebase to add a new one (hopefully soon this can be done automatically by configuration files).

file structure

The input must be a QuickscrapeNorma directory (QSNorma). This must contain scholarly.html which is used for analysis. When a plugin is run, the output is to the results directory, wiwth a subdirectory for each plugin and a sub-subdirectory for each plugin option: Example:

http_www.foo_1_2/
    fulltext.xml
    fulltext.pdf
    scholarly.html
    results/
        words/
            frequency/
                results.xml
            lengths/
                results.xml
        regex/
            consort0/
                results.xml
            publication/
                results.xml
You can’t perform that action at this time.