Skip to content

This repository contains all files used in the sequential pattern mining applied in sentences of PubMed abstracts about the anti-cancer activity of polyphenols, using the R language.

Notifications You must be signed in to change notification settings

ramongsilva/Sequential-pattern-mining-in-pubmed-abstracts-sentences-on-anticancer-activity

Repository files navigation

Sequential pattern mining in PubMed abstracts sentences on anticancer activity

This repository contains all files used in the sequential pattern mining applied at 72.019 sentences with entity associations from PubMed abstracts classified as positive in Text Classification Step. Below, there is information about the files:

  • sequential-pattern-mining-pubmed-abstract-sentences-gh.R: R script for sequential pattern mining in PubMed abstract sentences on polyphenols anticancer activity.
  • anotated_sentences.tsv: tsv file with a list of 72.019 sentences annotated with entities about polyphenols, cancers and genes, for sequential pattern mining. Save this file in the same folder of sequential-pattern-mining-pubmed-abstract-sentences-gh.R script, because it is needed to execute the script.

For more information about this and other steps of the Kaphta Architecture, see sections of the Kaptha Web Tool available in https://portal.ifsuldeminas.edu.br/kaphtawebtool/.

Patterns mined

Below, there is information about the files with the patterns mined, used in the creation of rules for information extraction about anticancer activity in PubMed abstracts:

Rules Dictionary

The sequential pattern mining contributes for creation of a dictionary with 25 rules for the Information Extraction Step. Click to see more information about the Rules Dictionary Implementation.

About

This repository contains all files used in the sequential pattern mining applied in sentences of PubMed abstracts about the anti-cancer activity of polyphenols, using the R language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages