Skip to content
This repository has been archived by the owner on Oct 20, 2020. It is now read-only.
/ extractorjs Public archive

Extract common information from a string.

License

Notifications You must be signed in to change notification settings

msrch/extractorjs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extractor.js

Extract common information from a string.

About

Extractor.js is a small library that helps to extract common information like dates, times, emails or links from text. It also provides an easy way to add new patterns to extract custom things.

Patterns

Following patterns are incorporated in this library:

  • Date formats
  • Email formats
  • Hash tags
  • Web links
  • Mentions
  • Phone formats
  • Time formats
  • YouTube links

Example

Here is an example paragraph of text:

@friend I have sent you an email to your address name@web.com
on 3rd of June 2013 at 12:36pm about your web www.some-website.com.
Watch this youtu.be/5Jp9_sgJcN0 and then call me (123) 456 7890.
#video #website

Following information/values will be extracted:

{
  "dates": ["3rd of June 2013"],
  "emails": ["name@web.com"],
  "hashtags": ["video", "website"],
  "links": ["www.some-website.com", "youtu.be/5Jp9_sgJcN0"],
  "mentions": ["friend"],
  "phones": ["(123) 456 7890"],
  "times": ["12:36pm"],
  "youtube":[
    {
      "id": "5Jp9_sgJcN0",
      "link": "youtu.be/5Jp9_sgJcN0",
      "thumb": "http://img.youtube.com/vi/5Jp9_sgJcN0/default.jpg",
      "thumbHQ": "http://img.youtube.com/vi/5Jp9_sgJcN0/hqdefault.jpg"
    }
  ]
}

How to use

There are two main ways how to use Extractor.js:

1. Pass a text string → receive an object with results

var results = Extractor('Lorem #ipsum text...');
// results.hashtags = ['ipsum']

The result is an object containing the structure mentioned above.

YouTube results

Besides id, link, thumb and thumbHQ values there is also a method called embed. This allows you to generate an embed code. You can customise the width and height of the <iframe>. Default dimensions are 560x315.

var yt = Extractor('Example youtu.be/5Jp9_sgJcN0 link.'),
    ytEmbed = yt.youtube[0].embed;

ytEmbed();
// <iframe width="560" height="315" src="//www.youtube.com/embed/5Jp9_sgJcN0" frameborder="0" allowfullscreen></iframe>

ytEmbed(640, 480);
// <iframe width="640" height="480" src="//www.youtube.com/embed/5Jp9_sgJcN0" frameborder="0" allowfullscreen></iframe>

2. Calling without arguments → receive pattern methods

var ex = Extractor();

ex.dates('Try 3rd of June 2013');
// ["3rd of June 2013"]

ex.emails('Try some@email.com');
// ["some@email.com"]

Method names match the names of the patterns/variables mentioned above.

Duplicates

By specifying a second argument Boolean you can remove duplicate values. Duplicates are left by default (true).

ex.mentions('Try @one @two and @one');
ex.mentions('Try @one @two and @one', true);
// ["one", "two", "one"]

ex.mentions('Try @one @two and @one', false);
// ["one", "two"]

Advanced usage

Options for Extractor()

You can pass additional settings when parsing a string directly.

var results = Extractor('Lorem ipsum...', {/* additional settings */});

filter

type: Array default: []

Returned object contains only results from patterns specified in the filter.

var dateAndTime = Extractor('Try 1st Jun at 2:00 pm', {
        filter: ['dates', 'times']
    });
// {"dates": ["1st Jun"], "times": ["2:00 pm"]}

without

type: Array default: []

Returned object contains all results except the patterns specified in the array.

var withoutExample = Extractor('Try 1st Jun at 2:00 pm', {
        without: ['emails', 'links', 'mentions', 'times', 'youtube']
    });
// {"dates": ["1st Jun"], "hashtags": [], "phones": []}

duplicates

type: Boolean default: true

Remove duplicate values from the results.

var uniqueResults = Extractor('Try @one @two and @one', {
        duplicates: true
    }).mentions;
// ["one", "two"]

Adding new pattern - Extractor.addPattern()

You can add new pattern as follows.

// Adding "test" pattern which will match word "test"
// and as a result adds "1" to the end of the string.
Extractor.addPattern({
    name: 'test',
    regexp: /\btest\b/gim,
    trim: true,
    postProcessor: function (value) {
        return value + '1';
    }
});

Pattern will be automatically used across the whole library so next time you will use Extractor(...) you will get results for your pattern as well. And also you can use just the method if you call Extractor() without any arguments.

Extractor().test('test and test');
// ["test1", "test1"]

Extractor('test and #other', {
    filter: ['test', 'hashtags', 'times']
});
// {"hashtags": ["other"], "times": [], "test": ["test1"]}

Here is a list of configuration options:

name

type: String default: null

Name of new pattern. Can't use existing pattern name and will accept only lowercase and uppercase letters.

name: 'test',

regexp

type: RegExp default: null

Regular expression that defines behaviour of your pattern - what you want to match.

regexp: /\btest\b/gim,

trim [optional]

type: Boolean default: true

Should the white space around the result value be stripped out.

trim: true,

postProcessor [optional]

type: Function default: null

You can specify post-processing method which will amend the result value as desired.

// Example - just add '1' after a result.
postProcessor: function (value) {
        return value + '1';
    }

Development

Dev environment is set up using Grunt, tests are written in Jasmine.

Licensed MIT.

Grunt

Requires Grunt ~0.4.0.

If you haven't used Grunt before, be sure to check out the Getting Started guide, as it explains how to create a Gruntfile as well as install and use Grunt plugins.

Here is a list of some notable tasks:

default

Runs dev tests and start watch source files (see "watch" task).

build

Runs all the tests and builds the production files.

test-dev

Runs JSHint and tests for source files only.

test-build

Runs tests on production/build files.

watch

Watches source files and if any change detected - runs "test-dev" task.

About

Extract common information from a string.

Resources

License

Stars

Watchers

Forks

Packages

No packages published