Skip to content
This repository has been archived by the owner on Sep 8, 2022. It is now read-only.

phptek/silverstripe-staticsiteconnector

 
 

Repository files navigation

WARNING: Project Has Been Archived! Please use phptek/silverstripe-exodus which works with Silverstripe 4.

SilverStripe Static Site Connector

Introduction

This module allows you to extract content from another website by crawling and parsing its DOM structure and transforms it directly into native SilverStripe objects, then imports those objects into SilverStripe's database as though they had been created via the CMS.

Although this has the disadvantage of leaving it unable to extract any information or structure that isn't represented in the site's markup, it means no special access or reliance on particular back-end systems is required. This makes the module suited for legacy and experimental site-imports, as well as connections to websites generated by obscure CMS's.

How it works

Importing a site is a 2 or 3 step process (Depending on user-selection).

  1. Crawl
  2. Import
  3. Rewrite Links (Automatic, if selected in step 2.)

A list of URLs are fetched and extracted from the site via PHPCrawl, and cached in a text file under the assets directory.

Each cached URL corresponds to a page or asset (css, image, pdf etc) that the module will attempt to import into native SilverStripe objects e.g. SiteTree and File.

Page content is imported page-by-page using cUrl, and the desired DOM elements extracted via configurable CSS selectors via phpQuery which is leveraged for this purpose.

Migration

See the included migration documentation for detailed instruction on migrating a legacy site into SilverStripe using the module.

Installation

This module requires the PHP Sempahore functions to work. These are installed by default on Debian and some OS/X PHP distributions, but if you're using Macports you'll need to add the +ipc flag when installing php5.

If compiling PHP from source you need to pass three additional flags to PHP's configure script:

./configure <usual flags> '--enable-sysvsem' '--enable-sysvshm' '--enable-sysvmsg'

Once that's done, you can use Composer to add the module to your SilverStripe project:

#> composer require phptek/staticsiteconnector

Please see the included Migration document, that describes exactly how to configure the tool to perform a site-scrape / migration.

There is also an example database-dump (MySQL/MariaDB only) provided which you can import into your DB to get you up and running quickly.

License

This code is available under the BSD license, with the exception of the PHPCrawl library, bundled with this module which is GPL version 2.

Authors

About

Connector plugin for the SilverStripe External Content module that uses web scraping to import content.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 99.2%
  • Other 0.8%