Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Add a context-free grammar generator and a lipsum generator

Summary: See discussion in D5605. Add support for generating realistic-seeming test data.

Test Plan:
Generated lipsum:

Lorem laboris irure proident ut voluptate non ("Ipsum Lorem Dolor") minim aliqua ea anim nisi sunt. Ipsum ipsum consequat reprehenderit laborum anim commodo. Amet magna quis sed ("Dolor Dolor Lorem") minim quis laboris ea. Dolor laboris ut sit excepteur exercitation ("Lorem Dolor Sit") labore quis aliquip nisi.

Lorem lorem! Amet adipisicing ullamco duis enim in. Lorem excepteur aute officia ("Sit Lorem Ipsum") labore velit.

Amet proident minim elit dolor sit ("Sit Sit Sit") sed occaecat est. Sit officia.

Lorem voluptate eiusmod lorem amet aute esse ("Ipsum Dolor Sit") eu ea nisi eu. Lorem labore. Lorem dolore fugiat: excepteur, in, elit sunt. Amet lorem qui fugiat dolore! Lorem occaecat fugiat! Ipsum nisi eu: ex, ipsum, nostrud magna.

Dolor ad mollit minim sunt; lorem duis amet quis dolor. Amet culpa magna commodo irure: eu, esse, mollit anim. Dolor sint officia fugiat? Dolor anim. Dolor cillum nostrud fugiat qui ("Sit Sit Amet") tempor mollit. Ipsum amet voluptate magna labore!

Amet eu deserunt amet in. Dolor amet veniam sed aliqua ullamco sit. Amet ea sint aute nulla adipisicing, "Dolor et incididunt duis." Sit enim laborum proident elit ex. Dolor cupidatat; adipisicing irure irure ea consequat excepteur.

Ipsum sunt do amet aliqua aliqua culpa! Lorem dolore nostrud amet ut; nulla est deserunt officia ipsum. Sit ipsum. Lorem esse elit sit ut ex, "Ipsum reprehenderit labore duis fugiat sed." Ipsum sit non aute in; et est.

Sit ad nostrud consectetur esse anim! Ipsum magna aliqua ipsum consequat culpa aute. Ipsum ex sint nulla sit aliquip ("Amet Sit Amet") nulla. Lorem occaecat enim ipsum proident; irure officia sunt aliqua excepteur est.

Amet reprehenderit adipisicing et nisi et sed!

Ipsum enim aute deserunt adipisicing eu; do id laboris fugiat est. Lorem adipisicing. Lorem mollit tempor, "Amet amet magna." Amet laborum ut; culpa dolor sint.

Sit ipsum et laborum deserunt adipisicing. Sit do voluptate sint sint sunt aute, "Lorem proident non cupidatat non amet." Ipsum pariatur ad aliquip mollit aliqua ut: tempor, commodo, duis labore. Sit adipisicing quis duis et aliqua minim?

Sit et sint cupidatat est sint eiusmod?

Ipsum sint magna laborum magna duis; in officia sint in. Dolor labore. Lorem ex qui ea do commodo laborum. Amet dolore laborum tempor? Ipsum anim veniam nulla officia nisi esse; proident proident tempor cupidatat exercitation. Amet commodo quis mollit nisi lorem incididunt: non, exercitation, in laborum.

Ipsum irure veniam consectetur elit nostrud. Lorem non reprehenderit aute aliqua sed culpa: sed, nostrud, sit excepteur. Amet aute in elit laboris sed veniam? Lorem nisi reprehenderit nisi?

Reviewers: chad, vrana, AnhNhan

Reviewed By: chad

CC: aran

Differential Revision: https://secure.phabricator.com/D5606
  • Loading branch information...
commit 2c4192c0b5a9f343ced55825d1fc1d9c683de48c 1 parent f456c6d
Evan Priestley epriestley authored
43 scripts/test/lipsum.php
View
@@ -0,0 +1,43 @@
+#!/usr/bin/env php
+<?php
+
+$root = dirname(dirname(dirname(__FILE__)));
+require_once $root.'/scripts/__init_script__.php';
+
+$args = new PhutilArgumentParser($argv);
+$args->setTagline('test context-free grammars');
+$args->setSynopsis(<<<EOHELP
+**lipsum.php** __class__
+ Generate output from a named context-free grammar.
+EOHELP
+ );
+$args->parseStandardArguments();
+$args->parse(
+ array(
+ array(
+ 'name' => 'class',
+ 'wildcard' => true,
+ ),
+ ));
+
+$class = $args->getArg('class');
+if (count($class) !== 1) {
+ $args->printHelpAndExit();
+}
+$class = reset($class);
+
+$symbols = id(new PhutilSymbolLoader())
+ ->setAncestorClass('PhutilContextFreeGrammar')
+ ->setConcreteOnly(true)
+ ->selectAndLoadSymbols();
+$symbols = ipull($symbols, 'name', 'name');
+
+if (empty($symbols[$class])) {
+ $available = implode(', ', array_keys($symbols));
+ throw new PhutilArgumentUsageException(
+ "Class '{$class}' is not a defined, concrete subclass of ".
+ "PhutilContextFreeGrammar. Available classes are: {$available}");
+}
+
+$object = newv($class, array());
+echo $object->generate()."\n";
8 src/__phutil_library_map__.php
View
@@ -103,6 +103,7 @@
'PhutilConsoleStdinNotInteractiveException' => 'console/PhutilConsoleStdinNotInteractiveException.php',
'PhutilConsoleSyntaxHighlighter' => 'markup/syntax/highlighter/PhutilConsoleSyntaxHighlighter.php',
'PhutilConsoleWrapTestCase' => 'console/__tests__/PhutilConsoleWrapTestCase.php',
+ 'PhutilContextFreeGrammar' => 'grammar/PhutilContextFreeGrammar.php',
'PhutilDaemon' => 'daemon/PhutilDaemon.php',
'PhutilDaemonOverseer' => 'daemon/PhutilDaemonOverseer.php',
'PhutilDefaultSyntaxHighlighter' => 'markup/syntax/highlighter/PhutilDefaultSyntaxHighlighter.php',
@@ -156,6 +157,7 @@
'PhutilLanguageGuesserTestCase' => 'parser/__tests__/PhutilLanguageGuesserTestCase.php',
'PhutilLexer' => 'lexer/PhutilLexer.php',
'PhutilLexerSyntaxHighlighter' => 'markup/syntax/highlighter/PhutilLexerSyntaxHighlighter.php',
+ 'PhutilLipsumContextFreeGrammar' => 'grammar/PhutilLipsumContextFreeGrammar.php',
'PhutilLock' => 'filesystem/PhutilLock.php',
'PhutilLockException' => 'filesystem/PhutilLockException.php',
'PhutilMarkupEngine' => 'markup/PhutilMarkupEngine.php',
@@ -298,7 +300,9 @@
'phutil_get_library_root' => 'moduleutils/moduleutils.php',
'phutil_get_library_root_for_path' => 'moduleutils/moduleutils.php',
'phutil_implode_html' => 'markup/render.php',
+ 'phutil_is_hiphop_runtime' => 'utils/utils.php',
'phutil_is_utf8' => 'utils/utf8.php',
+ 'phutil_is_windows' => 'utils/utils.php',
'phutil_passthru' => 'future/exec/execx.php',
'phutil_render_tag' => 'markup/render.php',
'phutil_safe_html' => 'markup/render.php',
@@ -391,8 +395,7 @@
'LinesOfALargeFile' => 'LinesOfALarge',
'LinesOfALargeFileTestCase' => 'PhutilTestCase',
'PhageAgentTestCase' => 'PhutilTestCase',
- 'PhagePHPAgent' => 'PhageAgent',
- 'PhagePHPAgentBootloader' => 'PhageAgent',
+ 'PhagePHPAgentBootloader' => 'PhageAgentBootloader',
'PhutilAWSEC2Future' => 'PhutilAWSFuture',
'PhutilAWSException' => 'Exception',
'PhutilAWSFuture' => 'FutureProxy',
@@ -453,6 +456,7 @@
'PhutilKeyValueCacheTestCase' => 'ArcanistPhutilTestCase',
'PhutilLanguageGuesserTestCase' => 'PhutilTestCase',
'PhutilLexerSyntaxHighlighter' => 'PhutilSyntaxHighlighter',
+ 'PhutilLipsumContextFreeGrammar' => 'PhutilContextFreeGrammar',
'PhutilLockException' => 'Exception',
'PhutilMarkupTestCase' => 'PhutilTestCase',
'PhutilMetricsChannel' => 'PhutilChannelChannel',
44 src/grammar/PhutilContextFreeGrammar.php
View
@@ -0,0 +1,44 @@
+<?php
+
+/**
+ * Generate nonsense test data according to a context-free grammar definition.
+ */
+abstract class PhutilContextFreeGrammar {
+
+ private $limit = 65535;
+
+ abstract protected function getRules();
+
+ public function generate() {
+ $count = 0;
+ return $this->applyRules('[start]', $count);
+ }
+
+ public function applyRules($input, &$count) {
+ $rules = $this->getRules();
+
+ if (++$count > $this->limit) {
+ throw new Exception("Token replacement count exceeded limit!");
+ }
+
+ $matches = null;
+ preg_match_all('/(\\[[^\\]]+\\])/', $input, $matches, PREG_OFFSET_CAPTURE);
+
+ foreach (array_reverse($matches[1]) as $token_spec) {
+ list($token, $offset) = $token_spec;
+ $token_name = substr($token, 1, -1);
+
+ if (empty($rules[$token_name])) {
+ throw new Exception("Invalid token '{$token_name}' in grammar.");
+ }
+
+ $key = array_rand($rules[$token_name]);
+ $replacement = $this->applyRules($rules[$token_name][$key], $count);
+
+ $input = substr_replace($input, $replacement, $offset, strlen($token));
+ }
+
+ return $input;
+ }
+
+}
117 src/grammar/PhutilLipsumContextFreeGrammar.php
View
@@ -0,0 +1,117 @@
+<?php
+
+final class PhutilLipsumContextFreeGrammar
+ extends PhutilContextFreeGrammar {
+
+ protected function getRules() {
+ return array(
+ 'start' => array(
+ '[sentence]',
+ '[sentence] [sentence]',
+ '[sentence] [sentence] [sentence]',
+ '[sentence] [sentence] [sentence] [sentence]',
+ '[sentence] [sentence] [sentence] [sentence]',
+ '[sentence] [sentence] [sentence] [sentence] [sentence]',
+ '[sentence] [sentence] [sentence] [sentence] [sentence]',
+ '[sentence] [sentence] [sentence] [sentence] [sentence] [sentence]',
+ ),
+ 'sentence' => array(
+ '[words].',
+ '[words].',
+ '[words].',
+ '[words]: [word], [word], [word] [word].',
+ '[words]; [lowerwords].',
+ '[words]!',
+ '[words], "[words]."',
+ '[words] ("[upperword] [upperword] [upperword]") [lowerwords].',
+ '[words]?',
+ ),
+ 'words' => array(
+ '[upperword] [lowerwords]',
+ ),
+ 'upperword' => array(
+ 'Lorem',
+ 'Ipsum',
+ 'Dolor',
+ 'Sit',
+ 'Amet',
+ ),
+ 'lowerwords' => array(
+ '[word]',
+ '[word] [word]',
+ '[word] [word] [word]',
+ '[word] [word] [word] [word]',
+ '[word] [word] [word] [word] [word]',
+ '[word] [word] [word] [word] [word]',
+ '[word] [word] [word] [word] [word] [word]',
+ '[word] [word] [word] [word] [word] [word]',
+ ),
+ 'word' => array(
+ 'ad',
+ 'adipisicing',
+ 'aliqua',
+ 'aliquip',
+ 'amet',
+ 'anim',
+ 'aute',
+ 'cillum',
+ 'commodo',
+ 'consectetur',
+ 'consequat',
+ 'culpa',
+ 'cupidatat',
+ 'deserunt',
+ 'do',
+ 'dolor',
+ 'dolore',
+ 'duis',
+ 'ea',
+ 'eiusmod',
+ 'elit',
+ 'enim',
+ 'esse',
+ 'est',
+ 'et',
+ 'eu',
+ 'ex',
+ 'excepteur',
+ 'exercitation',
+ 'fugiat',
+ 'id',
+ 'in',
+ 'incididunt',
+ 'ipsum',
+ 'irure',
+ 'labore',
+ 'laboris',
+ 'laborum',
+ 'lorem',
+ 'magna',
+ 'minim',
+ 'mollit',
+ 'nisi',
+ 'non',
+ 'nostrud',
+ 'nulla',
+ 'occaecat',
+ 'officia',
+ 'pariatur',
+ 'proident',
+ 'qui',
+ 'quis',
+ 'reprehenderit',
+ 'sed',
+ 'sint',
+ 'sit',
+ 'sunt',
+ 'tempor',
+ 'ullamco',
+ 'ut',
+ 'velit',
+ 'veniam',
+ 'voluptate',
+ ),
+ );
+ }
+
+}
Please sign in to comment.
Something went wrong with that request. Please try again.