Skip to content

Commit

Permalink
first functional and tested release
Browse files Browse the repository at this point in the history
  • Loading branch information
martinsik committed May 13, 2015
1 parent 5ea3d32 commit 8b78ed4
Show file tree
Hide file tree
Showing 10 changed files with 203 additions and 121 deletions.
7 changes: 7 additions & 0 deletions CHANGES.md
@@ -0,0 +1,7 @@
# 2.0.0 / 2015-05-13

* Generated JSON is incompatible with previous versions!
* First proper release to packagist.org
* Test rewritten to Behat
* Totally refactored code
* Added CLI interface
21 changes: 21 additions & 0 deletions LICENSE
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2015 Martin Sikora

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
206 changes: 137 additions & 69 deletions README.md
@@ -1,123 +1,191 @@
# PHP Documentation Parser

This is a standalone script that takes entire PHP documentation in "many HTML files" version and generates single JSON file with all standard classes and functions.
This package downloads gziped documentation from php.net, parses it and outputs all found functions as JSON with Markdown syntax. It comes with CLI interface for comfortable usage.

## Try it
[![](https://raw.githubusercontent.com/martinsik/php-doc-parser/master/doc/animation.gif)](https://raw.githubusercontent.com/martinsik/php-doc-parser/master/doc/animation.gif)

This repository comes out of the box with already parsed JSON output for English documentation in [`output\en`](https://github.com/martinsik/php-doc-parser/tree/master/output/en) so if you just want to see whether it's useful for you, you can try it right away.
## Installation

By the way, at the bottom of this page there's parsed `str_replace` function in prettified JSON.

## Usage

1. **Download documentation**
Choose language you prefer, download "Many HTML files" documetation from http://php.net/download-docs.php and unpack it wherever you want.

2. **Run the parser script:**
Add `martinsik/php-doc-parser` to your `composer.json` dependencies:
```
"require": {
...
"martinsik/php-doc-parser": "~2.0"
}
```

php54 parser.php unpacked_documentation output_directory
Then run `composer.phar install`.

Note: `parser.php` requires PHP 5.4 because it uses some new `json_encode` options.
## Usage

When it's finished you should see in your `output_directory` three files: `database.json`, `functions.json` and `stats.json`.
### As a CLI script

Composer adds `doc-parser` file to your directory with binaries (`vendor/bin` by default). Run it and follow the instructions on the screen.

$ vendor/bin/doc-parser

For more information about available parameters type: `php parser.php --help`.
Results are saved into `output` directory by default. This creates following files (names are generated by selected language and mirror):

Output directory should contain three files:
- `en_php_net.json` - Very large associative array with all parsed functions and their data. See sample output bellow.
- `en_php_net.list.json` - List of all function names in lowercase.
- `en_php_net.examples.json` (optional) - If you chose to export examples it'll put them into a separate file.

1. `database.json` - entire parsed documentation as JSON.
2. `stats.json` - contains 3 variables. `methods` and `examples` are just for debugging. The first one means the total number of functions/classes parsed from the source documentation and saved in `database.json`. `examples` is number of functions with lang snippets. The last one `timestamp` means when was the `database.json` generated and is used to upgrade the Web SQL database when a new version of PHP Ninja Manual is released.
3. `functions.json` - one big array of all parsed functions (useful for autocomplete).
For full list of options run:

$ vendor/bin/doc-parser help parser:run

### As a 3rd party package

## What is it good for?
Create an instance of `DocParser\Package` class to set language and mirror you want to parse and it'll download and unpack the documentation for you.
Then give the `DocParser\Parser` directory with files you want to parse and it'll return a `DocParser\ParserResult` object with all data as arrays.

IDEs, tools that need to use somehow structured PHP documentation.
```php
use DocParser\Package;
use DocParser\Parser;

## Why?
$package = new Package('en', 'php.net');
$tmpFile = sys_get_temp_dir() . DIRECTORY_SEPARATOR . $package->getOrigFilename();
$package->download($tmpFile);
$unpackedDir = $package->unpack();

I use this script to generate "database" for my Google Chrome Extension called [PHP Ninja Manual](https://chrome.google.com/webstore/detail/clbhjjdhmgeibgdccjfoliooccomjcab "PHP Ninja Manual"). It takes all classes and functions in `database.json` and indexes Web SQL database which is very fast and easy to use.
$result = $parser->processDir($unpackedDir, Parser::EXPORT_EXAMPLES);
// you can parse just a single file with: $parser->processFile('file.html');

By the way there's an official [PHP Documentation generator](https://wiki.php.net/doc/articles/phd_ide) for IDEs, but when I started developing my extension it didn't suit my needs. I don't know what are its capabilities now but maybe it's worth a try.
foreach ($result->getResult() as $funcName => $funcData) {
// Note that all function names used as keys are lowercase.
// Proper function names are in parameter lists (see sample bellow).
// eg.: $funcData['params'][0]['name']

// Get all examples for this function.
// $result->getExamples($funcName);

// If you used Parser::IMPORT_EXAMPLES then examples are right in $funcData.
// With Parser::SKIP_EXAMPLES they're not parsed at all.
}

## What it looks like
// Remove all temporary files
$package->cleanup();
```

Structure of `database.json` is pretty straight forward.
## Sample output

This is how `str_replace` looks like deep inside in [`output\en`](https://github.com/martinsik/php-doc-parser/tree/master/output/en).
This is what `DateTime::setDate` looks like deep inside `en_php_net.json`.

{
...
"str_pad": { ... }
"str_repeat": { ... }
"str_replace":
{
"name": "str_replace",
"desc": "Replace all occurrences of the search string with the replacement string.",
"long_desc": "This function returns a string or an array with all occurrences of `search` in `subject` replaced with the given `replace` value.\\n\\nIf you don't need fancy replacing rules (like regular expressions), you should always use this function instead of preg\\_replace().",
"ver": "PHP 4, PHP 5",
"ret_desc": "This function returns a string or an array with the replaced values.",
"abs": { ... },
"array_pop": { ... },
...
"datetime::add": { ... },
"datetime::setdate": {
"desc": "Sets the date.",
"long_desc": "Resets the current date of the DateTime object to a different date.",
"ver": "PHP 5 >= 5.2.0",
"ret_desc": "Returns the DateTime object for method chaining or FALSE on failure.",
"seealso": [
"str_ireplace",
"substr_replace",
"preg_replace",
"strtr"
"DateTime::setISODate",
"DateTime::setTime"
],
"url": "function.str-replace",
"class": null,
"filename": "datetime.setdate",
"params": [
{
"list": [
{
"type": "mixed",
"var": "$search",
"beh": 0,
"desc": "The value being searched for, otherwise known as the needle`. An array may be used to designate multiple needles."
"type": "int",
"var": "$year",
"beh": "required",
"desc": "Year of the date."
},
{
"type": "int",
"var": "$month",
"beh": "required",
"desc": "Month of the date."
},
{
"type": "int",
"var": "$day",
"beh": "required",
"desc": "Day of the date."
}
],
"name": "DateTime::setDate",
"ret_type": "DateTime"
},
{
"list": [
{
"type": "DateTime",
"var": "$object",
"beh": "required",
"desc": "Procedural style only: A DateTime object returned by date\\_create(). The function modifies this object."
},
{
"type": "mixed",
"var": "$replace",
"beh": 0,
"desc": "The replacement value that replaces found `search` values. An array may be used to designate multiple replacements."
"type": "int",
"var": "$year",
"beh": "required",
"desc": "Year of the date."
},
{
"type": "mixed",
"var": "$subject",
"beh": 0,
"desc": "The string or array being searched and replaced on, otherwise known as the haystack`.\\n\\nIf `subject` is an array, then the search and replace is performed with every entry of `subject`, and the return value is an array as well."
"type": "int",
"var": "$month",
"beh": "required",
"desc": "Month of the date."
},
{
"type": "int",
"var": "&$count",
"beh": 1,
"desc": "If passed, this will be set to the number of replacements performed."
"var": "$day",
"beh": "required",
"desc": "Day of the date."
}
],
"ret_type": "mixed"
"name": "date_date_set",
"ret_type": "DateTime"
}
],
"examples": [
{
"title": "Basic str_replace() examples",
"source": "\/\/ Provides: <body text='black'>\n$bodytag = str_replace(\"%body%\", \"black\", \"<body text='%body%'>\");\n\n\/\/ Provides: Hll Wrld f PHP\n$vowels = array(\"a\", \"e\", \"i\", \"o\", \"u\", \"A\", \"E\", \"I\", \"O\", \"U\");\n$onlyconsonants = str_replace($vowels, \"\", \"Hello World of PHP\");\n\n\/\/ Provides: You should eat pizza, beer, and ice cream every day\n$phrase  = \"You should eat fruits, vegetables, and fiber every day.\";\n$healthy = array(\"fruits\", \"vegetables\", \"fiber\");\n$yummy   = array(\"pizza\", \"beer\", \"ice cream\");\n\n$newphrase = str_replace($healthy, $yummy, $phrase);\n\n\/\/ Provides: 2\n$str = str_replace(\"ll\", \"\", \"good golly miss molly!\", $count);\necho $count;",
"output": null
"title": "DateTime::setDate() example",
"source": "$date = new DateTime();\n$date->setDate(2001, 2, 3);\necho $date->format('Y-m-d');",
"output": "2001-02-03"
},
{
"title": "Examples of potential str_replace() gotchas",
"source": "\/\/ Order of replacement\n$str     = \"Line 1\\nLine 2\\rLine 3\\r\\nLine 4\\n\";\n$order   = array(\"\\r\\n\", \"\\n\", \"\\r\");\n$replace = '<br \/>';\n\n\/\/ Processes \\r\\n's first so they aren't converted twice.\n$newstr = str_replace($order, $replace, $str);\n\n\/\/ Outputs F because A is replaced with B, then B is replaced with C, and so on...\n\/\/ Finally E is replaced with F, because of left to right replacements.\n$search  = array('A', 'B', 'C', 'D', 'E');\n$replace = array('B', 'C', 'D', 'E', 'F');\n$subject = 'A';\necho str_replace($search, $replace, $subject);\n\n\/\/ Outputs: apearpearle pear\n\/\/ For the same reason mentioned above\n$letters = array('a', 'p');\n$fruit   = array('apple', 'pear');\n$text    = 'a p';\n$output  = str_replace($letters, $fruit, $text);\necho $output;",
"output": null
"title": "Values exceeding ranges are added to their parent values",
"source": "$date = new DateTime();\n\n$date->setDate(2001, 2, 28);\necho $date->format('Y-m-d') . \"\\n\";\n\n$date->setDate(2001, 2, 29);\necho $date->format('Y-m-d') . \"\\n\";\n\n$date->setDate(2001, 14, 3);\necho $date->format('Y-m-d') . \"\\n\";",
"output": "2001-02-28\n2001-03-01\n2002-02-03"
}
]
},
"str_rot13": { ... },
"str_shuffle": { ... },
....
"date_date_set": "DateTime::setDate",
"datedime::createfromformat": { ... },
"date_create_from_format": "DateTime::createFromFormat",
...
"strpos": { ... }
"tempnam": { ... }
...
}

Note that this function has two different definitions, `DateTime::setDate` and `date_date_set`, where each takes different parameters. In order to be able to search both functions there are two keys for this function, where the second key, `date_date_set`, is just a reference to the first one. Also, all keys are lowercase.

## Why?

I use this script to generate "database" for my Google Chrome Extension called [PHP Ninja Manual](https://chrome.google.com/webstore/detail/clbhjjdhmgeibgdccjfoliooccomjcab "PHP Ninja Manual").

By the way there's an official [PHP Documentation generator](https://wiki.php.net/doc/articles/phd_ide) for IDEs, but when I started developing my extension it didn't exist. I don't know what are its capabilities now but maybe it's worth a try.

## Known limitations

* There are no PHP statements (for, if, while, ...)
* It's not able to recognize objective or procedural style in classes like in `mysqli`.

## Testing

This package uses [Behat](https://github.com/Behat/Behat) for testing. Run tests with:

$ bin/behat

## License

PHP Documentation Parser is licensed under the Beerware license.
PHP Documentation Parser (this package) is licensed under MIT license.

PHP Documentation pages ([php.net/docs.php](http://php.net/docs.php)) are licensed under [Creative Commons Attribution 3.0 License](http://creativecommons.org/licenses/by/3.0/legalcode).
2 changes: 1 addition & 1 deletion composer.json
@@ -1,7 +1,7 @@
{
"name": "martinsik/php-doc-parser",
"homepage": "https://github.com/martinsik/php-doc-parser",
"description": "Parser for PHP documentation with CLI interface.",
"description": "Parser for PHP documentation with CLI interface and output to JSON + Markdown.",
"license": "MIT",
"require": {
"php": ">=5.4",
Expand Down
Empty file modified doc-parser 100644 → 100755
Empty file.
Binary file added doc/animation.gif
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions features/test-manual-files/datetime.setdate.json
Expand Up @@ -15,19 +15,19 @@
"type": "int",
"var": "$year",
"beh": "required",
"desc": "Procedural style only: A DateTime object returned by date\\_create(). The function modifies this object."
"desc": "Year of the date."
},
{
"type": "int",
"var": "$month",
"beh": "required",
"desc": "Year of the date."
"desc": "Month of the date."
},
{
"type": "int",
"var": "$day",
"beh": "required",
"desc": "Month of the date."
"desc": "Day of the date."
}
],
"name": "DateTime::setDate",
Expand Down
2 changes: 1 addition & 1 deletion features/test-manual-files/eventhttp.setcallback.json
Expand Up @@ -26,7 +26,7 @@
"type": "string",
"var": "$arg",
"beh": "optional",
"desc": "EventHttpRequest object."
"desc": "Custom data."
}
],
"name": "EventHttp::setCallback",
Expand Down
14 changes: 14 additions & 0 deletions src/Command/RunCommand.php
Expand Up @@ -220,6 +220,8 @@ private function parse(Package $package, $outDir, $includeExamples) {
$this->saveOutput($basePath, $functions);
$this->output->writeln("Total functions: <info>" . count($functions) . "</info>");

$this->saveFunctionsList($basePath, $functions);

if ($includeExamples == Parser::EXPORT_EXAMPLES) {
$this->saveExamples($basePath, $examples);
$this->output->writeln("Total examples: <info>" . $results->countAllExamples() . "</info>");
Expand All @@ -246,6 +248,18 @@ private function saveExamples($basePath, $examples) {
$this->printJsonError();
}

private function saveFunctionsList($basePath, $functions) {
$normalized = array_map(function($name) {
return strtolower($name);
}, array_keys($functions));

$json = json_encode($normalized, $this->getJsonEncoderFlags());
$filePath = $basePath . '.list.json';
file_put_contents($filePath, $json);
$this->output->writeln("Saving list of all functions to <info>${filePath}</info>");
$this->printJsonError();
}

private function getTmpDir() {
return ($this->input->getOption('tmp-dir') ?: sys_get_temp_dir());
}
Expand Down

0 comments on commit 8b78ed4

Please sign in to comment.