# Python Helpers

The python helpers in this IPython notebook serve two purposes:

1. Generate Elixir modules out of Pygments's styles. This gives us 29 styles for free, even if some of the styles are a little weird. It's good to have a choice. Because this introspects python code, it is written in python and will continue to be written in python for as long as it's needed.

2. Generate a module with a bunch of helper parsers to recognize ASCII character classes. This should really be done using macros, but it's non-trivial and I'll turn them into macros at a later time.

I've done this in an IPython notebook because it's the best environment for exploratory programmin using python.

This is not to be regularly used during development, si I didn't bother creating a proper python package or even a requirements file. The external dependencies needed to run this notebook are:

* jupyter
* pygments
* jinja2

## Style Modules Generation

The code is pretty simple. It introspects the Python classes with some help from Pygments itself, and then generates Elixir modules with the same functionality. The architecture is of course quite different (Elixir lexers are *data*, not Objects)~.

Please don't touch the `lib/makeup/styles/html/style_map.ex` file between these markers:

```elixir
 # %% Start Pygments - Don't remove this line 
 ...
 # %% End Pygments - Don't remove this line 
```

Because they will be overwritten if this is run again.

In [67]:
import pygments.styles
import jinja2
import textwrap
from itertools import chain
import os
import re

tokens = [
  'Tok.text',
  'Tok.whitespace',
  'Tok.escape',
  'Tok.error',
  'Tok.other' ,
  'Tok.keyword',
  'Tok.keyword_constant',
  'Tok.keyword_declaration',
  'Tok.keyword_namespace',
  'Tok.keyword_pseudo',
  'Tok.keyword_reserved',
  'Tok.keyword_type' ,
  'Tok.name',
  'Tok.name_attribute',
  'Tok.name_builtin',
  'Tok.name_builtin_pseudo',
  'Tok.name_class',
  'Tok.name_constant',
  'Tok.name_decorator',
  'Tok.name_entity',
  'Tok.name_exception',
  'Tok.name_function',
  'Tok.name_function_magic',
  'Tok.name_property',
  'Tok.name_label',
  'Tok.name_namespace',
  'Tok.name_other',
  'Tok.name_tag',
  'Tok.name_variable',
  'Tok.name_variable_class',
  'Tok.name_variable_global',
  'Tok.name_variable_instance',
  'Tok.name_variable_magic',
  'Tok.literal',
  'Tok.literal_date',
  'Tok.string',
  'Tok.string_affix',
  'Tok.string_backtick',
  'Tok.string_char',
  'Tok.string_delimiter',
  'Tok.string_doc',
  'Tok.string_double',
  'Tok.string_escape',
  'Tok.string_heredoc',
  'Tok.string_interpol',
  'Tok.string_other',
  'Tok.string_regex',
  'Tok.string_sigil',
  'Tok.string_single',
  'Tok.string_symbol',
  'Tok.number',
  'Tok.number_bin',
  'Tok.number_float',
  'Tok.number_hex',
  'Tok.number_integer',
  'Tok.number_integer_long',
  'Tok.number_oct',
  'Tok.operator',
  'Tok.operator_word',
  'Tok.punctuation',
  'Tok.comment',
  'Tok.comment_hashbang',
  'Tok.comment_multiline',
  'Tok.comment_preproc',
  'Tok.comment_preproc_file',
  'Tok.comment_single',
  'Tok.comment_special',
  'Tok.generic',
  'Tok.generic_deleted',
  'Tok.generic_emph',
  'Tok.generic_error',
  'Tok.generic_heading',
  'Tok.generic_inserted',
  'Tok.generic_output',
  'Tok.generic_prompt',
  'Tok.generic_strong',
  'Tok.generic_subheading',
  'Tok.generic_traceback']

style_module_template = jinja2.Template('''
defmodule Makeup.Styles.HTML.{{module_name}} do
  @moduledoc false

  require Makeup.Token.TokenTypes
  alias Makeup.Token.TokenTypes, as: Tok

  @styles %{
    {% for tok in tokens %}
    {%- if styles[ex_to_py[tok]] %}{{ tok }} => "{{ styles[ex_to_py[tok]] }}"{% if not loop.last %},{% endif %}
    {% endif -%}
    {%- endfor %}
  }
  
  alias Makeup.Styles.HTML.Style
  
  @style_struct Style.make_style(
      short_name: "{{ short_name }}",
      long_name: "{{ long_name }}",
      background_color: "{{ background_color }}",
      highlight_color: "{{ highlight_color }}",
      styles: @styles)
      
  def style() do
    @style_struct()
  end
end
''')

style_map_file_fragment = jinja2.Template('''
  {% for (lowercase, uppercase) in pairs %}
  @doc """
  The *{{ lowercase }}* style. Example [here](https://tmbb.github.io/makeup_demo/elixir.html#{{ lowercase }}).
  """
  def {{ lowercase }}_style, do: HTML.{{ uppercase }}.style()
  
  {% endfor -%}
  
  # All styles
  @pygments_style_map_binaries %{
  {% for (lowercase, uppercase) in pairs %}  "{{ lowercase }}" => HTML.{{ uppercase }}.style(),
  {% endfor %}  }
    
  @pygments_style_map_atoms %{
  {% for (lowercase, uppercase) in pairs %}  {{ lowercase }}: HTML.{{ uppercase }}.style(),
  {% endfor %}}


''')

def py_to_ex(cls):
    # We don't want to operate on token classes, only their names
    name = str(cls)
    # They are of the form "Token.*"
    # Trim the "Token." prefix
    name = name.replace('Token.Literal.', 'Token.')
    trimmed = name[6:]
    # Convert to lower case
    # It would be confusing to have them in uppercase in Elixir
    # because they could be mistaken by aliases.
    # Besides, having them in lowercase allows us to use macros
    # to make sure at compile time we're not using any inexistant styles.
    lowered = trimmed.lower()
    # Continue turning them into valid identifiers
    replaced = lowered.replace('.', '_')
    # Turn it into a macro under the Tok alias
    return (str(cls), 'Tok.' + replaced)

def invert(pairs):
    return [(y, x) for (x, y) in pairs]
        
def stringify_styles(styles):
    return dict((str(k), v) for (k,v) in styles.items())

def correct_docs(text, level=2):
    # The module docs are writte in rST.
    # rST is similar enough to markdown that we can fake it
    # by removing the first lines with the title and
    # replacing some directives.
    
    # First, remove all indent
    md = textwrap.dedent(text)
    # Replace the :copyright directive
    md = md.strip().replace(':copyright:', '&copy;')
    # Replace the :license: directive
    md = md.replace(':license:', 'License:')
    # Add a link to the BDS license
    md = md.replace('see LICENSE for details',
                    'see [here](https://opensource.org/licenses/BSD-3-Clause) for details')
    # Escape the '*' character, which is probably not used for emphasis by the license
    md = md.replace('*', '\\*')
    # remove the first 3 lines, which contain the title
    # and indent all lines (2 spaces by default)
    indented = "\n".join(((" " * level) + line) for line in md.split('\n')[3:])
    return indented

def style_to_ex_module(key, value, tokens):
    # Pygments stores the module name and the class name under this weird format
    module_name, class_name = value.split('::')
    # Import the module
    __import__('pygments.styles.' + module_name)
    # Store the module in a variable
    module = getattr(pygments.styles, module_name)
    short_name = module_name
    long_name = class_name[:-5] + " " + class_name[-5:]
    # Extract the class from the module
    style_class = getattr(module, class_name)
    # Map the Elixir styles into Python stringified token classes
    ex_to_py = dict(invert([py_to_ex(k) for k in style_class.styles.keys()]))
    stringified_styles = stringify_styles(style_class.styles)
    # Render the tokens
    return style_module_template.render(
        # Preprocess the docs
        moduledoc=correct_docs(module.__doc__, 2),
        # We take the style name unchanged from Python
        # (including the *Style suffix)
        module_name=style_class.__name__,
        # The elixir token styles
        tokens=tokens,
        # Other class attributes
        short_name=short_name,
        long_name=long_name,
        background_color=style_class.background_color,
        highlight_color=style_class.highlight_color,
        styles=stringified_styles,
        ex_to_py=ex_to_py)

def all_styles(style_map, tokens):
    # This function generates elixir an elixir file (with a module for each Pygments style.
    # It will overwrite existing files.
    for key, value in style_map.items():
        source = style_to_ex_module(key, value, tokens)
        # The path where we'll generate the file
        file_path = os.path.join('lib/makeup/styles/html/pygments/', key + '.ex')
        with open(file_path, 'wb') as f:
            f.write(source.encode())

def generate_style_map_file(style_map):
    sorted_pairs = sorted([
      # Turn the key into a valid Elxir identifier
      (key.replace('-', '_'), value.split('::')[1])
        for (key, value) in style_map.items()
    ])
    # Generate the new text fragment
    new_fragment = style_map_file_fragment.render(pairs=sorted_pairs)
    file_path = os.path.join('lib/makeup/styles/html/style_map.ex')
    with open(file_path, 'r') as f:
        source = f.read()
    
    # Recognize the pattern to replace
    pattern = re.compile(
                 "(?<=  # %% Start Pygments %%)(\r?\n)"
                 "(.*?\r?\n)"
                 "(?=  # %% End Pygments %%)", re.DOTALL)
    # Replace the text between the markers
    replaced = re.sub(
        pattern,
        new_fragment,
        source)
    # Check we've done the right thing
    print(replaced)
    # Replace the file contents
    with open(file_path, 'wb') as f:
        source = f.write(replaced.encode())
    
# (Re)generate modules for all styles
all_styles(pygments.styles.STYLE_MAP, tokens)
# Regenerate the style_map file
generate_style_map_file(pygments.styles.STYLE_MAP)

defmodule Makeup.Styles.HTML.StyleMap do
  alias Makeup.Styles.HTML

  # %% Start Pygments %%
  
  @doc """
  The *abap* style. Example [here](https://tmbb.github.io/makeup_demo/elixir.html#abap).
  """
  def abap_style, do: HTML.AbapStyle.style()
  
  
  @doc """
  The *algol* style. Example [here](https://tmbb.github.io/makeup_demo/elixir.html#algol).
  """
  def algol_style, do: HTML.AlgolStyle.style()
  
  
  @doc """
  The *algol_nu* style. Example [here](https://tmbb.github.io/makeup_demo/elixir.html#algol_nu).
  """
  def algol_nu_style, do: HTML.Algol_NuStyle.style()
  
  
  @doc """
  The *arduino* style. Example [here](https://tmbb.github.io/makeup_demo/elixir.html#arduino).
  """
  def arduino_style, do: HTML.ArduinoStyle.style()
  
  
  @doc """
  The *autumn* style. Example [here](https://tmbb.github.io/makeup_demo/elixir.html#autumn).
  """
  def autumn_style, do: HTML.AutumnStyle.style()
  
  
  @doc """
  The *borland* style. Example [here](https://tmbb.github.io/makeup

# Generating Helper Functions

The following part is an ugly hack until I figure it out how to do it with macros or functions called at compile time. It's important that whatever we do, the functions must support `@doc` attributes.

For each "master" function (say, `letter`), six functions are generated:

* `letter` - recognizes 1 character among the matchers given as arguments
* `letters` - recognizes 0 or more characters among the matchers given as arguments
* `letters1` - recognizes 1 or more characters among the matchers given as arguments
* `letter_` - recognizes 1 character among the matchers given as arguments OR an underscore
* `letters_` - recognizes 0 or more characters among the matchers given as arguments OR underscores
* `letters1_` - recognizes 1 or more characters among the matchers given as arguments OR underscores

These functions are trivial, but getting character matchers right can be a source of errors (did you forget the underscore? Did you forget the uppercase versions?), and having these functions helps in avoiding those errors, as well as making the grammars a little more readable sometimes.

I'm still not very happy with the names of the underscore versions. Maybe they should become `parser_underscore`?

In [3]:
import jinja2

macros = r"""
space: [?\s, ?\t, ?\n, ?\r, ?\f, ?\v]
letter: [?A..?Z, ?a..?z]
lowercase_letter: [?a..?z]
uppercase_letter: [?A..?Z]
alphanum: [?A..?Z, ?a..?z, ?0..?9]
lowercase_alphanum: [?a..?z, ?0..?9]
uppercase_alphanum: [?A..?Z, ?0..?9]
lowercase_word_char: [?a..?z, ?0..?9, ?_]
uppercase_word_char: [?A..?Z, ?0..?9, ?_]
digit: [?0..?9]
hex_digit: [?0..?9, ?a..?f, ?A..?F]
lowercase_hex_digit: [?0..?9, ?a..?f]
uppercase_hex_digit: [?0..?9, ?a..?f]
"""

template = jinja2.Template('''
defmodule Makeup.Lexer.Common.ASCII do
  @moduledoc """
  Helpers to work with ASCII character classes
  """
  
  alias Makeup.Lexer.Common.Macros, as: M
  require M
  
{% for macro in macros %}
  @doc """
  Recognizes an ASCII {{ macro['name_pretty'] }} (`{{ macro['escaped_matchers'] }}`).
  """
  defmacro {{ macro["name"] }}(ast_context) do
    quote do
      M.char(unquote(ast_context), {{ macro["matchers"] }})
    end
  end
  
  @doc """
  Recognizes zero or more ASCII {{ macro['name_pretty'] }}s (`{{ macro['escaped_matchers'] }}`).
  """
  defmacro {{ macro["name"] }}s(ast_context) do
    quote do
      M.chars(unquote(ast_context), {{ macro["matchers"] }}, 0)
    end
  end

  @doc """
  Recognizes one or more ASCII {{ macro['name_pretty'] }}s (`{{ macro['escaped_matchers'] }}`).
  """
  defmacro {{ macro["name"] }}s1(ast_context) do
    quote do
      M.chars(unquote(ast_context), {{ macro["matchers"] }}, 1)
    end
  end

  {% if macro['name'] != 'space' %}
  @doc """
  Recognizes an ASCII {{ macro['name_pretty'] }}s (`{{ macro['escaped_matchers'] }}`) or an underscore (`?_`).
  """
  defmacro {{ macro["name"] }}_(ast_context) do
    quote do
      M.char(unquote(ast_context), {{ macro["matchers_underscore"] }})
    end
  end
  
  @doc """
  Recognizes zero or more ASCII {{ macro['name_pretty'] }}s (`{{ macro['escaped_matchers'] }}`) or underscores (`?_`).
  """
  defmacro {{ macro["name"] }}s_(ast_context) do
    quote do
      M.chars(unquote(ast_context), {{ macro["matchers_underscore"] }}, 0)
    end
  end

  @doc """
  Recognizes one or more ASCII {{ macro['name_pretty'] }}s (`{{ macro['escaped_matchers'] }}`) or underscores (`?_`).
  """
  defmacro {{ macro["name"] }}s1_(ast_context) do
    quote do
      M.chars(unquote(ast_context), {{ macro["matchers_underscore"] }}, 1)
    end
  end
  {% endif %}
{% endfor %}
end
''')

def add_underscore(matchers):
    # 'matchers' is a string, so we'll need some string manipulation
    # to insert the underscore character.
    return matchers[:-1] + ", ?_]"

# Split at the obvious place
pairs = [tuple(line.split(': ', 1)) for line in macros.strip().split('\n')]
macros = [
 dict(name=name,
      name_pretty=name.replace('_', ' '),
      matchers=matchers,
      matchers_underscore=add_underscore(matchers),
      # Escape the matchers for the @doc attribute (in case they contain slashes)
      escaped_matchers=matchers.replace('\\', '\\\\'))
 for (name, matchers) in pairs]
# Print the template so that we can copy-paste to the file 
# (inefficient, but enough for now)
print(template.render(macros=macros))


defmodule Makeup.Lexer.Common.ASCII do
  @moduledoc """
  Helpers to work with ASCII character classes
  """
  
  alias Makeup.Lexer.Common.Macros, as: M
  require M
  

  @doc """
  Recognizes an ASCII space (`[?\\s, ?\\t, ?\\n, ?\\r, ?\\f, ?\\v]`).
  """
  defmacro space(ast_context) do
    quote do
      M.char(unquote(ast_context), [?\s, ?\t, ?\n, ?\r, ?\f, ?\v])
    end
  end
  
  @doc """
  Recognizes zero or more ASCII spaces (`[?\\s, ?\\t, ?\\n, ?\\r, ?\\f, ?\\v]`).
  """
  defmacro spaces(ast_context) do
    quote do
      M.chars(unquote(ast_context), [?\s, ?\t, ?\n, ?\r, ?\f, ?\v], 0)
    end
  end

  @doc """
  Recognizes one or more ASCII spaces (`[?\\s, ?\\t, ?\\n, ?\\r, ?\\f, ?\\v]`).
  """
  defmacro spaces1(ast_context) do
    quote do
      M.chars(unquote(ast_context), [?\s, ?\t, ?\n, ?\r, ?\f, ?\v], 1)
    end
  end

  

  @doc """
  Recognizes an ASCII letter (`[?A..?Z, ?a..?z]`).
  """
  defmacro letter(ast_context) do
    quote do
      M.char(unquote(ast_contex