- Al inicio del Jupyter Notebook debe haber una cleda de markdown con la matrícula y el nombre de los integrantes del equipo, así como una breve descripción del proyecto.
- Cada función de Clojure definida en su programa debe contar con una celda de markdown (inmediatamente arriba de la celda de código correspondiente) documentando en un breve enunciado su intención.
- El código con la implementación de la solución del problema debe seguir las convenciones de estilo y codificación de Clojure.


# Resaltador de texto secuencial para python 3.9.5

## Autores:
* Luis Ignacio Ferro Salinas A01378248

Se realiza un resaltador de texto para el lenguaje Python, usando sus especificaciones de léxico más recientes descritas en: [Python lexical analysis]
Las categorías que se escogimos para resaltar son las siguientes.

* Comentarios.
* Keywords.
* Identifiers.
* Strings.
* Bytes.
* Integers.
* Floats.
* Imaginaries.
* Operators.
* Delimiters.

[Python lexical analysis]: <https://docs.python.org/3/reference/lexical_analysis.htmlhttps://docs.python.org/3/reference/lexical_analysis.html>

### Comentarios
Los comentarios en python, según la especificación léxica, comienzan con un #, y son comentarios de línea.

In [41]:
(def regex-comment  #"\#.*")

#'user/regex-comment

In [42]:
(re-seq regex-comment "# lol.\n# Primer comentario serio.")

("# lol." "# Primer comentario serio.")

### Keywords
Los keywords en python son palabras especiales que no pueden ser utilizadas como identificadores porque pueden tienen propósitos específicos.
Son las siguientes:
- False
- True
- None
- and
- as
- assert
- async
- await
- break
- class
- continue
- def
- del
- elif
- else
- except
- finally
- for
- from
- global
- if
- import
- in
- is
- lambda
- nonlocal
- not
- or
- pass
- raise
- return
- try
- while
- with
- yield

In [43]:
(def regex-keyword #"\bFalse\b|\bTrue\b|\bNone\b|\band\b|\bas\b|\bassert\b|\basync\b|\bawait\b|\bbreak\b|\bclass\b|\bcontinue\b|\bdef\b|\bdel\b|\belif\b|\belse\b|\bexcept\b|\bfinally\b|\bfor\b|\bfrom\b|\bglobal\b|\bif\b|\bimport\b|\bin\b|\bis\b|\blambda\b|\bnonlocal\b|\bnot|\bor\b|\bpass\b|\braise\b|\breturn\b|\btry\b|\bwhile\b|\bwith\b|\byield\b")



#'user/regex-keyword

In [44]:
(re-seq regex-keyword "\nif True:
        print()")

("if" "True")

### Identifiers
En python, los identificadores se definen por una normalización NFLK, pero sin considerarla, la especificación nos da una expresión regular:
````
id_start id_continue*
````
En id_start se puede usar el guión bajo, y pueden estar las siguientes categorías:
- Lu
- Ll
- Lt
- Lm
- Lo
- Nl


En id_continue, se permite todo lo que se permite en id_start, y además pueden estar las siguientes categorías:
- Mn
- Mc
- Nd
- Pc

Las categorías tienen los siguientes significados:
- Lu(uppercase)
- Ll(lowercase)
- Lt(titlecase)
- Lm(modifier)
- Lo(other)
- Nl(letter numbers)
- Mn(nonspacing marks)
- Mc(spacing combining marks)
- Nd(decimal numbers)
- Pc(connector punctuations)


In [45]:
(def regex-identifiers #"\b(?:_|\p{L}|\p{Nl})(?:_|\p{L}|\p{Nl}|\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc})*\b")


#'user/regex-identifiers

In [46]:
(re-seq regex-identifiers "print('buenas tardes')")

("print" "buenas" "tardes")

### Strings
Los strings en python tienen un prefijio opcional. Después pueden ser shortstrings delimitados por ' o por ".
También pueden ser longstrings que pueden ocupar múltiples líneas y están delimitados por ''' o por """.
Las secuencias de escape que requieren estados son:
- Hasta 3 dígitos octales: \ooo
- Exactamente 2 dígitos hexadecimales: \xhh
- Un caracter unicode con 16 bits con 4 dígitos hexadecimales: \uxxxx
- Un caracter unicode con 32 bits con 8 dígitos hexadecimales: \Uxxxxxxxx
- Un caracter de unicode por su nombre: \N{name}

In [47]:
(def regex-string #"(?xm)
  (?:[rRuUfF]|fr|fR|FR|Fr|rf|rF|RF|Rf)? 
    (?:
      (?:'''
        (?:[^\\]
         | \\
          (?:[abfnrt'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*''')
    | (?:\"\"\"
        (?:[^\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*\"\"\")
    | (?:'
        (?:[^\n\'\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*')
    | (?:\"
        (?:[^\n\"\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*\"))")


#'user/regex-string

In [48]:
(re-seq regex-string "\"\"\"buenas\"\"\"")

("\"\"\"buenas\"\"\"")

In [49]:
(def texto-ejemplo (slurp "texto_ejemplo.txt"))

#'user/texto-ejemplo

In [50]:
texto-ejemplo

"# This is a comment.\nvariable = 10\n# This is another comment.\nmy_string = \"this is a string\"\n\nmy_second_string = 'this is another string'\nmy_third_string = \"\"\" This string goes for multiple lines\n\t\t      weird\"\"\"\nmy_fourth_string = f'''A formatted string that goes for\n \t\t       multiple lines'''\n"

In [51]:
(re-seq regex-string texto-ejemplo)

("\"this is a string\"" "'this is another string'" "\"\"\" This string goes for multiple lines\n\t\t      weird\"\"\"" "f'''A formatted string that goes for\n \t\t       multiple lines'''")

In [52]:
(def matches (re-seq regex-comment (slurp "texto_ejemplo.txt")))
matches

("# This is a comment." "# This is another comment.")

### Bytes
Los bytes en python siempre tienen un prefijo, pueden ser con comillas simples o dobles, y pueden ser de línea o de bloque usando una comilla o tres respectivamente para encapsular.

Pueden escapar un subconjunto de las secuencias de escape que pueden escapar los strings:
- Hasta 3 dígitos octales: \ooo
- Excatamente 2 dígitos hexadecimales: \xhh

In [53]:
(def regex-byte #"(?xm)
  (?:b|B|br|bR|Br|BR|rb|rB|Rb|RB)
    (?:
      (?:'''
        (?:[^\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}))*''')
    | (?:\"\"\"
        (?:[^\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}))*\"\"\")
    | (?:'
        (?:[^\n\'\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}))*')
    | (?:\"
        (?:[^\n\"\\]
         | \\
          (?:[abfnrt\'\"]
            | [0-7]{1,3}
            | x[a-fA-F0-9]{2}))*\"))")


#'user/regex-byte

In [54]:
(re-seq regex-byte "my_byte")

nil

### Expresión con grupos de captura que regresa para cada match un vector con el string matcheado y los grupos de captura

In [55]:
(def regex-python #"(?xm)
  (\#.*)
| (\bFalse\b|\bTrue\b|\bNone\b|\band\b|\bas\b|\bassert\b|\basync\b|\bawait\b|\bbreak\b|\bclass\b|\bcontinue\b|\bdef\b|\bdel\b|\belif\b|\belse\b|\bexcept\b|\bfinally\b|\bfor\b|\bfrom\b|\bglobal\b|\bif\b|\bimport\b|\bin\b|\bis\b|\blambda\b|\bnonlocal\b|\bnot|\bor\b|\bpass\b|\braise\b|\breturn\b|\btry\b|\bwhile\b|\bwith\b|\byield\b)
| ((?:b|B|br|bR|Br|BR|rb|rB|Rb|RB)
    (?:
      (?:'''
        (?:[^\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}))*''')
    | (?:\"\"\"
        (?:[^\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}))*\"\"\")
    | (?:'
        (?:[^\n\'\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}))*')
    | (?:\"
        (?:[^\n\"\\]
         | \\
          (?:[abfnrt\'\"]
            | [0-7]{1,3}
            | x[a-fA-F0-9]{2}))*\"))) 
| ((?:[rRuUfF]|fr|fR|FR|Fr|rf|rF|RF|Rf)? 
    (?:
      (?:'''
        (?:[^\\]
         | \\
          (?:[abfnrt'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*''')
    | (?:\"\"\"
        (?:[^\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*\"\"\")
    | (?:'
        (?:[^\n\'\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*')
    | (?:\"
        (?:[^\n\"\\]
         | \\
          (?:[abfnrt\'\"]
           | [0-7]{1,3}
           | x[a-fA-F0-9]{2}
           | u[a-fA-F0-9]{4}
           | U[a-fA-F0-9]{8}
           | N\{[LMNPSCZ]\}))*\")))
| (\b(?:_|\p{L}|\p{Nl})(?:_|\p{L}|\p{Nl}|\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc})*\b)")


#'user/regex-python

In [56]:
(defn strings-indices-spans
  [matches]
  (map (fn [match] ; Expression returned by cond is vector with full matching string and the category string.
         (cond 
           (match 1) [(match 0) (last match) (str "<span style='color:red'>" (match 0) "</span>")]
           (match 2) [(match 0) (last match) (str "<span style='color:green'>" (match 0) "</span>")]
           (match 3) [(match 0) (last match) (str "<span style='color:blue'>" (match 0) "</span>")]
           (match 4) [(match 0) (last match) (str "<span style='color:brown'>" (match 0) "</span>")]
           (match 5) [(match 0) (last match) (str "<span style='color:turquoise'>" (match 0) "</span>")]))
       matches))

#'user/strings-indices-spans

In [57]:
(defn replace-strings-by-spans
  [strings-indices-spans-arg temporal-string-arg]
  (loop [temporal-string temporal-string-arg
         strings-indices-spans strings-indices-spans-arg
         string-index-span (first strings-indices-spans-arg)]
  ;(print strings-indices-spans)
  ;(print "\n")
  ;(print string-index-span)
  ;(print "\n")
  ;(print temporal-string)
  ;(print "\n")
  (if (= (count strings-indices-spans) 0)
    temporal-string
    (recur (if (and (first string-index-span) (second string-index-span) (last string-index-span)); string√index√span√
             (str (subs temporal-string 
                                         0 
                                         (second string-index-span)); 1 before the string begins
                                   (last string-index-span)
                                   (subs temporal-string 
                                         (+ (second string-index-span) 
                                            (count (first string-index-span)))))
           temporal-string)
           (let 
             [extra-letters (- (count (last string-index-span)) (count (first string-index-span)))]
             (map (fn
                  [string-index-span-map]
                  [(first string-index-span-map) 
                   (+ (second string-index-span-map)
                      extra-letters)
                   (last string-index-span-map)])
                (rest strings-indices-spans)))
           (let 
             [extra-letters (- (count (last string-index-span)) (count (first string-index-span)))]
             (first (map (fn
                  [string-index-span-map]
                  [(first string-index-span-map) 
                   (+ (second string-index-span-map) 
                      extra-letters)
                   (last string-index-span-map)])
                (rest strings-indices-spans))))))))

#'user/replace-strings-by-spans

In [58]:
(defn generate-matches 
  [matcher] 
  (loop [matches []
         match (re-find matcher)]
    (if (not (first match))
      matches
      (recur (conj matches 
                   (conj match 
                         (.start matcher))) 
             (re-find matcher)))))

#'user/generate-matches

In [59]:
(defn big-guy
  [file-name language-regex]
  (spit (clojure.string/replace file-name ".py" ".html")
        (let [split-html (clojure.string/split (slurp "starting_html.html") #"\<pre\>")]
          (reduce str "" [(split-html 0)
                          "<pre>"
                          (replace-strings-by-spans (strings-indices-spans (generate-matches (re-matcher language-regex 
                                                                                                         (slurp file-name))))
                                                    (slurp file-name))
                          (split-html 1)]))))


#'user/big-guy

In [60]:
(big-guy "python-file.py" regex-python)

nil

# Integers

In [2]:
(def regex-integer #"([+-]?\d+)")

#'user/regex-integer

# Floats

In [4]:
(def regex-floats #"(([-+]?(\d+[.]|[.]\d+)\d*([eE][-+]?\d+)?)|([-+]?\d+[eE][-+]?\d+))")

#'user/regex-floats

# Complex

In [5]:
(def regex-complex #"(([-+]?(\d+[.]|[.]\d+)?\d*([eE][-+]?\d+)?([-+]?(\d+[.]|[.]\d+)?\d*([eE][-+]?\d+)?[j]))|([-+]?\d+[eE][-+]?\d+[j]))")

#'user/regex-complex