sQeeZ Lexer

The sQeeZ Lexer is designed to break down the source code of the sQeeZ scripting language into tokens, which are the fundamental building blocks used by the parser and interpreter. This lexer adheres to the core principles of sQeeZ: compactness, readability by convention, and a focus on data structures.

How to Use

Note:

For convenience, each category (Dependencies, Build, Node-Build, Testing, Code Formatting) has a corresponding script:

dependencies.sh

build.sh

node.sh

test.sh

checkstyle.sh

These scripts can be run directly from the root directory of the project to automate the respective tasks.

Install Dependencies

1. Install Node Dependecies

npm install

Build & Run

To compile the project, follow these steps:

1. Create and Navigate to Build Directory

mkdir build
cd build

2. Configure the Project with CMake

cmake ..

3. Build the Project

cmake --build .

4. Run the generated Executable

cd build
./sQeeZ-Lexer-Exe $FILE_PATH.sqz [--flag]

Node Module

To compile the Lexer API as a Node Module, do the following:

1. Compile the Node Module

cmake-js compile --CDNODE=true

2. Import the Node Module into a JS-File

const addon = require('./build/Release/sQeeZ-Lexer-Node');
const lexer = new addon.LexerNode(require);

An example of how to use this library can be found in the index.js file.

Testing

To run the tests, do the following:

1. Navigate to the Build Directory

cd build

2. Execute the Tests

./lexer_test

or

ctest --output-on-failure

Code Formatting

To format your code, execute the following script, which will apply the clang-format to all .cpp and .hpp files in the project directory:

1. Apply clang-format

function format_file {
  local file=$1
  clang-format -i "$file"
}

for file in $(find . -name '*.cpp' -o -name '*.hpp'); do
  format_file "$file"
done

Tokens

The lexer identifies several token types that reflect the compact nature of the sQeeZ language. Each token is categorized based on its role in the language, and is structured using the Token class. This class provides a flexible and efficient way to represent various tokens using an enum and a union for different token types.

Token Structure

struct Token {
  enum class TypeTag { BASIC, DATA, KEYWORD, LOG, LOGICAL, OPERATOR, SHORT_NOTATION, SYNTAX } tag;

  union TokenType {
    BasicToken basicToken;
    DataToken dataToken;
    KeywordToken keywordToken;
    LogToken logToken;
    LogicalToken logicalToken;
    OperatorToken operatorToken;
    ShortNotationToken shortNotationToken;
    SyntaxToken syntaxToken;

    TokenType() {}
    TokenType(BasicToken t) : basicToken(t) {}
    TokenType(DataToken t) : dataToken(t) {}
    TokenType(KeywordToken t) : keywordToken(t) {}
    TokenType(LogToken t) : logToken(t) {}
    TokenType(LogicalToken t) : logicalToken(t) {}
    TokenType(OperatorToken t) : operatorToken(t) {}
    TokenType(ShortNotationToken t) : shortNotationToken(t) {}
    TokenType(SyntaxToken t) : syntaxToken(t) {}
  } type;

  size_t size;
  size_t pos;
  std::string value;
  std::string plainText;
  std::string desc;

  Token(BasicToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::BASIC),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}
  Token(DataToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::DATA),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}
  Token(KeywordToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::KEYWORD),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}
  Token(LogToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::LOG),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}
  Token(LogicalToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::LOGICAL),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}
  Token(OperatorToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::OPERATOR),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}
  Token(ShortNotationToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::SHORT_NOTATION),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}
  Token(SyntaxToken token, size_t size = 0, size_t pos = 0, std::string value = "", std::string plainText = "",
        std::string desc = "")
      : tag(TypeTag::SYNTAX),
        type(token),
        size(size),
        pos(pos),
        value(std::move(value)),
        plainText(std::move(plainText)),
        desc(std::move(desc)) {}

  std::string toString() const;
  std::string getTagString() const;
  std::string getTypeString() const;
};

Basic Tokens

This enum provides a base for handling fundamental lexical states, ensuring that the lexer properly tracks the beginning, end, and any unrecognized tokens encountered during tokenization.

Token	Size	Value	Description
`INIT`	0	\0	Represents the initial state before any token has been identified.
`TOKEN_EOF`	0	\0	Indicates the end of the input stream (end-of-file).
`UNKNOWN`	1	The Next Charackter	Marks the next charackter as a token that the lexer cannot categorize.

Syntax Tokens

This enum class categorizes various syntax tokens used in the sQeeZ language. These tokens represent punctuation and structural elements that define the organization of the code, such as grouping expressions, separating statements, and denoting specific syntax patterns.

Token	Size	Value	Description
`SEMICOLON`	1	;	The Semicolon, used to terminate statements
`COMMA`	1	,	The Comma, used to separate elements in lists or parameters
`DOT`	1	.	The Dot, typically, used for accessing members or methods
`COLON`	1	:	The colon, used in various syntax constructs like slice operations
`SINGLE_QUOTE`	1	'	The Single Quote, used for character or string literals
`DOUBLE_QUOTE`	1	"	The Double Quote, used for string literals
`OPEN_PARENTHESIS`	1	(	The Opening Parenthesis, used for grouping expressions or function calls
`CLOSE_PARENTHESIS`	1	)	The Closing Parenthesis, used for grouping expressions or function calls
`OPEN_BRACKET`	1	[	The Opening Bracket, used for array or list indexing
`CLOSE_BRACKET`	1	]	The Closing Bracket, used for array or list indexing
`OPEN_BRACE`	1	{	The Opening brace, used to define blocks of code or object literals
`CLOSE_BRACE`	1	}	The Closing brace, used to define blocks of code or object literals
`INLINE_COMMENT`	2	//	The Inline Comment, used for single-line comments in the code
`PIPE`	1	\|	Pipe
`PIPE_OPERATOR`	2	\|>	Pipe Operator
`QUESTION_MARK`	1	?	The Question Mark, used in Lambda Expressions
`ARROW`	2	->	The Arrow Operator, used in lambda expressions
`HASHTAG`	1	#	The Hashtag, used for Hex-Codes
`AT`	1	@	AT

Keyword Tokens

This enum class categorizes the reserved keywords used in the sQeeZ language. These keywords are integral to the language’s syntax and control flow, defining key constructs such as variable declarations, conditional statements, loops, and function definitions. By identifying these tokens, the lexer can correctly parse and interpret language constructs, ensuring that the code adheres to the language's syntax rules and functionality.

Token	Size	Value	Description
`VARIABLE`	3	var	Keyword for declaring a Variable
`CONSTANT`	5	const	Keyword for declaring a Constant
`IF`	2	if	Keyword for initiating a Conditional Statement
`ELSE`	4	else	Keyword for specifying an Alternative Branch in a Conditional Statement
`ELSE_IF`	4	elif	Keyword for adding Additional Conditions to an Conditional Statement
`FOR`	3	for	Keyword for starting a Loop
`WHILE`	5	while	While loop
`DO`	2	do	Do While loop
`FUNCTION`	2	fn	Keyword for defining a Function
`RETURN`	6	return	Keyword for Returning a Value from a Function
`IN`	2	in	In keyword
`OF`	2	of	Of keyword

Operator Tokens

This enum class categorizes various operators used in the sQeeZ language for mathematical operations and assignments. These tokens represent both basic arithmetic operations and their compound assignment counterparts, along with special operators like increment, decrement and potentation.

Token	Size	Value	Description
`ASSIGN`	1	=	Assignment operator
`ADDITION`	1	+	Addition operator
`SUBTRACTION`	1	-	Subtraction operator
`MULTIPLICATION`	1	*	Multiplication operator
`DIVISION`	1	/	Division operator
`MODULUS`	1	%	Modulus operator
`POTENTIATION`	2	**	Potentiation operator (Exponentiation)
`ADDITION_ASSIGNMENT`	2	+=	Addition Assignment operator (Adds and Assigns)
`SUBTRACTION_ASSIGNMENT`	2	-=	Subtraction Assignment operator (Subtracts and Assigns)
`MULTIPLICATION_ASSIGNMENT`	2	*=	Multiplication Assignment operator (Multiplies and Assigns)
`DIVISION_ASSIGNMENT`	2	/=	Division Assignment operator (Divides and Assigns)
`MODULUS_ASSIGNMENT`	2	%=	Modulus Assignment operator (Modulus and Assigns)
`POTENTIATION_ASSIGNMENT`	3	**=	Potentiation Assignment operator (Potentiation and Assigns)
`INCREMENT`	2	++	Increment operator (Increases value by 1)
`DECREMENT`	2	--	Decrement operator (Decreases value by 1)

Logical Tokens

This enum class categorizes logical operators used in conditional expressions within the sQeeZ language. These tokens facilitate the creation of complex logical conditions by providing various operators for equality checks, comparisons, and logical operations.

Token	Size	Value	Description
`EQUAL`	2	==	Compare two values for Equality
`NOT_EQUAL`	2	!=	Compare two values for Non-Equality
`GREATER`	1	>	Compare if one value is Larger than another
`LESS`	1	<	Compare if one value is Smaller than another
`GREATER_EQUAL`	2	>=	Compare if one value is Greater Than or Equal to another
`LESS_EQUAL`	2	<=	Compare if one value is Smaller Than or Equal to another
`AND`	2	&&	The logical AND used to combine two boolean expressions
`OR`	2	\|\|	The logical OR used to combine two boolean expressions
`NOT`	1	!	The logical NOT used to negate a boolean expression

Log Tokens

This enum class categorizes various types of log tokens used in the sQeeZ language for output and debugging purposes. These tokens enable different logging behaviors, including the basic output, a colored formatting for better visibility, and specific log levels such as warnings and errors.

Token	Size	Value	Description
`BASIC`	3	log	Basic Log Output without any special formatting
`COLORED`	4	logc	Log Output with Color Formatting for better visibility
`WARN`	4	warn	Warning Log Output indicating a Potential Issue
`ERROR`	5	error	Error Log Output indicating a Critical Issue that needs attention

Data Tokens

This enum class categorizes various types of data tokens used in the sQeeZ language. These tokens represent different literal types and identifiers, enabling the recognition and handling of various data structures within the language.

Token	Size	Value	Description
`COMMENT_LITERAL`	Length of the Comment Literal	The Comment Literal	Represents a Comment Literal
`STRING_LITERAL`	Length of the String Literal	The String Literal	Represents a String Literal
`HEX_CODE_LITERAL`	The Length of the Hex Code Literal	The Hex Code Literal	Represents a Hex Code Literal
`INTEGER_LITERAL`	Length of the Integer Literal	The Integer literal	Represents an Integer Literal
`DOUBLE_LITERAL`	Length of the Double Literal	The Double literal	Represents a Double Literal
`CHAR_LITERAL`	Length of the Character Literal	The Character Literal	Represents a Character Literal
`BOOLEAN_LITERAL`	Length of the Boolean Literal	true or false	Represents a Boolean Literal
`NULL_LITERAL`	4	null	Represents the absence of a value
`IDENTIFIER`	Size of the Identifier	The Identifier	Marks an Identifier

Short Notation Tokens

This enum class provides predefined function tokens that represent commonly used operations such as map, filter, reduce, and others designed for concise and efficient functional programming. These short notations streamline the syntax, allowing for faster implementation and interpretation while improving code readability.

Token	Size	Value	Description
`LENGTH`	6	LENGTH	Returns the length of an array or string.
`CONCAT`	6	CONCAT	Concatenates multiple arrays or strings together.
`INCLUDES`	8	INCLUDES	Checks if an array or string contains a value.
`INDEX_OF`	8	INDEX_OF	Finds the first index of a value.
`LAST_INDEX_OF`	13	LAST_INDEX_OF	Finds the last index of a value.
`SLICE`	5	SLICE	Extracts part of an array or string.
`PUSH`	4	PUSH	Adds elements to the end of an array.
`POP`	3	POP	Removes the last element from an array.
`SHIFT`	5	SHIFT	Removes the first element from an array.
`UNSHIFT`	7	UNSHIFT	Adds elements to the beginning of an array.
`SPLICE`	6	SPLICE	Adds/removes elements in an array at a position.
`REVERSE`	7	REVERSE	Reverses the elements of an array.
`SORT`	4	SORT	Sorts the elements of an array.
`FILL`	4	FILL	Fills an array with a static value.
`JOIN`	4	JOIN	Joins all elements of an array into a string.
`COUNT`	5	COUNT	Returns the number of elements matching a condition.
`EVERY`	5	EVERY	Checks if all elements satisfy a condition.
`SOME`	4	SOME	Checks if at least one element satisfies a condition.
`FIND`	4	FIND	Returns the first element matching a condition.
`FIND_INDEX`	10	FIND_INDEX	Returns the index of the first matching element.
`FIND_LAST`	9	FIND_LAST	Returns the last element matching a condition.
`FIND_LAST_INDEX`	15	FIND_LAST_INDEX	Returns the index of the last matching element.
`FILTER`	6	FILTER	Creates a new array with matching elements.
`MAP`	3	MAP	Creates a new array by applying a function.
`REDUCE`	6	REDUCE	Reduces an array to a single value.
`FLAT`	4	FLAT	Flattens nested arrays into a single array.
`FLAT_MAP`	8	FLAT_MAP	Maps and flattens the result into a new array.
`FOR_EACH`	8	FOR_EACH	Executes a function for each element.
`HAS_KEY`	7	HAS_KEY	Checks if an object has a specific key.
`KEYS`	4	KEYS	Returns an array of an object's keys.
`VALUES`	6	VALUES	Returns an array of an object's values.
`ENTRIES`	7	ENTRIES	Returns key-value pairs of an object.
`GET`	3	GET	Retrieves a value from an object.
`CHAR_AT`	7	CHAR_AT	Returns a character at a specific position.
`CHAR_CODE_AT`	12	CHAR_CODE_AT	Returns the Unicode code of a character.
`PAD_END`	7	PAD_END	Pads a string at the end with characters.
`PAD_START`	9	PAD_START	Pads a string at the beginning with characters.
`REPEAT`	6	REPEAT	Repeats a string multiple times.
`REPLACE`	7	REPLACE	Replaces part of a string with another string.
`REPLACE_ALL`	11	REPLACE_ALL	Replaces all occurrences of a pattern.
`SPLIT`	5	SPLIT	Splits a string into an array.
`STARTS_WITH`	11	STARTS_WITH	Checks if a string starts with a substring.
`ENDS_WITH`	9	ENDS_WITH	Checks if a string ends with a substring.
`SUBSTRING`	9	SUBSTRING	Extracts part of a string.
`LOWERCASE`	9	LOWERCASE	Converts a string to lowercase.
`UPPERCASE`	9	UPPERCASE	Converts a string to uppercase.
`TRIM`	4	TRIM	Removes whitespace from both ends of a string.
`TRIM_END`	8	TRIM_END	Removes trailing whitespace from a string.
`TRIM_START`	10	TRIM_START	Removes leading whitespace from a string.

Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
.github/workflows		.github/workflows
include		include
src		src
test		test
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
build.sh		build.sh
checkstyle.sh		checkstyle.sh
dependencies.sh		dependencies.sh
index.js		index.js
node.sh		node.sh
package-lock.json		package-lock.json
package.json		package.json
test.sh		test.sh

sQeeZ-scripting-language/lexer

Folders and files

Latest commit

History

Repository files navigation

sQeeZ Lexer

Table of Contents

How to Use

Install Dependencies

1. Install Node Dependecies

Build & Run

1. Create and Navigate to Build Directory

2. Configure the Project with CMake

3. Build the Project

4. Run the generated Executable

Node Module

1. Compile the Node Module

2. Import the Node Module into a JS-File

Testing

1. Navigate to the Build Directory

2. Execute the Tests

Code Formatting

1. Apply clang-format

Tokens

Token Structure

Basic Tokens

Syntax Tokens

Keyword Tokens

Operator Tokens

Logical Tokens

Log Tokens

Data Tokens

Short Notation Tokens

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages