Skip to content

A modern C++17 lazy sequence library inspired by JavaScript Generators, index of MySQL and the Java Stream API, offering expressive, composable data pipelines with functional-style transformations, collectors, and rich statistics utilities.

License

Notifications You must be signed in to change notification settings

eloyhere/semantic-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic - C++ Stream Processing Library

Overview

Semantic is a C++ stream processing library inspired by JavaScript generators, index from MySQL and Java Stream API, providing functional programming-style lazy evaluation data stream operations.

Core Features

  • Lazy Evaluation: All operations are lazy, executing only when terminal operations are called
  • Functional Programming: Supports higher-order functions, lambda expressions, and function composition
  • Type Safety: Template-based strong type system
  • Cache Optimisation: Automatic caching of statistical computation results
  • Multiple Data Sources: Supports arrays, containers, generators, and various data sources

Quick Start

  • Basic Usage
    #include "semantic.h"
    
    int main() {
      // Create stream from array
      int data[] = {1, 2, 3, 4, 5};
      auto stream = semantic::from(data, 5);
      
      // Chain operations
      stream.filter([](int x) { return x % 2 == 0; })
            .map([](int x) { return x * 2; })
            .cout();  // Output: 4 8
      
      return 0;
    }

Core Components

  • Semantic Stream Class.
    The main stream processing class providing rich intermediate and terminal operations.
    • Creating Streams
      // Empty stream
      auto emptyStream = semantic::empty<int>();
      
      auto unorderedStream = semantic::fromUnordered<int>({1,2,3,4,5})// Creates an unindexed semantic, whose redirect, distinct, sorted, reverse, translate, shuffle would never cause any effect before calling "reindex" method.
      .redirect([](const int& element, const auto& index)-> auto{
          return -index; //Invalid. 
      }).distinct() // Invalid.
      .cout(); // [1,2,3,4,5];
      
      auto orderedStream = semantic::fromOrdered<int>({1,2,3,4,5}) // Creates an indexed semantic, which could redirect, distinct, sorted, reverse, translate, shuffle. Only ordered and reindexed semantic could cause the effect on calling methods above.
      .redirect([](const int& element, const auto& index)-> auto{
          return -index; // Reverses the semantic.
      }).redirect([](const int& element, const auto& index)-> auto{
          return index + 3; // Translates all elements to next 3 points, the positive number moves the tail elements to the head, while the negative number moves the head elements to the tail, zero causes no effect.
      }).cout(); //[3,2,1,5,4]
      
      
      // From values
      auto single = semantic::of(42);
      auto multiple = semantic::of(1, 2, 3, 4, 5);
      
      // From containers
      std::vector<int> vec = {1, 2, 3};
      auto fromVec = semantic::from(vec);
      auto fromList = semantic::from(std::list{1, 2, 3});
      
      // From arrays
      int arr[] = {1, 2, 3};
      auto fromArray = semantic::from(arr, 3);
      
      // Numeric ranges
      auto rangeStream = semantic::range(1, 10);      // 1 to 9
      auto stepStream = semantic::range(1, 10, 2);    // 1,3,5,7,9
      
      // Generated streams
      auto generated = semantic::fill(42, 5);          // Five 42s
      auto randomStream = semantic::fill([]{ return rand() % 100; }, 10);

Intermediate Operations

  • Filtering Operations
    .filter(predicate)          // Filter elements
    .distinct()                 // Remove duplicates
    .distinct(comparator)       // Custom duplicate removal
    .limit(n)                   // Limit quantity
    .skip(n)                    // Skip first n elements
    .takeWhile(predicate)       // Take consecutive elements satisfying condition
    .dropWhile(predicate)       // Drop consecutive elements satisfying condition
  • Transformation Operations
    .map(mapper)                // Element transformation
    .flatMap(mapper)            // Flattening map
    .sorted()                   // Natural sorting
    .sorted(comparator)        // Custom sorting
    .reindex(indexer)          // Reindex elements
    .reverse()                 // Reverse order
    .shuffle()                 // Random shuffle
  • Debugging Operations
    .peek(consumer)            // Inspect elements without modifying stream

Terminal Operations

  • Matching Checks
    .anyMatch(predicate)       // Any element matches
    .allMatch(predicate)       // All elements match  
    .noneMatch(predicate)      // No elements match
  • Search Operations
    .findFirst()               // Find first element
    .findAny()                 // Find any element
  • Reduction Operations
    .reduce(accumulator)       // Reduction operation
    .reduce(identity, accumulator) // Reduction with initial value
  • Collection Operations
    .toVector()               // Convert to vector
    .toList()                 // Convert to list
    .toSet()                  // Convert to set
    .toMap(keyMapper, valueMapper) // Convert to map
    .collect(collector)       // Custom collection
  • Grouping and Partitioning
    .group(classifier)        // Group by classifier
    .partition(n)            // Partition by size
  • Output Operations
    .cout()                  // Output to standard output
    .forEach(consumer)       // Execute operation for each element

Statistics Class

Provides comprehensive statistical computation functionality with cache optimisation.

std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
auto stats = semantic::Statistics<double, double>(data);

// Basic statistics
auto count = stats.count();           // Count
auto sum = stats.sum();               // Sum
auto mean = stats.mean();             // Mean
auto min = stats.minimum();           // Minimum
auto max = stats.maximum();           // Maximum

// Dispersion statistics
auto variance = stats.variance();     // Variance
auto stdDev = stats.standardDeviation(); // Standard deviation
auto range = stats.range();           // Range

// Advanced statistics
auto median = stats.median();         // Median
auto mode = stats.mode();             // Mode
auto quartiles = stats.quartiles();   // Quartiles
auto skewness = stats.skewness();     // Skewness
auto kurtosis = stats.kurtosis();     // Kurtosis

// Frequency analysis
auto frequency = stats.frequency();   // Frequency distribution

Collector

Supports custom collection strategies.

// String concatenation collector
auto concatenator = semantic::Collector<std::string, std::string>(
    []() { return std::string(""); },
    [](std::string& acc, int value) { acc += std::to_string(value); },
    [](std::string a, std::string b) { return a + b; },
    [](std::string result) { return result; }
);

auto result = stream.collect(concatenator);

Advanced Features

Lazy Evaluation Example

auto stream = semantic::range(1, 1000)
    .filter([](int x) { 
        std::cout << "Filtering: " << x << std::endl;
        return x % 2 == 0; 
    })
    .map([](int x) {
        std::cout << "Mapping: " << x << std::endl;
        return x * 2;
    })
    .limit(3);  // Only process first 3 elements

// Nothing executed yet, only executes when terminal operation is called
auto result = stream.toVector();  // Execution starts

Custom Generators

auto fibGenerator = [](const auto& consumer, const auto& interrupt, const auto& redirect) {
    int a = 0, b = 1;
    for (int i = 0; i < 10; ++i) {
        if (interrupt && interrupt(b)) break;
        if (consumer) consumer(b);
        int next = a + b;
        a = b;
        b = next;
    }
};

auto fibStream = semantic::iterate(fibGenerator);

Performance Characteristics

  • Lazy Evaluation: Avoids unnecessary computations
  • Cache Optimisation: Automatic caching of statistical results
  • Zero-copy: Uses references where possible to avoid copying
  • Memory Safety: Smart pointer resource management

Compilation Requirements

  • C++11 or higher
  • Standard Template Library support

API Reference

Key Type Definitions

namespace semantic {
    typedef long long Timestamp;
    typedef unsigned long long Module;
    
    using Runnable = std::function<void()>;
    template <typename R> using Supplier = std::function<R()>;
    template <typename T, typename R> using Function = std::function<R(T)>;
    template <typename T> using Consumer = std::function<void(T)>;
    template <typename T> using Predicate = std::function<bool(T)>;
    // ... and more
}

Factory Functions

// Creation functions
template<typename E> Semantic<E> empty();
template<typename E, typename... Args> Semantic<E> of(Args &&... args);
template<typename E> Semantic<E> from(const E* array, const Module &length);
template<typename E> Semantic<E> range(const E& start, const E& end);
template<typename E> Semantic<E> iterate(const Generator<E>& generator);

Examples

  • Data Processing Pipeline
// Process user data
auto processedUsers = semantic::from(users)
    .reindex()
    .filter([](const User& u) { return u.isActive(); })
    .map([](const User& u) { return u.getName().toUpperCase(); })
    .distinct()
    .sorted()
    .toList();
  • Statistical Analysis
// Analyse sales data
auto salesStats = semantic::from(salesRecords)
    .map([](const Sale& s) { return s.amount(); })
    .toStatistics();

std::cout << "Average sale: " << salesStats.mean() << std::endl;
std::cout << "Sales variance: " << salesStats.variance() << std::endl;

Partition Semantics — The Hidden Superpower

In semantic-cpp, concat(), flat(), and flatMap() do not merge indices globally.
Instead, they preserve the index space of each source stream, effectively treating every concatenated or flattened stream as an independent partition.

This is deliberate and extremely powerful.

What It Means

auto s1 = of(1,2,3).reindex().reverse();     // [3,2,1]
auto s2 = of(4,5,6).reindex().reverse();     // [6,5,4]
auto s3 = of(7,8,9).reindex().reverse();     // [9,8,7]

auto merged = s1.concat(s2).concat(s3)
                 .flat();                    // flatten partitions

merged.reverse().cout();     
// Output: 9 8 7  6 5 4  3 2 1
// → Each partition is reversed independently, then concatenated

All indexing operations (redirect, distinct, sorted, reverse, shuffle, etc.) act only within their original partition when the stream is composed via concat / flat / flatMap.

Real-World Superpowers

Operation after flat() / flatMap() Effect
.sorted() Sort each group/partition independently
.distinct() Remove duplicates within each partition
.reverse() Reverse each group independently
.redirect(...) Reindex each partition independently
.limit(n) / .skip(n) Applied globally across all partitions

Common Idioms

// Group-wise sort (classic big-data pattern)
logs_by_shard.flat().sorted().cout();   
// Each shard is sorted internally; overall result is locally ordered

// Group-wise deduplication
events_by_node.flat().distinct().cout(); 
// Duplicates removed per node, not globally

// Group-wise reverse (e.g. latest-first per user)
messages_by_user.flat().reverse().cout(); 
// Latest messages first in each user partition

When You Need Global Indexing

If you require a single unified index across all partitions:

auto global = streams.flat().reindex();   // materializes a new global index
global.sorted();                          // now truly global

Summary

concat / flat / flatMap + indexing = automatic partition-aware processing.
No extra API, no manual grouping — just pure, composable, partition-local semantics.

This is not a limitation.
This is memory-level distributed computing, for free.

Why semantic-cpp? (The Indexable Revolution)

  • redirect(): Declares index and element mapping.
  • reindex(): Build indexes to enable redirect, distinct,sorted, reverse, translate,shuffle.
  • Small data (<OrderedThreashold elements): Instant indexing. Big data: Pure laziness.
fromUnordered(huge_data)  // No order assumed
    .reindex() // Build ondices now
    .redirect([](auto e, auto i){ return e.key; })  // Now redirect/sorted/distinct/reverse/translate/shuffle could cause effect.
    .filter(...)
    .sorted()   // O(1)!
    .toVector();

License

MIT License

About

A modern C++17 lazy sequence library inspired by JavaScript Generators, index of MySQL and the Java Stream API, offering expressive, composable data pipelines with functional-style transformations, collectors, and rich statistics utilities.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages