Skip to content

OLTP multi‐thread Design Doc

Wu Chencan edited this page Dec 5, 2023 · 1 revision

背景介绍

在算法实现过程中,支持批量和多线程并发的执行方式

实现方案

  • 以Kout算法的BFS模式为例,对可进行并行执行的部分进行分析,然后给出具体的方案

案例分析

  • 以上为测试用图,我们需要从marko节点出发,寻找满足条件depth=2&nearest=true的节点
  1. 阶段0,从marko节点出发,得到邻接节点joshlopvadas
  2. 阶段1,从节点joshlopvadas出发互不干扰,所以可以同时从这几个节点出发,采用多线程并行的方式,得到邻接节点ripplepeter
  • 阶段2中的lopjosh因为nearest参数限制被过滤
  1. 阶段2,depth达到阈值,得到深度距离为2的节点ripplepeter

具体实现

嵌套的Edge Iterator

  • 对需要迭代遍历的Iterator<Id> vertices,我们需要一种数据结构,能够迭代获取每个vertex对应的邻边edges,即一种嵌套的迭代器Iterator<Iterator<Edge>>
    • 第一层Iterator是对目标节点vertices的迭代
    • 第二层Iterator是对具体节点vertex的邻接边edges的迭代
// HugeTraverser.java
public EdgesIterator edgesOfVertices(Iterator<Id> sources,
                                     Directions dir,
                                     List<Id> labelIds,
                                     long degree) {
    return new EdgesIterator(new EdgesQueryIterator(sources, dir, labelIds, degree));
}

public class EdgesIterator implements Iterator<Iterator<Edge>>, Closeable {
    private final Iterator<Iterator<Edge>> currentIt;

    // highlight
    public EdgesIterator(EdgesQueryIterator queryIterator) {
        List<Iterator<Edge>> iteratorList = new ArrayList<>();
        while (queryIterator.hasNext()) {
            iteratorList.add(graph().edges(queryIterator.next()));
        }
        this.currentIt = iteratorList.iterator();
    }

    @Override
    public boolean hasNext() {
        return this.currentIt.hasNext();
    }

    @Override
    public Iterator<Edge> next() {
        return this.currentIt.next();
    }

    @Override
    public void close() throws IOException {
        CloseableIterator.closeIterator(currentIt);
    }
}
  • EdgesQueryIterator:能够根据vertices生成queryIterator,可以迭代获得vertex对应的边查询语句query
  • EdgesIterator:根据可迭代的query,查询到每个vertex对应的edges的迭代器
    • 代码中highlight注释的部分,在内部版server的实现中,后端存储中提供直接接口,可以通过EdgesQueryIterator直接查询并返回一个嵌套的边迭代器Iterator<Iterator<Edge>>
    • 在本次实现中,采用了一种简化的方法,通过迭代获得query进行查询,将查询结果存储在list,然后返回list的迭代器,从而实现嵌套的迭代器

并行执行

// OltpTraverser.java
protected <K> long traverseBatch(Iterator<Iterator<K>> iterator,
                                 Consumer<Iterator<K>> consumer,
                                 String name, int queueWorkerSize) {
    if (!iterator.hasNext()) {
        return 0L;
    }
    AtomicBoolean done = new AtomicBoolean(false);
    Consumers<Iterator<K>> consumers = null;
    try {
        consumers = getConsumers(consumer, queueWorkerSize, done,
                                 executors.getExecutor());
        return consumersStart(iterator, name, done, consumers);
    } finally {
        assert consumers != null;
        executors.returnExecutor(consumers.executor());
    }
}

private <K> long consumersStart(Iterator<Iterator<K>> iterator, String name,
                                AtomicBoolean done,
                                Consumers<Iterator<K>> consumers) {
    long total = 0L;
    try {
        consumers.start(name);
        while (iterator.hasNext() && !done.get()) {
            total++;
            Iterator<K> v = iterator.next();
            consumers.provide(v);
        }
    }
    ...
    return total;
}
  • traverseBatch方法中,进行批量执行和多线程并发执行
    • 参数iterator:是嵌套的Iterator<Iterator<K>>,存储vertices对应的邻接边edges
    • 参数consumer:是一个消费者函数接口,接受Iterator<K>作为参数,执行预设的方法
  • consumersStart方法中,将任务提交给consumers,在consumers中通过多线程的方式进行任务消费

并发控制

  • 目前在KoutKneighbor中完成了并行执行的实现,主要通过KoutRecordsKneighborRecords自带的并发控制完成控制

改造结果

正确性

  • Request URL:POST http://localhost:8080/graphs/hugegraph/traversers/kout
  • Request Body:
{
    "source": "1:marko",
    "steps": {
        "direction": "BOTH",
        "edge_steps": [
            {
                "label": "knows",
                "properties": {

                }
            },
            {
                "label": "created",
                "properties": {

                }
            }
        ],
        "vertex_steps": [
            {
                "label": "person",
                "properties": {

                }
            },
            {
                "label": "software",
                "properties": {}
            }
        ],
        "max_degree": 10000,
        "skip_degree": 100000
    },
    "max_depth": 2,
    "nearest": false,
    "limit": 10000,
    "with_vertex": false,
    "with_path": true,
    "with_edge": false
}
  • 串行Kout Post
{
    "kout": [
        "1:peter",
        "1:josh",
        "2:ripple",
        "2:lop"
    ],
    "size": 4,
    "paths": [
        {
            "objects": [
                "1:marko",
                "2:lop",
                "1:josh"
            ]
        },
        {
            "objects": [
                "1:marko",
                "2:lop",
                "1:peter"
            ]
        },
        {
            "objects": [
                "1:marko",
                "1:josh",
                "2:ripple"
            ]
        },
        {
            "objects": [
                "1:marko",
                "1:josh",
                "2:lop"
            ]
        }
    ],
    ...,
    "measure": {
        "edge_iterations": 10,
        "vertice_iterations": 4,
        "cost(ns)": 329856375
    }
}
  • 并行Kout Post
{
    "kout": [
        "1:peter",
        "1:josh",
        "2:ripple",
        "2:lop"
    ],
    "size": 4,
    "paths": [
        {
            "objects": [
                "1:marko",
                "2:lop",
                "1:josh"
            ]
        },
        {
            "objects": [
                "1:marko",
                "2:lop",
                "1:peter"
            ]
        },
        {
            "objects": [
                "1:marko",
                "1:josh",
                "2:ripple"
            ]
        },
        {
            "objects": [
                "1:marko",
                "1:josh",
                "2:lop"
            ]
        }
    ],
    ...,
    "measure": {
        "edge_iterations": 10,
        "vertice_iterations": 4,
        "cost(ns)": 191413166
    }
}
  • 在多组数据的测试中,并未发现正确性方面存在问题

多线程加速

  • 目前数据量较小,无法测试是否有显著加速效果