You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An unbounded priority queue based on a priority heap.
The elements of the priority queue are ordered according to their natural ordering, or by a Comparator provided at queue >construction time, depending on which constructor is used.
A priority queue does not permit null elements.
A priority queue relying on natural ordering also does not permit insertion of non-comparable objects (doing so may result in >ClassCastException).
The head of this queue is the least element with respect to the specified ordering. If multiple elements are tied for >least value, the head is one of those elements -- ties are broken arbitrarily.
The queue retrieval operations poll, remove, peek, and element access the element at the head of the queue.
A priority queue is unbounded, but has an internal capacity governing the size of an array used to store the elements on >the queue. It is always at least as large as the queue size. As elements are added to a priority queue, its capacity grows >automatically. The details of the growth policy are not specified.
This class and its iterator implement all of the optional methods of the Collection and Iterator interfaces.
The Iterator provided in method iterator() is not guaranteed to traverse the elements of the priority queue in any particular >order. If you need ordered traversal, consider using Arrays.sort(pq.toArray()).
Note that this implementation is not synchronized. Multiple threads should not access a PriorityQueue instance concurrently if >any of the threads modifies the queue. Instead, use the thread-safe PriorityBlockingQueue class.
Implementation note: this implementation provides
O(log(n)) time for the enqueuing and dequeuing methods (offer, poll, remove() and add);
linear time for the remove(Object) and contains(Object) methods;
constant time for the retrieval methods (peek, element, and size).
This class is a member of the Java Collections Framework.
/** * Establishes the heap invariant (described above) in the entire tree, * assuming nothing about the order of the elements prior to the call. */@SuppressWarnings("unchecked")
privatevoidheapify() {
//从最后一个非叶子节点(父亲节点)开始遍历所有父节点,直到堆顶for (inti = (size >>> 1) - 1; i >= 0; i--){
//下沉(将3 or 2者中较大元素下沉)siftDown(i, (E) queue[i]);
}
}
siftDown-下沉
/** * Inserts item x at position k, maintaining heap invariant by demoting x down the tree repeatedly * until it is less than or equal to its children or is a leaf. * * @param k the position to fill * @param x the item to insert */privatevoidsiftDown(intk, Ex) {
if (comparator != null) {
//按自定义顺序swap下沉siftDownUsingComparator(k, x);
} else {
//按字典顺序swap下沉siftDownComparable(k, x);
}
}
/** * Inserts item x at position k, * maintaining heap invariant by promoting x up the tree until it is greater than or equal to its parent, or is the root. * 为保持堆的性质,将插入元素x一路上浮,直到满足x节点值>=父节点值,或者到达根节点; * @param k the position to fill 插入位置 * @param x the item to insert 插入元素 */privatevoidsiftUp(intk, Ex) {
if (comparator != null) {
siftUpUsingComparator(k, x);
} else {
siftUpComparable(k, x);
}
}
/** * Removes the ith element from queue. * <p> * Normally this method leaves the elements at up to i-1, * inclusive, untouched. Under these circumstances, it returns null. * Occasionally, in order to maintain the heap invariant, * it must swap a later element of the list with one earlier thani. * Under these circumstances, * this method returns the element that was previously at the end of the list and is now at some position before i. * This fact is used by iterator.remove so as to avoid missing traversing elements. */@SuppressWarnings("unchecked")
privateEremoveAt(inti) {
asserti >= 0 && i < size;
// 修改次数+1modCount++;
// 堆尾元素Indexints = --size;
if (s == i) {
//如果删除的是堆尾元素,不需要进行siftUpqueue[i] = null;
} else {
//拿出堆尾元素Emoved = (E) queue[s];
queue[s] = null;
//将堆尾元素放到要删除的元素的位置,并执行siftDownsiftDown(i, moved);
//siftDown后,若元素没有改变,可能是因为要删除的结点和堆尾结点是跨子树,或者要删除的结点是叶子结点if (queue[i] == moved) {
//如果删除的元素和堆尾元素不在一个子树,需要siftUp操作siftUp(i, moved);
if (queue[i] != moved) {
returnmoved;
}
}
}
returnnull;
}
Scanning through a large collection of statistics to report the top N items
eg.N busiest network connections, N most valuable customers, N largest disk users...
概念
结构
一维数组
balanced binary heap
:ordered by comparator
, or by the elements'natural ordering
,lowest value
is in queue[0], assuming the queue is nonempty.参数
initialCapacity
:初始化容量,默认为11
;comparator
:用于队列中元素排序;size
:记录队列中元素个数;modCount
:记录队列修改次数;SortedSet
,PriorityQueue
这种有序的结构构建优先队列,直接Arrays.copyOf
把数据复制到queue数组中;构建堆(heapify)
操作;源码解析
heapify-构建堆
siftDown-下沉
offer
add
remove
siftDown
就可以;跨子树
的话,需要从删除位置执行siftUp
操作;删除5,siftdown后
此时还需要siftup一次,才能满足二叉堆的结构
poll
peek
性能
参考二叉堆性能
O(log(n)) time
for theenqueuing
anddequeuing
methods (offer, poll, remove() and add
);linear time
for theremove(Object) and contains(Object)
methods;constant time
for the retrieval methods (peek, element, and size
).线程安全性
并发修改队列时非线程安全,线程安全版本使用
PriorityBlockingQueue
使用场景
PriorityQueue处理优先级场景
如医院急诊科接诊要按病痛的优先级处理;构建好优先队列后逐个poll即可;
PriorityQueue求TopK大/小的元素
使用
小顶堆
来实现TopK问题求解:维护一个大小为K的最大堆,那么在堆中的数都是TopK。海量数据
在海量数据场景下,单机通常不能存放下所有数据。
哈希取模
方式拆分到多台机器上;在每个机器上维护最大堆
;合并
成最终的最大堆。PriorityQueue在Hadoop中的应用
在 hadoop 中,排序是 MapReduce 的灵魂,MapTask 和 ReduceTask 均会对数据按 Key 排序,这个操作是 MR 框架的默认行为,不管你的业务逻辑上是否需要这一操作。
快速排序
和基于堆实现的优先队列
。partition
和key
,当缓冲区容量占用 80%,会
spill
数据到磁盘,生成IFile
文件,Map
结束后,会将IFile
文件排序合并
成一个大文件(基于堆实现的优先级队列),以供不同的reduce
来拉取相应的数据。从 Mapper 端取回的数据已是部分有序,Reduce Task 只需进行一次
归并排序
即可保证数据整体有序。为了提高效率,Hadoop 将
sort
阶段和reduce
阶段并行化
,在
sort
阶段,Reduce Task 为内存和磁盘中的文件建立了小顶堆
,保存了指向该小顶堆根节点的迭代器,并不断的移动迭代器,以将 key 相同的数据
顺次
交给reduce()
函数处理,期间移动迭代器的过程实际上就是不断调整小顶堆的过程(建堆→取堆顶元素→重新建堆→取堆顶元素...),这样,sort 和 reduce 可以并行进行。常见问题
参考
The text was updated successfully, but these errors were encountered: