Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storm实战—基本概念 #21

Open
johnnian opened this issue Jul 25, 2017 · 0 comments

Comments

1 participant
@johnnian
Copy link
Owner

commented Jul 25, 2017

基础概念

storm-flow

1). Topologies
Storm 运行任务的逻辑单元,由 spouts、bolts 构成的有向图。

2). Tuples
Storm中的基础数据结构,可以包含下面数据结构:integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays, 此外,可以通过序列化(serializers )实现自定义类型的支持;

3). Spouts
数据源,从外部读取数据,传递到 Topologies 内部;

4). Bolts
数据处理单元,Topologies中所有处理都在此进行;

5). Streams
由一组 tuples 构成,Storm的数据流

6). Stream groupings
决定Stream如何分发到bolts,每一种组别对应一种数据传递的策略,目前,Storm中内置了 8 种分组策略:

  • Shuffle grouping: 随机均分Steam,各个 bolts 得到相同数额的数据流;
  • Fields grouping:根据指定的 Fileds 进行定向分发;
  • Partial Key grouping: 和 Fields grouping 类似,在下发的bolts中间进行负载均衡;
  • All grouping: Stream 会逐一复制到下发的 bolts 中进行处理;
  • Global grouping: 所有的Stream向同一个 bolts 传递;
  • None grouping: 目前和Shuffle grouping类似;
  • Direct grouping: Stream传递给指定的bolts;
  • Local or shuffle grouping: 优先传递给正在运行中的bolts,如果没有正在运行的bolts,则按照随机的方式分发;

7). Reliability

Storm保证每个数据流在 topology 中会被完全的传递和处理;

8). Tasks & Workers

Topologies 运行多个工作线程,所有的有作线程均分执行task。

参考

@johnnian johnnian added this to Storm in 大数据技术 Jul 25, 2017

@johnnian johnnian changed the title Storm实战—入门 Storm实战—基本概念 Jul 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.