Skip to content

Latest commit

 

History

History
179 lines (158 loc) · 11.1 KB

Phoenix的使用.md

File metadata and controls

179 lines (158 loc) · 11.1 KB

Phoenix的使用

安装过程参见:安装Phoenix

启动命令

bin/sqlline.py hadoop101,hadoop102,hadoop103:2181

使用

hbase不存在表
  • 创建表(创建的表会同步建到hbase中)
CREATE TABLE IF NOT EXISTS us_population (
      state CHAR(2) NOT NULL,
      city VARCHAR NOT NULL,
      population BIGINT
      CONSTRAINT my_pk PRIMARY KEY (state, city))
column_encoded_bytes=0;

# 这里没有指定列族,那么默认是0号列族 

主键映射到 HBase 中会成为 Rowkey. 如果有多个主键(联合主键), 会把多个主键的值拼成 rowkey

在 Phoenix 中, 默认会把表名,字段名等自动转换成大写. 如果要使用小写, 需要把他们用双引号括起来.

启动并创建表之后,在hbase中可以看到多了Phoenix的表

hbase(main):001:0> list
TABLE                                                                                                                                      
SYSTEM.CATALOG                                                                                                                             
SYSTEM.FUNCTION                                                                                                                            
SYSTEM.LOG                                                                                                                                 
SYSTEM.MUTEX                                                                                                                               
SYSTEM.SEQUENCE                                                                                                                            
SYSTEM.STATS                                                                                                                               
US_POPULATION                                                                                                                              
fruit                
...
  • 插入数据
upsert into us_population values('NY','NewYork',8143197);
upsert into us_population values('CA','Los Angeles',3844829);
upsert into us_population values('IL','Chicago',2842518);
  • 在hbase中查看结果
hbase(main):015:0> scan 'US_POPULATION'
ROW                                             COLUMN+CELL                                                                                                                               
 CALos Angeles                                  column=0:POPULATION, timestamp=1636257287696, value=\x80\x00\x00\x00\x00:\xAA\xDD                                                         
 CALos Angeles                                  column=0:_0, timestamp=1636257287696, value=x                                                                                             
 ILChicago                                      column=0:POPULATION, timestamp=1636257288767, value=\x80\x00\x00\x00\x00+_\x96                                                            
 ILChicago                                      column=0:_0, timestamp=1636257288767, value=x                                                                                             
 NYNewYork                                      column=0:POPULATION, timestamp=1636257287663, value=\x80\x00\x00\x00\x00|A]                                                               
 NYNewYork                                      column=0:_0, timestamp=1636257287663, value=x                                                                                             
3 row(s) in 0.0260 seconds
hbase表存在

需要映射先到Phoenix中,否则无法查看

  • 方式1 视图映射,适用于只做查询的场景
create view "student" (
id varchar primary key,
"info"."name" varchar,
"info"."age" varchar);

drop view "student";
  • 方式2 表映射,能进行增删改查
create table "student" (
id varchar primary key,
"info"."name" varchar,
"info"."age" varchar
) column_encoded_bytes=0;

upsert into "student" values('1008', 'phoenix', '66');

# 注意,这里的student,因为在hbase创建的时候是小写,那么在Phoenix使用的时候,需要加上双引号
# 否则,Phoenix会找不到表

添加column_encoded_bytes=0这个参数之后, 在 HBase 中添加的数据在 Phoenix 中也可以查询到. 否则查询不到

  • 总结:
    • 相比于直接创建映射表,视图的查询效率会低, 原因是:创建映射表的时候,Phoenix 会在表中创建一些空的键值对,这些空键值对的存在可以用来提高查询效率。
    • 使用create table创建的关联表,如果对表进行了修改,源数据也会改变,同时如果关联表被删除,源表也会被删除。但是视图就不会,如果删除视图,源数据不会发生改变

二级索引

HBase 里面只有 rowkey 作为一级索引, 如果要对库里的非 rowkey 字段进行数据检索和查询, 往往要通过 MapReduce/Spark 等分布式计算框架进行,硬件资源消耗和时间延迟都会比较高

从 0.94 版本开始, HBase 开始支持二级索引。

二级索引的分类,有两种:全局索引、本地索引

  • global全局索引:创建后在hbase中会生成一个表专门储存索引
    • 适合多读少写的场景。因为每次写操作,不仅需要更新数据,还需要更新索引
    • 网络开销大,家中RegionServer的压力
  • local本地索引:创建后在原表中新增一个列族,在列族储存索引信息
    • 使用多写少读的场景
# 全局索引
create index idx_name on user_1(name);
drop index idx_name on user_1;
# 本地索引
create local index idx_addr on user_1(addr);

案例测试:

  • 准备数据
create table user_1(id varchar primary key, name varchar, addr varchar);

upsert into user_1 values ('1', 'zs', 'beijing');
upsert into user_1 values ('2', 'lisi', 'shanghai');
upsert into user_1 values ('3', 'ww', 'sz');
  • 查看查询是否索引
0: jdbc:phoenix:hadoop101:2181> explain select * from user_1 where id='1';
+-----------------------------------------------------------------------------------------------+-----------------+----------------+------+
|                                             PLAN                                              | EST_BYTES_READ  | EST_ROWS_READ  | EST_ |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+------+
| CLIENT 1-CHUNK 1 ROWS 205 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER USER_1  | 205             | 1              | 0    |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+------+
1 row selected (0.052 seconds)
0: jdbc:phoenix:hadoop101:2181> explain select * from user_1 where name='1';
+------------------------------------------------------------------+-----------------+----------------+--------------+
|                               PLAN                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER USER_1  | null            | null           | null         |
|     SERVER FILTER BY NAME = '1'                                  | null            | null           | null         |
+------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.033 seconds)

出现full scan表示全部列扫描,没有使用上索引

  • 创建索引
0: jdbc:phoenix:hadoop101:2181> create index idx_name on user_1(name);
3 rows affected (6.366 seconds)
0: jdbc:phoenix:hadoop101:2181> explain select * from user_1 where name='1';
+------------------------------------------------------------------+-----------------+----------------+--------------+
|                               PLAN                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER USER_1  | null            | null           | null         |
|     SERVER FILTER BY NAME = '1'                                  | null            | null           | null         |
+------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.056 seconds)
0: jdbc:phoenix:hadoop101:2181> explain select name,addr from user_1 where name='1';
+------------------------------------------------------------------+-----------------+----------------+--------------+
|                               PLAN                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER USER_1  | null            | null           | null         |
|     SERVER FILTER BY NAME = '1'                                  | null            | null           | null         |
+------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.049 seconds)
0: jdbc:phoenix:hadoop101:2181> explain select name from user_1 where name='1';
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                   PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_NAME ['1']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY                                       | null            | null           | null         |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.049 seconds)
0: jdbc:phoenix:hadoop101:2181> explain select id,name from user_1 where name='1';
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                   PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_NAME ['1']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY                                       | null            | null           | null         |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.038 seconds)

可以发现如果select中出现没有加索引的的字段,那么就会full scan