Skip to content
Zeppelin-v0.8.0 Notebook演示使用Spark -v2.3.2+ Elasticsearch-v6.3.2构建推荐系统
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Oct 24, 2018
README.md spark download url error Jul 5, 2019
note.json 添加notebook及简单readme Oct 31, 2018

README.md

spark-elasticsearch-recommender

Zeppelin Notebook演示使用Spark + Elasticsearch构建推荐系统

组件

  • Zeppelin 0.8.0
  • Spark 2.3.2
  • Elasticsearch 6.3.2

1.环境准备

Mac OSX

Zeppeline
# http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.8.0/zeppelin-0.8.0-bin-netinst.tgz
$ wget http://mirrors.shu.edu.cn/apache/zeppelin/zeppelin-0.8.0/zeppelin-0.8.0-bin-netinst.tgz
$ tar -zxf zeppelin-0.8.0-bin-netinst.tgz
$ cd zeppelin-0.8.0-bin-netinst

# 安装必要interpreter
$ ./bin/install-interpreter.sh --name md,elasticsearch
$ ./bin/zeppelin-daemon.sh start
Spark
# http://spark.apache.org/downloads.html
$ wget https://www-us.apache.org/dist/spark/spark-2.3.3/spark-2.3.3-bin-hadoop2.7.tgz
$ tar -zxf spark-2.3.2-bin-hadoop2.7.tgz
Elasticsearch
# https://www.elastic.co/downloads/past-releases
# Elasticsearch + 6.3.2
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.zip
$ unzip elasticsearch-6.3.2.zip

# ES-Hadoop + 6.3.2
$ wget https://artifacts.elastic.co/downloads/elasticsearch-hadoop/elasticsearch-hadoop-6.3.2.zip
$ unzip elasticsearch-hadoop-6.3.2.zip
Elasticsearch 矢量评分插件
# 修改build.gradle,这样不必Checkout Elasticsearch 
# https://github.com/muhleder/elasticsearch-vector-scoring/issues/1#issuecomment-415267767
buildscript {
  repositories {
    jcenter()
    mavenLocal()
  }
  dependencies {
    classpath "org.elasticsearch.gradle:build-tools:6.3.2"
  }
}

apply plugin: 'idea'
apply plugin: 'java'
apply plugin: 'elasticsearch.esplugin'

licenseFile = rootProject.file('LICENSE')
noticeFile = rootProject.file('NOTICE')

esplugin {
  name 'elasticsearch-vector-scoring'
  description 'Provides a fast vector multiplication script.'
  classname 'com.gosololaw.elasticsearch.VectorScoringPlugin'
}

dependencies {
  compile "org.elasticsearch:elasticsearch:6.3.2"
}
# 插件安装
$ ./bin/elasticsearch-plugin install {file:///path/to/plugin.zip}
Python依赖库
$ pip install elasticsearch
$ pip install numpy
$ pip install tmdbsimple # 忽略,暂时未使用
Movielens数据集下载
$ cd data # 与zeppelin-0.8.0-bin-netinst同Path,note中配置PATH_TO_DATA = "../data/ml-latest-small"
$ wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
$ unzip ml-latest-small.zip

2.启动服务

Elasticsearch启动
$ ./bin/elasticsearch
Zeppelin配置及启动
$ cp conf/shiro.ini.template conf/shiro.ini
$ vim conf/shiro.ini
# 管理员账户密码
[users]
admin = 123456, admin

$ cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
$ vim conf/zeppelin-env.sh
# Spark配置
export SPARK_HOME=/{apache-spark-path}/spark-2.3.2-bin-hadoop2.7
export SPARK_SUBMIT_OPTIONS="--driver-memory 2G"

$ cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
$ vim conf/zeppelin-site.xml
# 根据需要可以修改zeppelin.server.port等配置

# 启动
$ ./bin/zeppelin-daemon.sh start

3.Notebook

http://localhost:8080

# Create new interpreter
# md

# elasticsearch
elasticsearch.client.type http
elasticsearch.port	9200

# spark
# 添加Dependencies
artifact /{elasticsearch-hadoop-path}/elasticsearch-hadoop-6.3.2/dist/elasticsearch-spark-20_2.11-6.3.2.jar

参考

You can’t perform that action at this time.