Skip to content

To describe the architecture of persona, store configuration files and build the big data platform based on spark.

Notifications You must be signed in to change notification settings

persona-project/persona-core

Repository files navigation

Persona - Core

To describe the architecture of persona and store configuration files.

Architecture

arthictecure

Deployment

bash install.sh

The systemd services will be generated: persona-offline, persona-realtime, persona-flume and persona-backend.
And you can use them as service.

Key points

  1. user_tag_value, moc_post, moc_reply, moc_comment comes from mooc MySql.
  2. wda_mooc maybe come from mooc HDFS.
  3. Spark used for off-line data processing.
  4. Spark Streaming used for real-time data processing.
  5. Redis has been chosen for data caching.

How to choose MySql, HBase and Redis?
- Redis: the data is easy to lose, but fastest.
- HBase: data not lose. Is its deployment easy?
- MySql: too slow.

In indetermination

  1. How to arrange persona - ml module?

Notes

persona大数据平台开发记录-1 业务逻辑数据导入

persona大数据平台开发记录-2 离线数据处理

persona大数据平台开发记录-3 实时日志收集与传输

persona大数据平台开发记录-4 部署过程

About

To describe the architecture of persona, store configuration files and build the big data platform based on spark.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published