Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to test TiSpark #33

Merged
merged 5 commits into from Jul 25, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions .travis.yml
Expand Up @@ -12,3 +12,5 @@ before_install:
script:
- docker ps -a --format="{{.Names}} {{.Image}} {{.Status}}" | grep -v 'Up' | grep -v 'Exited (0)' | awk '{print} END {if (NR>0) {exit 1;}}'
- mysql -h 127.0.0.1 -P 4000 -u root -e "select tidb_version()\G" # test if tidb-server is working
- docker-compose exec tispark-master bash /opt/tispark/tests/loaddata.sh # add some data for tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to load data from host.

  1. download and extract tispark-sample-data
  2. load data from host
    These two steps can be done in before_install section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do that (or why)? those data already extracted in the master image/container.
This call only inject the data into tidb my mysql interface. There is no downloading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can load data from host, we are sure we can access TiDB, so no need to do the select tidb_version() step. And we don't need to add extra script to tispark-master container. But if you insist your option, I'm OK with it.

- docker-compose exec tispark-master /opt/spark/bin/spark-submit /opt/tispark/tests/tests.py # run tispark tests
1 change: 1 addition & 0 deletions docker-compose.yml
Expand Up @@ -123,6 +123,7 @@ services:
- /opt/spark/sbin/start-master.sh
volumes:
- ./config/spark-defaults.conf:/opt/spark/conf/spark-defaults.conf:ro
- ./tispark/tests:/opt/tispark/tests:ro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can start another container connecting to tidb-docker-compose network and run tests there without affecting tispark-master container.

environment:
SPARK_MASTER_PORT: 7077
SPARK_MASTER_WEBUI_PORT: 8080
Expand Down
3 changes: 3 additions & 0 deletions tispark/tests/loaddata.sh
@@ -0,0 +1,3 @@
#!/usr/bin/env bash

mysql -h tidb -P 4000 -u root < /opt/spark/data/tispark-sample-data/dss.ddl
12 changes: 12 additions & 0 deletions tispark/tests/tests.py
@@ -0,0 +1,12 @@
from pyspark.sql import SparkSession
import pytispark.pytispark as pti

spark = SparkSession.builder.master("spark://tispark-master:7077").appName("TiSpark tests").getOrCreate()

ti = pti.TiContext(spark)

ti.tidbMapDatabase("TPCH_001")

count = spark.sql("select count(*) from lineitem").first()['count']

assert 60175 == count