Skip to content

2. Code and Data Setup

helena edited this page Nov 23, 2014 · 5 revisions

Clone the repo

git clone https://github.com/killrweather/killrweather.git
cd killrweather

Build the code

If this is your first time running SBT, you will be downloading the internet.

cd killrweather
sbt compile
# For IntelliJ users, this creates Intellij project files
sbt gen-idea

Setup - 3 Steps

  1. Download the latest Cassandra and open the compressed file.

     Optional: open /apache-cassandra-{latest.version}/conf/cassandra.yaml and increase batch_size_warn_threshold_in_kb to 64
    
  2. Start Cassandra - you may need to prepend with sudo, or chown /var/lib/cassandra. On the command line:

    ./apache-cassandra-{latest.version}/bin/cassandra -f

  3. Run the setup cql scripts to create the schema and populate the weather stations table.

On the command line start a cqlsh shell:

cd /path/to/reference-apps/timeseries/scala/data
~/apache-cassandra-{latest.version}/bin/cqlsh

You should see:

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh {latest.version} | Cassandra {latest.version} | CQL spec {latest.version} | Native protocol {latest.version}]
Use HELP for help.
cqlsh>

Run the script:

cqlsh> source 'create-timeseries.cql';
cqlsh> source 'load-timeseries.cql';
cqlsh> quit;

Run the app and client app: runnable by command line or in an IDE

To Run from an IDE

First start com.datastax.killrweather.KillrWeatherApp, then com.datastax.killrweather.KillrWeatherClientApp.

To Run from Command Line

cd /path/to/killrweather
sbt app/run

You should see:

Multiple main classes detected, select one to run:

[1] com.datastax.killrweather.SimpleSparkJob
[2] com.datastax.killrweather.KillrWeatherClientApp
[3] com.datastax.killrweather.KillrWeatherApp

Select 3, then open a new window, do the same and select 2.

Test The Data Setup

In cqlsh:

cqlsh> describe keyspace isd_weather_data;
cqlsh> use isd_weather_data;
cqlsh:isd_weather_data> select * from weather_station limit 10;
 

 id           | call_sign | country_code | elevation | lat    | long    | name                  | state_code
--------------+-----------+--------------+-----------+--------+---------+-----------------------+------------
 408930:99999 |      OIZJ |           IR |         4 |  25.65 |  57.767 |                  JASK |       null
 725500:14942 |      KOMA |           US |     299.3 | 41.317 |   -95.9 | OMAHA EPPLEY AIRFIELD |         NE
 725474:99999 |      KCSQ |           US |       394 | 41.017 | -94.367 |               CRESTON |         IA
 480350:99999 |      VBLS |           BM |       749 | 22.933 |   97.75 |                LASHIO |       null
 719380:99999 |      CYCO |           CN |        22 | 67.817 | -115.15 |    COPPERMINE AIRPORT |       null
 992790:99999 |     DB279 |           US |         3 |   40.5 | -69.467 |   ENVIRONM BUOY 44008 |       null
  85120:99999 |      LPPD |           PO |        72 | 37.733 |   -25.7 |   PONTA DELGADA/NORDE |       null
 150140:99999 |      LRBM |           RO |       218 | 47.667 |  23.583 |             BAIA MARE |       null
 435330:99999 |      null |           MV |         1 |  6.733 |   73.15 |              HANIMADU |       null
 536150:99999 |      null |           CI |      1005 | 38.467 |  106.27 |       YINCHUAN (CITY) |       null 

(10 rows)

cqlsh:isd_weather_data>      

If you ever want to clear everything out and start fresh just:

cqlsh> drop keyspace isd_weather_data;

Note: In Production you would use the NetworkTopologyStrategy and a mimimum replication factor of 3. NetworkTopologyStrategy