### Generate a set of random stock porfolios

As inputs, we have 
- 4000names.csv
- stock_symbols.csv

In [None]:
val names = sc.textFile("file:///root/summit-spark/notebooks-unfinished/datastax/4000names.csv")
val num_names = names.cache.count

### Read in the stock symbols, and give each one a number

In [None]:
val stock_symbols = sc.textFile("file:///root/summit-spark/notebooks-unfinished/datastax/stock_symbols.csv")
.zipWithIndex
.map(_.swap)
val num_stock_symbols = stock_symbols.cache.count.toInt

### For each name, pick a random number and duplicate it that number of times

Do duplicate it, we take the name and use List.fill to turn the simple name into a list of several copies of the same name.  We use flatMap so instead of emitting a List for each row (list of lists), we flatten it to just emit the inner lists.

- Why do we do it within a mapParitions?

In [None]:
val multi_names = names.mapPartitions( r => {val rnd = scala.util.Random;
                                             r.flatMap( n => List.fill(rnd.nextInt(12))(n)) } )

### for each row, add a random stock (by number)

In [None]:
val withstockIndex = multi_names.mapPartitions( r => {val rnd = scala.util.Random;
       r.map( n => (rnd.nextInt(num_stock_symbols).toLong,(n, rnd.nextInt(15) *100 + 100))) } )

### Join the name, stock # with the stock symbols

In [None]:
val raw_portfolios = withstockIndex.join(stock_symbols)

In [None]:
%%cql create table if not exists stock.portfolios (
        name text,
        stock_symbol text,
        quantity int,
        price float,
        value float,
        primary key (name, stock_symbol))

### Save it

In [None]:
raw_portfolios.map{ case (n, ((name,qty),sym)) => (name, sym, qty)}
 .saveToCassandra("stock","portfolios",SomeColumns("name","stock_symbol","quantity"))

### Check it out

In [None]:
%%cql select * from stock.portfolios limit 20;