# Project 3

**Due: Wednesday 27 Nov 2019 11:59PM**

This project's goal is to give you some practice using some of the libraries and use cases we are going through in class. It has the following learning outcomes:

1. Make you comfortable using JIFF, SQLAlchemy, and CILK.
2. Give you hands-on experience with MPC, ORM/Database programming, and parallel programming.
3. Demonstrate differences between MPC/parallel programs and their non-secure/sequential counter parts.
4. Demonstrate the effectivness of certain embedding and programming languages techniques, as well as API design, on the programmers experience.

**Submission:** This mini-project consists of three problems. Please submit your solutions through this Google Form: [https://forms.gle/Spps1b71ZU8ndAFu7](https://forms.gle/Spps1b71ZU8ndAFu7) **The solution of each problem should be submitted as a different Zip file containing your solution files. Your files should have the format described in each problem.**

**IMPORTANT: do not include the jiff git submodule (\<course-repo\>/project-3/problem1/jiff) in your submission!**

**Programming Languages:** You will have to use Javascript, Python, and a little C for problems 1, 2, and 3 respectively.

**Collaboration:** You should solve the problems individually. You are allowed to discuss the problems and your ideas with classmates, but you are prohibited from sharing or copying your solution code (in whole or part). You are free to use any ideas or code you find online.

**Grading:** To receive full credit, you need only to solve any 2 problems of the three below. Choose whichever ones you find more interesting. You can solve all 3 problems for bonus credit.

The grade assigned to each part is displayed next to its description. You will receive full credit if your solution meets all the requirements in the description and passes our correctness tests. You will receive partial credit for parts that are not completely solved or that fail some tests. In addition, you can receive bonus points for cleaner or more efficient solutions, as specified in each problem.

Feel free to ask any questions on Piazza. Remember, Piazza participation = extra credits!

## Problem 1 - JIFF (50 points)

In this problem, you task is to implement an MPC version of blackjack using JIFF. You can find all the files you need for this problem in the [course github repo](https://github.com/KinanBab/CS591L1/tree/master/project-3/).

### Motivation

A big problem with several card and board games is cheating. Most of these games require each player to keep a secret state, and do operations consistent with it. Sometimes, it is difficult to check if the operations were indeed consistent without revealing the secret state.

One example is blackjack. This is a card game between a dealer and a player. The dealer is responsible with providing the player with **random** cards. However, if the dealer does not shuffle the cards properly, or can see the cards as they are being served. The dealer is almost always guaranteed to win. This is a problem with in-person blackjack, as certain dealers may be very skilled with shuffling and marking cards, but it is a bigger problem with electronic blackjack, since the dealer's computer has access to all the information about the cards.

To resolve this issue, we will use MPC. The dealer and player will jointly sample cards randomly using MPC, so that neither of them knows which cards the other got.

### Rules

Blackjack is a simple game with a few rules. You can play it online to learn the rules (for example [here](https://www.arkadium.com/games/blackjack/)). We will not support any betting in our implementation. It is strictly for fun. We will only consider one player and one dealer. The dealer will have id 1, and the player will have id 2. We will also consider slightely simpler rules:

##### Setup phase: (25 points)
1. We have a collection of cards with numerical values (more on this later).
2. At the start of the game, the player is given a random card, then the dealer is given a random card. Both cards are shown to both players (they are public).
3. Then, the player is given another random public card, while the dealer is given a random secret card, that no one can see yet.

##### Player's turn: (15 points)
1. Now it is the player's turn. The player looks at their card, and can choose either to receive another random card or to stop. All received cards are public.
2. This is repeated until the player chooses to stop. If the player card values exceed 21, the player looses immediately.
2. If the player decides to stop, and their cards value do not exceed 21, then it is the dealer's turn.

##### Dealer's turn: (10 points)
1. The dealer' other card is revealed to everyone.
2. If the value of the dealer's cards is less than 17, the dealer must take another random card. If it is greater or equal to 17, the dealer must stop. This is repeated until 17 or greater is reached.
3. If the dealer's cards value exceeds 21, they loose. Otherwise, the one with the highest value wins. If both player and dealer have the same value, then it is a tie.

### Interface

To keep things simple, you can implement the game as a command line game with a simple UI. The basic building blocks for this UI is given to you in [UI.js](https://github.com/KinanBab/CS591L1/tree/master/project-3/Problem1/UI.js). The UI.js file contains usage sample to demonstrate how to use its API. You can also run it using `node UI.js`. This file provides four functions.

1. display(dealers_hand, players_hand): displays the two hands in a nice way, each hand is an array of strings or numbers.
2. readBoolean(): prompts the player with a message asking whether to take a new card or stop. Returns a promise that resolves to true if the user asks for a new card, or false otherwise.
3. clear(): clears the screen.
4. stop(): stops the interface, use this when everything is done to close the application.

Depending on the user's input through the UI, certain code/actions (like sampling) must be performed by both players. The player code must communicate the user's input to the dealer. JIFF provides an `emit` and `listen` API that supports such communication. Sample usage is shown in the given skeleton files.

### MPC

The MPC component of this problem is really random sampling, to ensure that cards are indeed random. JIFF provides a `<instance>.bits.rejection_sampling(...)` function, that allows parties to jointly sample a uniform number in a specified range using MPC. The problem files contain sample usage of this function to demonstrate how it works.

Note that this function by itself will not be sufficient to achieve true random sampling: after the first card is drawn randomly, consequent random draws should never draw that same card again! This means that you have to store all the cards that were drawn before in some array, and every time a new draw is made, make sure that it is not in that array, if it is, you will have to call rejection\_sampling again, until the card drawn is not in the array.

**HINT: most cards are public, only some are secret. Think about which ones are, and try to do most of the comparisons to check if a card has been seen before or not in the clear (not under MPC).**

### Cards

We want to use a single deck of cards for Blackjack. A card consists of two things: a suite and a value. Cards generally range between an ACE to a king. For our simplified version of blackjack, we will consider an ace to be 1, a Jack, Queen and King to be 11, 12, 13 respectively, the other cards range between 2-10 inclusive. For a total of 13 different values. Decks usually have 4 suites in them, to simplify things, we will refer to them using numbers 0-3 inclusive.

To simplify things, your code can work with suites and values as plain numbers as above. You may also display them to the user that way, so you do not have to worry about representing special cards (J, Q, K, and Ace) or suites as strings.

### Implementation Steps

First, Clone the course's github repo and update all its submodules and install the dependencies.

```bash
git clone https://github.com/KinanBab/CS591L1
git submodule init
git submodule update # may take a couple of minutes
cd project-3/Problem1
npm install
```

Go to project-3/problem-1 diretory: We provide two skeleton files for you there: [player.js](https://github.com/KinanBab/CS591L1/tree/master/project-3/Problem1/player.js), [dealer.js](https://github.com/KinanBab/CS591L1/tree/master/project-3/Problem1/dealer.js). Additionally, we provide complete implementations of the needed JIFF server [server.js](https://github.com/KinanBab/CS591L1/tree/master/project-3/Problem1/server.js) and interface [UI.js](https://github.com/KinanBab/CS591L1/tree/master/project-3/Problem1/UI.js).

You can run these files to see what the code samples in them do. You can also run the UI file using `node UI.js`.

```bash
# run each of these commands in a different terminal
node server.js
node dealer.js
node player.js
```

We suggest you follow these steps when trying to solve this problem:
1. Look at the provided files and run them, understand the rules as above, try blackjack out online or on a piece of paper.
2. Ignore suites, and consider only a single set of cards between 1 and 13 inclusive. Implement the setup phase with these cards only.
3. Add the suites into your implementation. You can do this by sampling a card using two calls to rejection\_sampling: the first samples the suite from \[0-3\], and the second samples the value from \[1-13\]. Make sure the setup phase is still correct.
4. Organize your code so that card drawing, including checking that a card has not been drawn before, is factored out into a single re-usable functions.
5. Implement the player's phase, use the card drawing function described above.
6. Implement the dealer's phase.

**HINT: You only need to perform secret comparison between a newly drawn card and the dealer's secret card. All other comparisons can be performed outside MPC. Notice that in the dealer's turn, the secret card is revealed!**


# Problem 2 - SQLAlchemy (50 points)

In this problem, you will create a simple database schema, insert data into it, and run a few queries against it. Use the official [sqlalchemy tutorial](https://docs.sqlalchemy.org/en/13/orm/tutorial.html) and the lecture notes as resources to help you with this problem.

The database will store information about NFL seasons 2010-2015, including results of all regular season games, and the teams roasters for every season.

For this problem, you have to:
1. Connects to an in-memory SQLite database.
2. Defines the schema below as sqlalchemy classes, and creates the schema in the database.
3. Inserts the data from the given csv files into the database using sqlalchemy.
4. Run the designated query using sqlalchemy.

You can organize your code into several python files for clarity. You should provide a `problem2.py` python file that contains the main entry point to your program, imports any other python file(s), and performs the 4 tasks above in order.

### Schema (15 points)

You need to create the following tables with the following schema:

**Teams:**
1. Team id: auto-generated integer, primary key
2. Team name: string, unique
3. Players: relationship listing the players of the team per year

**Players:**
1. Player id: auto-generated integer, primary key
2. Player name: string, unique
3. Position: string
4. Teams: relationship listing the teams of the player per year

*Note: a player may change position throughout his career. To simplify things, ignore any discrepancies like that, and use the first position you encounter for every player.*

**Player/Teams Association Table:**
1. Association id: auto-generated integer, primary key
2. Player id: integer, foreign key
3. Team id: integer, foreign key
4. Season: integer (2010-2015)

Columns 2, 3, and 4 should be unique together.

**Games:**
1. Game id: auto-generated integer, primary key.
2. Home team id: integer, foreign key
3. Away team id: integer, a different foreign key
4. Season: integer (2010-2015)
5. Week: integer (1-17)
6. Home score: integer
7. Away score: integer
8. Home team: relationship with teams
9. Away team: a different relationship with teams

**Remember: Relationships are logical sqlalchemy entities that do not create physical columns. Relationships are defined based on physical foreign key columns.**

### Data (20 points)

We provide you with CSV files containing the data required to fill the above database. You must write and use python functions to read this data from the CSV files and insert it into the database via sqlalchemy. The csv files are available at the [course github repo](https://github.com/KinanBab/CS591L1/tree/master/project-3/Problem2/data/).

Note that the csv files contain many columns that are not contained in our schema above, such columns should be ignored. Additionally, the csv files contain unique identifiers for games and players, you are free to keep the same ids, or ignored them, and use auto-generated ones from sqlalchemy.

For every season, we provide to csv files: reg_games and reg_roster. The first is a list of all the regular season games played that season, including team names and scores. The second is a list of all the players in that regular season and their corresponding team.

Hint: Your script should read the reg_games files first, and use them to fill the teams and games tables (when you encounter a game with a team that does not exist in the database, insert it into the teams tables!). Your script should then read the reg_roster files and fill in the players and player/teams association table.

### Query (15 points)

After all data has been inserted, you must implement these two queries using sqlalchemy and run them, see if you can predict their results ;)

1. Find the three teams with largest difference between total points scored and conceded overall in all the seasons combined.
2. Find the ten players with the most number of wins overall in all the seasons combined. If a player is on a team's roster during a season, consider him to have won all the games that team won in that season.

Try to encode most of the query using sqlalchemy API so it runs on the server side. You can code part of the query in python outside of sqlachemy if you need to, but try to keep it to a minimal. Clean and optimal solutions (efficiency-wise) get bonus points!

# Problem 3 - CILK (50 points)

In this problem, you will implement, analyze, and benchmark a parallel algorithm using CILK and C++.

Use the CILK lecture note and [CILK Hub](http://cilk.mit.edu/) as resources to help you in your solutions.

You need to produce four files for a complete solution:
1. mult.cpp
2. analysis.pdf|doc|txt
3. plot_n.png|jpg
4. plot_w.png|jpg

### Implementation (20 points)

You should implement an efficient parallel inner-product function using CILK, and use it to implement efficient matrix-vector multiplication.

##### Inner-product

You must implement an inner-product function with header: int inner(int n, int\* vec1, int\* vec2), where n is the length of both vectors.

Your function must behave as follows:
1. if n is less than 10, it performs the inner product sequentially.
2. otherwise, it must make two recursive parallel calls (using cilk_spawn), the first computes the inner produt of the first half of vectors, the second computes it for the second half, and then return the sum.

##### Matrix-vector multiplication

You must implement a matrix-vector multiplication function with header: void mult(int n, int\*\* mat, int\* vec, int\* result), where vec and mat are a vector and square matrix of dimension n.

Your code should compute inner product between every row i in the matrix mat and vec in parallel, and store the result at index i of the output vector. You can use a cilk_for to perform the parallelizm for you automatically.

cilk_for is susceptible to a lot of implicit bottlenecks. It is important that you use cilk_for for the matrix-vector mult function, and not for the inner function, otherwise, you will not see a speed up. See [here](https://jayconrod.com/posts/29/parallelization--harder-than-it-looks) for more details.

##### Main function

Your main function should take the dimension n as the first command line argument, and then generate a random vector and matrix of size n with values between 0 and 10. Your main function should then call mult(...) on these parameters.

Your program must measure the time required to compute mult. Be sure to measure the time accurately: generating the random matrix and allocating arrays should not be a part of the time measurement.

Your program should output the time to the console.

### Analysis (10 points)

Compute the work, span, and parallelism of your program analytically as a function of n. Show your derivation in analysis.pdf|doc|txt. Choose the file format that is easiest for you!

### Benchmarks (20 points)

Compile and run your program to produce benchmarking plots, and see if they match the analytical result from the analysis part.

Run your program with 2 CILK workers on n = 1000, 2000, 4000, 8000. Plot n (as the x-axis) against the time (as the y-axis). Make sure to run the program several times for each value of n, and plot the average time to avoid anomalies and outliers.

Run your program for n = 4000 using 1, 2, 4 (and if your computer has enough cores 8) cilk workers. Plot the result with workers being the x-axis and time being the y-axis.

Produce two image files for the two plots: plot_n and plot_w respectively. Discuss whether the plots confirm the analysis from the previous part!