# Boxcar: Not just a game for railcar hobos.

Complexity and data science go hand in hand. Our ability to train models with thousands of features while varying numerous hyper-parameters is awe inspiring--but it's also difficult to understand! As I started my own path to become a data scientist, I quickly realized that I would need to start at the beginning with an easy to understand problem. What I wanted was a way to generate reliable, yet unpredicatble data. Luckily, I'm a game maker with OOP experience and a love for puzzles.

This series chronicles the steps I took in anylizing the dice game Boxcar using real-world Python tools. First we will generate data using the Boxcar virtual player that I wrote, then analyze that data for a better understanding of the game. You will be given the opportunity to code a Virtual Player's AI and test it against my own. Next we will use our game to train a predictive model, and gain some insights into one of Python's most powerful modules: SciKit-Learn. By the end of the series you will be able to use a model to develope an AI that plays with the highest level of skill possible. 

## Intro to Boxcar

While play styles, techniques, and theory is well documented for popular games like chess, the game of boxcar is still widely unknown. It's fine if you haven't heard of Boxcar; I encourage you can learn as you go by exploring my code! However, if you are finding it difficult to understand and would like a quick read on the game, there is a [wikipedia page](https://en.wikipedia.org/wiki/Dice_10000). (boxcar is a version of 10,000) I'll also go over a few necessities here so that we can jump right in.

The game is believed to originate during the Great Depression, when railcar hopping hobos played the dice game to pass the time. Players use six dice, and are allowed to roll up to three times. Each roll, if taken, has the potential of gaining the player more points-- or to take them all away. The essence of the game comes down to one decision: will you roll or stay? 

At this point, I suggest you spend some time looking over the game files. I've included a handy [documentation guide](https://github.com/gavinraym/Boxcar/blob/master/src/docs.md) that explains each class in detail. This is a good place to start. I'd especially like you to look at [engine.py](https://github.com/gavinraym/Boxcar/blob/master/src/engine.py), as this is the script we use to play the game. If you are interested in how the game is scored, that is done on the [cabouse](https://github.com/gavinraym/Boxcar/blob/master/src/cabouse.py). Also, it's important to check out [coach.py](https://github.com/gavinraym/Boxcar/blob/master/src/coach.py), as this is where our players live. When you're ready, the rest of this notebook will finish preparing you to generating some initial data.

## The Virtual Player (VP)

Now that you know the layout of the source files, let's play the game! I've included three built-in VP's for you to use in your EDA. The most important, and the one we will begin with, is the Perfect VP. It is important to note that this VP is different from all the others in that it cheats. *WHAT?!* I know, that's probably not what you were expecting me to say! But let me explain. This VP is very important for a number of reasons. 

The Perfect VP is allowed to cheat for a very good reason. It will allow us to establish the nice initial representation of the distribution of scores from the game at an upper boundary. This will become more clear as you begin to understand how the Perfect VP works. Because this series is about coding, I'd like to try and explain by showing the code that defines the Perfect VP:




In [3]:
''' From coach.py, this is the code that determines how Perfect VP will play. 
The VP recieves the dice as they exist after the last roll, and the round's 
current score. The VP then must return either True to keep rolling, or False
to stop rolling and keep the points. 

As you can see, the Perfect VP always rolls no matter what!'''

#==============================================================================

def perfect(self, dice, score):
    # This VP always rolls.
    return True

#==============================================================================

In [5]:
''' Taken from engine.py, here you can see that most players will lose all the
points accumulated on the current round if they lose on a roll. For the 
Perfect VP, however, points are never lost!'''

#==============================================================================

def _lose_round(self):
    # If a loss, lose all points for this round
    if self._player_name != 'Perfect':
        self.round_points = 0
    # terminal is used for human players
    if self._player_name == 'terminal':
        display_loss(self._score.dice)

#==============================================================================

This essentially makes the Perfect VP into the perfect player. It's scores are similar to those that would be gained from a clairvoyent player who knows ahead of time what a roll's outcome will be. To be clear, the Perfect VP **DOES NOT** represent an outcome that is in any way achievable in real life without cheating. At the end of this series we will develop a VP that represents the best play style that *can* be achieved **in real life**. Until then, we will use the Perfect VP's scores to gain insight into how we expect the scores of the best player to behave.

We will evaluate the Perfect VP in the next chapter by running some simulations, then plotting the scores with Python's matplotlib module. We will talk about one of the main foundations of data science, *the Central Limit Theroem*, and one important tool that takes advantage of it, called *Bootstrapping*. See you there!