# Week 2 In-Class Exercises

Let's start by grabbing our dataset. We'll be using a dataset of poker hands from UC Irvine located [here](http://archive.ics.uci.edu/ml/datasets/Poker+Hand). I've left them in the class Github folder for your convenience, but downloading them would look something like this:

In [22]:
download.file(url = "http://archive.ics.uci.edu/ml/machine-learning-databases/poker/poker-hand-training-true.data",
             destfile = "./poker_hands.csv")

Now we'll read them into R and create data frames, which we'll merge together using *rbind()*

In [23]:
poker.hands <- read.csv(file = "./poker_hands.csv")

It's a good idea to start by exploring the data a bit...we'll do a few variations on this before we move on to the dplyr stuff.

In [24]:
head(poker.hands)
str(poker.hands)

X1,X10,X1.1,X11,X1.2,X13,X1.3,X12,X1.4,X1.5,X9
2,11,2,13,2,10,2,12,2,1,9
3,12,3,11,3,13,3,10,3,1,9
4,10,4,11,4,1,4,13,4,12,9
4,1,4,13,4,12,4,11,4,10,9
1,2,1,4,1,5,1,3,1,6,8
1,9,1,12,1,10,1,11,1,13,8


'data.frame':	25009 obs. of  11 variables:
 $ X1  : int  2 3 4 4 1 1 2 3 4 1 ...
 $ X10 : int  11 12 10 1 2 9 1 5 1 1 ...
 $ X1.1: int  2 3 4 4 1 1 2 3 4 2 ...
 $ X11 : int  13 11 11 13 4 12 2 6 4 1 ...
 $ X1.2: int  2 3 4 4 1 1 2 3 4 3 ...
 $ X13 : int  10 13 1 12 5 10 3 9 2 9 ...
 $ X1.3: int  2 3 4 4 1 1 2 3 4 1 ...
 $ X12 : int  12 10 13 11 3 11 4 7 3 5 ...
 $ X1.4: int  2 3 4 4 1 1 2 3 4 2 ...
 $ X1.5: int  1 1 12 10 6 13 5 8 5 3 ...
 $ X9  : int  9 9 9 9 8 8 8 8 8 1 ...


The dataset is tidy, but it's pretty ambiguous. We can get some information about it from the [description file](http://archive.ics.uci.edu/ml/machine-learning-databases/poker/poker-hand.names) that will help us understand it, but we can also do a lot with dplyr to help make the dataset more useful and maybe draw some conclusions about it. 

But before that, let's add some more descriptive column names.

In [25]:
names(poker.hands) <- c("Suit.1","Rank.1","Suit.2","Rank.2","Suit.3",
                       "Rank.3","Suit.4","Rank.4","Suit.5","Rank.5","Class")
head(poker.hands)

Suit.1,Rank.1,Suit.2,Rank.2,Suit.3,Rank.3,Suit.4,Rank.4,Suit.5,Rank.5,Class
2,11,2,13,2,10,2,12,2,1,9
3,12,3,11,3,13,3,10,3,1,9
4,10,4,11,4,1,4,13,4,12,9
4,1,4,13,4,12,4,11,4,10,9
1,2,1,4,1,5,1,3,1,6,8
1,9,1,12,1,10,1,11,1,13,8


Much better! Now then, let's load up dplyr and get started.

In [26]:
suppressMessages(library(dplyr))

## Exercises

### 1. Starting simple

To begin with, notice that there are 10 types of hands. Return to me a summary of the percentage (not probability) of the time we get each type of hand (Class). The list should be in order of descending percentage. 

For extra "credit", do so while returning the actual name of each hand rather than a number. 

In [33]:
poker.hands %>%
group_by(Class) %>%
summarise(n_class = n(), pcnt = n_class/25009*100) %>%
arrange(desc(pcnt))

Class,n_class,pcnt
0,12493,49.95401655
1,10599,42.38074293
2,1206,4.82226398
3,513,2.05126155
4,93,0.37186613
5,54,0.21592227
6,36,0.14394818
7,6,0.02399136
8,5,0.0199928
9,4,0.01599424


### 2. Assessing Straight Flushes

I think the most exciting thing about poker is the straight flush, the rarest hand, of which royal flushes are a subset. What percentage (not fraction) of the hands in this dataset are straight flushes?

Furthermore, among straight flushes, what percentage occur in each suit?

In [51]:
poker.hands %>%
filter(Class == 8 | Class == 9) %>%
summarise(p_straight = n()/dim(poker.hands)[1]*100)


p_straight
0.03598704


In [53]:
poker.hands %>%
filter(Class == 8 | Class == 9) %>%
group_by(Suit.1) %>%
summarise(p_straight = n()/dim(poker.hands)[1]*100)

Suit.1,p_straight
1,0.007997121
2,0.007997121
3,0.007997121
4,0.011995682


### 3. Stuck on a pair

Sometimes when I'm playing poker, I pick up the first 2 cards and I've already got a pair...but then I can't seem to get anything else. How often does this occur in the dataset? I'm looking for the percentage of time that I end up with just a pair after starting with 2 paired cards. 

For part 2, list the ranked probabilities (0 <= p <= 1) and occurences of the different hand outcomes when I start with 2 paired cards. They should be listed in increasing order. 

In [77]:
poker.hands %>%
filter(Rank.1 == Rank.2) %>%
mutate(jp = Class == 1, perc = jp/n()) %>%
summarise(sum = sum(perc)) 

sum
0.7206823


In [78]:
poker.hands %>%
filter(Rank.1 == Rank.2) %>%
mutate(perc = 1/n()) %>%
group_by(Class) %>%
summarise(probs = sum(perc)) %>%
arrange(probs)

Class,probs
7,0.00355366
6,0.009950249
3,0.105899076
2,0.159914712
1,0.720682303


### 4. Ranking flushes

An interesting thing about the dataset is that it groups all flushes (Class = 5) together. But in fact, some flushes are higher than others, with Ace-high flushes (Rank = 1) being the highest. 

Similar to the previous questions, I would like to see a ranked breakdown of flushes. The flushes should be in order from highest to lowest, but no probabilities needed this time.

Note: you can do this with aces being low and that's just fine; for extra "credit", try doing it with aces being high

### 5. Inside straights

Inside straights are the best kind of gift; you're sitting there with nothing and then suddenly, your last card elevates your hand to something much more fantastic. It's when your first 4 cards are unconnected, but your 5th card completes a 5-card straight. So for instance, if my current ranks are:

2 3 4 5

And I get a 6, that's not an inside straight. But if I have:

2 3 5 6

And I get a 4, that is. 

For this exercise, return to me a summary of inside vs. normal straights, including the probability that if I have a straight, it's of the given type. 

### 1. Creating new variables

There are two things I very much dislike about how this dataset is currently laid out:

1. The suits are numbers; I want to see actual suit names!
2. Aces are "1", but Aces are high. And for that matter, Jack/Queen/King make more sense to me than 11/12/13. I want to change 1/11/12/13 to A/J/Q/K, respectively. 

For your first task, make these changes to the dataset (creating a new dataset) by creating new variables that satisfy my criteria.

## Sample Answers

Keep in mind that there's always more than 1 way to do things; these are just examples that work and may differ from your answers

### 1. Starting simple

In [27]:
lut <- c("0" = "Nothing", "1" = "Pair", "2" = "Two Pair",
        "3" = "Three of a Kind", "4" = "Straight",
        "5" = "Flush", "6" = "Full House",
        "7" = "Four of a Kind", "8" = "Straight Flush",
        "9" = "Royal Flush")

poker.hands$Class.Name <- lut[as.character(poker.hands$Class)]

poker.hands %>%
mutate(p_hands = 100/n()) %>%
group_by(Class.Name) %>%
summarise(percent = sum(p_hands)) %>%
arrange(desc(percent))

Class.Name,percent
Nothing,49.95401655
Pair,42.38074293
Two Pair,4.82226398
Three of a Kind,2.05126155
Straight,0.37186613
Flush,0.21592227
Full House,0.14394818
Four of a Kind,0.02399136
Straight Flush,0.0199928
Royal Flush,0.01599424


### 2. Assessing Straight Flushes

In [28]:
poker.hands %>%
mutate(is.sf = Class == 9 | Class == 8, p_hands = 100*is.sf/n()) %>%
summarise(sum(p_hands))

sum(p_hands)
0.03598704


In [29]:
poker.hands %>% 
filter(Class == 9 | Class == 8) %>%
mutate(p_rf = 100*1/n()) %>%
group_by(Suit.1) %>%
summarise(sum = sum(p_rf))

Suit.1,sum
1,22.22222
2,22.22222
3,22.22222
4,33.33333


Looks like straight flushes occur 0.036% of the time. 22.2% of these are in Suits 1, 2, and 3, respectively, and 33.3% of them are in Suit 4. 

### 3. Stuck on a pair

In [79]:
poker.hands %>%
filter(Rank.1 == Rank.2) %>%
mutate(is.pair = Class == 1, p_hands = 100*is.pair/n()) %>%
summarise(total_percent = sum(p_hands))

total_percent
72.06823


In [31]:
poker.hands %>%
filter(Rank.1 == Rank.2) %>%
mutate(probs = 1/n()) %>%
group_by(Class) %>%
summarise(occurences = n(), probs = sum(probs)) %>%
arrange(probs)

Class,occurences,probs
7,5,0.00355366
6,14,0.009950249
3,149,0.105899076
2,225,0.159914712
1,1014,0.720682303


No wonder I feel like the world has it out for me; I only get better than a pair about 28% of the time. 

### 4. Ranking Flushes

First, we'll filter out only flushes. Then we'll convert our aces from "1" to "14" so they're highest. Then we'll mutate, adding 5 columns of highest to lowest. Finally, we'll arrange on our 5 new columns

In [75]:
poker.hands %>% 
filter(Class == 5) %>%
select(starts_with("Rank")) %>%
rowwise() %>%
mutate(High = max(Rank.1,Rank.2,Rank.3,Rank.4,Rank.5),
      Second = max(setdiff(c(Rank.1,Rank.2,Rank.3,Rank.4,Rank.5),High)),
       Third = max(setdiff(c(Rank.1,Rank.2,Rank.3,Rank.4,Rank.5),c(High,Second))),
       Fourth = max(setdiff(c(Rank.1,Rank.2,Rank.3,Rank.4,Rank.5),c(High,Second,Third))),
       Fifth = min(Rank.1,Rank.2,Rank.3,Rank.4,Rank.5)) %>%                           
arrange(desc(High),desc(Second),desc(Third),desc(Fourth),desc(Fifth))

Rank.1,Rank.2,Rank.3,Rank.4,Rank.5,High,Second,Third,Fourth,Fifth
11,12,13,4,9,13,12,11,9,4
11,13,1,12,9,13,12,11,9,1
10,12,9,1,13,13,12,10,9,1
6,12,4,13,9,13,12,9,6,4
13,8,3,12,5,13,12,8,5,3
12,2,13,5,8,13,12,8,5,2
4,8,13,12,3,13,12,8,4,3
13,3,7,5,12,13,12,7,5,3
12,1,13,4,3,13,12,4,3,1
9,11,13,10,3,13,11,10,9,3


### 5. Inside Straights

In [42]:
lut <- c("0" = "Nothing", "1" = "Pair", "2" = "Two Pair",
        "3" = "Three of a Kind", "4" = "Straight",
        "5" = "Flush", "6" = "Full House",
        "7" = "Four of a Kind", "8" = "Straight Flush",
        "9" = "Royal Flush")


In [43]:
lut