# Day 3: Binary Diagnostic

## Part 1:

The submarine has been making some odd creaking noises, so you ask it to produce a diagnostic report just in case.

The diagnostic report (your puzzle input) consists of a list of binary numbers which, when decoded properly, can tell you many useful things about the conditions of the submarine. The first parameter to check is the **power consumption**.

You need to use the binary numbers in the diagnostic report to generate two new binary numbers (called the **gamma rate** and the **epsilon rate**). The power consumption can then be found by multiplying the gamma rate by the epsilon rate.

Each bit in the gamma rate can be determined by finding the **most common bit in the corresponding position** of all numbers in the diagnostic report. For example, given the following diagnostic report:
```
00100
11110
10110
10111
10101
01111
00111
11100
10000
11001
00010
01010
```

Considering only the first bit of each number, there are five 0 bits and seven 1 bits. Since the most common bit is 1, the first bit of the gamma rate is 1.

The most common second bit of the numbers in the diagnostic report is 0, so the second bit of the gamma rate is 0.

The most common value of the third, fourth, and fifth bits are 1, 1, and 0, respectively, and so the final three bits of the gamma rate are 110.

So, the gamma rate is the binary number `10110`, or `22` in decimal.

The epsilon rate is calculated in a similar way; rather than use the most common bit, the least common bit from each position is used. So, the epsilon rate is 01001, or 9 in decimal. Multiplying the gamma rate (22) by the epsilon rate (9) produces the power consumption, 198.

Use the binary numbers in your diagnostic report to calculate the gamma rate and epsilon rate, then multiply them together. **What is the power consumption of the submarine?** (Be sure to represent your answer in decimal, not binary.)

In [3]:
# First on the test input
with open('./test_input_3.txt') as f:
    lines = f.readlines()
test_diag = [line.strip() for line in lines]
test_diag

['00100',
 '11110',
 '10110',
 '10111',
 '10101',
 '01111',
 '00111',
 '11100',
 '10000',
 '11001',
 '00010',
 '01010']

In [30]:
bits = [0] * len(test_diag[0])
total = len(test_diag)

In [31]:
for num in test_diag:
    for i, bit in enumerate(num):
        bits[i] += int(bit)

In [32]:
bits

[7, 5, 8, 7, 5]

In [41]:
final = ''
for bit in bits:
    final += '1' if bit>total/2 else '0'
final = int(final)
final

10110

In [49]:
11111-final

1001

In [50]:
int('1001', 2)

9

In [129]:
def solution(report):
    bits = [0] * len(report[0])
    total = len(report)

    for num in report:
        for i, bit in enumerate(num):
            bits[i] += int(bit)
    final = ''
    for bit in bits:
        final += '1' if bit > total/2 else '0'
    inverse = '1'*len(final)

    return int(final, 2) * int(str(int(inverse)-int(final)), 2)

In [130]:
solution(test_diag)

198

Ugh, that's pretty gross. There's got to be a better way... I feel like I'm missing something about binary numbers and bitwise operations.

In [131]:
# Now on the full input
with open('./input_3.txt') as f:
    lines = f.readlines()
diag = [line.strip() for line in lines]
diag

['010100010111',
 '100100100110',
 '100110111001',
 '011001011011',
 '010000110111',
 '000011101001',
 '011000011101',
 '101111011111',
 '011001011010',
 '111100001001',
 '111111000110',
 '100010100110',
 '011100100100',
 '011111010000',
 '111010001100',
 '010111001110',
 '100010100100',
 '101000010000',
 '011101110100',
 '100010011000',
 '001111110011',
 '111001100001',
 '010000011001',
 '000011101010',
 '100010101100',
 '111011100010',
 '110000100001',
 '101010110001',
 '111101110101',
 '001010010100',
 '001001111001',
 '100001110010',
 '100100000111',
 '000101010101',
 '001101111011',
 '111100011000',
 '100111110101',
 '010101111000',
 '100110011001',
 '010001111010',
 '010111111001',
 '111000100010',
 '011000011011',
 '100010111111',
 '010110110010',
 '010100000001',
 '100011000100',
 '100000010001',
 '101010001000',
 '100111011001',
 '101011001010',
 '010110110111',
 '111011000100',
 '110010111110',
 '010101000111',
 '000101111101',
 '101101101101',
 '010100010001',
 '000111101111

In [132]:
solution(diag)

4139586

I'm realizing now that instead of counting each bit in each position, it's likely possible to just use the *average* value of that position. If the average is greater than 0.5, then there are more 1s than 0s! (Reminds me of the sigmoid function in a logistic regression...)

It sounds like this would be easier solved using either a Pandas DataFrame or just a Numpy array, with each number making up a row and each bit in a column. But how to get the numbers into that format?

In [133]:
import pandas as pd
import numpy as np

Is there a way to do this with numpy?

In [151]:
x = np.array(test_df)
x

array([['0', '0', '1', '0', '0'],
       ['1', '1', '1', '1', '0'],
       ['1', '0', '1', '1', '0'],
       ['1', '0', '1', '1', '1'],
       ['1', '0', '1', '0', '1'],
       ['0', '1', '1', '1', '1'],
       ['0', '0', '1', '1', '1'],
       ['1', '1', '1', '0', '0'],
       ['1', '0', '0', '0', '0'],
       ['1', '1', '0', '0', '1'],
       ['0', '0', '0', '1', '0'],
       ['0', '1', '0', '1', '0']], dtype=object)

But now that I have this, I'm actually not sure how to, for instance, access the mean of the first element of each row...

Pandas it is!

In [148]:
test_df = pd.DataFrame.from_records(test_diag)
test_df

Unnamed: 0,0,1,2,3,4
0,0,0,1,0,0
1,1,1,1,1,0
2,1,0,1,1,0
3,1,0,1,1,1
4,1,0,1,0,1
5,0,1,1,1,1
6,0,0,1,1,1
7,1,1,1,0,0
8,1,0,0,0,0
9,1,1,0,0,1


Omg that was so much easier than I thought!

In [136]:
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       12 non-null     object
 1   1       12 non-null     object
 2   2       12 non-null     object
 3   3       12 non-null     object
 4   4       12 non-null     object
dtypes: object(5)
memory usage: 608.0+ bytes


In [137]:
test_df = test_df.astype(int)
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   0       12 non-null     int64
 1   1       12 non-null     int64
 2   2       12 non-null     int64
 3   3       12 non-null     int64
 4   4       12 non-null     int64
dtypes: int64(5)
memory usage: 608.0 bytes


In [138]:
test_df.describe().loc['mean',:]

0    0.583333
1    0.416667
2    0.666667
3    0.583333
4    0.416667
Name: mean, dtype: float64

Now that I have the mean of each column, all I have to do is map it to `[0, 1]`.

In [139]:
def pandas_solution(report):
    df = pd.DataFrame.from_records(report).astype(int)
    means = df.describe().loc['mean',:]
    
    final = ''.join([str(int(mean>0.5)) for mean in means])
    inverse = int('1'*len(final)) - int(final)
    
    return int(final, 2) * int(str(inverse), 2)

In [140]:
pandas_solution(test_diag)

198

It does take 5 times as long (3ms vs 15ms), but it does seem more elegant than counting with nested `for` loops.

In [141]:
pandas_solution(diag)

4139586

Again, 5 times as long (6ms vs 29ms). Interesting.

## Part 2:

Next, you should verify the life support rating, which can be determined by multiplying the oxygen generator rating by the CO2 scrubber rating.

Both the oxygen generator rating and the CO2 scrubber rating are values that can be found in your diagnostic report - finding them is the tricky part. Both values are located using a similar process that involves filtering out values until only one remains. Before searching for either rating value, start with the full list of binary numbers from your diagnostic report and consider just the first bit of those numbers. Then:

 - Keep only numbers selected by the bit criteria for the type of rating value for which you are searching. Discard numbers which do not match the bit criteria.
 - If you only have one number left, stop; this is the rating value for which you are searching.
 - Otherwise, repeat the process, considering the next bit to the right.

The bit criteria depends on which type of rating value you want to find:

 - To find oxygen generator rating, determine the most common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 1 in the position being considered.
 - To find CO2 scrubber rating, determine the least common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 0 in the position being considered.

For example, to determine the oxygen generator rating value using the same example diagnostic report from above:

 - Start with all 12 numbers and consider only the first bit of each number. There are more 1 bits (7) than 0 bits (5), so keep only the 7 numbers with a 1 in the first position: 11110, 10110, 10111, 10101, 11100, 10000, and 11001.
 - Then, consider the second bit of the 7 remaining numbers: there are more 0 bits (4) than 1 bits (3), so keep only the 4 numbers with a 0 in the second position: 10110, 10111, 10101, and 10000.
 - In the third position, three of the four numbers have a 1, so keep those three: 10110, 10111, and 10101.
 - In the fourth position, two of the three numbers have a 1, so keep those two: 10110 and 10111.
 - In the fifth position, there are an equal number of 0 bits and 1 bits (one each). So, to find the oxygen generator rating, keep the number with a 1 in that position: 10111.
 - As there is only one number left, stop; the oxygen generator rating is 10111, or 23 in decimal.

Then, to determine the CO2 scrubber rating value from the same example above:

 - Start again with all 12 numbers and consider only the first bit of each number. There are fewer 0 bits (5) than 1 bits (7), so keep only the 5 numbers with a 0 in the first position: 00100, 01111, 00111, 00010, and 01010.
 - Then, consider the second bit of the 5 remaining numbers: there are fewer 1 bits (2) than 0 bits (3), so keep only the 2 numbers with a 1 in the second position: 01111 and 01010.
 - In the third position, there are an equal number of 0 bits and 1 bits (one each). So, to find the CO2 scrubber rating, keep the number with a 0 in that position: 01010.
 - As there is only one number left, stop; the CO2 scrubber rating is 01010, or 10 in decimal.

Finally, to find the life support rating, multiply the oxygen generator rating (23) by the CO2 scrubber rating (10) to get 230.

Use the binary numbers in your diagnostic report to calculate the oxygen generator rating and CO2 scrubber rating, then multiply them together. **What is the life support rating of the submarine?** (Be sure to represent your answer in decimal, not binary.)

In [183]:
test_df

Unnamed: 0,0,1,2,3,4
0,0,0,1,0,0
1,1,1,1,1,0
2,1,0,1,1,0
3,1,0,1,1,1
4,1,0,1,0,1
5,0,1,1,1,1
6,0,0,1,1,1
7,1,1,1,0,0
8,1,0,0,0,0
9,1,1,0,0,1


In [210]:
''.join(test_df.iloc[[1]].values[0])

'11110'

In [279]:
o_df = test_df.copy().astype(int)
o_df

Unnamed: 0,0,1,2,3,4
0,0,0,1,0,0
1,1,1,1,1,0
2,1,0,1,1,0
3,1,0,1,1,1
4,1,0,1,0,1
5,0,1,1,1,1
6,0,0,1,1,1
7,1,1,1,0,0
8,1,0,0,0,0
9,1,1,0,0,1


In [280]:
o_df.describe()

Unnamed: 0,0,1,2,3,4
count,12.0,12.0,12.0,12.0,12.0
mean,0.583333,0.416667,0.666667,0.583333,0.416667
std,0.514929,0.514929,0.492366,0.514929,0.514929
min,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0
50%,1.0,0.0,1.0,1.0,0.0
75%,1.0,1.0,1.0,1.0,1.0
max,1.0,1.0,1.0,1.0,1.0


In [281]:
means = o_df.describe().loc['mean']
means

0    0.583333
1    0.416667
2    0.666667
3    0.583333
4    0.416667
Name: mean, dtype: float64

In [315]:
o_df[0]

0     0
1     1
2     1
3     1
4     1
5     0
6     0
7     1
8     1
9     1
10    0
11    0
Name: 0, dtype: int64

In [180]:
def solution(report):
    df = pd.DataFrame.from_records(report).astype(int)
    return oxygen_rating(df) * co2_rating(df)

In [336]:
def oxygen_rating(df):
    o_df = df.copy().astype(int)
    
    for i in range(len(o_df)-1):
        criterion = int(o_df[i].mean() < 0.5)
        o_df = o_df.drop(list(o_df.loc[o_df[i]==criterion].index))
        if len(o_df)==1:
            return int(''.join([str(val) for val in o_df.values[0]]), 2)

In [337]:
oxygen_rating(test_df)

23

In [338]:
def co2_rating(df):
    o_df = df.copy().astype(int)
    
    for i in range(len(o_df)-1):
        criterion = int(o_df[i].mean() >= 0.5)
        o_df = o_df.drop(list(o_df.loc[o_df[i]==criterion].index))
        if len(o_df)==1:
            return int(''.join([str(val) for val in o_df.values[0]]), 2)

In [339]:
co2_rating(test_df)

10

In [340]:
solution(test_diag)

230

Wow, that was very frustrating and took much longer than I thought! Let's give it a shot on the full input.

In [341]:
solution(diag)

1800151

Woot!!!! I'm still super curious if there's a faster/better way to do this that involves bitwise operators though.