# Binaries and Bitstrings
From [the Getting Started guide](https://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html)

Strings are binaries.

In [1]:
string = "hello"
is_binary(string)

true

`ł` has code point `322`

A `byte` can store a number from `0` to `255`.

`UTF-8` encoded binary is a specific way to encode a string as a list of bytes.

In [2]:
byte_size("hełło")

7

In [3]:
String.length("hełło")

5

In [4]:
byte_size("ł")

2

Prefix with `?` to get the character code.

In [5]:
?ł

322

Another way to get code points

In [6]:
String.codepoints("hełło")

["h", "e", "ł", "ł", "o"]

## Binaries (and bitstrings)
You can define a binary using `<<>>`.

In [7]:
<<0, 1, 2, 3>>

<<0, 1, 2, 3>>

In [8]:
byte_size(<<0, 1, 2, 3>>)

4

---
Bytes can be a sequence which is not a valid string.

In [2]:
String.valid?(<<239, 191, 19>>)

true

---
Or, a valid string.

In [11]:
?a

97

In [10]:
String.valid?(<<97, 97, 97>>)

true

In [12]:
<<97, 97, 97>>

"aaa"

---
And this is a `charlist` but we'll get to that later.

In [14]:
[97, 97, 97]

'aaa'

In [3]:
[239, 19]

[239, 19]

---
String concatenation is really binary concatenation.

In [15]:
<<0, 1>> <> <<2, 3>>

<<0, 1, 2, 3>>

In [16]:
"Hello " <> "World"

"Hello World"

---
Concatenating the null byte `<<0>>` to a string is a shorthand for revealing the inner binary representation.

In [17]:
"hełło" <> <<0>>

<<104, 101, 197, 130, 197, 130, 111, 0>>

---
Truncated

In [21]:
<<256>>

<<0>>

In [23]:
<<258>>

<<2>>

---
Use 16 bits (2 bytes) to store the number.

In [18]:
<<256 :: size(16)>>

<<1, 0>>

---
The number is a code point

In [25]:
<<256 :: utf8>>

"Ā"

In [26]:
<<322 :: utf8>>

"ł"

---
See the binary representation

In [27]:
"ł" <> <<0>>

<<197, 130, 0>>

In [28]:
<<322 :: utf8, 0>>

<<197, 130, 0>>

In [29]:
is_bitstring(<<322>>)

true

---
Bytes have 8 bits... what if we make a list of bits?

In [30]:
<<1 :: size(1)>>

<<1::size(1)>>

---
Truncated

In [31]:
<<2 :: size(1)>>

<<0::size(1)>>

---
It's no longer binary.

In [32]:
is_binary(<<1 :: size(1)>>)

false

---
But, it's still a bitstring.

In [33]:
is_bitstring(<<1 :: size(1)>>)

true

---
It's a bit.

In [34]:
bit_size(<<1 :: size(1)>>)

1

---
A binary is a bitstring, with number of bits divisible by 8.

In [35]:
is_binary(<<1 :: size(16)>>)

true

In [36]:
is_binary(<<1 :: size(15)>>)

false

## And finally, pattern matching
We can match on binaries / bitstrings.

In [38]:
<<0, 1, x>> = <<0, 1, 2>>
x

2

---
We expect each entry to match exactly 8 bits.

In [39]:
<<0, 1, x>> = <<0, 1, 2, 3>>

MatchError: 1

---
To match on a binary of unknown size, use the `binary` modifier at the end of the pattern.

In [40]:
<<0, 1, x :: binary>> = <<0, 1, 2, 3>>
x

<<2, 3>>

---
This also works with string concatenation.

In [41]:
"he" <> rest = "hello"
rest

"llo"

## More Pattern Matching
Based on the [Special Forms docs](https://hexdocs.pm/elixir/Kernel.SpecialForms.html#%3C%3C%3E%3E/1)

In [44]:
"808" <> <<0>>

<<56, 48, 56, 0>>

In [46]:
area_code = <<56::size(8), 48::size(8), 56::size(8)>>

"808"

In [8]:
defmodule Telephone do
  @area_code <<56::size(8), 48::size(8), 56::size(8)>>
  
  def parse(<<@area_code, "-", rest::binary>>) do
    "Local Number: #{rest}"
  end
  
  def parse(<<other_area_code::binary-size(3), "-", rest::binary>>) do
    "Long Distance Number: #{rest} in the #{other_area_code}"
  end
end

  nofile:1

  nofile:3



{:module, Telephone, <<70, 79, 82, 49, 0, 0, 6, 12, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 143, 0, 0, 0, 15, 16, 69, 108, 105, 120, 105, 114, 46, 84, 101, 108, 101, 112, 104, 111, 110, 101, 8, 95, 95, 105, 110, 102, ...>>, {:parse, 1}}

In [5]:
import Telephone, only: [parse: 1]

Telephone

In [9]:
parse("808-555-1234")

"Local Number: 555-1234"

In [63]:
parse("408-555-1234")

"Long Distance Number: 555-1234 in the 408"

## For completeness: Charlists
A `charlist` is a list of code points. Use single quotes `'` to create them.

In [66]:
'hełło'

[104, 101, 322, 322, 111]

In [70]:
is_list 'hełło'

true

In [71]:
'hello'

'hello'

---
Charlists contain the code points of the characters in the single quotes.

In [76]:
List.first('world')

119

In [77]:
"world" <> <<0>>

<<119, 111, 114, 108, 100, 0>>

In [78]:
[119, 111, 114, 108, 100]

'world'

---
IEx will only output code points if any of the integers is outside the ASCII range.

In [73]:
?l

108

In [10]:
[108]

'l'

In [79]:
[108, 1]

[108, 1]

---
Convert to string

In [80]:
to_string('hełło')

"hełło"

and back

In [81]:
to_charlist "hełło"

[104, 101, 322, 322, 111]

---
Concat with the list concatenation operator `++`

In [82]:
'this ' ++ 'works'

'this works'

In [83]:
'this ' <> 'fails'

ArgumentError: 1

In [83]:
"this" ++ "fails too"

ArgumentError: 1