# Programming Concepts with Python

Topics Covered:
* Learn how data is represented under the hood
* Learn about encodings
* Learn how to work with text files
* Learn how to optimize data usage

## Binary and Positional Number Systems

Bits:
![Bits](https://dq-content.s3.amazonaws.com/450/binary.png)

**Refresher: How to add Commas to Large Numbers:**

In [3]:
val = 2 ** 32
print(val)
print(f'{val:,}') # :, used to incorporate commas

4294967296
4,294,967,296


### Binary Digits
**Base 10:**

In [4]:
# for number 645231:

# assign the weight of digit '2':
weight_digit_2 = 10**2
# assign the value of digit '2':
value_digit_2 = 2 * weight_digit_2

# assign the weight of digit 5:
weight_digit_5 = 10 ** 3
# assign the value of digit '5':
value_digit_5 = 5 * weight_digit_5

**Base 2**
Numbers represented in Base 2 are represented using parenthesis with a subscript of 2: **(101)<sub>2</sub>**

In [5]:
base = 2

decimal_1 = 1*(base**0) + 1*(base**3) + 1*(base**4)
decimal_2 = 1*(base**1) + 1*(base**2) + 1*(base**3)

print(decimal_1)
print(decimal_2)

25
14


**Convert a number to Base 2:**

In [7]:
num = 11010101
str_num = str(num)
print(int(str_num, 2)) # int(string, base)

213


**Convert a number to binary:**

In [6]:
print(bin(25))

0b11001


### Differences Between Base 2 and Base 10

![Base Differences](https://dq-content.s3.amazonaws.com/450/tb1.png)

**Convert the following numbers/bases to base 10:**

In [8]:
base_8_to_10 = int('435', 8)
base_7_to_10 = int('10', 7)

print(base_8_to_10, base_7_to_10)

285 7


### Hexadecimal

**Hexadecimal and Beyond**
Python has a built in `hex()` function to convert integers to base 16, prefixed with '0x'. 
Python supports bases between 2 and 36, inclusive.

In [9]:
hex_3501 = hex(3501)
decimal_F = int('F', 16)

print(hex_3501)
print(decimal_F)

0xdad
15


#### What's So Great About Hexadecimal?
* A group of 4 bits is called a 'nibble'
    * **2<sup>4</sup>** = 16 values (0-15)
* Hexadecimal lets us represent a nibble with a single character.
* A group of 8 bits is called a [byte](https://en.wikipedia.org/wiki/Byte).

Also important implications in RGB color representation.

In [10]:
red_hex = hex(213)
green_hex = hex(111)
blue_hex = hex(56)

rgb = red_hex, green_hex, blue_hex
rgb_formatted = ''
for color in rgb:
    formatted = color.replace('0x','')
    rgb_formatted += formatted

print(rgb_formatted)

d56f38


**Octal - Base 8**
`oct(integer)`

In [11]:
octal_999 = oct(999)

original = int(str(octal_999), 8)

print(octal_999, original)

0o1747 999


## Encodings and Representing Text in a Computer

In order to represent more complex information such as text, all that is needed is to define a set of rules that translates the information that we want to represent into a sequence of zeros and ones. The simplest kind of rule that we can define is a table that explicitly tells us the binary representation of each object that we want to represent. Such a rule is called an encoding.

[ASCII Table](https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html)

In [12]:
data = "QUEST"

for char in data:
    ordinal = ord(char)
    binary = bin(ordinal)
    print(binary)

0b1010001
0b1010101
0b1000101
0b1010011
0b1010100


`chr()` and `ord()` - Inverse Relation

In [13]:
print(chr(65))

A


In [14]:
print(ord('A'))

65


In [1]:
chr(65) == ord('A')

False

In [1]:
text = "The Swedish word for quest is sökande"

encoded = text.encode(encoding='ascii', errors='replace')

print(encoded)
print(type(encoded))

b'The Swedish word for quest is s?kande'
<class 'bytes'>


### Exploring the `bytes` Class
Bytes object is represented as a sequence of integers between 0 and 255.

In [2]:
b = 'DATA'.encode(encoding='ascii')
print(b[0]) # access the byte corresponding to D
print(b[1]) # access the byte corresponding to A
print(b[2]) # access the byte corresponding to T
print(b[3]) # access the byte corresponding to the second A

68
65
84
65


In [3]:
print(b)

b'DATA'


In [4]:
B = bytes.fromhex('ff a9 c8 44 41 54 41')

In [5]:
print(B)

b'\xff\xa9\xc8DATA'


In [6]:
# Check if char is lowercase:
def is_lowercase(c):
    return 97 <= ord(c) and ord(c) <= 122

In [8]:
is_lowercase('f')

True

In [9]:
# provided inputs
string_1 = 'lowercase'
string_2 = 'UPPERCASE'

# 65 - 90 incl
def check_uppercase(string):
    for c in string:
        # if non-uppercase is found, return false
        if not (65 <= ord(c) and ord(c) <= 90):
            return False
    return True
    
check_uppercase('AA')

True

In [11]:
val = 2 ** 16
print(f'{val:,}')

65,536


In [12]:
print(ord('你'))

20320


### BIG5 Encoding (2 bytes)
BIG5 encoding is a double byte encoding used for traditional Chinese characters. Since it is 2-byte encoding, each character needs to specify the two bytes to which it corresponds.

In [13]:
trad_chinese = "你好嗎?"

encoded = trad_chinese.encode(encoding='BIG5')
print(encoded)

print(len(encoded))

b'\xa7A\xa6n\xb6\xdc?'
7
