# Builtin Python Data Structures

## Introduction

It is critical as a Data Scientist to thoroughly understand the basic Python data structures and how numbers are represented in the computer.  We will build on this foundation when we discuss the Numpy and Pandas libraries. 

The foundational Python data model is documented on the [docs.python.org](https://docs.python.org/3.8/reference/datamodel.html) site and is your go to reference for Python data.  However, the docs can be at times confusing so we will go through each data structure with explanations and examples.  Our goal is to ensure that everybody has a solid foundation on which to build.

<a class="anchor" id="0"></a>
## Table of Contents
1. [Notebooks](#1)
1. Fundamental Python Data Structures
    1. Introduction
    1. [Numbers](#3)
        1. Integers
        1. Boolean
        1. Real
        1. Complex
    1. [Sequences](#4)
        1. Strings
        1. Tuples
        1. Bytes
        1. Lists
        1. Byte Arrays
    1. [Sets](#5)
        1. Sets
        1. Frozen Sets
    1. [Mappings](#6)
        1. Dictionaries
    1. [Special Types](#7)
        1. None
        1. NotImplimented
        1. Ellipsis

---
<a class="anchor" id="1"></a>
## Notebooks
[Back to Table of Contents](#0)

JupyterLab is based on Jupyter Notebooks.  Each Notebook is composed of one or more **cells** and is associated with a **kernel**. 

### Kernels
Kernels are the execution engines which execute programming languages such as **Ju**lia, **Py**thon or **R** (hence Jupyter). Today many other languages are also supported by Jupyter.  We will only be dealing with Python kernels in this course. The Python kernel is based on IPython, which is the modern Python REPL (Read-Evaluate-Print Loop) interpreter.  You can run IPython from a shell command line if you like and get much of the same functionality as a Notebook with a Python kernel.

### Cells
#### Each cell has a "type":
* **Code** - The cell contains code for the kernel that you are running.  The cell "understands" your code. 
* **Markdown** - The cell contains documentation formatted with *Markdown*.
* **Raw** - No formatting or interpretation of the content of a raw cell is executed.  It's left as is.

#### The color of the box around the cell is important:
* **Blue** - If a cell is bordered by a blue, then you are in **editing** mode for that cell.
* **Gray** - If a cell is bordered by a gray box, then you are in **command** mode for that cell.

#### Keyboard shortcuts

* **ESC** - The esc key switches a cell from Edit Mode (blue box) to Command Mode (gray box)
* **Enter** - The enter key switches a cell from Command Mode to Edit Mode

##### Editing Mode
* **TAB** - brings up a completion menu for the object you are working with
* **Shift-TAB** - brings up helpful information on the nearest object
* **?obj*** shows obj help (docstring)
* **??obj** shows obj source code
* **dir(obj)** provides a list of an objects attributes and methods


##### Command Mode
* **DD** - delete the cell
* **X** - cut the cellb
* **C** - copy the cell
* **V** - paste the cell below
* **A** - insert a new cell above this one
* **B** - insert a new cell below this one

### Getting Help
JupyterLab and to some extent Jupyter Notebook has help on almost any topic you may need under the Help menu. 

---


### Introduction
* Everything in Python is an Object. 
* Every Object has:
    * _Identity_ (its address)
    * _Type_ (determines what it can do)
    * _Value_ (what values can it take on)
    * _Pointer_ (the thing you see)
* Objects are accessed by reference (a pointer to the object)
* Information about Python objects is at your fingertips:
    * type(obj)
    * dir(obj)
    * help(obj)
    * obj Shift-TAB
    * obj. TAB
    * id(obj)

In [2]:
# The variable "a" points to the float object whose value is 3.14
a = 3.14
type(a)

'0x1.91eb851eb851fp+1'

In [4]:
# The address of the object pointed to by "a"
id(a)

140510598477680

In [5]:
# The variable "b" now points to the same object as "a"
b = a
id(b)

140510598477680

In [6]:
a is b

True

In [None]:
# Point variable "a" to a different object
a = 6.28 
id(a)

In [None]:
a is b

---
<a class="anchor" id="3"></a>
## Numbers
[Back to Table of Contents](#0)

#### Integers
* There are two types of Integers:
    * int - regular integers (0,1,2,...)
    * bool - __True__ or __False__
<p/>
* Integers are immutable
* Range is only limited by machines virtual memory

In [None]:
x = 2
print(f"The value of x is {x} and type is {type(x)}")
b = True
print(f"The value of b is {b} and type is {type(b)}")

#### Real
* float (3.14159)
* Reals are immutable* Reals are immutable
* [Double-precision 64-bit](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)

In [None]:
y = 9.123456789123456789123456789e300
print(f"The value of y is {y} and type is {type(y)}")

#### Complex
* complex - (where i^2 = -1 for example 2+3i is a complex)
* Complex numbers are immutable
* A pair of double precision floating point numbers 

In [None]:
z = complex(-4, 9); 
print (f'The value of z is {z} and type is {type(z)}') 
print (f'Its real part is {z.real} and its imaginary part is {z.imag}') 

In [None]:
z**0.5

In [None]:
(z**0.5)**2

---
<a class="anchor" id="4"></a>
## Sequences
[Back to Table of Contents](#0)

### Strings
* Immutable
* Unicode code points in the range U+0000 - U+10FFFF

In [None]:
s = 'Hello World'
print(f"""
The value  of s is        : "{s}"
The type   of s is        : {type(s)}
The length of s is        : {len(s)}
The value of s in caps is : {s.upper()}
""")

### Tuples
* Immutable
* Sequence of arbitrary Python objects
* Special Cases
    * Empty Tuple
    * Singleton Tuple, only 1 object

# empty tuple
empty = ()
print(f'The falue of empty is {empty} and type is {type(empty)}')

In [None]:
# tuple with one object
singleton = ('a',)
print(f'The value of singlton is "{singleton}" and type is "{type(singleton)}"')

In [None]:
# a string with parens, not a tuple
not_a_tuple = ('a')
print(f'The value of not_a_tuple is "{not_a_tuple}" and type is "{type(not_a_tuple)}"')

In [None]:
# a tuple of an integer and two floats
t1 = (3, 0.1, 0.04)
print(t1)

In [None]:
# operations on a tuple
tup_sum = sum(t1)
tup_sum

In [None]:
# can't always operate on a tuple - error since you can't add numbers and strings
t1 = (3, 0.1, 0.04, "pi") 
tupe_sum = sum(t1) 

In [None]:
# parenthese are optional

t2 = 'a third', 1/3
t2

In [None]:
# parenthese are optional unless needed for clarity
# e.g. a tuple of two tuples and a string
    
t3 =   (1, 2, 3), (4, 5, 6), "Hello"
t3

In [None]:
# t3 and t4 are equavelant

t4 = ((1, 2, 3), (4, 5, 6), "Hello")
t3 == t4

In [None]:
# t5 is not the same as t3 and t4

t5 = (1, 2, 3, 4, 5, 6, "Hello")
t5 == t4

In [None]:
# a tuple that contains a list - which is mutable
t6 = ('abc', 'def', ['h','i','j'])
t6

In [None]:
# you can modify a mutable object inside a tuple
t6[2][1] = 'k'
t6

### Bytes
* Immutable
* Sequence of bytes
* Bytes are 8-bit bytes, of values 0 <= x < 256

In [None]:
a = b'abc'
b = bytes('def', "ascii")
print(f'The value of a is "{a}" and type is {type(a)}')
print(f'The value of b is "{b}" and type is {type(b)}')

### Lists
The items of a list are arbitrary Python objects. Lists are formed by placing a comma-separated list of expressions in square brackets. (Note that there are no special cases needed to form lists of length 0 or 1.)

In [None]:
l = [1, 2, 3]
print(l)

#### Byte Arrays
A bytearray object is a mutable array. They are created by the built-in bytearray() constructor. Aside from being mutable (and hence unhashable), byte arrays otherwise provide the same interface and functionality as immutable bytes objects.

In [None]:
b = 

---
<a class="anchor" id="5"></a>
## Sets
[Back to Table of Contents](#0)

### Properties
* Set of immutable objects, which are...
    * Unordered
    * Finite 
    * Unique
* Sets do not have indices 
* Sets can be itterated over

#### Common uses:
* fast membership testing, 
* removing duplicates from a sequence 
* computing mathematical operations such as intersection, union, difference, and symmetric difference.

#### Two ways to create sets

In [None]:
# give a list to the set() function
a = set([2, 4, 6, 2, 3, 6, 6, 6, 4])
a

In [None]:
# set literal with curly braces
b = {2, 2, 40, 60, 60, 70}
b

#### Set Operations

In [None]:
# Union
a | b

In [None]:
# Intersection
a & b

In [None]:
# Difference (elements in a that are not in b)
a - b

In [None]:
# Symmetric Difference (elements in either a or b but not both)
a ^ b

### Frozen Sets
* These represent an immutable set. They are created by the built-in frozenset() constructor. As a frozenset is immutable and hashable, it can be used again as an element of another set, or as a dictionary key.

## Mappings
* These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item indexed by k from the mapping a; this can be used in expressions and as the target of assignments or del statements. The built-in function len() returns the number of items in a mapping.

#### Dictionaries
* finite sets of objects with arbitrary indexes
* index values must be immutable (a key's hash value must remain constant)
* insertion order is preserved

In [None]:
dictionary = {'a': 1, 'b': 2, 'c':3}
dictionary

# Additional Topics
* numpy and pandas and matplotlib
* deep dive on data frames - compare with sas datasets
* python packing and pypi.org, etc



### Special Objects

####  None
* Object of type "NoneType"
* Has no other value
* There is only one None


In [None]:
a = None
print(f"a is of type {type(a)} at address {id(a)}")
b = None
print(f"b is of type {type(b)} at address {id(b)}")

#### NotImplimented
* Object of type "NotImplimentedType"
* Has no other value
* There is only one NotImplimented

In [None]:
m = NotImplemented
print(f"m is of type {type(m)} at address {id(m)}")
n = NotImplemented
print(f"n is of type {type(n)} at address {id(n)}")

#### Ellipsis
* Object of type "ellipsis"
* Has no other value
* Truth value is True
* There is only one Ellipsis
* Useful for slicing multi-dimensional arrays in Numpy

In [None]:
e = Ellipsis
print(f"e is of type {type(e)} at address {id(e)}")
if e:
    print("e is True")