# Section 2 - Python Data Types

Objectives:
- Output
- Common data types in basic Python

## 2.1 Output

The **print** command is your go-to for outputting any numbers or text to the screen. It can output one thing at a time, or multiple things at once, separated by commas:

In [None]:
x = 12
print(x)
print('x squared is',x**2)
x**2

Notice that last line output 144 to the screen without the "print" command. Whenever the last line of code performs a calculation and the result isn't stored anywhere, Python will just output it to the screen. I use this often with SymPy, as it allows expressions to be output in a nice formatting, whereas print does not. See this output, for example:

In [None]:
import sympy as sp
x = sp.symbols('x')
print(x**2 / 3) # Outputs expression in plain text. Not as readable.
x**2 / 3

## 2.2 Numerical Data Types

Unlinke C, Java, etc., you do NOT have to specify a data type when creating a numerical variable- Python will infer based on input, though you don't always have to rely on that.

For more information on all data types mentioned in this notebook and much more, see https://docs.python.org/3/library/stdtypes.html

### int (Integers) and float (Floating Point Numbers)

- Basic numerical data types. They hold a single number.
- int = integer. 
- float = decimal number. The largest it can hold is ~$1.8*10^{308}$. Anything larger will be stored as "inf"

Integers can be written as floats by putting a decimal at the end.

In [None]:
a = 12 # int
b = 12. # float
print(a, b)

c = 1.82*10**308 # too large
d = -1.82*10**308 # too negative
print(c,d)

To check a variable's data type (useful for SymPy), use the **type()** command:

In [None]:
print(type(a))
print(type(b))

#### Operations on int and float

- **Assignment**: = (assign right-hand side to the variable named in the left-hand side)
- **Addition**: +
- **Subtraction**: -
- **Negative**: Don't need to define a negative by "0 - x". Just put "-x"
- **Multiplication**: *
- **Standard Division**: /
- **Floor Division**: // (returns the integer divisor, ignores remainder)
- **Modulus**: %
- **Power**: ** (two stars, NOT the caret symbol ^. This is different from most other languages!)
- **Power**: pow(x,y) (same as the above, just written differently)
- **Absolute Value**: abs(x)

Arithmetic operations behave as we expect, for the most part.

For example, int + int = int, while int + float = float. Division by 0 will stop your script and give you an error message.

Some caveats:
- int / int = float, even if the result is an integer.
- inf - inf = inf (as opposed to undefined. See the below example)

In [None]:
c-d # infinity - infinity = +infinity?

#### The math package

This provides a wider variety of operations to be applied to ints and floats than basic Python can do. Some examples:
- **math.floor(x)** = x rounded down, returns an int
- **math.ceil(x)** = x rounded up, returns an int
- **math.round(x)** = x rounded to the nearest int
- **math.round(x,n)** = x rounded to n decimal places, returns a float

### Booleans

- Bit-sized data type. Can only hold a 0 (False) or 1 (True)
- Output from logical tests

#### Logical tests

- **Equal to**: x == y
- **Greater than**: x > y
- **Greater than or equal to**: x >= y
- **Less than**: x < y
- **Less than or equal to**: x <= y
- **Not Equal to**: x != y
- **Is an integer**: x.is_integer() (returns True if x is an int, False if not)
- And many more like the example above.

#### Operations on booleans

- **or**: x | y (bitwise OR of x and y)
- **xor**: x ^ y (bitwise EXCLUSIVE OR of x and y)
- **and**: x & y (bitwise AND of x and y)
- **not**: ~x (bitwise NEGATION of x)

Here, x and y can be booleans or logical tests that return a boolean.

**Your turn**: Write a line of code that tests the statement "$x \geq 30$ or $x < 1$". The variable $x$ has been defined for you, but you may change its value to test your code.

In [None]:
x = 21



####  (Optional) Bitwise operations on binary numbers

Booleans are just 1-digit binary numbers. OR, XOR, AND, and NOT all have natural extensions to binary numbers of the same length. In addition, we have
- **Bitshift left by n places**: x << n
- **Bitshift right by n places**: x << n

All of these can be directly applied to integer numbers- under the hood, the computer sees them as binary numbers anyway.

**(Optional) Your turn:** First, bitshift the number 255 to the right by 2 places. Then in the second cell, compute the bitwise AND of 50 and 53.

## 2.3 Sequence Data Types

Three basic sequence types: lists, tuples, and ranges. More advanced sequence types, such as text strings, are built on these types.

Technical note: you'll see the word "**iterable**" in Python documentation and in some error messages. This is a category including all data types that can be iterated over (like in a for loop). Sequence data types are all iterables.

- **list**: changeable sequence of data entries, allows for collections of different data types, though this can decrease speed of operations on the list.
- **tuple**: immutable sequences of data, usually heterogeneous data
- **range**: Looks like a list, but is also immutable. Lists integer numbers $A\leq x < B$, or $A + n*i$ where $i$ is an integer step. Used for looping mostly, as it can only hold integers.

In [None]:
listA = [1, 2, 3] # Lists are in SQUARE BRACKETS
tupleA = (1, 2, 3) # Tuples are in PARENTHESIS
rangeA = range(1,4) # all integers 1 <= x < 4

### Common Operations on Sequence Data Types

Let **s** be a sequence data type. Then we can do...

- **x in s** (Set inclusion, returns a Boolean)
- **x not in s** (Set exclusion, returns a Boolean)
- **s + t** (Concatenation of two sequence data types)
- **s * n** (Self-concatenation n times. **n * s** will also work)
- **len(s)** (Length of s)
- **min(s)** (Minimum of s. Can fail if mixed data types are not comparable.)
- **max(s)** (Maximum of s. Can fail if mixed data types are not comparable.)

### Indexing and Slicing

You can retrieve an entry from such a data type by referring to the entry's *index* using square brackets. **In Python, the first entry is at index 0**.

- **s[i]** gives the *i*th item
- **s[i:j]** gives a new sequence (of the same data type) containing entries *i* through *j* (not including the *j*th entry). This is called *slicing*.
- **s[i:j:k]** gives a slice containing entries *i*, *i+k*, *i+2k*, ..., through *j* (not including the *j*th entry even if *j=i+nk* for some *n*).
- You can let *j = -n* if you want to tell Python to slice from *i* to *n* entries before the end.
- **s[i:]** gives a slice containing all entries from *i* through the end of the sequence
- **s[:i]** gives a slice containing all entries from the beginning of the sequence to *i* (excluding the *i*th entry)

**Your Turn:** I've created a list X containing a handful of numbers. Write a line of code that returns the 4th entry (by Python counting) through the second to last entry (including that entry).

In [None]:
X = [x for x in range(0,100)] # We'll talk about this line of code in a bit.



### (Optional) Ranges

Typically used as a quick number list for a loop. They are NOT changeable. Here is how you create one:

In [None]:
# range(start, end, step_size)
samplerange = range(0,10,2) # all even integers 0 <= x < 10.

### (Optional) Tuples

Also not changeable. Can be different data types. Several ways to create one:

In [None]:
emptytuple = ()
sampletuple1 = 1, 2, 3 # The first two are the same
sampletuple1 = (1, 2, 3)
singletontuple = 1, # shortcut for a singleton tuple
singletontuple = (1,)
sampletuple1 = tuple([1, 2, 3]) # use "tuple()" to make a tuple out of any iterable

Not too much to say about these for now, except print statements will take a  tuple of text strings and variables and output them in order, with spaces inserted for you:

In [None]:
print('The tuple', sampletuple1, 'is pretty boring.')

### Lists

The most important sequence data type. They're changeable, sliceable, have more features than tuples or ranges, and there are many ways to create one:

In [None]:
emptylist = []

listA = [1, 2, 3]
listA = list((1, 2, 3)) # use "list()" to make a list out of any iterable

### List Comprehension

The most important means of creating a list. The basic idea is: for each entry in some iterable data type (usually a range or another list), perform some operations to it, then save the result to the same index position in a new list. The syntax looks like:

newlist = \[ new_entry_depending_on_x **for** x **in** some_iterable\]

Many SymPy exercises the department has assigned require the use of list comprehension.

**Your turn**: Use list comprehension to compute $x^2 - 4$ for all $x = \{0, 1, 2, 3, 4\}$. I've set up X as a range for you.

In [None]:
X = range(0,5)



### Features of Lists

- **listname[i] = x** Replaces the entry at index *i* with *x*
- **listname[i:j] = x** Replaces all entries from *i* to *j-1* with *x*
- **listname.append(x)** Appends *x* to the end of the list as a new entry
- **listname.extend(secondlist)** List concatenation. Appends secondlist's entries to the end of the list
- **listname.remove(x)** removes the first entry from the list that is equal to x
- **del listname[i:j]** deletes multiple entries from the list

Some more neat features:
- **sort(listname)** will sort the list according to the relation <, however it's defined for the data type. Caution: this modifies the original list, so make a copy if you need to keep the original ordering for something else. Will fail if mixed data types are not comparable, and depending on how far the sort got, will leave the list in a partially-sorted state.
- **sum(listname)** will add up all terms of the list (in index order) according to the binary operation +, however it's defined for the data types involved. If the data types can't be added together, it will fail.

### (Optional) Strings

Text strings are usually written in Python with single OR double-quotation marks.

Under the hood, text strings in Python are just lists of single characters, so everything we've talked about with lists can be done to strings as well:

In [None]:
sometext = 'Here is some text. Aren\'t I creative?'
print(sometext[19:])
print(sometext[0])
print(sometext + ' Why yes I am.')

Notice that I had to put a " \\ " before the " ' ". Ordinarily, the " ' " would have ended the string, but this tells Python to interpret that as a character in the string and keep going. This is called an *escape sequence*. Other useful escape sequences include:
- **\n** (newline character)
- **\t** (tab character)

Be careful when counting, as an escape sequence counts as a single character.

## 2.4 Set Data Types

A **set** is a mathematical set, equipped with all the set operations we know and love. As you'd expect, there are no repeating elements in a set. A **frozenset** is the same thing, but you cannot change a frozenset- you can only save changes to a copy of it.

There are several ways to make a set:

In [None]:
set1 = {1, 2, 3} # Notice the pattern: (tuple), [list], {set}
set1 = set([1, 2, 2, 3]) # Use "set" to create a set from some other iterable
set1 # Notice I repeated the 2 in the input, but it was eliminated during conversion.

Similarities between a set type and a sequence data type:
- They are also iterable, meaning you can use one for a for loop
- **len**, **in**, and **not in** all work the same way

Differences:
- They are NOT indexable nor sliceable
- In fact, there is no order to them whatsoever

Sets do have a wide variety of features that sequence data types cannot use:
- **A.isdisjoint(B)**: are A and B disjoint? Answers with a boolean
- **A.issubset(B)** or **A <= B**: Is A a subset of B? Answers with a boolean
- **A < B**: similar, but tests for proper subset
- **A.issuperset(B)** or **A >= B**: Is A a superset of B?
- **A > B**: similar, but tests for proper superset
- **A.union(B,C)** or **A | B | C**: computes $A\cup B\cup C$
- **A.intersection(B,C)** or **A & B & C**: computes $A\cap B\cap C$
- **A.difference(B,C)** or **A - B - C**: computes $A\backslash B\backslash C$
- **A.symmetric_difference(B)** or **A ^ B**: computes the symmetric difference between $A$ and $B$

## 2.5 (Optional) Mapping Data Types

An unchangeable data type whose index is not the natural numbers, but almost any data type, most commonly text strings. The only basic data type in this category is the **dictionary**, or **dict**. The following all create the same dictionary:

In [None]:
a = dict(one=1, two=2, three=3)

b = {'one': 1, 'two': 2, 'three': 3}

c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))

d = dict([('two', 2), ('one', 1), ('three', 3)])

e = dict({'three': 3, 'one': 1, 'two': 2})

f = dict({'one': 1, 'three': 3}, two=2)

Like an actual dictionary, you look up the entry using its *keyword*:

In [None]:
a['one'] # Note we use square brackets here

If you are solving a system of multiple equations in SymPy, the output will be a dictionary whose keywords are the variables involved.

## 2.6 Troubleshooting

In general, an error message has two parts: a *traceback*, or an attempt to show you exactly where the error occurred, and the *TypeError* message, which often gives a helpful description of what happened. The traceback will often go through layers of underlying code (for example, if the error occurred when using SymPy's integrate function, often it will show you the underlying code for that function), which may or may not be helpful, so if you get a long traceback, look for the part that shows YOUR code.

Throughout the workshop, I'll conclude a notebook by pointing out common mistakes related to the notebook's topics.

1) Data type mismatch between operations

In [None]:
27 > [1] # Can't compare an integer to a list!

2) Accessing a list/tuple/range entry that doesn't exist. (Don't forget that Python indexing begins at 0!)

In [None]:
list1 = ['A', 'B', 'C'] # Index only goes from 0 to 2, not 1 to 3
list1[3]