# Python Tutorial 02 - Python Data Types
## M. Virginia McSwain (Lehigh University; mcswain@lehigh.edu)

## Table of contents
* [Numbers](#numbers)
* [Strings](#strings)
* [Lists](#lists)
* [Tupels](#tupels)
* [Dictionaries](#dictionaries)
* [Sorting Lists and the Zip Command](#sorting)
* [Using External Packages](#packages)
* [NumPy Arrays](#arrays)
* [Formatting Print Output](#format)

This notebook covers several basic functionalities that will get you up and running with some simple Python procedures.  

There are 5 standard data types in Python: numbers, strings, lists, tupels, and dictionaries.  Python usage is *greatly* expanded with the use of external packages and additional data types that add nearly infinite potential.  The **`NumPy`** (Numerical Python) package is one of the most commonly used packages for scientific computing. 

<a class="anchor" id="numbers"></a>
## Numbers

There are 4 types of *numbers*:

* **Int** (signed integer)
    
    Examples:  10, 1000, -786


* **Long** (long integers)

    Examples:  51924361L, -786L


* **Float** (floating point real values)

    Examples:  0.0, -21.935, -100.


* **Complex** (complex numbers; a + bj)

    Examples:  3.14j, 0.786 + 0j


In [None]:
a = 2
b = 4.99
c = 4.1 + 0j
print (a, b, c)

You can do simple numerical calculations with the *numbers* data type.

In [None]:
x = 1
y = 2
z = x+y
print (z)
print (z*y)
print (z**y)   # The double asterisk (**) is the code for an exponent in Python.
print (x/y)

### Try it now!  

Add a cell below and try your own operations (multiply, divide, add, subtract, powers, and parenthesis):

<a class="anchor" id="strings"></a>
## Strings

A *string* is a sequence of characters that may include letters, numbers, and special characters.  

Strings are immutable, meaning that you cannot change them later without overwriting them completely. 

Either single or double quotes can be used to enclose strings.  If your string contains single or double quotes, you can accommodate this.

In [None]:
a = 'Hello world!'
b = "isn't"
c = "I want \"double quotes\" and \'single quotes\' inside my string!"
print (a)
print (b)
print (c)

You can use triple quotes to break a string across multiple lines.

In [None]:
a = '''This is an example 
      of a multi-line string.'''
print (a)

In [None]:
# You can concatenate two strings by adding them together.
a = "This is the first part, "
b = "and this is the second."
print (a+b)

You can access (but not change) any element of a string using the concept of **index numbers**. 

<img src = "Example Data Files/index_numbers.jpg">

In [None]:
# Find the occurrence of the letter ‘e’ in the string and return its index number
a = "Hello"
a.find('e')

In [None]:
# Print the character at the index position 4
a[4]

You can also use a range of indices to make a **slice** of part of a string. Note that the slice does not change the original string, so if you want to save the slice, you should define it as a new variable.

In [None]:
# Start a slice at index 1 and goes up to (but not including) index 3
b = a[1:3]

In [None]:
# Start slice at index 1 and automatically go to end
b = a[1:]

In [None]:
# Slice from beginning to end
b = a[:]

In [None]:
# Equivalent to a[1:3]
b = a[-4:-2]

In [None]:
# Omit the last 3 characters
b = a[:-3]

In [None]:
# Return only the last 3 characters
b = a[-3:]

In [None]:
# We cannot delete or remove characters from a string. 
# But deleting the string entirely is possible using the keyword del.
del a
print (a)  # This will cause an error since a was just deleted!

There are many ways to manipulate strings in Python that you might find useful.  

In [None]:
a = ['ccc', 'aaaa', 'd', 'bb']

# Join the strings in list a using colon delimiter
b = ':'.join(a)
print (b)

In [None]:
# Join the strings in list a using newline delimiter
c = '\n'.join(a)
print (c)

In [None]:
# Split the string b at each colon delimiter
# Note: this will produce a list of strings!  See below for more info about lists.
d = b.split(':')
print (d)

In [None]:
# Find the length of a string
a = 'Hello, world!'
print (len(a))

### Try it now! 

A comma-separated value (csv) formatted table functions like a primitive version of a spreadsheet, and it's a common data format used in astrophysics since it doesn't require proprietary software (eg. Microsoft Excel).

Imagine that you have a directory that is formatted as a csv file, and your string represents a single row from the file.  Use the ``split`` command to split the string into variables that represent each column.

| Name | Phone | Office |
|:-:|:-|:-:|
| Prof. McSwain | 610-758-5322 | LL 405 |

In [None]:
row = "'Prof.McSwain', '8-5322', 'LL 405'"

#name = 
#phone = 
#office = 

<a class="anchor" id="lists"></a>
## Lists

*Lists* always appear in **square brackets**.  Any data type can be used within a list, and there is no requirement that items in a single list have the same data type.  Lists are mutable (can be changed).

In [None]:
a = [1,2,3]
b = [1, 2, 'aaa']

In [None]:
# Note that adding 2 lists concatenates them.  It does *not* do mathematical addition on them.
c = [4,5,6]
d = a + c
print (d)

In [None]:
# You can also append a new value to the end of the list.
d.append(7)
print (d)

In [None]:
# Change the value of index 0 in the list d
d[0] = 12
print (d)

In [None]:
# You can delete an index from the list either of two ways:
d.pop(0)
print (d)

del c[2]
print (c)

In [None]:
# Reverse the list
a = [1, 2, 3, 4, 5]
a[::-1]

In [None]:
# What happens when you try mathematical operations on a list?
test = [1, 2, 3]
print (test*2)

In [None]:
# You can also select a slice from a list, just as you can with a string.
print (d[3:5])

<a class="anchor" id="tupels"></a>
## Tupels

*Tuples* are similar to lists, except enclosed within **parentheses** instead of brackets.  Like strings, they are immutable (cannot be updated).  In this way, tupels act like read-only lists.

In [None]:
a = (1, 2, 3)
#a[0] = 13         # doesn’t work, cannot change an element of a
print (a[0])
a = (2, 3, 4)      # have to overwrite a in order to change it

It is possible to make a list of tupels that can be used for sorting or other manipulation.

In [None]:
a = [(1, 'b'), (3, 'a'), (2, 'a'), (1, 'a')]

Tuples are often used to associate a fixed number of elements that are related.  Some examples might be storing a set of (x, y, z) coordinates or making a phone book. 

In [None]:
tel = [('Ginny McSwain', '610-758-5322'), ('Physics Dept', '610-758-3930'), ('Dean of CAS', '610-758-4570')]
sorted(tel)

A list of tupels can be changed.  (It is the list that is changing, not the tupels itself.)

In [None]:
tel[0]=('HELP Desk', '610-758-HELP')
print (tel)
new = ('Ginny McSwain', '610-758-5322')
tel.append(new)
print (tel)

In [None]:
test = (1, 2, 3)
test2 = (4, 5, 6)
result = test+test2
print (result)

### Try it now! 

Using a list of coordinate pairs, assign the first x value and first y value each to their own new variable.

In [None]:
coords = [ (1, 2), (3,4), (5,6), (7,8) ]

#x1 =                # write a function to return the value 1 from the list
#y1 =                # write a function to return the value 2

<a class="anchor" id="dictionaries"></a>
## Dictionaries

Lists and strings are indexed by numbers, but **dictionaries** are indexed by **keys** and are defined by curly brackets:

```
Dict = {key: value}
```

Keys may be strings or numbers, and even sometimes tuples (as long as the tuple is immutable).  Keys may not be lists since lists are mutable.

Within one dictionary, the keys must be unique.  There may be more than one key:value pair. 

```
Dict1 = {‘Class’: ‘ASTR 302’, ‘Professor’: ‘McSwain’}
Dict2 = {‘Class’: ‘ASTR 105’, ‘Professor’: ‘Pepper’}
```

Personally, I don't encounter dictionaries very often in Python computing, but I think they are common in certain applications (especially databases).  We will use them in Tutorial 6 to save new data to a spreadsheet format.

In [None]:
tel = {'Jill': 81234, 'Jack': 82345, 'Anne': 89876, 'Andy': 80345}

# Print the dictionary tel
tel

In [None]:
# Return Anne's number
tel['Anne']

In [None]:
# Change Jack's number
tel['Jack'] = 12345

In [None]:
# Add Sam to the telephone directory
tel['Sam'] = 87654

In [None]:
# Delete Jill's entry
del tel['Jill']

In [None]:
# Return all of the keys in Tel
tel.keys()

In [None]:
# Test whether Sam is a key in Tel (1 if true, 0 if false)
'Sam' in tel

Dictionaries may also have more than one value per key.

In [None]:
tel = {'Jenny': [5558675309, 6108675309]}
tel['Jenny']
#tel['Jenny'][1]       # returns index 1 from the list of values

<a class="anchor" id="sorting"></a>
## Sorting Lists and the Zip Command

A common task in programming is to sort lists of items.  By default, sorting a list will place them in increasing numeric (or alphabetic) order, but there are other sorting keys that you might choose to use.  Reverse sorting is easy, too.  Here are a few simple examples.

In [None]:
a = [4, 2, 1, 6]
sorted(a)
#help(sorted)               # view optional features

In [1]:
# Reverse sort
sorted(a, reverse=True)

NameError: name 'a' is not defined

You can custom sort using any attribute of your list that can be pulled from built-in Python functions.  For example, you can sort a list of strings according to their lengths.

In [None]:
a = ['ccc', 'aaaa', 'd', 'bb']
sorted(a, key=len)    # sorts using the built-in function len as the sorting key

You can even define your own sort key!  For example, let's sort a list of strings alphabetically by the last element. 

In [None]:
def Last(s):
    return s[-1]

a = ['ccc', 'aaaz', 'd', 'bb']
sorted(a, key=Last)

You can sort a list of tupels, too.  By default, **`sorted`** will sort by the first element then the second.  

In [None]:
a = [(1, 'b'), (3, 'a'), (2, 'a'), (1, 'a')]
sorted(a)   

In [None]:
a = [(5, 'b'), (2, 'a'), (3, 'e'), (1, 'c'), (7, 'd')]
sorted(a)

You can also sort tuples on the second element alone.

In [None]:
def takeSecond(elem):
    return elem[1]

a = [(1, 'b'), (2, 'a'), (3, 'e'), (5, 'c'), (7, 'd')]
sorted(a, key=takeSecond)

### Zipping two related lists

Let's say you have 2 separate lists that need to be sorted together (such as x and y coordinate pairs).  Python has a function called `zip` that will link them as a tupel, and then you can sort them. 

In [None]:
x = [1, 5, 1, 9]
y = [1, 25, 9, 81]
coordlist = zip(x, y)
newlist = sorted(coordlist)
print (newlist)

You can also "unzip" the tupel.  Here, we'll unzip the newly sorted lists of x and y values into new lists.

In [None]:
x2, y2 = zip(*newlist)
print (x2)
print (y2)

<a class="anchor" id="packages"></a>
## Using External Packages

Most of the power of using Python comes from incorporating the wide assortment of open source packages that have been developed.  For scientific computing, the following packages are widely used (and come with Anaconda by default):
* **`numpy`** (Numerical Python)
* **`scipy`** (Scientific Python)
* **`matplotlib`** (Gives you a MatLab-like environment)
* **`pandas`** (A data analysis package)

The following packages are a few have been developed specifically for professional and amateur astronomers:
* **`astropy`** (A community Python library for Astronomy)
* **`APLpy`** (Astronomical Plotting Library in Python)
* **`astroquery`** (Query common astronomical datasets)
* **`pyephem`** (Computes the positions of astronomical objects)
* **`galpy`** (A Python library for Galactic dynamics)

If you need to install a new package that you don't already have (say, **`time`**), Anaconda includes a program called **`pip`** that makes this easy!  Use JupyterLab's Launcher to start a terminal.  From the terminal command line, type: **`pip install time`** .  (Data Lab users, you probably need to contact the NOIRLab tech support to make the request.)


In [2]:
import numpy as np

In [None]:
# Show all available commands in the numpy package (Warning: long output!)
dir (np)

In [None]:
# Get the help file for numpy (VERY long output!!!)
help(np.zeros)

In [None]:
# Get the help file for a specific numpy procedure
help(np.zeros)

<a class="anchor" id="arrays"></a>
## NumPy Arrays

For scientific computing, using lists of numbers isn't very practical since you can't use them for mathematical operations.  Instead, it's common to use numerical **arrays** using the **`numpy`** package. You will find that it's usually more efficient to use numpy arrays than lists of numbers.

In [None]:
# Create an empty, one-dimensional array with 4 positions.
a = np.zeros(4)
print (a)

In [None]:
# Create an empty, 3x5 matrix. 
a = np.zeros( (3,5) )
print (a)

In [None]:
# Create a numerical array from a list of numbers.
a = np.array( [1., 2., 3.])
print (a)
print (a+1)

In [None]:
# Create an evenly spaced array of numbers between 1-10, spaced 0.1 apart.  
# Note that the final value (10) is NOT included in the array.
a = np.arange(1, 10.1, 0.1)
print (a)

In [None]:
# Create an evenly spaced array of numbers between 1-10, with 15 steps.
# Note that the final value IS included in this array.
a = np.linspace(1, 10, 15)
print (a)

In [None]:
# Convert an existing list of numbers to a numpy array
x = [1, 2, 3, 4, 5]
newx = np.array(x)

**`Numpy`** includes lots of useful built in numerical features, including the very useful ability to determine where a certain condition is met using **`numpy.where`**.  (See also Tutorial 3 with the complete list of Boolean operators you can use here.)

Consider a data set that has a large number of values, some of which might be flagged with a quality indicator.  A flag might indicate values of 0 are ok, but non-zero values have some inherent problem.  We want to retrieve only the good values and dismiss the others. 

In [18]:
x = np.array([1, 2, 3, 4, 5])
flag = np.array([0, 0, 3, 4, 0])

# This usage of np.where returns the indices where the flag is equal to 0. 
good = np.where(flag == 0)
x_good = x[good]
print (good)  # contains an array of indices
print (x_good)  # print the results of the array at those indices

(array([0, 1, 4]),)
[1 2 5]


Another way to use the **`numpy.where`** function is to return a particular value when a condition is true.  For example, let's return a value of 1 when the condition is true, and a value of 0 when it is not.  This is useful if you want to quickly count how many values meet the condition. 

In [19]:
x = np.array([1, 2, 3, 4, 5])
large = np.where(x >= 3, 1, 0)
print (large)

howmany = np.sum(large)
print (howmany)

[0 0 1 1 1]
3


You can apply multiple conditions with **`numpy.where`**.  For example, consider the case of many points in a Cartesian plane, and you want to retrieve only the points within a given quadrant. 

In [25]:
x = np.array([-3, -2, -1, 0, 1, 2, 3])
y = np.array([5, 4, 3, 2, 1, 0, -1])
keep = np.where((x < 0) & (y > 0))
print (keep)

(array([0, 1, 2]),)


Note that using **`numpy.where`** requires that the operand be in the **`numpy`** array format, so if you're using an ordinary list you will need to convert it.  This list could include strings, too!  For example:

In [29]:
x = np.array(['A', 'B', 'C'])
print (np.where(x == 'B'))

(array([1]),)


<a class="anchor" id="format"></a>
## Formatting Print Output

When you are printing results from some numerical calculation, it's often helpful to format your output for clarity.  You might want to align output into columns of specific width, or you may just want to limit the displayed results to a certain number of sig-figs.  Either way, Python offers several methods to do this. 

For starters, consider the following mathematical output (unformatted).  (I'm using a few simple **`for`** and **`while`** loops here, too...  More on these in the next tutorial!)

In [None]:
import numpy as np

a = np.arange(1, 10, 1)
b = 7
for item in a:
    c = item/b
    print (c)

I don't know about you, but my head hurts trying to read all those decimal places!

The following code prints the calculation for the variable **`c`**, but now **`c`** is now formatted into a fixed-width column of 4 spaces (including the decimal) and 2 decimal places.

In [None]:
a = np.arange(1, 10, 1)
b = 7
for item in a:
    c = item/b
    print ('{:4.2f}'.format(c))

This can be particularly useful when printing output that might be combined with some text.  But consider the combination of strings and numbers below.  Without formatting the length of the string, we end up with output that is still difficult to read. 

In [None]:
string = ['Betelgeuse', 'Rigel', 'Vega']
a = [1, 2, 3]
b = [7, 7, 7]

i = 0
while i < 3:
    c = a[i]/b[i]
    print (string[i], '{:4.2f}'.format(c))
    i += 1

If we specify a fixed-width of 12 spaces for the strings (s), and 4 spaces for the floating point values (f), we find the result is much easier to read. 

In [None]:
string = ['Betelgeuse', 'Rigel', 'Vega']
a = [1, 2, 3]
b = [7, 7, 7]

i = 0
while i < 3:
    c = a[i]/b[i]
    print ('{:12s}{:4.2f}'.format(string[i], c))
    i += 1