These notes are based on the material in Chapter 2 of Dive into Python.


# Concepts

Computer languages have the concept of a _data type_ that comes down to the question, _what sort of information am I storing or looking at here?_  Python lets you intermix types more easily than many languages, but we need to have a good understanding of what these data types are and why we'd use one over another.  

The simplest split in some ways is between a number and a string of characters.  So, while we know **1** and **"one"** in many ways mean the same thing, they're very different to your program.  The first is a number and the second is a string of three characters.  That may seem obvious (I hope it does), but **1** and **"1"**are different as well.  Heck, **1** is not always the same thing as **1.0**.  The former is an integer and the latter a floating point number.  Again, Python takes a lot of the hurt out of working with these, but we need to know what they are, how to convert between them, and why we'd care about one or the other.


# Int and float (numbers)

Integers are "whole numbers" and floating point numbers let you have a decimal place. That's the 1 vs. 1.0 noted above.  If you use a number like 10, it assumes you're talking about an integer.  If you give it 10.1, it assumes you're talking about a float.  Python defaults to integers, but will convert on the fly as needed.  This is what makes Python a _dynamically typed_ language.  We'll touch base on dynamic vs. static typing later.

_Question: Why do you think it defaults to integers?_


In [None]:
x=1
y=1.1
type(x)
type(y)
print('x is {} and y is {}'.format(type(x),type(y)))
z=x+y
print('z is {}'.format(type(z)))
x=x+0.1
print('Now x is {}'.format(type(x)))

# Boolean

Booleans can have values of only _True_ or _False_.  There is no other option.  Flags, logic statements, etc. are either explicitly or implicitly booleans.


```
do_it = True
if do_it:
   do_something
x=10
if x>10:
   do_something
if x:
   do_something
```


In each of those cases, the do_something would be done.  In the first, we made a variable called do_it and set it to True, so when evaluating that if-statement, True is True, so it gets done.  In the second case, we tested if x>10.  That x>10 first gets evaluated by Python and turned into True or False.  So, again, we have an "if True" and we run the do_something line.

The last one there shows a handy shortcut.  All non-zero numbers (ints or floats) are True.  Anything non-zero, even 0.0000000000001, is True.


# Strings

Anything vaguely resembling text (including a text-based version of a number) is a string (or set of characters.  We tell Python it's a string by putting it in single-quotes or double-quotes. Python really doesn't care if you say 'foo' or "foo".

One nice thing about this is that you can make strings that have quotes in them (in other languages, this is a royal pain, so be happy to be in Python!).  Let's say you want _He said "Whuddup yo?" to me!_ as a string.



*   `a="He said "Whuddup yo?" to me!"` won't work as that has _"He said "_ as one string and then some non-Python command of `Whuddup yo?` and then another string of _" to me!"_
*   `a='He said "Whuddup yo?" to me!'` though will work.

If you've got a big, multi-line block of text to define, you can use triple-single or triple-double quotes.  For example:


```
"""There are only two kinds of people in the world
1. Those who can extrapolate from incomplete data
"""
```


Keep in mind that while you can store numbers in text, they're not numbers per se anymore if you do.

Try running this:

In [None]:
a=1.0
b='1.0'
a==b

## Indexing and formatting strings

Internally, strings are represented much like "arrays" or lists.  Think of them as an ordered set of characters.  So, you can index strings like _a[3]_ to get the 3rd character in _a_.  See more indexing in lists below for the syntax, but this does let you grab (or change) portions of a string nicely.

More often, though, we'll want to create strings that have numbers in them that are the result of some calculations we've done and that are in a nice, readable format.  Much of this is covered in Chapter 4 of Dive into Python, but a few highlights here.

Say you want to create a string that's based got some text with a few numbers or other variables inserted in key places.  Maybe you even want to format those inserted numbers in a nice, pretty way. The _.format()_ function is your friend here


### Basic examples


```
'I have {0} eyes and {1} spleen'.format(2,1)
```


This means take the value 2 (first one specified) and insert it into the first place-holder {0}.  Take the value 1 and put it into the second one {1}. (Yes, Python always counts from zero).

This is the same as:


```
'I have {1} eyes and {0} spleen'.format(1,2)
```

It's also the same as:

```
'I have {} eyes and {} spleen'.format(2,1)
```
I'll assume that you're going in order if you just use that format without any indices inside.  

You can put anything that Python can evaluate here into your `.format`.  For example:


```
'I have {} neurons in my {}'.format(10 ** 11,'brain')
```
***Exercise - Try something here in the code box below:***

### Don't look like a Python newbie
The `.format` syntax is very useful in that it gives the most flexibility. But, that flexibility comes at the cost of ease.  We now have a newer variant of `.format` called an *f-string* that you invoke by putting an `f` at the start of your string (before the quotation mark).  Here's how it looks:
```
region='brain'
n_neurons=10 ** 11
f'I have {n_neurons} neurons in my {region}'
```
Anything inside that `{}` gets evaluated by Python before being formatted and put into the string.  So, here's a bit of sample code you can run:


In [None]:
dogs=10
cats=3
birds=4
print(f'I have {dogs+cats+birds} pets: {dogs} dogs, {cats} cats and {birds} birds. I have {cats/birds}x as many cats as birds')

### Fancier formatting

In either format, instead of just the placeholder on its own (e.g., `{0}` or `{birds}`), you can tell Python how to format a number put in that placeholder.  You can control things like the width (number of characters used), the decimal point precision, and all sorts of things with this.  For example:

*   `{0:.1f}`  = Insert the 0th (first) parameter here, treat it as a floating point number (the f bit) but only show one decimal point (the 1)
*   `{0:^20}` = Put in zero, but pad it with spaces on either side to center it such that the total width is 20 characters wide
*   `{0:e}` = Use exponential notation

If you can imagine it possibly being vaguely thought of as potentially useful, there's a format specifier for it

[ https://docs.python.org/3.1/library/string.html#format-specification-mini-language](https://docs.python.org/3.1/library/string.html#format-specification-mini-language)

Here's an example of how that might be handy:

In [None]:
dogs=10
cats=3
birds=4
print(f'I have {dogs+cats+birds} pets: {dogs} dogs, {cats} cats and {birds} birds. I have {birds/cats}x as many birds as cats')
print('vs.')
print(f'I have {dogs+cats+birds} pets: {dogs} dogs, {cats} cats and {birds} birds. I have {birds/cats:.2f}x as many birds as cats')


# Lists

Until this point, we might as well have been talking about C/C++ or MATLAB as ints, floats, bools, strings, etc. are all pretty standard.  Python's lists, however, set Python apart as they are incredibly easy to use and versatile.  A list is just an ordered set of items and in Python, those items can be whatever you want.  We define a list by putting the elements in square brackets - `[]` - and we separate elements with the comma.

Here's an example to run:


In [None]:
a=[1,2,3]
print(a)

b=['foo','bar', 3.14]
print(b)

c=['start',a,'mid',b,'end']
print(c)


It's really as easy as that.  Just put things - any things - in square brackets and separate them with commas.  Heck, put lists inside of lists.  Python doesn't care.

Note, this is one-dimensional - it's a list not an array (one of the reasons we use NumPy and its ndarray)


## Can slice, dice, and even julienne your lists

Commit now.  Zero is a number and everything starts at 0.  Get your head in that space and life will be easier in Python where indices are 0-indexed (first one is 0 not 1).  So `a[0]` is always the first item in `a`.

With that out of the way, in our prior example, we could say something like `b[1]` and Python would evaluate this to `'bar'`.  But we often don't just want elements, but a range of elements.  We specify the start and stop for the range by `a[start:stop]`. 

So far so good, but there's a classic hangup here for folks.  When giving a range like `a[index1:index2]`, know that the **_index2 is not included_**.  So `a[0:2]` gives `a[0]` and `a[1]` but not `a[2]`.  You can think of it as `a[start:don't_include]` or `a[start:end-1]` if you like.  While it seems odd, it does have its uses in that if you're going from the beginning of the list and want N items, it's just `a[0:N]` (since Python is zero-indexed).

That `c` list still exists for you here (if you ran the code above).  ***Exercise - Try printing out _'start', [1, 2, 3], 'mid', ['foo', 'bar', 3.14]_ in the code below and _[1, 2, 3], 'mid', ['foo', 'bar', 3.14], 'end'_ as well.***

In [None]:
print(c[:])
print(c)

print(c[0:4])
print(c[:4])
print(c[:-1])

print(c[1:5])
print(c[1:])

## The julienne bit

Whoever wrote Python really wanted to be able to have these lists easy to work with as there are a number of neat features here.  One is a negative index feature with `a[-1]` being the last item, `a[-2]` being the next-to-last, etc.  Since this is 0-indexed it kinda makes sense.  In the above, `a` is 3-items long and the first is `a[0]`, the last `a[2]`.  So, _a_'s length of 3 minus 1 is 2, and the last item is `a[2]` aka `a[-1]`.  Of course `a[-2]` then is really `a[1]` here -- aka 2nd from the end.

In addition, when using ranges, can leave out one or both sides and the end will be the default, so you don't need to know how long the list actually is, for example.



*   `a[:3]` - From start to 2
*   `a[2:]` - From 2 to end
*   `a[:]` - From start to end (aka all items)

You also have a _stride_ you can apply.  The default stride is 1, meaning you go from start up to end (but not end) hitting every item.  If the stride is 2, it'll skip over one.  So `a[0:5:2]` would give `a[0]`, `a[2]`, and `a[4]`.  Strides can be negative to go backwards as well.

**_Exercise - Use the 's' list below to create the following, using only the start, stop, stride indexing_**

`s=['c','d','a','o','t','g','s']`

And create:
```
['c', 'a', 't', 's']
['d', 'o', 'g']
['s', 'o']
```

In [None]:
s=['c','d','a','o','t','g','s']

print(s[::2])
print(s[1::2])
print(s[-1:-5:-3])

['c', 'a', 't', 's']
['d', 'o', 'g']
['s', 'o']


## Ranges

Here, we might as well introduce the `range` function that lets you create a, well, range of elements.  The format is `range([start], stop[, step])`. So, you **_can_** give it a start and a step but you **_must_** give it a stop.  The start defaults to 0, the step defaults to 1, so if you say `range(4)` it will give you a list that is 4 items long, starting at 0 (aka 0-3).

Now, technically, I just lied.  In Python 2, `range` creates and returns a list.  In Python 3, it creates an "iterator" (what Python 2 called `xrange`.)  If you're using this for something like a for-loop (which we'll get to), it really doesn't matter.  If you want `range` to create an honest-to-goodness list using, you need to do something like `list(range(10))` or `list(range(2,10,3))`.  As-is, `range()` is great for things like for-loops (covered later).


## Extending lists, inserting into lists, etc.

Python lists aren't MATLAB or NumPy vectors/arrays, so things that look like math aren't math.  Lists are just ordered sets of things.  Keep this clear in your head as we introduce NumPy later.  It does mean we can have a simple syntax to do things like stick two lists (a and b) together:



In [1]:
a=[1,2,3]
b=['foo','bar', 3.14]
c=a + b
print(c)

[1, 2, 3, 'foo', 'bar', 3.14]


This makes a new list, calls it `c` and puts the concatenation of `a` and `b` into it.  If you'd wanted to store that in `a` (aka, append `b` onto `a`) rather than save it in a new `c`, you could have just done `a=a+b`

There also is `a.append()`.  This will append a single item to `a` (this is destructive, meaning calling this alters `a`)


In [2]:
a=[1,2,3]
b=['foo','bar', 3.14]
print(a)
a.append(4)
print(a)
a.append(b)
print(a)


[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, ['foo', 'bar', 3.14]]



Note how that if `b` is a list, this will turn the list b into a single item and appended it as the next item in b rather than concatenating giving you something like `[1, 2, 3, ['foo','bar', 3.14]]`.  Lists can have lists as elements.  That may not be what you want, though, so we also have things like:

*   `a.extend(b)`: Take whatever items in list `b` that are there and put each into _a_ (destructive). 
*   `a.insert(index,item)`: Put a single _item_ into the place just before _index_ (destructive)

### Quick diversion into object-oriented programming

Note these last 3 are calls to functions built into each and every list.  Everything is an "object" and can (and does) have many built-in functions.  This is what "classes" are all about.  We have a **variable** `a` that is an **object** of **class** list. 

Try this:

In [3]:
dir(a)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

See all the stuff in there?  Want to know what something like "reverse" does?

In [4]:
help(a.reverse)

Help on built-in function reverse:

reverse() method of builtins.list instance
    Reverse *IN PLACE*.



_Remember, you can open a Scratch cell in Google Colab (Insert, Scratch code cell) and use that for things like "help" commands._

Each thing in Python knows how to do a lot of stuff to itself.  This is what "object-oriented programming" is all about.

You can, when you want to, take an existing class and make your own version of this class to add new features you want to have so you have a customized version of something like Python's list.  Maybe, you'd want to write a function `mathadd()` so that you could write `a.mathadd(b)` and have it actually return a+b in a numeric sense rather than a concatenation.  If you did a lot of that, you'd have written NumPy.

## Removing elements

Want to remove some elements?  One way is to use index ranges to get to what you want with something like `a=a[0:2]`.  Another is to use the _del_ function to destructively remove an element with `del a[place]`



## Other handy list functions


### How big is the list?


```
len(a)
a.__len__()
```


_Thought: Do you find it strange that these both exist?  Why do both exist?  What's the difference between them?_


### Finding things in lists

How many times does 2 appear in _a_?


```
a.count(2)
```


Is 2 in the list _a_?


```
2 in a
```


What is the **_first_** occurrence of 2 in _a_?


```
a.index(2)
```


Note, if not in there, it raises an exception.  If you don't want to handle the exception, check to see if it's in there first.


### Copy-on-create

 File this away as a handy function that Python lists can do for you that NumPy's arrays (which you'll use a ton) won't do.  Let's say that you have:


```
a=[1,2,3]
b=a*3 
```


What do you think _b_ has become?  It's not `[3,6,9]` but it'll make 3 copies of `a` (aka `[1,2,3,1,2,3,1,2,3]`).  Its just like the `a+b` thing above, but it's `a+a+a`.  Remember, lists are lists not matrices / vectors (contrary to MATLAB) and they can have anything in them - strings, other lists, whatever you want (and that might not make sense for math).  If you want math here, you'll need to either use NumPy (which would make `b` into `[3,6,9]`) or use a list comprehension (more on that later)

## Tuples

Whether it's a tuh-pul or too-pull, it's a read-only (pre-defined) list.  Once made, you can't change them, but they're very fast the immutable nature can make them safer.  To make a tuple, just use `()` instead of the `[]` you used for lists.

One nice plus here is that use can assign multiple values at once:


```
a=[1,2,3]
(x,y,z)=a
```


This will make x=1, y=2, z=3.  This comes in really handy with functions:


```
def circ_and_area(r)
   circ = 2 * 3.14159 * r
   area = 3.14159 * r * r
   return (circ,area)

a=7
circ_a, area_a = circ_and_area(a)
```


So, we package two numbers (the result of the circumference and area calculation) into a tuple and return that tuple from the function `circ_and_area()`.  We assign that returned tuple into the variables we want to use: `circ_a` and `area_a`.

# NumPy Arrays

We're going to use these **a lot** in the course as they're the fundamental building block of the scientific part of Scientific Python.  When NumPy was made, the goal was to give Python the same kind of vectors and arrays we have in MATLAB.  So we can say things like:

In [6]:
import numpy as np
a=np.random.randn(100)
sse=np.power(a,2).sum()
rms=np.sqrt(sse/100)
print('SSE={:.2f}  RMS={:.2f}'.format(sse,rms))

SSE=127.26  RMS=1.13


This starts by making a into a list of 100 random Gaussian numbers.  Then computes the sum-squared error (here, error is deviation from 0) and stores that in `sse`.  Then, this computes the square root of the mean of that and stores it in the variable `rms`.  It's NumPy that let's us do lots of math and manipulation of data large-scale.  FWIW, what it really is is a Python library (just a big set of functions and classes) that accesses core math routines often written in C for high-speed computations.


# Coercing between data types
Have a string that represents a number and want the number? Have an int and want it to be a float?  Want a string of a number?  You can coerce variables into another format so long as you're not going to lose information by using the `int`, `float`, `str`, `list`, etc. functions.  So, I might say:

In [2]:
a=1
b=3.1
c='2.3'

d=b+float(c)
print(d)

print(str(a) + ' is less than ' + str(b))

5.4
1 is less than 3.1


# Native Python types we'll use less

## Sets

Sets, like tuples, are immutable (can't be changed).  But, unlike tuples, they're not ordered.  They're just a group of items you encase in `{}`.  You can do things like union and intersection on them if you're into that sort of thing.


## Dictionaries

Dictionaries are key-value pairs that are like sets, but indexed nicely by these keys.  You might define one like this:


```
params = {'ID': 47183,
'Duration': 2.0,
'ISI':0.5,
'TrialOrder': [1,5,2,3,4]}
```


and then access the ISI portion via `params['ISI']`.  We saw a dictionary in the humansize example:


```
SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
            1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
```


So, we have two key-value pairs.  Each key (1000 or 1024) has a value.  That value is a list.  When we wanted the value associated with the 1000 key, we said `SUFFIXES[1000]` and it returned a list.  We could then get the nth value in that list, etc.  Note how in the first example, we used a string as the key and here we used a number. Python is fine with anything really being the key.
