In [1]:
# One of the reasons we want to use Python is to utilise its versatility when it comes to object types.

In [2]:
# In what follows, we will go through the basics of some data structures.

# Strings, Characters, Calculations

In [3]:
# Strings are words; characters are the elements in words; calculations on words are possible in Python! For example,

In [4]:
hello_me = "hello me"
print(hello_me)

hello me


In [5]:
# One can also put two words together:

In [6]:
ystday = "from yesterday"
print(hello_me + " " + ystday)

hello me from yesterday


Note: the syntax is -- variable + "  (space) " + variable. This is possible because as a **dynamically typed**, ie. Python doesn't treat this operation as one between two different objects.

It is possible to use Python strictly as a calculator (though the motivation for that is unclear to me); in particular, it can be a very sophisticated calculator that is fairly quick.

In [7]:
1+1

2

In [8]:
29021+1

29022

In [9]:
# One can also add two lists of numbers element-wise, though it is NOT clear at all (at first) what should be done.

In [10]:
print([1,2] + [3,4]) # This is concatenating two lists
print(sum([1,2]+[3,4])) # This adds all four numbers

[1, 2, 3, 4]
10


In [11]:
# To do this, we need to develop some ideas for what other data types there are.

Numbers, strings, lists, dictionaries, tuples, files, and sets are generally considered to be the core object (data) types. Types, None, and Booleans are sometimes classified this way as well. There are multiple number types (integer, floating point, complex, fraction, and decimal) and multiple string types (simple strings and Unicode strings in Python 2.X, and text strings and byte strings in Python 3.X).

* Lists are kind of like vectors (the ones created using c()) in R --- which doesn't need anymore introduction at this point.
* Sets are like sequences in R --- which, again, doesn't need anymore introduction at this point.
* Dictionaries and tuples are **incredibly powerful** tools in Python, which we will use below. Tuples are dictionaries that cannot be changed in anyway.

In [12]:
dict = {"a":1, "b":2} # a dictionary is a collection of (key,value) --- most of which is self-explanatory here

In [13]:
[dict["a"], dict["b"]] # returns the VALUES

[1, 2]

To return keys from values is actually much harder, because that is usually NOT what a dictionary is. So, we make use of a property called items, which are essentially the ordered pairs (key,value).

In [14]:
for (k,v) in dict.items():
    print("The key", k, "corresponds to the value", v)

The key a corresponds to the value 1
The key b corresponds to the value 2


Because dictionaries are mutable, we can do ALL sorts of fun things, including deleting and adding elements.

In [15]:
del dict["a"]
dict["e"] = 9
dict["f"] = 7
dict["g"] =10
dict

{'b': 2, 'e': 9, 'f': 7, 'g': 10}

In [16]:
dict["a","c","e"] = [3,4,6] # nested lists as keys and values!

In [17]:
dict

{'b': 2, 'e': 9, 'f': 7, 'g': 10, ('a', 'c', 'e'): [3, 4, 6]}

In [18]:
for (k,v) in dict.items():
    print("The key", k, "corresponds to the value(s)", v)

The key b corresponds to the value(s) 2
The key e corresponds to the value(s) 9
The key f corresponds to the value(s) 7
The key g corresponds to the value(s) 10
The key ('a', 'c', 'e') corresponds to the value(s) [3, 4, 6]


As you can imagine, this is VERY powerful, because this lets us solve one of the problems we found earlier: that we cannot add vectors component-wise as easily as we did in R.

In [19]:
list1 = (1,2)
list2 = (3,4)

In [20]:
[sum(x) for x in zip(list1,list2)] #zip() creates a dictionary!

[4, 6]

In [21]:
# A less clever, but equally useful, option:
from operator import add
list(map(add, list1, list2))

[4, 6]

A slightly more sophisticated use of dictionary is to perhaps sort a BUNCH of people by last names, say, on a wedding invitation list.

In [43]:
lst_names = ["f","g","r","h","a","b","i"]
fst_names = ["w","x","c","j","q","l","z"]
dd = zip(fst_names,lst_names)

In [44]:
sorted(dd) # This sorts by KEYS

[('c', 'r'),
 ('j', 'h'),
 ('l', 'b'),
 ('q', 'a'),
 ('w', 'f'),
 ('x', 'g'),
 ('z', 'i')]

In [46]:
sorted(zip(fst_names,lst_names), key=lambda x: x[1]) # This sorts by VALUES (weird syntax aside, this is how one does it)

[('q', 'a'),
 ('l', 'b'),
 ('w', 'f'),
 ('x', 'g'),
 ('j', 'h'),
 ('z', 'i'),
 ('c', 'r')]

Another application: counting. This gets endlessly complicated somethings, but with Python dictionaries, it is MUCH easier.

In [51]:
letter_counts = {}
word = "Kentucky"
for letter in word:
    letter_counts[letter] = letter_counts.get(letter,0) + 1

letter_counts

{'K': 1, 'e': 1, 'n': 1, 't': 1, 'u': 1, 'c': 1, 'k': 1, 'y': 1}

We start with an empty dictionary. For each letter in the string, we find the current count (possibly zero) and increment it. At the end, the dictionary contains pairs of letters and their frequencies.

We can also ignore cases when it comes to counting.

In [64]:
letter_counts = {}
word = "ThiS is String with Upper and lower case Letters"
for letter in word.lower():
    letter_counts[letter] = letter_counts.get(letter,0) + 1

alph_letter_counts = {}
for (k,v) in letter_counts.items():
    sorted_k = sorted(letter_counts.keys())
    for x in sorted_k:
        alph_letter_counts[x] = letter_counts[x]

print(alph_letter_counts)

{' ': 8, 'a': 2, 'c': 1, 'd': 1, 'e': 5, 'g': 1, 'h': 2, 'i': 4, 'l': 2, 'n': 2, 'o': 1, 'p': 2, 'r': 4, 's': 5, 't': 5, 'u': 1, 'w': 2}


In [None]:
# This is a little complicated, but this is basically how things work here.