# Types and Sequences

The absence of static typing in Python doesn't mean that there aren't types. The Python language has a built in function called `type` which will show you what type a given reference is. As mentioned in the previous course, some of the commom types includes strings, none type, intergers, float, and you can also have function type. 

In [1]:
type("I am a string")

str

In [2]:
type(None)

NoneType

In [3]:
type(1)

int

In [4]:
type(1.2)

float

In [10]:
x = [1,2,3,4,5,6]
type(x)

list

In [5]:
type(sum)

builtin_function_or_method

Typed objects have properties associated with them, and these properties can be data or functions. Python is built around different kinds of sequences or collection types. And there are three native kinds of collections which we have talked before: tuples, lists and dictionaries. 

Let's have a brief review here. A tuple is a sequence of variables which itself is immutable. That means that a tuple has items in order but it can not be changed once created. We write tuples using parentheses and we can mix types for the contents of tuple.

In [None]:
x = (1, 'a', 2, 'b')

Lists are very similar, but they can be mutable. You can change the number of elements, and the element values.

In [11]:
x = [1, 2, 3, 4, 5, 6]

There are a couple of different ways to change the contents of a list. One is through the `append` function which allows you to append new items to the end of the list:

In [12]:
x.append(7)
print(x)

[1, 2, 3, 4, 5, 6, 7]


Both list and tuple are iterable types, so you can write loops to go through every value they hold. You can use the norm `for` statement.

Lists and tuples can also be accessed by using the square bracket `[]` operator, which is called the indexing operator. The first item of the list starts at position 0 and to get the length of the list, we use the built in len function. There are some other common functions that you might expect like min and max which will find the minimum or maximum values in a given list or tuple. 

Python lists and tuples also have some basic mathematical operations that can be allowed on them. The plus (`+`) sign concatenates lists for instance. And the asterisks (`*`) repeats the values of a list. A very common operator is the `in` operator. This looks at set membership and returns a boolean value of true or false depending on whether one item is in a given list.

**Dictionaries** are similar to lists and tuples in that they hold a collection of items, but they're labeled collections which **do not have an ordering**. This means that for each value you insert into the dictionary, you must also give a **key** to get that value out. In other languages the structure is often called a **map**. And in Python we use curly braces (`{}`) to denote a dictionary. Here is an example where we might link names to email addresses. You can see that we indicate each item of the dictionary when creating it using a pair of values separated by colons. Then you can retrieve a value for a given label using the indexing operator. 

The types you use for indices or values in the dictionary can be anything. And this could be a mixture of types if you prefer. 

In [4]:
x = {"Rabbit": "Carrot", "Panda": "Banboo", "Jan": 1}
x['Rabbit']

'Carrot'

We can add new items to the dictionary using the same indexing operator we are used to. Just on the left hand side of a statement. 

In [5]:
x['Feb'] = 2

In [6]:
x

{'Feb': 2, 'Jan': 1, 'Panda': 'Banboo', 'Rabbit': 'Carrot'}

You an iterate over all of the items in a dictionary in a number of ways. First you can iterate over all of the keys and just pull the contents out as you see fit. 

In [7]:
for key in x:
    print(x[key])

Carrot
Banboo
1
2


Or you can iterate over the values and just ignore the keys. 

In [9]:
for value in x.values():
    print(value)

Carrot
Banboo
1
2


Finally you can iterate over both the values and the keys at once using the item's function. 

In [10]:
for key, value in x.items():
    print(key)
    print(value)

Rabbit
Carrot
Panda
Banboo
Jan
1
Feb
2


This last example is a little bit different, and it's an example of something called **unpacking**. In Python you can have a sequence. **That's a list or a tuple of values, and you can unpack those items into different variables through assignment in one statement.** Here's another example of that, where we have a tuple that has first name, last name, and favourite food:

In [12]:
x = ('Happy', 'Rabbit', 'Carrot')
fname, lname, food = x

In [13]:
fname

'Happy'

In [14]:
lname

'Rabbit'

In [15]:
food

'Carrot'

# Read and Write CSV Files

In [21]:
import csv
# set the floating point precision for printing to 2
%precision 2

with open('data/SegData.csv') as csvfile:
    segdat = list(csv.DictReader(csvfile))

In [23]:
segdat[:3]

[OrderedDict([('age', '57'),
              ('gender', 'Female'),
              ('income', '120963.400958119'),
              ('house', 'Yes'),
              ('store_exp', '529.134363087558'),
              ('online_exp', '303.512474550009'),
              ('store_trans', '2'),
              ('online_trans', '2'),
              ('Q1', '4'),
              ('Q2', '2'),
              ('Q3', '1'),
              ('Q4', '2'),
              ('Q5', '1'),
              ('Q6', '4'),
              ('Q7', '1'),
              ('Q8', '4'),
              ('Q9', '2'),
              ('Q10', '4'),
              ('segment', 'Price')]),
 OrderedDict([('age', '63'),
              ('gender', 'Female'),
              ('income', '122008.104949511'),
              ('house', 'Yes'),
              ('store_exp', '478.005780681606'),
              ('online_exp', '109.529710262832'),
              ('store_trans', '4'),
              ('online_trans', '2'),
              ('Q1', '4'),
              ('Q2', '1'),
       

In [4]:
len(segdat)

1000

In [5]:
segdat[0].keys()

odict_keys(['age', 'gender', 'income', 'house', 'store_exp', 'online_exp', 'store_trans', 'online_trans', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10', 'segment'])

In [6]:
sum(float(d['store_exp']) for d in segdat)/len(segdat)

1356.85

In [13]:
segment = set(d['segment'] for d in segdat)
segment

{'Conspicuous', 'Price', 'Quality', 'Style'}

In [15]:
StoreExpBySegment = []
for s in segment:
    sumexp = 0
    segmentcount = 0
    for d in segdat:
        if d['segment'] == s:
            sumexp += float(d['store_exp'])
            segmentcount += 1
    StoreExpBySegment.append((s, sumexp/segmentcount))

In [18]:
StoreExpBySegment.sort(key=lambda x: x[0])

In [19]:
StoreExpBySegment

[('Conspicuous', 5214.19),
 ('Price', 499.85),
 ('Quality', 301.16),
 ('Style', 368.05)]

# Python Dates and Times

Date and times can be stored in many different ways. One of the most common legacy methods for storing the date and time in online transactions systems is based on the offset from the epoch, which is January 1, 1970. There's a lot of historical cruft around this, but it's not uncommon to see systems storing the date of a transaction in seconds or milliseconds since this date. So if you see large numbers where you expect to see date and time, you'll need to convert them to make much sense out of the data. 

In Python, you can get the current time since the epoch using the time module. 

In [25]:
import datetime as dt
import time as tm
tm.time()

1511756959.48

You can then create a time stamp using the from time stamp function on the date time object. When we print this value out, we see that the year, month, day, and so forth are also printed out. 

In [27]:
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

datetime.datetime(2017, 11, 26, 22, 29, 54, 738388)

In [30]:
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second

(2017, 11, 26, 22, 29, 54)

Date time objects allow for simple math using time deltas. For instance, here, we can create a time delta of 100 days, then do subtraction and comparisons with the date time object. This is commonly used in data science for creating sliding windows. 

In [31]:
delta = dt.timedelta(days = 100)
delta

datetime.timedelta(100)

For instance, you might want to look for any five day span of time where sales were highest, and flag that for follow up. 

In [34]:
today = dt.date.today()
today - delta

datetime.date(2017, 8, 18)

In [35]:
today > today - delta

True