# Week 4: Text and time

1. Text
    - Dealing with text data
    - Cleaning dirty integer data
    - Textual statistics 
    - Trimming strings
2. Dates and times
    - What does it mean to have dates and times in programming / data?
    - Time deltas
    - Time series
    - Resampling 

In [2]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [3]:
# if I create a series of integers, the dtype will (by default) be an integer type (np.int64)

s = Series([10, 20, 30, 40, 50])
s

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [4]:
# what if, though, I have a series of strings?

s = Series('this is a bunch of words'.split())
s

0     this
1       is
2        a
3    bunch
4       of
5    words
dtype: object

The `object` dtype in Pandas means: I'm not storing this in NumPy, because it's easier for me to think of it as a Python object. Really, in the back-end NumPy storage, I just have a "pointer," or a "reference," to the memory location of the Python object.

If you see a `dtype` of `object`, the odds are pretty good that it contains strings.

Pandas is moving, slowly but surely, toward having its own string types, but we don't have to worry about that right now.

Let's say I want to find out how long each of these strings is. How can I do that? Python provides me with the `len` function, so can I run that on my series?

In [5]:
len(s)  # this returns the length of the series, not of the individual strings in the series

6

In [7]:
# what about a for loop?

for one_item in s:
    print(len(one_item))    # don't do this!

4
2
1
5
2
5


Pandas provides us with a special attribute, known as an "accessor," which lets us invoke string methods on every element in our series, one at a time.  Instead of invoking a `for` loop, we can have Pandas do that on our behalf, and do it at the low level that makes things faster.

The key, then, is to use this accessor, known as `.str`.



In [8]:
s.str    # this brings up the accessor

<pandas.core.strings.accessor.StringMethods at 0x12194fd50>