# Strings

In this brief notebook we cover a few additional features of Python strings that you may find useful and conclude with an example of preparing data for text analytics.

### Library Dependancies
Need collections and string, which both come pre-installed with the Python Standard Library.

## Splitting string, joining strings and replacing characters

There are multiple ways to split a string into characters, but the easiest by far is to just pass the string as an argument to the list function.

In [None]:
str1 = 'abcdefgh'
list1 = list(str1)
list1

The elements of a list can be combined into a string using join. Note that we apply the join method to the string that serves as the "glue" and pass the list as an argument.

In [None]:
join_str = ' + '
str2 = join_str.join(list1)
str2

We can split a string on a specified substring, but note that the substring itself will not be included in the results. By default, splitting will be done on whitespace.

In [None]:
split_str = ' + '
list2 = str2.split(split_str)
list2

The replace method replaces the occurences of a substring within a string

In [None]:
str2.replace('+', 'XXX')

## String constants

The Python string module contains some very useful string constants.

In [None]:
import string

In [None]:
string.ascii_letters

In [None]:
string.ascii_lowercase

In [None]:
string.ascii_uppercase

In [None]:
string.digits

## Example - breaking text into words

While these string constants are mostly a convenience to save you from having to type out lists of letters or digits, the punctuation constant is particularly useful when processing data for text analytics.

In [None]:
string.punctuation

In [None]:
with open('moby.txt', 'rt', encoding='latin1') as file:  
    text = file.read()

In [None]:
print(text)

In [None]:
for c in string.punctuation:
    text = text.replace(c, ' ')

text = text.lower()

In [None]:
print(text)

In [None]:
word_list = text.split()

In [None]:
import collections
collections.Counter(word_list)